Getting familiar with neural networks
To check my understanding of this tool I decided to test myself - if I can achieve good results on some publicly known data samples. Found that one of most documented databases with results is MNIST - images 28x28 of handwritten digits 0-9 tagged with correct value assigned. Data set is split into 2 parts: 60k training examples and 10k test examples.
Test 1 - Multilayer Perceptron
First I wanted to see how far I can go with “most basic” neural network - a model called Multilayer Perceptron (MLP). Why worth trying? This kind of neural network even with 1 hidden layer is said to be able to recreate any nonlinear input-output relation (also I tried this - and works fine for Gauss function or sinus!).
To try MLP I used scikit learn, well documented python library. I played around different parameters ( nr of layers, amount of hidden layer neurons, activation function, solver, some others...) to see how neural network works. The significant parameters I found were solver and hidden layer size.
With following config MLPClassifier( hidden_layer_sizes=(38*5, ), solver='lbfgs', max_iter=500) I could get as low as 2.2% errors on test set, which is quite good and in range of 1.53-3.05% results on MNIST page for 3 layers neural network. If my results were significantly better or significantly worse - I would know most probably I did something wrong. Seems not the case, so I can move forward.
Note - here my reference are MNIST results for similar NN model. If I was doing prediction of unknown data with unknown results it would require much more work and more detailed analysis.
Most important things to note:
- To normalize your data correctly (for this purpose I scaled each image separately to have values from Gaussian distribution).
- To set proper size of hidden layer - which is said to be somewhere between input and output layers size. Input in this case was 28x28 and output 10 (number of classes)
Test2 - CNN - Convolutional Neural Network
CNNs are a bit more sophisticated NN, which applies internal filters, helping to extract patterns. CNNs are used for image classifications a lot, like face recognition. Should be better for detecting handwritten digits, right? To check that I used Tensorflow python library for 2 reasons - it utilizes GPU processor and it allows for easy CNN design using Keras module. Also there is a lot of documentation and interesting articles containing examples of such CNNs in Tensorflow/Keras.
I found following model getting good results (down to 0.8% error rate on testing set when lucky):
model = Sequential() # model type
model.add(Conv2D(32, kernel_size=(7,7), strides=(1, 1), activation='relu', input_shape=(28,28,1)))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(loss=tf.keras.losses.categorical_crossentropy, optimizer=tf.keras.optimizers.Adam(), metrics=['accuracy'])
model.fit(X, y_train, batch_size=150, epochs=4, shuffle=True)
As you can see CNN consists of several layers - in this example they are 2D convolutional layers which create a 2D filters over the image, split by MaxPooling layers which resample the data taking most significant inputs.
I’m not going any deeper here (to understand why it works this way, not the other) then simply checking few articles to see it’s how CNNs are designed for best results and practiced by many.
I’m also quite satisfied with the results - achieving almost 2x-3x less errors (0.8% vs 2.2%) when comparing CNN to MLP and also achieving similar results to relevant articles on MNIST page (0.8%-09% test error rate).
I was surprised using this module batch size have big impact on results and changing it to 180 or 90 showed worse results, even when adjusted epochs. Need to keep it in mind when working on unknown data.
Test 3 - modern deep learning for pattern/image recognition
After having results for MLP and CNN I tried to find if something better exist… and the search was very promising - every year there are contests for CNNs’ designers for a better CNN model. These are very sophisticated models designed to learn recognize many real life objects. These models usually have much more than 10 layers (for example InceptionResNetV2 has 572 layers!).
I was happy to find these models were included in Tensorflow/Keras and needed very little adjustments to run on MNIST dataset. I tried few of them: Xception, InceptionResNetV2, ResNet50 and InceptionV3. This time calculations took about 15-40 min depending on epoch number and model.
I was deeply disappointed. Calculations took about 10x-100x more time then using the “normal” CNN described above and results were on the level of Multilayer Perceptron and sometimes much worse… I tried to play around changing epochs to avoid over-learning, but couldn’t get a decent results. So I think these super-fancy neural network give good results only on the data set they were designed to work on (maybe colors and image size is very important?). Still I believe these could give better results, but probably need more time tuning it to learn different data set. It is also possible the results was not good due to small batch numbers, but I could not afford higher batches due to lack of memory (used 32-64). The best results I achieved for InceptionV3 , where it worked with batch size 150. After 3 epochs I got 1.8% errors on test set.
Summary
It seems best I can use right now without going deeply into creating my own neural network designs is to use CNN similar to the one described above. It gives good results and calculates very fast on GPU (about 1 minute total time).
Good read
CNN explained
https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f
https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-1f6f42faee1
History of recent year's Deep Learning evolution
https://medium.com/comet-app/review-of-deep-learning-algorithms-for-image-classification-5fdbca4a05e2
Building CNN guide
https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-ii-hyper-parameter-42efca01e5d7
Practical Keras Example
https://medium.com/@jon.froiland/python-deep-learning-part-3-9d9e4cf9035c
Many code sippets
https://www.programcreek.com/python/example/100068/keras.applications.resnet50.ResNet50
Design guide for image classification CNN
https://hackernoon.com/a-comprehensive-design-guide-for-image-classification-cnns-46091260fb92
Face detection
https://towardsdatascience.com/how-does-a-face-detection-program-work-using-neural-networks-17896df8e6ff
Image elements detection
https://blog.athelas.com/a-brief-history-of-cnns-in-image-segmentation-from-r-cnn-to-mask-r-cnn-34ea83205de4
Congratulations @djlemonskull! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :
Click here to view your Board of Honor
If you no longer want to receive notifications, reply to this comment with the word
STOP
Do not miss the last post from @steemitboard: