Image recognition using Convolutional Neural Networks

AI is one of the main buzzwords in 21^st century which is most intriguing in terms of its applications and usage. Nowadays, AI is used in each and every aspect of modern life – from ordering food to autonomously driving cars. AI can be classified into many sub-categories. But the main concerning categories are Machine Learning and Deep Learning.

The traditional affair: Machine Learning vs Deep Learning

Machine Learning is a type of AI in which the features are already given to a model. If a model is supposed to detect an image, training it as a machine learning model would result in less accuracy as there are ton of features we have to provide the model. In this scenario, Deep Learning is used. Deep Learning is a subset of machine learning in which the features are not provided to a model. It basically tries to mimic human brain.

Imagine a 6yr old kid. This kid can identify a dog for its entire life time once it sees a picture of a dog. The kid doesn’t need too many encounters with the dog in order to detect an occurrence of a dog in future. That’s the power of human brain, and that’s how deep learning works – Learning by example.

CNN

Every neural network in a deep learning model is connected with the help of a neuron. These neurons are interconnected in such a way that these mimic the structure of the human brain. A deep learning network design that learns directly from data is a convolutional neural network (CNN or ConvNet). CNNs are very helpful for recognising objects, classes, and categories in photos by looking for patterns in the images. Today, we will see how to implement CNN in binary and multiclass classification using Keras.

About Keras

Keras is a high-level, open-source neural network library written in Python. It provides a user-friendly API for defining and training deep learning models. Google created the high-level Keras deep learning API to implement neural networks. Keras acts as an interface for the TensorFlow library, which provides the low-level operations for training and running neural networks. With its ease of use and built-in support for common neural network architectures, Keras has quickly become a popular choice for deep learning practitioners.

Binary Classification

In binary classification, a model is trained on a labeled dataset, where each instance is assigned a label of 0 or 1, and the model's goal is to learn a mapping from input features to the binary outputs. Once trained, the model can be used to make predictions on new, unseen instances. Binary classification is used to solve problems such as spam detection, image classification (e.g., is this picture a cat or not), and medical diagnosis (e.g., does a patient have a disease or not).

Binary classification can be implemented in Keras.

# Import necessary libraries
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense

# Load the data
(X_train, y_train), (X_test, y_test) = your-binary-dataset

# Define the model
model = Sequential()
model.add(Dense(32, input_dim=X_train.shape[1], activation='relu'))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Fit the model on the training data
history = model.fit(
    X_train, y_train, epochs=10, batch_size=32, 
    validation_data=(X_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(X_test, y_test)

# Make predictions on new instances
predictions = model.predict(X_new)

In this illustration, it is assumed that the input data is kept in the X train and X test arrays, while the binary labels are kept in the Y train and Y test arrays. First, a fully connected (Dense) layer with 32 units and a ReLU activation function receives the input data. A single unit with a sigmoid activation function receives the output of this layer and outputs a binary prediction in the [0, 1] range.

The Adam optimizer and binary cross-entropy loss are used in the model's construction, while the fit method is used for training. Finally, the evaluate method is used to assess the model's performance on the test data, and the predict method is used to make predictions about future occurrences.

Multiclass Classification:

A sizable database of handwritten numbers called the MNIST database (Modified National Institute of Standards and Technology database) is frequently used to train different image processing systems. The database is frequently used for machine learning training and testing. Multiclass classification of numbers using the MNIST database can be accomplished in Keras by training a deep learning model to recognize handwritten digits. The MNIST database contains 60,000 training images and 10,000 test images of handwritten digits (0-9).

Training process. Image via 3Brown1Blue

# Import necessary libraries
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.datasets import mnist

# Load the data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data
X_train = X_train.reshape(-1, 28 * 28) / 255.0
X_test = X_test.reshape(-1, 28 * 28) / 255.0
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Define the model
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Fit the model on the training data
history = model.fit(X_train, y_train, 
                    epochs=10, 
                    batch_size=32,
                    validation_data=(X_test, y_test))

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(X_test, y_test)

# Make predictions on new instances
predictions = model.predict(X_new)

In this example, the input data is loaded using the mnist.load_data function, which returns the training and test images as arrays. The data is preprocessed by reshaping the images into a 1D array of 784 pixels and normalizing the pixel values to be between 0 and 1. The binary labels are one-hot encoded using the to_categorical function.

The model consists of a flattening layer to convert the 2D image data into a 1D array, followed by two fully connected (Dense) layers. The output layer has 10 units, one for each possible digit, and uses a softmax activation function to produce multiclass predictions.

The model is compiled with categorical cross-entropy loss (as it is a multiclass classification example) and the Adam optimizer, and trained using the fit method. Finally, the model's performance on the test data is evaluated using the evaluate method, and predictions can be made on new instances using the predict method.

Congrats for reading till here!

In conclusion, Convolutional Neural Networks have revolutionized the field of computer vision and image classification. They are designed to capture the spatial hierarchies and correlations in image data, making them well suited for tasks such as object detection and image segmentation. With their ability to learn increasingly complex features and representations, they have achieved state-of-the-art results on a wide range of computer vision benchmarks. The flexibility and ease of implementation of CNNs in popular deep learning frameworks, such as Keras, makes it easier for practitioners to apply these models to real-world problems. With the ongoing advancements in hardware and research, we can expect to see even more exciting applications and breakthroughs in the future. Whether you are a beginner or a seasoned practitioner, I highly recommend exploring the exciting world of Convolutional Neural Networks.

Let's connect