Build a CNN for MNIST Image Classification Using Keras
In the last lesson, we explored the basics of Convolutional Neural Networks (CNNs), which are powerful tools for image processing. We learned how CNNs use filters to detect patterns like edges, textures, and shapes in images. We also discussed the roles of convolutional layers, pooling layers, and fully connected layers in building a CNN. If you missed it, I recommend revisiting Lesson 7.1 to understand the foundation before diving into this tutorial.
Now, let’s move forward and apply what we’ve learned to build a CNN for image classification using the MNIST dataset.
Use-Case: Building a CNN for MNIST Image Classification
I recently worked on a project where I needed to classify handwritten digits from the MNIST dataset. The goal was to build a model that could accurately identify digits from 0 to 9. This task is a classic example of image classification, and CNNs are perfect for it.
I started by preparing the dataset, defining the CNN architecture, and training the model. Along the way, I faced challenges like overfitting and slow training times, which I solved by tweaking the model and adjusting hyperparameters. By the end, I achieved an accuracy of over 99% on the test set. Let me walk you through the steps I took to accomplish this.
Step 1: Preparing the MNIST Dataset
The MNIST dataset contains 60,000 training images and 10,000 test images of handwritten digits. Each image is a 28x28 grayscale image, which makes it easy to work with.
First, I loaded the dataset using Keras, which provides a built-in function to fetch MNIST data. Here’s how I did it:
from tensorflow.keras.datasets import mnist
# Load the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Next, I reshaped the data to add a channel dimension, which is required for convolutional layers. Since the images are grayscale, the channel dimension is 1.
# Reshape the data
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
Finally, I normalized the pixel values to the range [0, 1] to improve training efficiency.
# Normalize the data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
Step 2: Defining the CNN Architecture
With the data ready, I defined the CNN architecture using Keras. The model consists of convolutional layers, pooling layers, and fully connected layers.
Here’s the code I used:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
# Define the model
model = Sequential()
# Add convolutional layers
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# Flatten the output and add fully connected layers
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
The first convolutional layer has 32 filters, and the second has 64 filters. After each convolutional layer, I added a max-pooling layer to reduce the spatial dimensions. Finally, I flattened the output and added two fully connected layers. The last layer uses a softmax activation to output probabilities for each digit class.
Step 3: Compiling and Training the Model
Once the model was defined, I compiled it by specifying the optimizer, loss function, and metrics.
# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
I used the Adam optimizer, which is efficient and works well for most tasks. The loss function is sparse categorical crossentropy, which is suitable for multi-class classification.
Next, I trained the model on the training data:
# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=128, validation_split=0.1)
I trained the model for 5 epochs with a batch size of 128. I also set aside 10% of the training data for validation.
Step 4: Evaluating Model Performance
After training, I evaluated the model on the test set to check its performance.
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
The model achieved an accuracy of over 99%, which is excellent for this task.
Conclusion
In this tutorial, we walked through the steps to build a CNN for image classification using the MNIST dataset. We prepared the data, defined the CNN architecture, trained the model, and evaluated its performance. By following these steps, you can build your own CNN for similar tasks.
In the next lesson, we’ll explore data augmentation and transfer learning, which are techniques to improve model performance and handle more complex datasets. Don’t miss it!
Comments
There are no comments yet.