Sports Image Classification
Built a image classification model to predict the sport that is being represented in the image.
Have you ever wondered how machines can recognize sports just by looking at images? Well, it's all thanks to the power of Convolutional Neural Networks (CNNs). In this article, we will explore how CNNs can be used to classify sports images and compare the performance of different CNN architectures.
Dataset
The dataset used for this project is a collection of images representing 100 different types of sports and activities. The sports range from traditional sports like "archery", "arm wrestling", "bowling", "football", "water polo", "weightlifting" to non-traditional ones like "wingsuit flying" and "nascar racing". The goal is to predict the correct sport based on the image.
The dataset consists of 13572 train, 500 test, and 500 validation images. Each image is of size 224 x 224 pixels and has been segregated into train, test, and valid directories. The dataset can be downloaded from Kaggle.
Experiments
To classify the sports images, we compared the performance of five different CNN architectures: a custom CNN, InceptionV3, ResNet50V2, MobileNetV2, and EfficientNetB3. For each model, we fine-tuned the pre-trained ImageNet weights and added two hidden layers of 256 and 128 neurons respectively with leaky-relu activations. Dropout layers with p=0.1 were added to prevent overfitting. The number of epochs was 50 with early stopping with a patience parameter of 2 epochs. A batch size of 32 was used for training, and Sparse Categorical Cross-Entropy was used as the loss function.
Custom CNN
We started with a baseline custom CNN model with 3 convolution layers and 3 dense layers. A kernel of size 3 x 3 was used for all the convolution layers. Training the model with an Adam optimizer with a learning rate of 0.001 for 47 epochs yielded an Accuracy of 56.44%, F1-Score of 48.48%, and an ROC-AUC Score of 49.46%.
InceptionV3
The InceptionV3 model was initialized with pre-trained ImageNet weights. Only the Dense layers were fine-tuned. Training the model with an Adam optimizer with a learning rate of 0.001 for 22 epochs yielded an Accuracy of 68.92%, F1-Score of 64.92%, and an ROC-AUC Score of 66.64%.
ResNet50V2
The ResNet50V2 model was initialized with pre-trained ImageNet weights. Only the Dense layers were fine-tuned. Training the model with an Adam optimizer with a learning rate of 0.001 for 16 epochs yielded an Accuracy of 72.88%, F1-Score of 70.67%, and an ROC-AUC Score of 69.72%.
MobileNetV2
The MobileNetV2 model was initialized with pre-trained ImageNet weights, and all the layers were fine-tuned. Training the model with an Adam optimizer with a learning rate of 0.001 for 8 epochs yielded an Accuracy of 86.68%, F1-Score of 86.79%, and an ROC-AUC Score of 88.36%.
EfficientNetB3
The EfficientNetB3 model was initialized with pre-trained ImageNet weights, and all the layers were fine-tuned. Training the model with an Adam optimizer with a learning rate of 0.001 for 18 epochs yielded an Accuracy of 92.72%, F1-Score of 91.76%, and an ROC-AUC Score of 96.92%.
Results
The best results were obtained using a fine-tuned EfficientNetB3 model, which achieved an Accuracy of 92.72%, F1-Score of 91.76%, and an ROC-AUC Score of 96.92%. The results from all the models have been summarized in the table below:
Model | Accuracy | F1-Score | ROC-AUC Score |
---|---|---|---|
Custom CNN | 56.44 | 48.48 | 49.46 |
InceptionV3 (fine-tuned) | 68.92 | 64.92 | 66.64 |
ResNet50V2 (fine-tuned) | 72.88 | 70.67 | 69.72 |
MobileNetV2 (fine-tuned) | 86.68 | 86.79 | 88.36 |
EfficientNetB3 (fine-tuned) | 92.72 | 91.76 | 96.92 |
Deploying the Model
A web app was made using Streamlit to make predictions for new images using the best model. The live app can be viewed here.
Run Locally
All the code for this project can be found in this Github repository. To run the app locally, you can follow the instructions provided in the repository.
- Install required libraries:
pip install -r streamlit/requirements.txt
- Fine-tune models:
python sports-clf-custom-cnn.py python sports-clf-inception.py python sports-clf-resnet.py python sports-clf-mobilenet.py python sports-clf-efficientnet.py
- Generate predictions:
python sports-clf-final-predictions.py
Conclusion
In conclusion, we have seen how CNNs can be used to classify sports images with high accuracy. We compared the performance of five different CNN architectures and found that the EfficientNetB3 model achieved the best results. This model can be used to classify sports images in real-time, which can be useful in various applications such as sports analytics, sports broadcasting, and sports betting.