Once Upon a Time in CIFAR-10
CIFAR-10 is one of the most popular image datasets in computer vision. CIFAR-10 consists of 60,000 32x32 color images in 10 classes, splitting into 50,000 images for training and 10,000 images for testing. Figure 1 shows the labels and some images in CIFAR-10. There are tons of articles about CIFAR-10. Some articles use CIFAR-10 in an introduction to computer vision and deep learning, and some articles use CIFAR-10 in the benchmark of a new algorithm for image classification. But, one important issue we shall remember is the human-level accuracy and the state-of-the-art accuracy on CIFAR-10, which could help us to set one anchor for justifying the focus or the methods of the article. This short post briefly summarizes the human-level accuracy and the state-of-the-art accuracy of CIFAR-10, and provides some general practices in developing solutions for CIFAR-10 or reviewing articles on CIFAR-10.
The Human-Level Accuracy, The State-of-The-Art Accuracy, and The Quality of CIFAR-10
The error rate of a human on CIFAR-10 is estimated to be around 6%, which means that a model achieving above 94% accuracy will be regarded as a super-human performance. According to paperswithcode.com, the best model can reach 99% accuracy on CIFAR-10 as shown in Figure 2.
Roughly speaking, a simple convolutional neural network together with a proper training strategy can achieve accuracy between 80% and 90%. Some old-school classic models such as ResNet and DenseNet can reach accuracy between 90% and 95%, and some new-school classic models such as ViT can reach accuracy above 95%.
However, the quality of CIFAR-10 itself might be a fundamental problem for benchmarking. First, some researchers report that many popular datasets contain some label errors, specifically the test set of CIFAR-10 contains 0.54% label errors. These label errors might compromise or incorrectly boost the performance of a model. Some label errors in CIFAR-10 are shown in Figure 3. Second, another quality issue is that 3.3% and 10% of the images from the test set have duplicates in the training set in CIFAR-10 and CIFAR-100. As a result, a model might memorize some images and incorrectly boost its performance. Some duplicated and near-duplicated images in these datasets are shown in Figure 4. The authors of the above studies have also published corrected visions of CIFAR-10 for evaluation.
Some Technical Articles using CIFAR-10
With the above numbers in mind, we could set a standard or an expectation for reviewing a technical article using CIFAR-10. Here are a few examples for reference.
- TensorFlow/Keras Offical Tutorial on Image Classification: This article from tensorflow.org covers the framework of data processing and model building under TensorFlow/Keras for image classification on CIFAR-10. The accuracy of the model is 72%. Note: The focus of this article is on the whole process of image classification under TensorFlow/Keras, rather than good performance on CIFAR-10.
- PyTorch Offical Tutorial on Image Classification: This article from pytorch.org covers the framework of data processing and model building under PyTorch for image classification on CIFAR-10. The accuracy of the model is 54%. Note: The focus of this article is on the whole process of image classification under PyTorch, rather than good performance on CIFAR-10. However, the performance of the show-case example is way below the norm.
- Tutorial on Image Classification from Machine Learning Mastery: This article introduces a baseline model in TensorFlow/Keras on CIFAR-10 and then describes several mechanisms to improve the performance. The accuracy of the baseline model is 67% while the accuracy of the advanced model is 88%.
- Image classification with ConvMixer: This article from keras.io illustrates one case in the emerging trend of combining the mechanisms in convolutional neural networks and vision transformers for applications of computer vision. Because this article is just to show the core idea of ConvMixer rather than to fully train a complete model for CIFAR-10, the accuracy is 83%. But, it refers to the original work which could reach accuracy above 90%.
Conclusions
CIFAR-10 is a good dataset for beginners to work with. But, it is critical to know what a simple convolutional neural network can do and what old-school/new-school classic models can do. It is also important to know how researchers identify the quality issues of CIFAR-10 and if such approaches can be applied to other datasets.
Thanks for reading. Please feel free to leave your comments.