First of all, one key issue in machine learning/deep learning is feature engineering for whatever data types. Better features results in better learning process on classification, regression, etc. Second, I don’t think that kMeans on raw images works because of the complexity of the content in images.
Regards