From CNN to ViT: Improving Facial Emotion Recognition
🧠Motivation My initial attempt at facial expression recognition, completed during my university studies, involved a convolutional neural network (CNN). Despite limited resources—a modest GTX 1050 GPU and sparse datasets—I managed to achieve an accuracy between 65-75%. Although promising, it wasn’t sufficient for real-world applications, particularly for something as sensitive as mental health diagnostics. Fast forward to today: armed with greater experience, advanced hardware (RTX 3060), and modern ML frameworks, I revisited this challenge using Vision Transformers (ViT). The goal was clear: surpass 80% accuracy and deepen my expertise in state-of-the-art deep learning techniques. ...