Akash Kumar

I am a fifth year PhD student at Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), under the supervision of Prof. Yogesh Singh Rawat.

I have a broad interest in deep learning and computer vision. My current research mainly focuses on limited label understanding for multimodal and unimodal dense video tasks.

Looking for research intern/full-time positions (Summer'25)! Feel free to drop me an email.

Email  /  Google Scholar  /  Github  /  LinkedIn  /  CV

profile photo
Intern Experience
Applied Scientist Intern
Visual Research, Palo Alto, USA. Summer 2024
Host: Shan Yang, Junbang Liang

Towards Open-vocabulary video object understanding.

Publications
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
Akash Kumar, Zsolt Kira, Yogesh Singh Rawat
Under review

First foundation model adaptation for dense multimodal video detection task without any labels. Context aware and self-paced progressive scene learning approach.

Stable Mean Teacher for Semi-Supervised Video Action Detection
Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat
Association for the Advancement of Artificial Intelligence (AAAI), 2025
paper  /  code

Learning from mistakes on labelled set and transfer that learning to pseudo labels from unlabeled set to enhance spatio-temporal localization. Class-agnostic spatio-temporal refinement module and temporal coherency constraint for better spatio-temporal localization.

Semi-supervised Active Learning for Video Action Detection
Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat
Association for the Advancement of Artificial Intelligence (AAAI), 2024
paper  /  code

High-pass filtering for enhanced pseudo labels to improvise spatio-temporal localization. Simple sample augmentation strategy for informative sample selection.

End-to-End Semi-Supervised Learning for Video Action Detection
Akash Kumar, Yogesh Singh Rawat
Computer Vision and Pattern Recognition Conference (CVPR), 2022
paper  /  code

First end-to-end semi-supervised approach for video action detection task. Short-term and long-term smoothness constraints to exploit spatio-temporal coherency.

Service
Reviewer, IEEE Transaction on Image Processing
Reviewer, CVPR 2023, 2024, 2025
Reviewer, ICLR 2023, 2024, 2025
Reviewer, ECCV 2022, 2024
Reviewer, ICCV 2023
Reviewer, NeurIPS 2023, 2024

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.