Akash Kumar

I am a fifth year PhD student at Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), under the supervision of Prof. Yogesh Singh Rawat.

I have a broad interest in deep learning and computer vision. My current research mainly focuses on data-efficient approaches for dense video tasks.

Looking for full-time positions (Jan'26)! Feel free to drop me an email.

Email / Google Scholar / Github / LinkedIn / Resume / Thesis Poster

Updates

May'25: Started internship at Amazon, Bellevue, WA
Apr'25: Received Doctoral Research Support Award from UCF. 💥
Mar'25: Student Travel Award grant to attend ICLR 2025
Mar'25: STPro accepted at CVPR 2025 💥
Feb'25: Student Travel Award grant to attend WACV 2025
Feb'25: Student Travel Award grant to attend AAAI 2025
Jan'25: CoSPaL (First author paper) accepted at ICLR 2025 💥
Jan'25: Selected for Doctoral consortium at IEEE/CVF WACV 2025 💥
Dec'24: SMT (First author paper) accepted at AAAI 2025 💥
May'24: Started internship at Amazon, Palo Alto, CA
Dec'23: SSL-AL accepted at AAAI 2024 💥
Oct'23: Benchmark-SSL (First author paper) accepted at NeurIPS Self-Superivsed Workshop 2024
Mar'23: MAMA-VAD accepted at CVPR Workshops 2022
Mar'22: E2E-SSL (First author paper) accepted at CVPR 2022 💥

Intern Experience

Applied Scientist Intern
Decision Science Technology, Bellevue, WA. Summer 2025
Host: Eduardo Santiago

Spatio-Temporal Anamoly Detection.

Applied Scientist Intern
Visual Shopping Team, Palo Alto, CA. Summer 2024
Host: Shan Yang, Junbang Liang, Sampath Chanda.

Towards Open-vocabulary video object understanding.

Publications

Below is a selected list of my works (in chronological order), representative papers are highlighted.

	Training free Spatio-Temporal Video Grounding via Multimodal foundation models Akash Kumar, Yogesh Singh Rawat Under review Adaptation of VLMs leveraging Large Language Models (LLMs) via spatio-temporal composite relationship.
	STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding Aaryan Garg, Akash Kumar, Yogesh Singh Rawat Computer Vision and Pattern Recognition Conference (CVPR), 2025 project page / paper / huggingface Improved VLMs grounding capabilities via action composition and complex spatio-temporal scene understanding.
	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding Akash Kumar, Zsolt Kira, Yogesh Singh Rawat International Conference on Learning Representations (ICLR), 2025 project page / paper / code / poster / huggingface Developed first vision language models (VLMs) for dense multimodal video detection task without any labels. Devised context aware and self-paced progressive scene learning approach.
	Stable Mean Teacher for Semi-Supervised Video Action Detection Akash Kumar, Sirshapan Mitra, Yogesh Singh Rawat Association for the Advancement of Artificial Intelligence (AAAI), 2025 project page / paper / code / poster / video / huggingface Learning from mistakes on labelled set and transfer that learning to pseudo labels from unlabeled set to enhance spatio-temporal localization. Class-agnostic spatio-temporal refinement module and temporal coherency constraint for better spatio-temporal localization.
	Semi-supervised Active Learning for Video Action Detection Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat Association for the Advancement of Artificial Intelligence (AAAI), 2024 project page / paper / code / poster / video High-pass filtering for enhanced pseudo labels to improvise spatio-temporal localization. Simple sample augmentation strategy for informative sample selection.
	Benchmarking self-supervised video representation learning Akash Kumar, Ashlesha Kumar, Vibhav Vineet, Yogesh Singh Rawat Neural Information Processing (NeurIPS Workshops), 2023 4th Workshop on Self-supervised Learning: Theory and Practices project page / paper / poster First exhaustive study on impact of pre-training in self-supervised learning for videos. Proposed a simple knowledge distillation approach outperforming previous works with 90% less videos.
	End-to-End Semi-Supervised Learning for Video Action Detection Akash Kumar, Yogesh Singh Rawat Computer Vision and Pattern Recognition Conference (CVPR), 2022 project page / paper / code / video First end-to-end semi-supervised approach for video action detection task. Short-term and long-term smoothness constraints to exploit spatio-temporal coherency.
	Video Action Detection: Analysing Limitations and Challenges Rajat Modi, Aayush Rana, Akash Kumar, Praveen Tirupattar, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah Computer Vision and Pattern Recognition Conference (CVPR Workshops), 2022 1st Workshop on Vision Datasets Understanding paper Developed new spatio-temporal surveillance based dataset for real-world challenges.

Publications (Funding projects)

Below is a list of my works (in chronological order) for funding projects.

Benchmarking Robustness of Gait Recognition Models
Reeshoon Sayera, Sirshapan Mitra, Prudvi Kamtam, Akash Kumar, Yogesh Singh Rawat
Under review

Investigate the robustness of gait recognition models against perturbations and corruptions, focusing on both key components: the parsing model and the gait recognition model.

Gait recognition under limited labels settings: A generalized approach
Sirshapan Mitra, Akash Kumar, Yogesh Singh Rawat
Under review

A versatile solution applicable to all limited label settings (semi-supervised & domain adaptation), via low-dimensional clustering and knowledge distillation.

Gabriella V2: Towards better generalization in surveillance videos for Action Detection
Ishan Dave, Zaccheeus Scheffer, Akash Kumar, Sania Shiraz, Yogesh Singh Rawat, Mubarak Shah
IEEE Winter Conference on Applications of Computer Vision (WACV Workshops), 2022
Human Activity Detection in Multi-Camera Long-Duration Video
paper / video

Proposed real-time online action detection system for open-world surviellance videos.

Awards / Recognitions

Recieved Doctoral Research Support Award 2025

Selected for ICLR Student Travel Grant 2025

Selected for AAAI Student Travel Grant 2025

Selected for IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025 Doctoral Consortium

2^nd place, 2023 - IARPA BRIAR: Biometric Recognition and Identification at Altitude and Range

2^nd place, 2021 - NIST TRECVID ActEV: Activities in Extended Video

1^st place, 2021 - PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)

Selected for 8th Heidelberg Laureate Forum, Germany, 2021

1^st place, 2021 - PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)

ORCGS Doctoral Fellowship, 2020-2021

Top 0.01%, 2015 - Joint Engineering Entrance-Mains exam, India

Professional Service

Reviewer, CVPR 2023, 2024, 2025
Reviewer, ICLR 2023, 2024, 2025
Reviewer, ECCV 2022, 2024
Reviewer, ICCV 2023, 2025
Reviewer, NeurIPS 2023, 2024
Reviewer, ICML 2024
Reviewer, ACMMM 2023, 2024
Reviewer, WACV 2023, 2024
Reviewer, IEEE Transaction on Image Processing
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.