Akash Kumar
I am a fifth year PhD student at Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), under the supervision of Prof. Yogesh Singh Rawat.
I have a broad interest in deep learning and computer vision. My current research mainly focuses on data-efficient approaches for dense video tasks.
Looking for research intern/full-time positions (Summer'25)! Feel free to drop me an email.
Email  / 
Google Scholar  / 
Github  / 
LinkedIn  / 
CV
|
|
Updates
Jan'25: First author paper accepted at ICLR 2025 💥
Jan'25: Selected for Doctoral consortium at IEEE/CVF WACV 2025 💥
Dec'24: First author paper accepted at AAAI 2025 💥
May'24: Started internship at Amazon, Palo Alto, CA
Dec'23: A paper accepted to AAAI 2024 💥
Oct'23: A First author paper accepted at NeurIPS Self-Superivsed Workshop 2024
Mar'23: A paper accepted to CVPR Workshops 2022
Mar'22: A First author paper "E2E-SSL" accepted to CVPR 2022 💥
|
Publications
Below is a selected list of my works (in chronological order), representative papers are highlighted.
|
|
Training free Spatio-Temporal Video Grounding via Multimodal foundation models
Akash Kumar, Yogesh Singh Rawat
Under review
Adaptation of VLMs leveraging Large Language Models (LLMs) via spatio-temporal composite relationship.
|
|
Weakly Supervised Spatio-Temporal Video Grounding via Progressive Learning
Aaryan Garg, Akash Kumar,
Yogesh Singh Rawat
Under review
Improved VLMs grounding capabilities via action composition and complex spatio-temporal scene understanding.
|
|
Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding
Akash Kumar,
Zsolt Kira,
Yogesh Singh Rawat
International Conference on Learning Representations (ICLR), 2025
project page  / 
paper  / 
code  / 
huggingface
Developed first vision language models (VLMs) for dense multimodal video detection task without any labels. Devised context aware and self-paced progressive scene learning approach.
|
|
Stable Mean Teacher for Semi-Supervised Video Action Detection
Akash Kumar,
Sirshapan Mitra,
Yogesh Singh Rawat
Association for the Advancement of Artificial Intelligence (AAAI), 2025
project page  / 
paper  / 
code  / 
poster  / 
video  / 
huggingface
Learning from mistakes on labelled set and transfer that learning to pseudo labels from unlabeled set to enhance spatio-temporal localization.
Class-agnostic spatio-temporal refinement module and temporal coherency constraint for better spatio-temporal localization.
|
|
Semi-supervised Active Learning for Video Action Detection
Ayush Singh,
Aayush J Rana,
Akash Kumar,
Shruti Vyas,
Yogesh Singh Rawat
Association for the Advancement of Artificial Intelligence (AAAI), 2024
project page  / 
paper  / 
code  / 
poster  / 
video
High-pass filtering for enhanced pseudo labels to improvise spatio-temporal localization. Simple sample augmentation strategy for informative sample selection.
|
|
Benchmarking self-supervised video representation learning
Akash Kumar,
Ashlesha Kumar, Vibhav Vineet, Yogesh Singh Rawat
Neural Information Processing (NeurIPS Workshops), 2023
4th Workshop on Self-supervised Learning: Theory and Practices
project page  / 
paper  / 
poster
First exhaustive study on impact of pre-training in self-supervised learning for videos. Proposed a simple knowledge distillation
approach outperforming previous works with 90% less videos.
|
|
End-to-End Semi-Supervised Learning for Video Action Detection
Akash Kumar,
Yogesh Singh Rawat
Computer Vision and Pattern Recognition Conference (CVPR), 2022
project page  / 
paper  / 
code  / 
video
First end-to-end semi-supervised approach for video action detection task. Short-term and long-term smoothness constraints to exploit spatio-temporal coherency.
|
|
Video Action Detection: Analysing Limitations and Challenges
Rajat Modi, Aayush Rana, Akash Kumar,
Praveen Tirupattar, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah
Computer Vision and Pattern Recognition Conference (CVPR Workshops), 2022
1st Workshop on Vision Datasets Understanding
paper
Developed new spatio-temporal surveillance based dataset for real-world challenges.
|
|
Gabriella V2: Towards better generalization in surveillance videos for Action Detection
Ishan Dave, Zaccheeus Scheffer, Akash Kumar,
Sania Shiraz, Yogesh Singh Rawat, Mubarak Shah
IEEE Winter Conference on Applications of Computer Vision (WACV Workshops), 2022
Human Activity Detection in Multi-Camera Long-Duration Video
paper  / 
video
Proposed real-time online action detection system for open-world surviellance videos.
|
|
Selected for IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025 Doctoral Consortium
2nd place, 2023 -
IARPA BRIAR: Biometric Recognition and Identification at Altitude and Range
1st place, 2021 -
PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)
Selected for 8th Heidelberg Laureate Forum, Germany, 2021
1st place, 2021 -
PMiss@0.02tfa, ActivityNet ActEV SDL (CVPR)
ORCGS Doctoral Fellowship, 2020-2021
|
|
Reviewer, CVPR 2023, 2024, 2025
Reviewer, ICLR 2023, 2024, 2025
Reviewer, ECCV 2022, 2024
Reviewer, ICCV 2023
Reviewer, NeurIPS 2023, 2024
Reviewer, ICML 2024
Reviewer, ACMMM 2023, 2024
Reviewer, WACV 2023, 2024
Reviewer, IEEE Transaction on Image Processing
Reviewer, IEEE Transactions on Circuits and Systems for Video Technology
|
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|