World Models
Causal, dynamic, real-time world models for closed-loop RL training in humanoid robotics.
Head of AI at Pocket FM · ex-Tesla Optimus, Meta FAIR, Citadel
Building causal, dynamic, real-time world models for humanoid robotics. Passionate about multimodal foundation models, efficient video generation, and self-supervised learning. 100+ papers at NeurIPS, CVPR, ICLR, ACL and others, with 15k+ citations.
Publications
Citations
Years Experience
Top AI Labs
2026 — Present · San Francisco Bay Area
Leading AI at Pocket Entertainment, a Lightspeed-backed startup with 300M+ users. Built and scaled a 60-person AI org across GenAI Research, Applications, and Personalization.
2025 — 2026 · Palo Alto, CA
Built causal, real-time, dynamic world models for closed-loop RL training on humanoid robotics.
2022 — 2025 · Menlo Park, CA
Worked with Facebook AI Research as part of Meta Superintelligence Labs on large-scale multimodal foundational models.
2019 — 2024 · Chicago, IL
Leveraged machine learning and statistical methods to model financial markets and time-series data at scale.
2021 — 2022 · Sunnyvale, CA
Worked on multimodal models, embodied AI, and the Alexa Prize SimBot Challenge for visual-language navigation.
2017 — 2019 · Pittsburgh, PA
Worked with Prof. Louis-Philippe Morency on multimodal machine learning, adversarial attacks on VQA, facial landmark detection, and natural visual perception.
My work spans the intersection of vision, language, speech, and robotics — pushing the boundaries of what AI systems can perceive, generate, and act upon.
Causal, dynamic, real-time world models for closed-loop RL training in humanoid robotics.
Large-scale models spanning vision, language, speech, and audio on trillion-token datasets.
Efficient DiT-based video generation backbones with 100x throughput improvements.
Building generalization capabilities across diverse real-world robotic use cases.
DINOv2, MetaCLIP, MaViL — learning robust representations without supervision.
Visual-language navigation, multi-agent RL, and smart robot intelligence.
A selection from 100+ papers across NeurIPS, CVPR, ICLR, ACL, EMNLP, TMLR, COLM, NAACL, WACV, Interspeech and more.
M.S. Machine Learning & AI (PhD Dropout)
MLT at Language Technologies Institute (LTI)
GPA: 4.19/4.33 (Department Rank 1)
BTech. in Computer Science & Engineering
GPA: 9.99/10.0
JEE All India Rank: 165 (out of >1M people)
Reflections on AI research, building at scale, and lessons from the frontier.
“Really proud to share the first two accepted papers from the team’s research efforts at ICML 2026 (Culture x AI). Both papers sit in a space that has received far less attention from mainstream AI research: narratology, long-form storytelling, cultural adaptation, and how humans sustain emotional engagement across thousands of interactions with content.”
“The sharpest AI memo I read this week came from someone with a lot to lose. The durable advantage will not be ‘we use the best model.’ Everyone will have access to strong models. The advantage will come from whether your company can turn daily work into a learning system. The scary failure mode is not ‘AI replaces workers.’ It is companies outsourcing their own learning.”
“There is a type of delusion that collapses when reality pushes back, and another type that keeps recruiting engineers, rewriting constraints, and shipping hardware until reality finally moves. SpaceX is the second kind. How many ‘obviously impossible’ companies are we underestimating right now because they still look delusional from the outside?”
“A while ago I had delivered a course on Multimodal AI with O’Reilly! From reading O’Reilly books growing up to delivering a course for them, life truly feels like coming full circle! Teaching is something I enjoy a lot. One of the easiest and most impactful ways to give back to the community that gave me so much.”