World Models
Causal, dynamic, real-time world models for closed-loop RL training in humanoid robotics.
Head of AI at Pocket FM · ex-Tesla Optimus, Meta FAIR, Citadel
Building causal, dynamic, real-time world models for humanoid robotics. Passionate about multimodal foundation models, efficient video generation, and self-supervised learning. 100+ papers at NeurIPS, CVPR, ICLR, ACL and others, with 15k+ citations.
Publications
Citations
Years Experience
Top AI Labs
My work spans the intersection of vision, language, speech, and robotics — pushing the boundaries of what AI systems can perceive, generate, and act upon.
Causal, dynamic, real-time world models for closed-loop RL training in humanoid robotics.
Large-scale models spanning vision, language, speech, and audio on trillion-token datasets.
Efficient DiT-based video generation backbones with 100x throughput improvements.
Building generalization capabilities across diverse real-world robotic use cases.
DINOv2, MetaCLIP, MaViL — learning robust representations without supervision.
Visual-language navigation, multi-agent RL, and smart robot intelligence.
2026 — Present · San Francisco Bay Area
Leading AI at Pocket Entertainment, a Lightspeed-backed startup with 300M+ users. Built and scaled a 60-person AI org across GenAI Research, Applications, and Personalization.
2025 — 2026 · Palo Alto, CA
Built causal, real-time, dynamic world models for closed-loop RL training on humanoid robotics.
2022 — 2025 · Menlo Park, CA
Worked with Facebook AI Research as part of Meta Superintelligence Labs on large-scale multimodal foundational models.
2019 — 2024 · Chicago, IL
Leveraged machine learning and statistical methods to model financial markets and time-series data at scale.
2021 — 2022 · Sunnyvale, CA
Worked on multimodal models, embodied AI, and the Alexa Prize SimBot Challenge for visual-language navigation.
2017 — 2019 · Pittsburgh, PA
Worked with Prof. Louis-Philippe Morency on multimodal machine learning, adversarial attacks on VQA, facial landmark detection, and natural visual perception.
A selection from 100+ papers across NeurIPS, CVPR, ICLR, ACL, EMNLP, TMLR, COLM, NAACL, WACV, Interspeech and more.
M.S. Machine Learning & AI (PhD Dropout)
MLT at Language Technologies Institute (LTI)
GPA: 4.19/4.33 (Department Rank 1)
BTech. in Computer Science & Engineering
JEE Rank 165
GPA: 9.99/10.0
Advising multiple startup founders and VCs on AI strategies from ideation to Series A/B/C. Domains include multimodal foundation models, world models, RL environments, video generation, robotics, and data annotation.
Regular guest lecturer at Stanford, CMU, MIT, Oxford, and O'Reilly on topics spanning multimodal AI, large-scale model training, and applied research.
Led the development of a cutting-edge AI research program empowering students to publish at ICML, NeurIPS, EMNLP, and EACL. Students admitted to CMU, Stanford, MIT, UC Berkeley and received offers from Anthropic, TikTok, and more.
Interested in collaborating on research, startup advising, or speaking engagements? Reach out on LinkedIn.