À propos
I am a Principal Researcher in the Azure Research- Systems (opens in new tab) team. I am currently leading the efficient AI (opens in new tab) research project, focusing on the power/energy/thermal bottlenecks of GenAI deployment in cloud, and datacenter sustainability.
For publications, visit this (opens in new tab) tab, resume is here (opens in new tab).
Most recent news:
- April 2025: Invited talk at the UPenn NSF Expedition workshop (opens in new tab) on «AI Infrastructure: Foundations for Energy Efficiency and Scalability»
- April 2025: Our paper «Towards Resource-Efficient Compound AI Systems (opens in new tab)» will be presented at HotOS 2025
- April 2025: Our papers were presented at EuroMLSys (opens in new tab) and ASPLOS 2025 (opens in new tab)
- April 2025: Panel member at the YArch (opens in new tab) and SESAME (opens in new tab) workshops at ASPLOS 2025, Rotterdam
- March 2025: Gave invited talks at Cornell Tech (guest lecture) and AMD Power Summit
- March 2025: DynamoLLM was given the best paper award (opens in new tab) at HPCA 2025!
- February 2025: Two of our papers were declared IEEE MICRO Top Picks 2024! (opens in new tab)! Splitwise, and GreenSKUs!
- February 2025: Our EcoServe paper is on arxiv now: here (opens in new tab)
- January 2025: Serving as Associate Editor for IEEE Computer Architecture Letters (CAL)
- December 2024: Gave an invited talk at University of Minnesota Twin Cities
- November 2024: Gave an invited talk at IIT Bombay
- November 2024: Our droidspeak paper is out on arxiv, here (opens in new tab)
- November 2024: Presented at APCCAS 2024 in Taipei, our collaborative paper with AMD, on Optimizing GPU datacenter power.
- November 2024: Attended MICRO 2024 where two of our papers were presented: Mosaic and «Memory Allocation under Hardware Compression»
- October 2024: Gave an invited talk at Georgia Tech.
- September 2024: We published a preprint of our work on 10-million tokens long context LLM inference, Mnemosyne at https://arxiv.org/abs/2409.17264 (opens in new tab)
Some of the amazing students I have been working with/ have worked with:
- Gohar Irfan Chaudhry, MIT (opens in new tab)
- Melissa Pan, UC Berkeley (opens in new tab)
- Yueying Li, Cornell Tech (opens in new tab)
- Amey Agrawal, Georgia Tech (opens in new tab)
- Jovan Stojkovic, UIUC (opens in new tab)
- Yuhan Liu, University of Chicago (opens in new tab)
- Theo Gregersen, CMU (opens in new tab)
- Pratyush Patel, UW Seattle (opens in new tab)
- Muhammad Laghari, Virgina Tech (opens in new tab)
- Gagandeep Panwar, Virginia Tech (opens in new tab)
- Jaylen Wang, CMU (opens in new tab)
- Josh Fried, MIT (opens in new tab)
- Edwin Lim, CMU (opens in new tab)
- Kunal Jain, IIIT Hyderabad (opens in new tab)
- Marcin Copik, ETH Zurich (opens in new tab)
I received my PhD in 2019 from the University of Texas at Austin, with a thesis on main memory compression for higher effective capacity and bandwidth. I am generally interested in hardware-software co-design for systems challenges.