Portrait de Esha Choukse

Esha Choukse

Principal Researcher

Connect on LinkedIn

À propos

I am a Principal Researcher in the Azure Research- Systems (opens in new tab) team. I am currently leading the efficient AI (opens in new tab) research project, focusing on the power/energy/thermal bottlenecks of GenAI deployment in cloud, and datacenter sustainability.

For publications, visit this (opens in new tab) tab, resume is here (opens in new tab).

Most recent news:

  1. April 2025: Invited talk at the UPenn NSF Expedition workshop (opens in new tab) on «AI Infrastructure: Foundations for Energy Efficiency and Scalability»
  2. April 2025: Our paper «Towards Resource-Efficient Compound AI Systems (opens in new tab)» will be presented at HotOS 2025
  3. April 2025: Our papers were presented at EuroMLSys (opens in new tab) and ASPLOS 2025 (opens in new tab)
  4. April 2025: Panel member at the YArch (opens in new tab) and SESAME (opens in new tab) workshops at ASPLOS 2025, Rotterdam
  5. March 2025: Gave invited talks at Cornell Tech (guest lecture) and AMD Power Summit
  6. March 2025: DynamoLLM was given the best paper award (opens in new tab) at HPCA 2025!
  7. February 2025: Two of our papers were declared IEEE MICRO Top Picks 2024! (opens in new tab)! Splitwise, and GreenSKUs!
  8. February 2025: Our EcoServe paper is on arxiv now: here (opens in new tab)
  9. January 2025: Serving as Associate Editor for IEEE Computer Architecture Letters (CAL)
  10. December 2024: Gave an invited talk at University of Minnesota Twin Cities
  11. November 2024: Gave an invited talk at IIT Bombay
  12. November 2024: Our droidspeak paper is out on arxiv, here (opens in new tab)
  13. November 2024: Presented at APCCAS 2024 in Taipei, our collaborative paper with AMD, on Optimizing GPU datacenter power.
  14. November 2024: Attended MICRO 2024 where two of our papers were presented: Mosaic and «Memory Allocation under Hardware Compression»
  15. October 2024: Gave an invited talk at Georgia Tech.
  16. September 2024: We published a preprint of our work on 10-million tokens long context LLM inference, Mnemosyne at https://arxiv.org/abs/2409.17264 (opens in new tab)

Some of the amazing students I have been working with/ have worked with:

  1. Gohar Irfan Chaudhry, MIT (opens in new tab)
  2. Melissa Pan, UC Berkeley (opens in new tab)
  3. Yueying Li, Cornell Tech (opens in new tab)
  4. Amey Agrawal, Georgia Tech (opens in new tab)
  5. Jovan Stojkovic, UIUC (opens in new tab)
  6. Yuhan Liu, University of Chicago (opens in new tab)
  7. Theo Gregersen, CMU (opens in new tab)
  8. Pratyush Patel, UW Seattle (opens in new tab)
  9. Muhammad Laghari, Virgina Tech (opens in new tab)
  10. Gagandeep Panwar, Virginia Tech  (opens in new tab)
  11. Jaylen Wang, CMU (opens in new tab)
  12. Josh Fried, MIT (opens in new tab)
  13. Edwin Lim, CMU (opens in new tab)
  14. Kunal Jain, IIIT Hyderabad (opens in new tab)
  15. Marcin Copik, ETH Zurich (opens in new tab)

 

I received my PhD in 2019 from the University of Texas at Austin, with a thesis on main memory compression for higher effective capacity and bandwidth. I am generally interested in hardware-software co-design for systems challenges.