Direct Nash Optimization: Teaching language models to self-improve with general preferences
Corby Rosset, Senior Researcher, Microsoft Research AI Frontiers, discusses teaching language models to self-improve using a preference oracle like GPT-4, framing it as a two-player game to find an optimal policy at a Nash equilibrium, and achieving state-of-the-art win rates against GPT-4 Turbo on benchmarks such as Alpaca-Eval and MT-Bench.
- Series:
- Microsoft Research Forum
- Date:
-
-
Corby Rosset
Senior Researcher
-
-
Series: Microsoft Research Forum
-
-
-
-
-
-
-
-
-
A generative model of biology for in-silico experimentation and discovery
Speakers:- Kevin Yang
-
-
-
-
-
-
AutoGen Update: Complex Tasks and Agents
Speakers:- Adam Fourney
-
MatterGen: A Generative Model for Materials Design
Speakers:- Tian Xie
-
-
Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
Speakers:- Daniela Massiceti
-
Research Forum 3 | Panel: Generative AI for Global Impact: Challenges and Opportunities
Speakers:- Jacki O'Neill,
- Tanuja Ganu,
- Sunayana Sitaram
-
-
-
-
-
-
What's new in AutoGen?
Speakers:- Chi Wang