News on Medial

Meta Research Introduces Revolutionary Self-Rewarding Language Models Capable of GPT-4 Level Performance

MedialMedial · 10m
Meta Research Introduces Revolutionary Self-Rewarding Language Models Capable of GPT-4 Level Performance

Language models can achieve superhuman capabilities with the new paradigm of Self-Rewarding Language Models (SRLMs). Meta has developed an innovative system where the model generates its own rewards, leading to continual improvement in instruction following and reward modeling abilities. The iterative Direct Preference Optimization (DPO) framework is used to train SRLMs, allowing the model to push itself to superhuman levels. After just three iterations, the SRLM outperformed state-of-the-art systems, demonstrating its potential to achieve GPT-4 level performance. SRLMs break free from fixed reward models, offering continuous improvement possibilities for language models.

Comments

Download the medial app to read full posts, comements and news.