Back

Bhoop singh Gurjar

AI Deep Explorer | f... • 1d

LLM Post-Training: A Deep Dive into Reasoning LLMs This survey paper provides an in-depth examination of post-training methodologies in Large Language Models (LLMs) focusing on improving reasoning capabilities. While LLMs achieve strong performance from pretraining on massive datasets, post-training methods such as fine-tuning, reinforcement learning (RL) and test-time scaling are essential for aligning LLMs with human intent, enhancing reasoning and ensuring safe, context-aware interactions. Key Highlights 1. Post-Training Taxonomy -The paper introduces a structured taxonomy of post-training strategies: -Fine-tuning: Task/domain-specific adaptation -Reinforcement Learning: Optimization using human or Al feedback -Test-time Scaling: Inference-time improvements in reasoning and efficiency 2. Fine-Tuning -Enhances domain-specific capabilities but risks overfitting. -Parameter-efficient techniques like LoRA and adapters reduce computational overhead. -Struggles with generalization if overly specialized. 3. Reinforcement Learning (RL) -RLHF, RLAIF, and DPO refine model outputs based on preference signals. -RL in LLMs requires dealing with high-dimensional, sparse, and subjective feedback. -Chain-of-thought (CoT) reasoning and stepwise reward modeling help improve logical consistency. 4. Test-Time Scaling -Involves techniques like Tree-of-Thoughts and Self-Consistency. -Dynamic computation during inference improves multi-step reasoning. -Includes search-based methods and retrieval-augmented generation (RAG). 5. Advanced Optimization Techniques -PPO, GRPO, TRPO, OREO, and ORPO are discussed with comparisons. -These methods balance stability, efficiency and alignment with human values. 6. Reward Modeling -Both explicit (human-annotated) and implicit (interaction-based) reward types are covered. -Process-oriented rewards (intermediate reasoning steps) are emphasized for complex reasoning. 7. Practical Benchmarks and Models -Extensive table covering 40+ state-of-the-art LLMs (e.g., GPT-4, Claude, DeepSeek, LLaMA 3, etc.) with their RL methods and architecture types. Introduces DeepSeek-R1 and DeepSeek-R1-Zero showcasing pure RL-based LLM training. Keep learning and keep growing!!

0 replies2 likes

More like this

Recommendations from Medial

Bhoop singh Gurjar

AI Deep Explorer | f... • 19d

"A Survey on Post-Training of Large Language Models" This paper systematically categorizes post-training into five major paradigms: 1. Fine-Tuning 2. Alignment 3. Reasoning Enhancement 4. Efficiency Optimization 5. Integration & Adaptation 1️⃣ Fin

See More
0 replies7 likes
1

Bhoop singh Gurjar

AI Deep Explorer | f... • 19d

A (Long) Peek into Reinforcement Learning How do AI agents master games like Go, control robots, or optimize trading strategies? The answer lies in Reinforcement Learning (RL)—where agents learn by interacting with environments to maximize rewards.

See More
0 replies8 likes
Image Description

Aura

AI Specialist | Rese... • 7m

Revolutionizing AI with Inference-Time Scaling: OpenAI's o1 Model" Inference-time Scaling: Focuses on improving performance during inference (when the model is used) rather than just training. Reasoning through Search: The o1 model enhances reasonin

See More
1 replies5 likes
1
Image Description
Image Description

souradip bhattacharjee

Business is an art❣️... • 11m

OpenAI spent millions if not billions developing and training LLMs. Same is for Google, Microsoft and Meta.. the only difference being they are releasing their LLMs now after the craze of AI is at its peak. But I don't understand how companies like K

See More
9 replies15 likes

Ayush Maurya

AI Pioneer • 3m

"Synthetic Data" is used in AI and LLM training !! • cheap • easy to produce • perfectly labelled data ~ derived from the real world data to replicate the properties and characteristics of the rela world data. It's used in training an LLM (LLMs

See More
0 replies4 likes
Image Description

Varun reddy

 • 

GITAM • 11m

Fine-Tuning: The Secret Sauce of AI Magic! Ever wonder how AI gets so smart? It’s all about fine-tuning! Imagine a pre-trained model as a genius with general knowledge. 🧠✨ Fine-tuning takes that genius and hones its skills for a specific task, li

See More
1 replies4 likes
Image Description

Bhoop singh Gurjar

AI Deep Explorer | f... • 1d

AI Resources for Beginners Books: 1. Deep Learning. Illustrated Edition. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2. Mathematics for Machine Learning. Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. 3. Reinforcement learning, An Introd

See More
1 replies4 likes
Image Description

Parampreet Singh

Python Developer 💻 ... • 1m

3B LLM outperforms 405B LLM 🤯 Similarly, a 7B LLM outperforms OpenAI o1 & DeepSeek-R1 🤯 🤯 LLM: llama 3 Datasets: MATH-500 & AIME-2024 This has done on research with compute optimal Test-Time Scaling (TTS). Recently, OpenAI o1 shows that Test-

See More
1 replies5 likes
Image Description

ProgrammerKR

Code • Create • ... • 14d

Tech News: OpenAI Releases GPT-5 Preview OpenAI's GPT-5 preview offers faster reasoning and better memory. Early testers say it feels "shockingly human." Full launch expected Q3 2025. #AI #OpenAI #GPT5 #TechNews

1 replies6 likes

Mohit Singh

18yo ✨ #developer le... • 12m

Meta's Llama 3 model scales open language models, boasting improved performance and various sizes. With a focus on addressing fatigue, it utilizes diverse training methods and achieves impressive results, strengthening the open LLM ecosystem

0 replies3 likes

Download the medial app to read full posts, comements and news.