Back

Bhoop singh Gurjar

AI Deep Explorer | f...ย โ€ขย 1m

A (Long) Peek into Reinforcement Learning How do AI agents master games like Go, control robots, or optimize trading strategies? The answer lies in Reinforcement Learning (RL)โ€”where agents learn by interacting with environments to maximize rewards. Key Concepts: โœ“Agent & Environment: The agent takes actions; the environment responds. โœ“States & Actions: The agent moves between states by choosing actions. โœ“Rewards & Policy: Rewards guide learning; a policy defines the best action strategy. โœ“Value Function & Q-Value: Measures how good a state/action is in terms of future rewards. โœ“MDPs & Bellman Equations: Most RL problems follow Markov Decision Processes (MDPs), solved using Bellman equations. Solving RL Problems: โœ“Dynamic Programming: Requires a known model, iteratively improves policies. โœ“Monte-Carlo Methods: Learn from complete episodes. โœ“TD Learning (SARSA, Q-Learning, DQN): Learns from incomplete episodes, bootstraps value estimates. โœ“Policy Gradient (REINFORCE, A3C): Directly optimizes policies using gradient ascent. โœ“Evolution Strategies: Model-agnostic, inspired by natural selection. Challenges: โœ“Exploration-Exploitation Tradeoff: Balancing new knowledge vs. maximizing rewards. โœ“Deadly Triad: Instability when combining off-policy learning, function approximation, and bootstrapping. Case Study: AlphaGo Zero โœ“DeepMindโ€™s AlphaGo Zero achieved superhuman Go-playing skills using self-play and Monte Carlo Tree Search (MCTS), without human supervision. link :https://lilianweng.github.io/posts/2018-02-19-rl-overview/

0 replies9 likes

More like this

Recommendations from Medial

Bhoop singh Gurjar

AI Deep Explorer | f...ย โ€ขย 20d

Having worked on Reinforcement Learning, itโ€™s always fascinating to see how itโ€™s being applied in the world of LLMs. If youโ€™re curious about how RL powers modern LLM agents, especially in areas like reward modeling, and policy gradients here are a f

See More
0 replies15 likes
1

Bhoop singh Gurjar

AI Deep Explorer | f...ย โ€ขย 24d

Give me 2 minutes, I will tell you How to Learn Reinforcement Learning for LLMs A humorous analogy for reinforcement learning uses cake as an example.ย Reinforcement learning, much like baking a cake, involves trial and error to achieve a desired ou

See More
0 replies2 likes

Bhoop singh Gurjar

AI Deep Explorer | f...ย โ€ขย 26d

LLM Post-Training: A Deep Dive into Reasoning LLMs This survey paper provides an in-depth examination of post-training methodologies in Large Language Models (LLMs) focusing on improving reasoning capabilities. While LLMs achieve strong performance

See More
0 replies2 likes

Download the medial app to read full posts, comements and news.