Back

AI Engineer

AI Deep Explorer | f...ย โ€ขย 10m

A (Long) Peek into Reinforcement Learning How do AI agents master games like Go, control robots, or optimize trading strategies? The answer lies in Reinforcement Learning (RL)โ€”where agents learn by interacting with environments to maximize rewards. Key Concepts: โœ“Agent & Environment: The agent takes actions; the environment responds. โœ“States & Actions: The agent moves between states by choosing actions. โœ“Rewards & Policy: Rewards guide learning; a policy defines the best action strategy. โœ“Value Function & Q-Value: Measures how good a state/action is in terms of future rewards. โœ“MDPs & Bellman Equations: Most RL problems follow Markov Decision Processes (MDPs), solved using Bellman equations. Solving RL Problems: โœ“Dynamic Programming: Requires a known model, iteratively improves policies. โœ“Monte-Carlo Methods: Learn from complete episodes. โœ“TD Learning (SARSA, Q-Learning, DQN): Learns from incomplete episodes, bootstraps value estimates. โœ“Policy Gradient (REINFORCE, A3C): Directly optimizes policies using gradient ascent. โœ“Evolution Strategies: Model-agnostic, inspired by natural selection. Challenges: โœ“Exploration-Exploitation Tradeoff: Balancing new knowledge vs. maximizing rewards. โœ“Deadly Triad: Instability when combining off-policy learning, function approximation, and bootstrapping. Case Study: AlphaGo Zero โœ“DeepMindโ€™s AlphaGo Zero achieved superhuman Go-playing skills using self-play and Monte Carlo Tree Search (MCTS), without human supervision. link :https://lilianweng.github.io/posts/2018-02-19-rl-overview/

Reply
9

More like this

Recommendations from Medial

AI Engineer

AI Deep Explorer | f...ย โ€ขย 9m

Having worked on Reinforcement Learning, itโ€™s always fascinating to see how itโ€™s being applied in the world of LLMs. If youโ€™re curious about how RL powers modern LLM agents, especially in areas like reward modeling, and policy gradients here are a f

See More
Reply
1
15

AI Engineer

AI Deep Explorer | f...ย โ€ขย 9m

Give me 2 minutes, I will tell you How to Learn Reinforcement Learning for LLMs A humorous analogy for reinforcement learning uses cake as an example.ย Reinforcement learning, much like baking a cake, involves trial and error to achieve a desired ou

See More
Reply
2

AI Engineer

AI Deep Explorer | f...ย โ€ขย 9m

LLM Post-Training: A Deep Dive into Reasoning LLMs This survey paper provides an in-depth examination of post-training methodologies in Large Language Models (LLMs) focusing on improving reasoning capabilities. While LLMs achieve strong performance

See More
Reply
2

Sandeep Prasad

Business Coachย โ€ขย 4m

๐Ÿ”ฅModels & Agents: DeepSeekโ€™s Nextโ€‘Gen Push ๐Ÿ”ฅ DeepSeek targets endโ€‘2025 for a new model with advanced agent capabilities โ€“ aims for multiโ€‘step autonomous actions and adaptive learning . ๐Ÿค” Why It Matters โ€“ More capable agents could automate resear

See More
Reply
1
3

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 2m

If AIโ€™s rapid pace feels overwhelming, trust me-everyone feels it. New models, new papers, new frameworksโ€ฆ itโ€™s impossible to keep up with everything. And the good news is-you donโ€™t have to. What actually helps is a clear path, not more noise. So I

See More
Reply
5

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 15d

Most people miss these principles while building AI agents. Iโ€™ve explained everything that you should keep in mind. 1. Never run an agent without clear context. 2. Define who the agent is and what it is responsible for. 3. Always log inputs, action

See More
Reply
2
6

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 6d

Real Agentic AI is a multi-layered system, where each layer solves a specific challenge from reasoning to compliance: 1. LLM (Core Reasoning) โ€“ Handles language understanding and generation. Alone, not enterprise-ready. 2. RAG (Retrieval Layer) โ€“ G

See More
Reply
1

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 1m

Deconstructing How Agentic AI Actually Works Weโ€™ve all experienced what Large Language Models can do โ€” but Agentic AI is the real leap forward. Instead of just generating responses, it can understand goals, make decisions, and take action on its own

See More
Reply
3
Image Description
Image Description

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 1m

Data scientist, Data analyst, AI engineer, or AI agent builder? Which one is best? I've explained below. 1. ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ This field teaches you how to ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜‡๐—ฒ ๐—ฑ๐—ฎ๐˜๐—ฎ, ๐—ฏ๐˜‚๐—ถ๐—น๐—ฑ ๐— ๐—Ÿ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€, ๐—ฎ๐—ป๐—ฑ ๐—ฑ๐—ฒ๐—ฝ๐—น๐—ผ๐˜† ๐˜๐—ต๐—ฒ๐—บ ๐—ถ

See More
1 Reply
22
20
2

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 1m

4 powerful loops that power Agentic AI. Hereโ€™s the easiest explanation of how each one works. ๐—”๐—š๐—˜๐—ก๐—ง๐—œ๐—– ๐—Ÿ๐—ข๐—ข๐—ฃ๐—ฆ Agentic Loops explain how AI agents think, act, learn, coordinate, and improve over time using structured cycles. 1. ๐—–๐—ผ๐—น๐—น๐—ฎ

See More
Reply
2
7

Download the medial app to read full posts, comements and news.