Back

Comet

#freelancer • 3m

Difference between previous llms(gpt4o/claude 3.5 sonnet/meta llama)  and recent thinking/reasoning llms(o1/o3) Think of older LLMs (like early GPT models) as GPS navigation systems that could only predict the next turn. They were like saying "Based on this road, the next turn is probably right" without understanding the full journey. The problem with RLHF (Reinforcement Learning from Human Feedback) was like trying to teach a driver using only a simple "good/bad" rating system. Imagine rating a driver only on whether they arrived at the destination, without considering their route choices, safety, or efficiency. This limited feedback system couldn't scale well for teaching more complex driving skills. Now, let's understand O1/O3 models: 1. The Tree of Possibilities Analogy: Imagine you're solving a maze, but instead of just going step by step, you: - Can see multiple possible paths ahead - Have a "gut feeling" about which paths are dead ends - Can quickly backtrack when you realize a path isn't promising - Develop an instinct for which turns usually lead to the exit O1/O3 models are trained similarly - they don't just predict the next step, they develop an "instinct" for exploring multiple solution paths simultaneously and choosing the most promising ones. 2. The Master Chess Player Analogy: - A novice chess player thinks about one move at a time - A master chess player develops intuition about good moves by:   * Seeing multiple possible move sequences   * Having an instinct for which positions are advantageous   * Quickly discarding bad lines of play   * Efficiently focusing on the most promising strategies O1/O3 models are like these master players - they've developed intuition through exploring countless solution paths during training. 3. The Restaurant Kitchen Analogy: - Old LLMs were like a cook following a recipe step by step - O1/O3 models are like experienced chefs who:   * Know multiple ways to make a dish   * Can adapt when ingredients are missing   * Have instincts about which techniques will work best   * Can efficiently switch between different cooking methods if one isn't working The "parallel processing" mentioned (like O1-pro) is like having multiple expert chefs working independently on different aspects of a meal, each using their expertise to solve their part of the problem. To sum up: O1/O3 models are revolutionary because they're not just learning to follow steps (like older models) or respond to simple feedback (like RLHF models). Instead, they're developing sophisticated instincts for problem-solving by exploring and evaluating many possible solution paths during their training. This makes them more flexible and efficient at finding solutions, similar to how human experts develop intuition in their fields.

Reply
2

More like this

Recommendations from Medial

Chamarti Sreekar

Fcuk imposter syndro... • 7m

Sam Altman says the o3-mini will be worse than the o1 pro 👀

Reply
14
Image Description
Image Description

Harsh Dwivedi

 • 

Medial • 4m

This is the only way OpenAI is coming up with models named O3-mini-high and O4-mini-high

5 Replies
1
18
Image Description

Chamarti Sreekar

Fcuk imposter syndro... • 3m

Think of models like hires. Gemini Pro 1.5 is your generalist. Claude 3.5 is your software developer. GPT o3 is your PhD intern. Don’t mix up the job descriptions.

6 Replies
2
10
Image Description

Sarthak Gupta

Developer • 4m

10+ State of the art llms 2+ Compound llms (latest in the market) Which models you want next? btw check the reply for the app

1 Reply
1
Image Description
Image Description

Vishu Bheda

 • 

Medial • 2d

𝗧𝗵𝗲 𝟱 𝗔𝗜 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗮𝘁 𝗮𝗰𝘁𝘂𝗮𝗹𝗹𝘆 𝘀𝘁𝘂𝗰𝗸 𝘄𝗶𝘁𝗵 𝗺𝗲 𝗿𝗶𝗴𝗵𝘁 𝗮𝗳𝘁𝗲𝗿 𝗹𝗮𝘂𝗻𝗰𝗵 (𝗻𝗼𝘁 “𝗯𝗲𝘀𝘁,” 𝗷𝘂𝘀𝘁 𝘂𝗻𝗳𝗼𝗿𝗴𝗲𝘁𝘁𝗮𝗯𝗹𝗲): • Claude 3.5 Sonnet – balanced, smooth, felt almost human. • o3 – search + s

See More
12 Replies
3
20
Image Description
Image Description

Havish Gupta

Figuring Out • 8m

OpenAI's 12-Day Series has finally ended, and on the last day, they announced the O3 and O3 Mini models, which have smashed all benchmarks! 1. O3 scored 2727 Coding Elo on Codeforces, ranking it equivalent to the 175th best coder globally. 2. On Ha

See More
7 Replies
10

Chamarti Sreekar

Fcuk imposter syndro... • 2m

Apple just exposed the truth behind so-called AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini: They’re not actually reasoning — they’re just really good at memorizing patterns. Here’s what Apple found:

Reply
18
Image Description
Image Description

Vignesh S

Machine Learning Eng... • 1y

A weird analogy here: the AI wave is gonna bring more startups like perplexity which is basically a product when Llms get married to web browsers. So even though marriage rate among humans are going down globally tech marriages are gonna go up lol 😂

4 Replies
4

Comet

#freelancer • 1y

In a recent educational lecture, Andrej Karpathy, one of the creators of ChatGPT, provides an introduction to Large Language Models (LLMs). LLMs are advanced technologies that can process and generate human-like text. Karpathy highlights the fut

See More
Reply
3
10
Image Description
Image Description

Baqer Ali

AI agent developer |... • 1m

Open ai has released the o3 pro model which is well enough to replace a senior software developer To make things worse it can be the foundational steps towards AGI by open ai First for the newbies we have two types of models Two types of models

See More
4 Replies
2
4

Download the medial app to read full posts, comements and news.