Back

Rahul Agarwal

Founder | Agentic AI... • 21d

3 transformer architectures everyone should know. I've explained it in a simple way below. 1. 𝗗𝗲𝗰𝗼𝗱𝗲𝗿-𝗢𝗻𝗹𝘆 𝗠𝗼𝗱𝗲𝗹𝘀 These are mainly used for 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (like ChatGPT). They predict the 𝗻𝗲𝘅𝘁 𝘁𝗼𝗸𝗲𝗻 𝘀𝘁𝗲𝗽 𝗯𝘆 𝘀𝘁𝗲𝗽. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Break input text into smaller tokens • 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Add position info so the model understands word order • 𝗦𝗲𝗹𝗳-𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Each token looks at previous tokens for context • 𝗤𝘂𝗲𝗿𝘆/𝗞𝗲𝘆/𝗩𝗮𝗹𝘂𝗲: Mechanism used to measure token relationships • 𝗖𝗮𝘂𝘀𝗮𝗹 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Tokens can only see earlier tokens, not future ones • 𝗙𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 𝗹𝗮𝘆𝗲𝗿: Neural layer refines token representations • 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝘂𝗽𝗱𝗮𝘁𝗲: Model refreshes its internal understanding • 𝗡𝗲𝘅𝘁 𝘁𝗼𝗸𝗲𝗻 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻: Predict the most probable next word • 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻: Repeat prediction until the full response is formed • 𝗢𝘂𝘁𝗽𝘂𝘁 𝘀𝗲𝗾𝘂𝗲𝗻𝗰𝗲: Generated text becomes the final result ___________ 2. 𝗘𝗻𝗰𝗼𝗱𝗲𝗿-𝗢𝗻𝗹𝘆 𝗠𝗼𝗱𝗲𝗹𝘀 They focus on 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗮𝘁 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝘁𝗲𝘅𝘁 rather than generating it, like classification, embeddings, and search tasks. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Convert text into tokens • 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗹𝗮𝘆𝗲𝗿: Transform tokens into numerical vectors • 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Add sequence order information • 𝗦𝗲𝗹𝗳-𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Each token attends to every other token • 𝗠𝘂𝗹𝘁𝗶-𝗵𝗲𝗮𝗱 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Capture multiple relationships simultaneously • 𝗟𝗮𝘆𝗲𝗿 𝗻𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Stabilize values during processing • 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Build a deep contextual representation • 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻: Identify patterns and meaning in the text • 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗹𝗮𝘆𝗲𝗿: Map representations to predictions • 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗲𝗱 𝗹𝗮𝗯𝗲𝗹𝘀: Output results like sentiment, topic, or category ___________ 3. 𝗠𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 (𝗠𝗼𝗘) MoE models improve 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 𝗶𝗻 𝗹𝗮𝗿𝗴𝗲 𝗔𝗜 𝗺𝗼𝗱𝗲𝗹𝘀 by activating only a few specialized networks. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Break input text into tokens • 𝗚𝗮𝘁𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺: A router decides which experts should process tokens • 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: Send tokens to the most relevant expert networks • 𝗖𝗵𝗼𝗼𝘀𝗲 𝗲𝘅𝗽𝗲𝗿𝘁𝘀: Activate only a small subset of experts • 𝗘𝘅𝗽𝗲𝗿𝘁 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻: Each expert processes the assigned tokens • 𝗠𝗲𝗿𝗴𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: Combine outputs from multiple experts • 𝗪𝗲𝗶𝗴𝗵𝘁𝗲𝗱 𝘀𝘂𝗺: Assign importance scores to expert outputs • 𝗖𝗼𝗺𝗯𝗶𝗻𝗲𝗱 𝗼𝘂𝘁𝗽𝘂𝘁: Merge expert responses into a unified representation • 𝗙𝗼𝗿𝘄𝗮𝗿𝗱 𝗹𝗮𝘆𝗲𝗿: Further refine the combined result • 𝗙𝗶𝗻𝗮𝗹 𝗼𝘂𝘁𝗽𝘂𝘁: Produce an efficient and accurate prediction ✅ Repost for people in your network so they can understand this.

Reply
4

More like this

Recommendations from Medial

Cryptoreach

Crypto News & Analys... • 1y

Goatseus Maximus is a cryptocurrency token on the Solana blockchain, known as the $GOAT. It combines fun internet memes with financial goals to enable fast and secure transactions. Tech expert Andy Ayrey created it to give users a unique and rewardin

See More
Reply
9

Rahul Agarwal

Founder | Agentic AI... • 21d

People think these 3 AI terms are same, they're not. I’ve explained differences for each. 1. 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 Using AI to create content across text, images, audio, and video. • 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴-𝗗𝗲𝗰𝗼𝗱𝗶𝗻𝗴 + 𝗟𝗮𝘁𝗲𝗻𝘁 𝗦𝗽𝗮𝗰𝗲:

See More
Reply
1
1
Image Description

Baqer Ali

AI agent developer |... • 7m

AGI is a scam We will never reach AGI with LLM models A Large language model is train on 1 trillion tokens A token is like a word 10 to the power 14 that is 1 behind 14 zeors (100000000000000) this is all available text on internet It will

See More
1 Reply

Rahul Agarwal

Founder | Agentic AI... • 3m

Most people don't even know these basics of LLM's. I've explained it in a simple way below. 1. 𝗗𝗮𝘁𝗮 𝗖𝗼𝗹𝗹𝗲𝗰𝘁𝗶𝗼𝗻 LLMs are trained on massive amounts of text from books, websites, articles, and documents so they can learn how language is

See More
Reply
2
7

mg

mysterious guy • 10m

30 AI Buzzwords Explained for Entrepreneurs 1) Large Language Model (LLM) LLMs are like super-smart computer programs that can understand and do almost anything you ask them using regular language. Think of tools like ChatGPT or Gemini – they're a

See More
Reply
8

Rahul Agarwal

Founder | Agentic AI... • 2m

Most people don’t know how Gen AI really works. I’ve explained core models in simple way below. 1. 𝗗𝗶𝗳𝗳𝘂𝘀𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹𝘀 They learn by 𝗮𝗱𝗱𝗶𝗻𝗴 𝗻𝗼𝗶𝘀𝗲 to data and then learning how to 𝗿𝗲𝗺𝗼𝘃𝗲 𝘁𝗵𝗮𝘁 𝗻𝗼𝗶𝘀𝗲 step by step.

See More
Reply
1
8

Varun Bhambhani

 • 

Medial • 10d

🗞️ Medial Bulletin 🌍 Geopolitics: The "Legal Lock" on Trade Supreme Court Shockwave: The highly anticipated India-US Trade Deal (the 18% tariff agreement) has hit a massive legal snag. The US Supreme Court just struck down the International Emerge

See More
Reply
1
9

Download the medial app to read full posts, comements and news.