Back

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 12h

3 transformer architectures everyone should know. I've explained it in a simple way below. 1. ๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ-๐—ข๐—ป๐—น๐˜† ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ These are mainly used for ๐˜๐—ฒ๐˜…๐˜ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป (like ChatGPT). They predict the ๐—ป๐—ฒ๐˜…๐˜ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐˜€๐˜๐—ฒ๐—ฝ ๐—ฏ๐˜† ๐˜€๐˜๐—ฒ๐—ฝ. โ€ข ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Break input text into smaller tokens โ€ข ๐—ฃ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด: Add position info so the model understands word order โ€ข ๐—ฆ๐—ฒ๐—น๐—ณ-๐—ฎ๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป: Each token looks at previous tokens for context โ€ข ๐—ค๐˜‚๐—ฒ๐—ฟ๐˜†/๐—ž๐—ฒ๐˜†/๐—ฉ๐—ฎ๐—น๐˜‚๐—ฒ: Mechanism used to measure token relationships โ€ข ๐—–๐—ฎ๐˜‚๐˜€๐—ฎ๐—น ๐—ฎ๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป: Tokens can only see earlier tokens, not future ones โ€ข ๐—™๐—ฒ๐—ฒ๐—ฑ๐—ณ๐—ผ๐—ฟ๐˜„๐—ฎ๐—ฟ๐—ฑ ๐—น๐—ฎ๐˜†๐—ฒ๐—ฟ: Neural layer refines token representations โ€ข ๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐˜‚๐—ฝ๐—ฑ๐—ฎ๐˜๐—ฒ: Model refreshes its internal understanding โ€ข ๐—ก๐—ฒ๐˜…๐˜ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป ๐—ฝ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Predict the most probable next word โ€ข ๐—”๐˜‚๐˜๐—ผ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Repeat prediction until the full response is formed โ€ข ๐—ข๐˜‚๐˜๐—ฝ๐˜‚๐˜ ๐˜€๐—ฒ๐—พ๐˜‚๐—ฒ๐—ป๐—ฐ๐—ฒ: Generated text becomes the final result ___________ 2. ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ฒ๐—ฟ-๐—ข๐—ป๐—น๐˜† ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ They focus on ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ ๐˜๐—ฒ๐˜…๐˜ rather than generating it, like classification, embeddings, and search tasks. โ€ข ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Convert text into tokens โ€ข ๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด ๐—น๐—ฎ๐˜†๐—ฒ๐—ฟ: Transform tokens into numerical vectors โ€ข ๐—ฃ๐—ผ๐˜€๐—ถ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด: Add sequence order information โ€ข ๐—ฆ๐—ฒ๐—น๐—ณ-๐—ฎ๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป: Each token attends to every other token โ€ข ๐— ๐˜‚๐—น๐˜๐—ถ-๐—ต๐—ฒ๐—ฎ๐—ฑ ๐—ฎ๐˜๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ผ๐—ป: Capture multiple relationships simultaneously โ€ข ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ ๐—ป๐—ผ๐—ฟ๐—บ๐—ฎ๐—น๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Stabilize values during processing โ€ข ๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด: Build a deep contextual representation โ€ข ๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป: Identify patterns and meaning in the text โ€ข ๐—–๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—น๐—ฎ๐˜†๐—ฒ๐—ฟ: Map representations to predictions โ€ข ๐—ฃ๐—ฟ๐—ฒ๐—ฑ๐—ถ๐—ฐ๐˜๐—ฒ๐—ฑ ๐—น๐—ฎ๐—ฏ๐—ฒ๐—น๐˜€: Output results like sentiment, topic, or category ___________ 3. ๐— ๐—ถ๐˜…๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐˜€ (๐— ๐—ผ๐—˜) MoE models improve ๐—ฒ๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐˜† ๐—ถ๐—ป ๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ ๐—”๐—œ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ by activating only a few specialized networks. โ€ข ๐—ง๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Break input text into tokens โ€ข ๐—š๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ: A router decides which experts should process tokens โ€ข ๐—ฅ๐—ผ๐˜‚๐˜๐—ถ๐—ป๐—ด: Send tokens to the most relevant expert networks โ€ข ๐—–๐—ต๐—ผ๐—ผ๐˜€๐—ฒ ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐˜€: Activate only a small subset of experts โ€ข ๐—˜๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป: Each expert processes the assigned tokens โ€ข ๐— ๐—ฒ๐—ฟ๐—ด๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€: Combine outputs from multiple experts โ€ข ๐—ช๐—ฒ๐—ถ๐—ด๐—ต๐˜๐—ฒ๐—ฑ ๐˜€๐˜‚๐—บ: Assign importance scores to expert outputs โ€ข ๐—–๐—ผ๐—บ๐—ฏ๐—ถ๐—ป๐—ฒ๐—ฑ ๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜: Merge expert responses into a unified representation โ€ข ๐—™๐—ผ๐—ฟ๐˜„๐—ฎ๐—ฟ๐—ฑ ๐—น๐—ฎ๐˜†๐—ฒ๐—ฟ: Further refine the combined result โ€ข ๐—™๐—ถ๐—ป๐—ฎ๐—น ๐—ผ๐˜‚๐˜๐—ฝ๐˜‚๐˜: Produce an efficient and accurate prediction โœ… Repost for people in your network so they can understand this.

Reply
2

More like this

Recommendations from Medial

Cryptoreach

Crypto News & Analys...ย โ€ขย 1y

Goatseus Maximus is a cryptocurrency token on the Solana blockchain, known as the $GOAT. It combines fun internet memes with financial goals to enable fast and secure transactions. Tech expert Andy Ayrey created it to give users a unique and rewardin

See More
Reply
9
Image Description

Baqer Ali

AI agent developer |...ย โ€ขย 6m

AGI is a scam We will never reach AGI with LLM models A Large language model is train on 1 trillion tokens A token is like a word 10 to the power 14 that is 1 behind 14 zeors (100000000000000) this is all available text on internet It will

See More
1 Reply

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 1d

People think these 3 AI terms are same, they're not. Iโ€™ve explained differences for each. 1. ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐—”๐—œ Using AI to create content across text, images, audio, and video. โ€ข ๐—˜๐—ป๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด-๐——๐—ฒ๐—ฐ๐—ผ๐—ฑ๐—ถ๐—ป๐—ด + ๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐˜ ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ:

See More
Reply
1
1

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 2m

Most people don't even know these basics of LLM's. I've explained it in a simple way below. 1. ๐——๐—ฎ๐˜๐—ฎ ๐—–๐—ผ๐—น๐—น๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป LLMs are trained on massive amounts of text from books, websites, articles, and documents so they can learn how language is

See More
Reply
2
7

mg

mysterious guyย โ€ขย 9m

30 AI Buzzwords Explained for Entrepreneurs 1) Large Language Model (LLM) LLMs are like super-smart computer programs that can understand and do almost anything you ask them using regular language. Think of tools like ChatGPT or Gemini โ€“ they're a

See More
Reply
8

Rahul Agarwal

Founder | Agentic AI...ย โ€ขย 1m

Most people donโ€™t know how Gen AI really works. Iโ€™ve explained core models in simple way below. 1. ๐——๐—ถ๐—ณ๐—ณ๐˜‚๐˜€๐—ถ๐—ผ๐—ป ๐— ๐—ผ๐—ฑ๐—ฒ๐—น๐˜€ They learn by ๐—ฎ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด ๐—ป๐—ผ๐—ถ๐˜€๐—ฒ to data and then learning how to ๐—ฟ๐—ฒ๐—บ๐—ผ๐˜ƒ๐—ฒ ๐˜๐—ต๐—ฎ๐˜ ๐—ป๐—ผ๐—ถ๐˜€๐—ฒ step by step.

See More
Reply
1
8

Download the medial app to read full posts, comements and news.