Founder | Agentic AI... • 21d
3 transformer architectures everyone should know. I've explained it in a simple way below. 1. 𝗗𝗲𝗰𝗼𝗱𝗲𝗿-𝗢𝗻𝗹𝘆 𝗠𝗼𝗱𝗲𝗹𝘀 These are mainly used for 𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (like ChatGPT). They predict the 𝗻𝗲𝘅𝘁 𝘁𝗼𝗸𝗲𝗻 𝘀𝘁𝗲𝗽 𝗯𝘆 𝘀𝘁𝗲𝗽. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Break input text into smaller tokens • 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Add position info so the model understands word order • 𝗦𝗲𝗹𝗳-𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Each token looks at previous tokens for context • 𝗤𝘂𝗲𝗿𝘆/𝗞𝗲𝘆/𝗩𝗮𝗹𝘂𝗲: Mechanism used to measure token relationships • 𝗖𝗮𝘂𝘀𝗮𝗹 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Tokens can only see earlier tokens, not future ones • 𝗙𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 𝗹𝗮𝘆𝗲𝗿: Neural layer refines token representations • 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝘂𝗽𝗱𝗮𝘁𝗲: Model refreshes its internal understanding • 𝗡𝗲𝘅𝘁 𝘁𝗼𝗸𝗲𝗻 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻: Predict the most probable next word • 𝗔𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻: Repeat prediction until the full response is formed • 𝗢𝘂𝘁𝗽𝘂𝘁 𝘀𝗲𝗾𝘂𝗲𝗻𝗰𝗲: Generated text becomes the final result ___________ 2. 𝗘𝗻𝗰𝗼𝗱𝗲𝗿-𝗢𝗻𝗹𝘆 𝗠𝗼𝗱𝗲𝗹𝘀 They focus on 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗮𝘁 𝘂𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 𝘁𝗲𝘅𝘁 rather than generating it, like classification, embeddings, and search tasks. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Convert text into tokens • 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗹𝗮𝘆𝗲𝗿: Transform tokens into numerical vectors • 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Add sequence order information • 𝗦𝗲𝗹𝗳-𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Each token attends to every other token • 𝗠𝘂𝗹𝘁𝗶-𝗵𝗲𝗮𝗱 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Capture multiple relationships simultaneously • 𝗟𝗮𝘆𝗲𝗿 𝗻𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Stabilize values during processing • 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴: Build a deep contextual representation • 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻: Identify patterns and meaning in the text • 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗹𝗮𝘆𝗲𝗿: Map representations to predictions • 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗲𝗱 𝗹𝗮𝗯𝗲𝗹𝘀: Output results like sentiment, topic, or category ___________ 3. 𝗠𝗶𝘅𝘁𝘂𝗿𝗲 𝗼𝗳 𝗘𝘅𝗽𝗲𝗿𝘁𝘀 (𝗠𝗼𝗘) MoE models improve 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 𝗶𝗻 𝗹𝗮𝗿𝗴𝗲 𝗔𝗜 𝗺𝗼𝗱𝗲𝗹𝘀 by activating only a few specialized networks. • 𝗧𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Break input text into tokens • 𝗚𝗮𝘁𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺: A router decides which experts should process tokens • 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: Send tokens to the most relevant expert networks • 𝗖𝗵𝗼𝗼𝘀𝗲 𝗲𝘅𝗽𝗲𝗿𝘁𝘀: Activate only a small subset of experts • 𝗘𝘅𝗽𝗲𝗿𝘁 𝗰𝗼𝗺𝗽𝘂𝘁𝗮𝘁𝗶𝗼𝗻: Each expert processes the assigned tokens • 𝗠𝗲𝗿𝗴𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀: Combine outputs from multiple experts • 𝗪𝗲𝗶𝗴𝗵𝘁𝗲𝗱 𝘀𝘂𝗺: Assign importance scores to expert outputs • 𝗖𝗼𝗺𝗯𝗶𝗻𝗲𝗱 𝗼𝘂𝘁𝗽𝘂𝘁: Merge expert responses into a unified representation • 𝗙𝗼𝗿𝘄𝗮𝗿𝗱 𝗹𝗮𝘆𝗲𝗿: Further refine the combined result • 𝗙𝗶𝗻𝗮𝗹 𝗼𝘂𝘁𝗽𝘂𝘁: Produce an efficient and accurate prediction ✅ Repost for people in your network so they can understand this.

Crypto News & Analys... • 1y
Goatseus Maximus is a cryptocurrency token on the Solana blockchain, known as the $GOAT. It combines fun internet memes with financial goals to enable fast and secure transactions. Tech expert Andy Ayrey created it to give users a unique and rewardin
See MoreDownload the medial app to read full posts, comements and news.