Founder | Agentic AI... • 10d
How do Voice, Coding & Computer Agents work? I've explained each one in a very simple way below. 1. 𝗩𝗼𝗶𝗰𝗲 𝗔𝗴𝗲𝗻𝘁𝘀 AI systems that talk with people using speech. Examples: Vapi, Retell AI, OpenAI TTS etc. 𝗦𝘁𝗲𝗽𝘀: 1. 𝗨𝘀𝗲𝗿 𝘀𝗽𝗲𝗮𝗸𝘀 𝗮 𝗾𝘂𝗲𝗿𝘆 → Example: “What’s the stock price of nvidia today?” 2. 𝗧𝗲𝗹𝗲𝗽𝗵𝗼𝗻𝘆 𝘀𝘆𝘀𝘁𝗲𝗺 captures the voice. 3. 𝗦𝗧𝗧 (𝗦𝗽𝗲𝗲𝗰𝗵-𝘁𝗼-𝗧𝗲𝘅𝘁) converts the spoken words into text. 4. 𝗔𝗴𝗲𝗻𝘁 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀 𝗶𝘁 using: • 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹 & 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝗔𝗣𝗜 (finds relevant knowledge). • 𝗩𝗲𝗰𝘁𝗼𝗿 𝗗𝗕 (stores knowledge for quick search). 5. 𝗔𝗴𝗲𝗻𝘁 𝗱𝗲𝗰𝗶𝗱𝗲𝘀 𝗮𝗻 𝗮𝗻𝘀𝘄𝗲𝗿 using tools. 6. 𝗧𝗧𝗦 (𝗧𝗲𝘅𝘁-𝘁𝗼-𝗦𝗽𝗲𝗲𝗰𝗵) converts text back into speech. 7. 𝗢𝘂𝘁𝗽𝘂𝘁 is spoken back to the user. 𝗜𝗻 𝘀𝗵𝗼𝗿𝘁: Voice → Text → Keywords → Search → Answer → Voice Back. _____________________________________________ 2. 𝗖𝗼𝗱𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀 AI systems that help developers write, debug, and test code faster. Examples: GitHub Copilot, Cursor AI etc. 𝗦𝘁𝗲𝗽𝘀: 1. 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗴𝗶𝘃𝗲𝘀 𝗮 𝗾𝘂𝗲𝗿𝘆 → Example: “Write a function to sort numbers.” 2. 𝗔𝗴𝗲𝗻𝘁 receives the request. 3. Agent uses: • 𝗖𝗼𝗱𝗲 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗼𝗿 → writes new code. • 𝗖𝗼𝗱𝗲 𝗗𝗲𝗯𝘂𝗴𝗴𝗲𝗿 → finds and fixes errors. • 𝗧𝗲𝘀𝘁 𝗥𝘂𝗻𝗻𝗲𝗿 → checks if code works. 4. 𝗘𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁 provides a workspace where the code runs. 5. 𝗢𝘂𝘁𝗽𝘂𝘁 → working code returned to the developer. 𝗜𝗻 𝘀𝗵𝗼𝗿𝘁: Query → Keywords → AI writes code → Debug + Test → Final Code. _____________________________________________ 3. 𝗖𝗨𝗔 (𝗖𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗨𝘀𝗶𝗻𝗴 𝗔𝗴𝗲𝗻𝘁𝘀) AI systems that 𝘂𝘀𝗲 𝗮 𝗰𝗼𝗺𝗽𝘂𝘁𝗲𝗿 𝗹𝗶𝗸𝗲 𝗮 𝗵𝘂𝗺𝗮𝗻 𝘄𝗼𝘂𝗹𝗱 (e.g., clicking, typing, browsing). Examples: Manus AI, Saner AI etc. 𝗦𝘁𝗲𝗽𝘀: 1. 𝗨𝘀𝗲𝗿 𝗴𝗶𝘃𝗲𝘀 𝗮 𝗾𝘂𝗲𝗿𝘆 → Example: “Book the cheapest flight from Hong Kong to Paris.” 2. 𝗩𝗶𝘀𝗶𝗼𝗻 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹 (𝗩𝗟𝗠) → understands what’s on the screen. 3. 𝗚𝗲𝗻𝗲𝗿𝗮𝗹 𝗣𝘂𝗿𝗽𝗼𝘀𝗲 𝗟𝗟𝗠 → decides actions (click, type, drag, etc.). 4. Uses: • 𝗩𝗲𝗰𝘁𝗼𝗿 𝗗𝗕 → recall past actions or data. • 𝗠𝗲𝗺𝗼𝗿𝘆 → remember ongoing tasks. • 𝗧𝗵𝗶𝗿𝗱-𝗣𝗮𝗿𝘁𝘆 𝗧𝗼𝗼𝗹𝘀 → connect with apps (email, Excel, etc.). • 𝗗𝗲𝘀𝗸𝘁𝗼𝗽 𝗦𝗮𝗻𝗱𝗯𝗼𝘅 → safe test environment to interact with the computer. 5. 𝗢𝘂𝘁𝗽𝘂𝘁 → task completed automatically (like a digital assistant doing your computer work). 𝗜𝗻 𝘀𝗵𝗼𝗿𝘁: You instruct → AI sees screen + plans → clicks/types → finishes the task like a human would. All these different agents are shaping the future of AI. What’s your take? ✅ Repost for others who want to understand how they work.
Providing The Best N... • 3m
Hey guys, I'm building an ai platform for students where students can enter questions via :- text, pdf , image , and select option or enter what to do with it like summarise, generate quiz, etc . It also has voice mode for real-time voice conversati
See MoreAI Engineer and AI G... • 2m
One of My best project "YAMUNA" - Personal Voice Assistant for my PC "What if just saying a name… could awaken intelligence?" What if 'Yamuna' wasn’t just a name — but a portal to the power of AI? Meet Yamuna, my personal AI voice assistant –From man
See MoreHey I am on Medial • 8m
in the next few days I'll be learning & trying out latest techstack for dev using AI tools (& build 3 mini apps) no-code game dev print on demand Ecom business creating and monetising 3 faceless , ai automated yt channels learning AI integra
See MoreBuilding @frendle.ap... • 2m
Built a SaaS app over a single weekend - without writing a line of code or hiring a developer. No CS degree, no late-night code grinds, just pure determination (and some Gen Z energy). Introducing Frendle: A chat app for Gen Z to spill, vent, or vib
See MoreDownload the medial app to read full posts, comements and news.