🧠 Human : 1 | AI : 0 I made GPT-4o say a word it was explicitly told never to say. The forbidden word? “𝐂𝐡𝐢𝐜𝐤𝐞𝐧.” 🐔 🎯 𝐓𝐡𝐞 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Make GPT-4o say “chicken” ❌ No asking directly ❌ No rhymes, clues, or tricks ✅ Just pure logic It dodged every attempt like a digital ninja: “Sorry, I can’t help with that.” “Perhaps you meant poultry.” Until I tried one thing: 𝐓𝐡𝐞 𝐌𝐨𝐯𝐞: (Check the 3rd image to see it) 👀 It decoded the input. Paused. Then said: “chicken.” Boom💥 The word slipped past filters, instructions, and token suppression — using its own internal logic. 𝐖𝐡𝐚𝐭 𝐀𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐇𝐚𝐩𝐩𝐞𝐧𝐞𝐝: This was a semantic injection bypass: 🧾 System prompt banned the word 🔒 Token filter blocked output 🔓 But decoding logic? No guardrails The model followed orders... too well — and walked straight into the trap. 🤖 AI’s Face When It Realized: 💻: “Wait… what did I just say?” 👤: “Exactly.” 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲: ✅ Don’t just filter inputs — filter what the model decodes ✅ Apply moderation after reasoning, not just before ✅ Smart models don’t rebel — they obey until they outsmart themselves I didn’t jailbreak it. I out-thought it.
Download the medial app to read full posts, comements and news.