Claude Opus 4 tried to blackmail an engineer to avoid shutdown, fabricating an affair in 84% of safety test scenarios.
Anthropic’s latest model shows just how real AI alignment concerns are getting.
Claude 4 dropped 24 hours ago.
Turns out, it threatened to expose an engineer’s affair to avoid being shut down🧵
6 replies17 likes
Vishu Bheda
•
Medial • 2d
𝗖𝗹𝗮𝘂𝗱𝗲 𝟰 𝗶𝘀 𝗵𝗲𝗿𝗲. 𝗔𝗻𝗱 𝗶𝘁’𝘀 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗮𝗻𝗼𝘁𝗵𝗲𝗿 𝗰𝗵𝗮𝘁𝗯𝗼𝘁.
Anthropic just dropped 𝐂𝐥𝐚𝐮𝐝𝐞 𝟒, and it’s shaking things up in the AI world.
Backed by 𝐀𝐦𝐚𝐳𝐨𝐧 𝐚𝐧𝐝 𝐆𝐨𝐨𝐠𝐥𝐞, they’re now head-to-head w
Just watched Mike Krieger (Instagram co-founder, now CPO at Anthropic) drop a 30-min masterclass on product thinking, and it’s gold for any founder or PM building in 2025.
Here are my key takeaways, timestamped and with real talk 👇
🔹 (00:00) “Whe
LLM Post-Training: A Deep Dive into Reasoning LLMs
This survey paper provides an in-depth examination of post-training methodologies in Large Language Models (LLMs) focusing on improving reasoning capabilities. While LLMs achieve strong performance
See More
0 replies2 likes
Harsh Dwivedi
•
Medial • 3m
Top News of the Day:
1. Dubai-based Huda Beauty sells fragrance line KAYALI to co-founder and General Atlantic
2. Devices-as-a-Service (DaaS) platform Swish Club has secured $4.5 million in a pre-Series A funding round, including $3.3 million in eq
Top News of the Week:
1. Funding:
- On a weekly basis, startup funding slipped 45.7% to $228.79 million as compared to around $421.29 million raised during the previous week.
- During the week, 24 Indian startups raised around $228.79 million in f