It's tested and rated on various benchmarks like token, understanding, reasoning etc
0 replies
More like this
Recommendations from Medial
PRATHAM
•
Apple • 1y
What's Your Thought About NFT ( Non Fungible Token ) ? 🤔
It's based on Blockchain technology which maybe a unique token or digital art that is traded. The Token justifies your ownership on the art or token. People say it's future. But I think it's
OpenAI launches o3-mini, a new AI reasoning model on Friday.
Here are the highlights -
-> More reliable: Fact-checks before responding, excelling in STEM fields like programming, math, and science.
-> Faster & cheaper: 63% lower cost than o1-min
See More
4 replies6 likes
Chamarti Sreekar
Passionate about Pos... • 3m
Google's Project Mariner is Here
it is an AI agents which browse web for you
AI agents to automate web tasks like navigation, form-filling, and decision-making.
With real-time interaction and multimodal understanding, it enhances productivity, acc
See More
4 replies3 likes
Chetan Bhosale
Software Engineer | ... • 10m
Understanding Access Tokens and Refresh Tokens in an Indian Scenario 🇮🇳:
Access Token: Think of it as a cinema ticket 🎟️. It allows you to use an app (like a banking app) for a specific period.
Refresh Token: Imagine having a special pass 🏷️. W
Alibaba has unveiled QwQ-32B-Preview, a new “open” AI model that stands out for its logical reasoning and problem-solving capabilities.
With 32.5 billion parameters and the ability to process up to 32,000 words of context, QwQ is already making wave
See More
1 replies6 likes
Comet
#uiux designer #free... • 2m
China is moving VERY fast… 🚀 First DeepSeek, now Kimi – and it’s FREE with unlimited usage!
They claim it BEATS GPT-4o and 3.5 Sonnet on multiple benchmarks. 🤯 Real-time web search, advanced reasoning, 50-file analysis – ALL FOR FREE. Is OpenAI
I am 100% sure all the LLM benchmarks are, well let’s just say incomplete- they just don’t work in real world scenarios, they do good hypothetically.
We need domain and industry specific benchmarks and we need them now.
Anyone creating anything lik
See More
10 replies8 likes
Aakash kashyap
Building JalSeva and... • 5m
AI Model Performance Benchmarks: 🚀
Comparing Claude, GPT-4o, and Gemini Across Key Tasks
with Claude 3.5 Sonnet performing best overall, especially in code (93.7%) and reasoning (65.0%). Gemini 1.5 Pro excels in math (86.5% with 4-shot CoT). GPT-
I want a list of reasoning questions that openai o1 and/or deepseek r1 is failing to answer correctly. Quick help is much appreciated.
Working on something and want to test it for reasoning capabilities.