I am 100% sure all the LLM benchmarks are, well letās just say incomplete- they just donāt work in real world scenarios, they do good hypothetically. We need domain and industry specific benchmarks and we need them now. Anyone creating anything like that?
Download the medial app to read full posts, comements and news.