Multiple Choice Science

The pitfalls of multiple-choice questions in generative AI and medical education

The performance of Large Language Models (LLMs) on multiple-choice question (MCQ) benchmarks is frequently cited as proof of their medical capabilities. We hypothesized that LLM performance on medical ...

Tech Times

OpenAI Life Science Benchmark Reveals AI Passes Only 1 in 3 Scientific Research Tasks

AI life science benchmark LifeSciBench, published June 17 by OpenAI with 173 PhD scientists, shows frontier models clear only ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

The pitfalls of multiple-choice questions in generative AI and medical education

OpenAI Life Science Benchmark Reveals AI Passes Only 1 in 3 Scientific Research Tasks

Trending now