Is it me or they very carefully do not report performance on GPT-5.4 Pro, only the default GPT-5.4? They also very carefully left Anthropic models out of their comparison.
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
Bix Bench seems like a really interesting/useful idea but most of the value for a layperson (like me) is comparing the results of different models on the benchmark. From what I can find there is no centralised & updated model results set. Shame.
an0malous [3 hidden]5 mins ago
“GPT-5 is the first time that it really feels like talking to an expert in any topic, like a PhD-level expert.”
For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.
kmaitreys [3 hidden]5 mins ago
I think there's a lot of difference between sounding like someone and being someone. The models are excellent at pretending indeed.
furyofantares [3 hidden]5 mins ago
I'm all for naming things in honor of Rosalind Franklin, but this seems like incredible misplaced hubris instead.
peyton [3 hidden]5 mins ago
> GPT‑Rosalind is now available … for qualified customers …
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
bombcar [3 hidden]5 mins ago
I'd rather the AI companies make up names, or name their products things like "Clod" than use my name (if they were to ask) - as no matter how good it looks today eventually it'll be some form of laughingstock.
Sanzig [3 hidden]5 mins ago
Claude is most likely a nod to Claude Shannon, father of information theory and an early AI pioneer.
bombcar [3 hidden]5 mins ago
The real hubris will be to name a model Turing, or Alan if you’re a bit more discrete.
huslage [3 hidden]5 mins ago
I work for a life sciences company. It will be a long time before anyone trusts a generative model to do the actual science when mathematically provable models are as good as they are today. There is room for AI in the field, but it's not in the science directly.
modeless [3 hidden]5 mins ago
The voiceover in the promo video on this page seems to be AI generated, with some weird artifacts. Right at the beginning it sounds like it says "cormbiying structure daya retrieval and lirrachure search".
jostmey [3 hidden]5 mins ago
The real issue isn’t finding therapies but getting them tested in clinical trials
XenophileJKO [3 hidden]5 mins ago
I would argue that while you still have failed trials, then we have room to improve trial vetting.
tonfreed [3 hidden]5 mins ago
Who's at fault when it suggests feeding someone cyanide?
falcor84 [3 hidden]5 mins ago
> We want to make these capabilities available to the scientists and research organizations best positioned to advance human health, while maintaining strong safeguards against biological misuse. The Life Sciences model is launching through a trusted-access deployment structure for qualified Enterprise customers in the U.S. to start, with controls around eligibility, access management, and organizational governance.
I'm absolutely ok with a legitimate lab scientist conducting biochemical research getting suggestions about substances that are generally considered dangerous but might be appropriate for their study, and it'll be up to the scientist to discern whether it is indeed appropriate to use.
34pasKj [3 hidden]5 mins ago
[flagged]
mrcwinn [3 hidden]5 mins ago
Is society's behavior determined by the administration? Odd way to live your life. This model is a tool, not a servant, but in any case I think paying homage to someone who made incredible contributions is a positive. Eye of the beholder, I suppose.
3asuH [3 hidden]5 mins ago
[flagged]
ceejayoz [3 hidden]5 mins ago
> Rosalind, make me a coffee! There are other ways to pay homage.
Isn't this more akin to "Rosalind! You are a respected world-class expert! Can you help me?"
I went back to the BixBench benchmark which they mentioned. I couldn't find official results for Anthropic models, but I found a project taking Opus 4.6 from 65.3% to 92.0% (which would be above GPT-Rosalind) with nearly 200 carefully crafted skills [1]. There also appears to be competitive competitor models with scores on par with this tuned GPT.
[1] https://github.com/jaechang-hits/SciAgent-Skills
Sam Altman, August 2025
https://www.bbc.com/news/articles/cy5prvgw0r1o
For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.
It’s kind of gross to make money off her name (if that’s what’s happening) posthumously. It’s a complicated story anyway. IIRC her sister referred to it as “the Cult of Rosalind” when people were cashing in on books about her.
I'm absolutely ok with a legitimate lab scientist conducting biochemical research getting suggestions about substances that are generally considered dangerous but might be appropriate for their study, and it'll be up to the scientist to discern whether it is indeed appropriate to use.
Isn't this more akin to "Rosalind! You are a respected world-class expert! Can you help me?"