For those claiming they rigged it. Do you have any concrete evidence? What if the models have just gotten really good?
I just asked Gemini pro to generate an SVG of an octopus dunking a basketball and it did a great job. Not even Deep Think model. Then I did "generate an svg of raccoon at a beach drinking a beer" you can go try this out yourself. Ask it to generate anything you want in SVG. use your imagination.
Rant:
This is why AI is going to take over, folks are not even trying the least.
JumpCrisscross [3 hidden]5 mins ago
> What if the models have just gotten really good?
Kagi Assistant remains my main way of interacting with AI. One of its benefits is you're encouraged to try different models.
The heterogeneity in competence, particular per unit in time, is growing rapidly. If I'm extrapolating image-creation capabilities from Claude, I'm going to underestimate what Gemini can do without fuckery. Likewise, if I'm using Grok all day, Gemini and Claude will seem unbelievably competent when it comes to deep research.
vessenes [3 hidden]5 mins ago
Simon notes this benchmark is win-win, since he loves pictures of pelicans riding bicycles — if they spend time benchmaxxing it’s like free pelicans for him.
He originally promised to generate a bunch more animals when we got a “good” pelican. This is not a good pelican. This is an OUTSTANDING pelican, a great bicycle, and it even has a little sun ray over the ocean marked out. I’d like to see more animals please Simon!
rustyhancock [3 hidden]5 mins ago
The intensity of competition between models is so intense right now they are definitely benchmaxxing pelican on bike SVGs and Will Smith spaghetti dinner videos.
It is not a full benchmark - rather a litmus test.
bayindirh [3 hidden]5 mins ago
So, again, when the indicator becomes a target, it stops being a good indicator.
JumpCrisscross [3 hidden]5 mins ago
> when the indicator becomes a target, it stops being a good indicator
But it's still a fair target. Unless it's hard coded into Gemini 3 DT, for which we have no evidence and decent evidence against, I'd say it's still informative.
kakugawa [3 hidden]5 mins ago
That's how you know you've made it: when your pet benchmark becomes a target.
rcbdev [3 hidden]5 mins ago
Goodhart's law in action.
yieldcrv [3 hidden]5 mins ago
note that this benchmark aside, they've gotten really good at SVGs, I used to rely on the nounproject for icons, and sometimes various libraries, but now coding agents just synthesize an SVG tag in the code and draw all icons.
rcarmo [3 hidden]5 mins ago
I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.
staticassertion [3 hidden]5 mins ago
I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.
manojlds [3 hidden]5 mins ago
It's funny how I can know where the post is from just by looking at the title (and it's not just about pelicans)
aidos [3 hidden]5 mins ago
The bicycles are getting pretty cyclable now. I’m enjoying this pelican that’s already sliced and ready to bbq.
throwaway333444 [3 hidden]5 mins ago
Since it’s a* FAQ… Also that pelican is pretty fly
bstsb [3 hidden]5 mins ago
read it aloud. “since it’s an FAQ”, where FAQ is pronounced “eff-ay-queue”
I just asked Gemini pro to generate an SVG of an octopus dunking a basketball and it did a great job. Not even Deep Think model. Then I did "generate an svg of raccoon at a beach drinking a beer" you can go try this out yourself. Ask it to generate anything you want in SVG. use your imagination.
Rant: This is why AI is going to take over, folks are not even trying the least.
Kagi Assistant remains my main way of interacting with AI. One of its benefits is you're encouraged to try different models.
The heterogeneity in competence, particular per unit in time, is growing rapidly. If I'm extrapolating image-creation capabilities from Claude, I'm going to underestimate what Gemini can do without fuckery. Likewise, if I'm using Grok all day, Gemini and Claude will seem unbelievably competent when it comes to deep research.
He originally promised to generate a bunch more animals when we got a “good” pelican. This is not a good pelican. This is an OUTSTANDING pelican, a great bicycle, and it even has a little sun ray over the ocean marked out. I’d like to see more animals please Simon!
A pelican on a bike is SFW, inclusive, yet cool.
It is not a full benchmark - rather a litmus test.
But it's still a fair target. Unless it's hard coded into Gemini 3 DT, for which we have no evidence and decent evidence against, I'd say it's still informative.