Hacker News

Gemini 3 Deep Think drew me a good SVG of a pelican riding a bicycle

94 points by stared - 17 comments

segmondy [3 hidden]5 mins ago

For those claiming they rigged it. Do you have any concrete evidence? What if the models have just gotten really good?

I just asked Gemini pro to generate an SVG of an octopus dunking a basketball and it did a great job. Not even Deep Think model. Then I did "generate an svg of raccoon at a beach drinking a beer" you can go try this out yourself. Ask it to generate anything you want in SVG. use your imagination.

Rant: This is why AI is going to take over, folks are not even trying the least.

JumpCrisscross [3 hidden]5 mins ago

> What if the models have just gotten really good?

Kagi Assistant remains my main way of interacting with AI. One of its benefits is you're encouraged to try different models.

The heterogeneity in competence, particular per unit in time, is growing rapidly. If I'm extrapolating image-creation capabilities from Claude, I'm going to underestimate what Gemini can do without fuckery. Likewise, if I'm using Grok all day, Gemini and Claude will seem unbelievably competent when it comes to deep research.

vessenes [3 hidden]5 mins ago

Simon notes this benchmark is win-win, since he loves pictures of pelicans riding bicycles — if they spend time benchmaxxing it’s like free pelicans for him.

He originally promised to generate a bunch more animals when we got a “good” pelican. This is not a good pelican. This is an OUTSTANDING pelican, a great bicycle, and it even has a little sun ray over the ocean marked out. I’d like to see more animals please Simon!

rustyhancock [3 hidden]5 mins ago

The intensity of competition between models is so intense right now they are definitely benchmaxxing pelican on bike SVGs and Will Smith spaghetti dinner videos.

stared [3 hidden]5 mins ago

There was Lenna for digital image compression (https://en.wikipedia.org/wiki/Lenna).

A pelican on a bike is SFW, inclusive, yet cool.

It is not a full benchmark - rather a litmus test.

bayindirh [3 hidden]5 mins ago

So, again, when the indicator becomes a target, it stops being a good indicator.

JumpCrisscross [3 hidden]5 mins ago

> when the indicator becomes a target, it stops being a good indicator

But it's still a fair target. Unless it's hard coded into Gemini 3 DT, for which we have no evidence and decent evidence against, I'd say it's still informative.

kakugawa [3 hidden]5 mins ago

That's how you know you've made it: when your pet benchmark becomes a target.

rcbdev [3 hidden]5 mins ago

Goodhart's law in action.

yieldcrv [3 hidden]5 mins ago

note that this benchmark aside, they've gotten really good at SVGs, I used to rely on the nounproject for icons, and sometimes various libraries, but now coding agents just synthesize an SVG tag in the code and draw all icons.

rcarmo [3 hidden]5 mins ago

I don't think this is a good "benchmark" anymore. It's probably on everyone's training set by now.

staticassertion [3 hidden]5 mins ago

I think it could still be an interesting benchmark. Like, assuming AI companies are genuinely trying to solve this pelican problem, how well do they solve it? That seems valid, and the assumption here is that the approach they take could generalize, which seems plausible.

manojlds [3 hidden]5 mins ago

It's funny how I can know where the post is from just by looking at the title (and it's not just about pelicans)

aidos [3 hidden]5 mins ago

The bicycles are getting pretty cyclable now. I’m enjoying this pelican that’s already sliced and ready to bbq.

throwaway333444 [3 hidden]5 mins ago

Since it’s a* FAQ… Also that pelican is pretty fly

bstsb [3 hidden]5 mins ago

read it aloud. “since it’s an FAQ”, where FAQ is pronounced “eff-ay-queue”

bulletsvshumans [3 hidden]5 mins ago

They rigged it.