HN.zip

ML promises to be profoundly weird

244 points by pabs3 - 233 comments
glitchc [3 hidden]5 mins ago
> Claude launched into a detailed explanation of the differential equations governing slumping cantilevered beams. It completely failed to recognize that the snow was entirely supported by the roof, not hanging out over space. No physicist would make this mistake, but LLMs do this sort of thing all the time.

You have to meet some physicist friends of mine then. They are likely to assume that the roof is spherical and frictionless.

beders [3 hidden]5 mins ago
Thank you for putting it so succinctly.

I keep explaining to my peers, friends and family that what actually is happening inside an LLM has nothing to do with conscience or agency and that the term AI is just completely overloaded right now.

erichocean [3 hidden]5 mins ago
AI is exactly the right term: the machines can do "intelligence", and they do so artificially.

Just like we have machines that can do "math", and they do so artificially.

Or "logic", and they do so artificially.

I assume we'll drop the "artificial" part in my lifetime, since there's nothing truly artificial about it (just like math and logic), since it's really just mechanical.

No one cares that transistors can do math or logic, and it shouldn't bother people that transistors can predict next tokens either.

mayama [3 hidden]5 mins ago
> AI is exactly the right term: the machines can do "intelligence", and they do so artificially.

AI in pop culture doesn't mean that at all. Most people impression to AI pre-LLM craze was some form of media based on Asmiov laws of robotics. Now, that LLMs have taken over the world, they can define AI as anything they want.

ruszki [3 hidden]5 mins ago
In 2018, ie “pre-LLM”, the label “AI” was already stamped to everything, so I highly doubt that most people thought that their washing machines are sentient in any way. I remember this starkly, because my team was responsible at Ericsson (that time, about 120k employee) for one of the crucial step to have a model in production, and basically every single project wanted that stamp.

The shift in meaning has been slowly diluted more and more across decades.

rudhdb773b [3 hidden]5 mins ago
> what actually is happening inside an LLM has nothing to do with conscience or agency

What makes you think natural brains are doing something so different from LLMs?

hedgehog [3 hidden]5 mins ago
Structurally a transformer model is so unrelated to the shape of the brain there's no reason to think they'd have many similarities. It's also pretty well established that the brain doesn't do anything resembling wholesale SGD (which to spell it is evidence that it doesn't learn in the same way).
hackinthebochs [3 hidden]5 mins ago
>Structurally a transformer model is so unrelated to the shape of the brain there's no reason to think they'd have many similarities.

Substrate dissimilarities will mask computational similarities. Attention surfaces affinities between nearby tokens; dendrites strengthen and weaken connections to surrounding neurons according to correlations in firing rates. Not all that dissimilar.

rudhdb773b [3 hidden]5 mins ago
Sure the implementation details are different.

I suppose I should have asked by what definition of "consciousness and agency" are today's LLMs (with proper tooling) not meeting?

And if today's models aren't meeting your standard, what makes you think that future LLMs won't get there?

qsera [3 hidden]5 mins ago
For starters, natural brains have the innate ability to differentiate between things that it knows and things that it have no possibility of knowing...
rudhdb773b [3 hidden]5 mins ago
Modern LLMs are fairly good at that as well.
qsera [3 hidden]5 mins ago
But that is bolted on and is not a core behavior.
krainboltgreene [3 hidden]5 mins ago
Any amount of reading into how we understand brains and LLMs to work.
lamasery [3 hidden]5 mins ago
> People keep asking LLMs to explain their own behavior. “Why did you delete that file,” you might ask Claude. Or, “ChatGPT, tell me about your programming.”

Oh man, every business-side person in my company insists on reporting all the way to the UI a "confidence score" that the LLM generates about its own output and I've seen enough to know not to get between an MBA and some metric they've decided they really want even if I'm pretty sure the metric is meaningless nonsense, but... I'm pretty sure those are meaningless nonsense.

danieltanfh95 [3 hidden]5 mins ago
I think the discussion has to be more nuanced than this. "LLMs still can't do X so it's an idiot" is a bad line of thought. LLMs with harnesses are clearly capable of engaging with logical problems that only need text. LLMs are not there yet with images, but we are improving with UI and access to tools like figma. LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.
Aperocky [3 hidden]5 mins ago
> LLMs are clearly unable to propose new, creative solutions for problems it has never seen before.

LLMs are incredibly useful but I'm not sure about this statement.

It is proposing stuff that I haven't seen before, but I don't know about it is new or creative from the entirety of collective human knowledge.

throwaway27448 [3 hidden]5 mins ago
> LLMs with harnesses are clearly capable of engaging with logical problems that only need text.

To some extent. It's not clear where specifically the boundaries are, but it seems to fail to approach problems in ways that aren't embedded in the training set. I certainly would not put money on it solving an arbitrary logical problem.

__alexs [3 hidden]5 mins ago
Solving arbitrary logical problems seems to be equivalent to solving the halting problem so you are probably wise not to make that bet.
senko [3 hidden]5 mins ago
> LLMs are not there yet with images

https://genai-showdown.specr.net/image-editing

There's been a lot of progress there, it's just that an LLM that's best for, say coding, isn't going to be also the best for image edit.

drob518 [3 hidden]5 mins ago
> "LLMs still can't do X so it's an idiot"

Let’s be careful. That’s a straw man. I don’t know anyone who says that. Aphyr says in the article that AIs can do things. But they have been marketed as “intelligent,” and I agree with Aphyr that the word is suggesting way more than AIs currently deliver. They do not reason and they do not think and are not truly intelligent. As the article says, they are big wads of linear algebra. Sometimes, that’s useful.

stickfigure [3 hidden]5 mins ago
I think it's too early to declare the Turing test passed. You just need to have a conversation long enough to exhaust the context window. Less than that, since response quality degrades long before you hit hard window limits. Even with compaction.

Neuroplasticity is hard to simulate in a few hundred thousand tokens.

zug_zug [3 hidden]5 mins ago
"You're absolutely right!"

I think for a while the test was passed. Then we learned the hallmark characteristics of these models, and now most of us can easily differentiate. That said -- these models are programmed specifically to be more helpful, more articulate, more friendly, and more verbose than people, so that may not be a fair expectation. Even so, I think if you took all of that away, you'd be able to differentiate the two, it just might take longer.

drob518 [3 hidden]5 mins ago
Right. I think the modern LLMs are quite good at mimicking human words, but we were initially taken in like we were in the 1960s by ELIZA. It’s a (increasingly sophisticated) magic trick, but it’s just a trick.
downboots [3 hidden]5 mins ago
It was not meant as a pass/fail
criley2 [3 hidden]5 mins ago
For as rigorous of a Turing test as you present, I believe many (or even most) humans would also fail it.

How many humans seriously have the attention span to have a million "token" conversation with someone else and get every detail perfect without misremembering a single thing?

nine_k [3 hidden]5 mins ago
But context window exhaustion does not look like mere forgetfulness, but more like loss of general coherence, like getting drunk.
stickfigure [3 hidden]5 mins ago
Response quality degrades long before you hit a million tokens.

But sure, let's say it doesn't. If you interact with someone day after day, you'll eventually hit a million tokens. Add some audio or images and you will exhaust the context much much faster.

However, I'll grant you that Turing's original imitation game (text only, human typist, five minutes) is probably pretty close, and that's impressive enough to call intelligence (of a sort). Though modern LLMs tend to manifest obvious dead giveaways like "you're absolutely right!"

dairem [3 hidden]5 mins ago
Doesn't the Turing test require a human too, to be compared to the AI?
drob518 [3 hidden]5 mins ago
> It remains unclear whether continuing to throw vast quantities of silicon and ever-bigger corpuses at the current generation of models will lead to human-equivalent capabilities. Massive increases in training costs and parameter count seem to be yielding diminishing returns. Or maybe this effect is illusory. Mysteries!

I’m not even sure whether this is possible. The current corpus used for training includes virtually all known material. If we make it illegal for these companies to use copyrighted content without remuneration, either the task gets very expensive, indeed, or the corpus shrinks. We can certainly make the models larger, with more and more parameters, subject only to silicon’s ability to give us more transistors for RAM density and GPU parallelism. But it honestly feels like, without another “Attention is All You Need” level breakthrough, we’re starting to see the end of the runway.

munificent [3 hidden]5 mins ago
There is a whole giant essay I probably need to write at some point, but I can't help but see parallels between today and the Industrial Revolution.

Prior to the industrial revolution, the natural world was nearly infinitely abundant. We simply weren't efficient enough to fully exploit it. That meant that it was fine for things like property and the commons to be poorly defined. If all of us can go hunting in the woods and yet there is still game to be found, then there's no compelling reason to define and litigate who "owns" those woods.

But with the help of machines, a small number of people were able to completely deplete parts of the earth. We had to invent giant legal systems in order to determine who has the right to do that and who doesn't.

We are truly in the Information Age now, and I suspect a similar thing will play out for the digital realm. We have copyright and intellecual property law already, of course, but those were designed presuming a human might try to profit from the intellectual labor of others. With AI, we're in the industrial era of the digital world. Now a single corporation can train an AI using someone's copyrighted work and in return profit off the knowledge over and over again at industrial scale.

This completely unpends the tenuous balance between creators and consumers. Why would a writer put an article online if ChatGPT will slurp it up and regurgitate it back to users without anyone ever even finding the original article? Who will contribute to the digital common when rapacious AI companies are constantly harvesting it? Why would anyone plant seeds on someone else's farm?

It really feels like we're in the soot-covered child-coal-miner Dickensian London era of the Information Revolution and shit is gonna get real rocky before our social and legal institutions catch up.

arjie [3 hidden]5 mins ago
If I'm being honest, I've never related to that notion of remuneration and credit being the primary reason to write something. I don't claim to be some great writer or anything, but I do have a blog I write quite often on (though I'm traveling in my wife's Taiwan now and haven't updated it in a while). But for me, I write because it feels good to do so. Sometimes there's a group utility in things like I edit a Google Maps listing to be correct even though "a faceless corporation is going to hoover up my work and profit off it without paying me for my work" and I might pick up a Lime bike someone's dropped into the sidewalk even though "a faceless corporation is externalizing the work of organizing the proper storage of their property on public land without paying the workers" or so on.

I just think it's nice to contribute to the human commons and it's fine if some subset of my fellow organism uses it in whatever way. Realistically, the fact that Brewster Kahle is paid whatever few hundred thousand he's paid for managing a non-profit that only exists because it aggregates other people's work isn't a problem for me. Or that Larry Page and Sergey Brin became ultra-rich around providing a search interface into other people's work. Or that Sam Altman and Dario Amodei did the same through a different interface.

This particular notion doesn't seem to be a post-AI trend. It seems to have happened prior to the big GPTs coming out where people started doing a lot of this accounting for contribution stuff. One day it'll be interesting to read why it started happening because I don't recall it from the past. Perhaps I just wasn't super plugged in to the communities that were complaining about Red Hat, Inc.

It's not that I don't understand if I sold my Subaru to a guy who immediately managed to sell it to another guy for a million times the money. I get that. I'd feel cheated. But if I contributed a little to it, like I did so Google would have a site to list for certain keywords so that they could show ads next to it in their search results, I just find it so hard to be like "That's my money you're using. Pay me!".

wat10000 [3 hidden]5 mins ago
You do it as a hobby, that's fine. Some people do it for a living. And while they aren't owed a living doing that specific thing, it is going to be a big problem for them if they can't make money at it anymore.

I'm sure plenty of people feel the same way about software. They make software as a hobby and don't care about remuneration or credit. Meanwhile I write software for my day job and losing the ability to make money from it would be devastating.

arjie [3 hidden]5 mins ago
Ah, I see. It’s just straightforward protectionism like dockworkers opposing automation and so on. That I do comprehend, in fact.

I write software too and I may no longer be able to just do it in the old way. Pretty scary world but also exciting. I can’t imagine trying to restrict LLM software writers on that basis but I can comprehend it as simply self-interest.

Fair enough.

wat10000 [3 hidden]5 mins ago
Do you make money writing software? I bet you either try to restrict LLM usage or assign your rights to an employer who does. Putting code in the public domain is pretty rare, and extremely rare for paid work.
steveklabnik [3 hidden]5 mins ago
As you know, I deeply respect you. Not trying to argue here, just provide my own perspective:

> Why would a writer put an article online if ChatGPT will slurp it up and regurgitate it back to users without anyone ever even finding the original article?

I write things for two main reasons: I feel like I have to. I need to create things. On some level, I would write stuff down even if nobody reads it (and I do do that already, with private things.) But secondly, to get my ideas out there and try to change the world. To improve our collective understanding of things.

A lot of people read things, it changes their life, and their life is better. They may not even remember where they read these things. They don't produce citations all of the time. That's totally fine, and normal. I don't see LLMs as being any different. If I write an article about making code better, and ChatGPT trains on it, and someone, somewhere, needs help, and ChatGPT helps them? Win, as far as I'm concerned. Even if I never know that it's happened. I already do not hear from every single person who reads my writing.

I don't mean that thinks that everyone has to share my perspective. It's just my own.

munificent [3 hidden]5 mins ago
Agreed, totally! I still write and put stuff online.

But it definitely feels different now. It used to feel like I was tending a public garden filled with other people who might enjoy it. It still kind of feels like that, but there are a handful of giant combine machines grinding their way around the garden harvesting stuff and making billionaires richer at the same time.

It's not enough to dissuade me from contributing to the public sphere, but the vibe is definitely different.

Honestly, it reminds me a lot about the early days of Amazon. It's hard to remember how optimistic the world felt back then, but I remember a time when writing reviews felt like a public good because you were helping other people find good products. It was like we all wanted honest product information and Amazon provided a neutral venue for us to build it. Like Wikipedia for stuff.

But as Amazon got bigger and bigger and the externalities more apparent, it felt less like we were helping each other and more like we were help Bezos buy yet another yacht or media empire. And as the reviews got more and more gamed by shady companies, they became less of a useful public good. The whole commons collapsed.

I worry that the larger web and digital knowledge environment is going that way.

I still intend to create and share my stuff with the world because that's who I want to be. But I'll always miss the early days of the web where it felt like a healthier environment to be that kind of person in.

steveklabnik [3 hidden]5 mins ago
I can totally see that, for sure. I was much more likely to write a review long ago, now I don't even bother. (For buying stuff online, at least.) Maybe I lost my innocence about this stuff a long time ago, and so it's not so much LLMs that broke it for me, but maybe... I dunno, the downfall of Web 2.0 and the death of RSS? I do think that the old internet, for some definition of "old," felt different. For sure. I'll have to chew on this. I certainly felt some shock on the IP questions when all of this came up. I'm from the "information wants to be free" sort of persuasion, and now that largely makes me feel kinda old.

Also I'm not a fan of billionaires, obviously, but I think that given I've worked on open source and tools for so long, I kinda had to accept that stuff I make was going to be used towards ends I didn't approve of. Something about that is in here too, I think.

(Also, I didn't say this in the first comment, but I'm gonna be thinking about the industrial revolution thing a lot, I think you're on to something there. Scale meaningfully changes things.)

lelanthran [3 hidden]5 mins ago
> I don't mean that thinks that everyone has to share my perspective. It's just my own.

I think you are walking all around the word "consent" and trying very hard to avoid it altogether.

Your perspective, because it refuses to include any sort of consent, is invalid. No perspective that refuses consent can be valid.

steveklabnik [3 hidden]5 mins ago
Consent is absolutely important, but that does not mean that every single thing in the entire world requires explicit consent. You did not ask me for consent to use my words in your comment. That does not mean you're a bad person.

Free use is an important part of intellectual property law. If it did not exist, the powerful could, for example, stifle public criticism by declaring that they do not consent to you using their words or likeness. The ability to do that is important for society. It is also just generally important for creating works inspired by others, which is virtually every work. There has to be lines for cases where requiring attribution is required, and cases where it is not.

lelanthran [3 hidden]5 mins ago
> You did not ask me for consent to use my words in your comment.

I am not representing your words as mine. I am not using your words to profit off. I am not making a gain by attributing your words to you.

> There has to be lines for cases where requiring attribution is required, and cases where it is not.

You are blurring the lines between "using a quote or likeness" and "giving credit to". I am skeptical that you don't know the difference between the two.

Regardless, any "perspective" that disregards the need to acquire consent is invalid. Even if you are going to ignore it, you have to acknowledge that you don't feel you need any consent from the people you are taking from.

This whole "silence is consent" attitude is baffling.

steveklabnik [3 hidden]5 mins ago
You made an incredibly strong statement that is much broader than what we are talking about. I am pointing out various cases where I think that broadness is incorrect, I am not equating the two.

I do not think that, if you read, say, https://steveklabnik.com/writing/when-should-i-use-string-vs... , and then later, a friend asks you "hey, should I use String or &str here?" that you need my consent to go "at the start, just use String" instead of "at the start, just use String, like Steve Klabnik says in https://steveklabnik.com/writing/when-should-i-use-string-vs... ". And if they say "hey that's a great idea, thank you" I don't think you're a bad person if you say "you're welcome" without "you should really be saying welcome to Steve Klabnik."

It is of course nice if you happen to do so, but I think framing it as a consent issue is the wrong way to think about it.

We recognize that this is different than simply publishing the exact contents of the blog post on your blog and calling it yours, because it is! To me, an LLM is a transformative derivative work, not an exact copy. Because my words are not in there, they are not being copied.

But again, I am not telling anyone else that they must agree with me. Simply stating my own relationship with my own creative output.

cjcole [3 hidden]5 mins ago
"but I can't help but see parallels between today and the Industrial Revolution"

You're not the only one.

The current Pope Leo XIV explicitly named himself after the the previous Leo, Pope Leo XIII, who was pope during the Industrial Revolution (1878-1903) and issued the influential Encyclical Rerum novarum (Rights and Duties of Capital and Labor) in response to the upheaval.

“Pope Leo XIII, with the historic Encyclical Rerum novarum, addressed the social question in the context of the first great industrial revolution,” Pope Leo recalled. “Today, the Church offers to all her treasure of social teaching in response to another industrial revolution and the developments of artificial intelligence.” A name, then, not only rooted in tradition, but one that looks firmly ahead to the challenges of a rapidly changing world and the perennial call to protect those most vulnerable within it.”

https://www.vatican.va/content/leo-xiii/en/encyclicals/docum...

https://www.vaticannews.va/en/pope/news/2025-05/pope-leo-xiv...

konschubert [3 hidden]5 mins ago
> Prior to the industrial revolution, the natural world was nearly infinitely abundant.

The opposite is true. Central Europe was almost devoid of trees. Food was scarce as arable land bore little fruit without fertiliser.

Society was Malthusian until the Industrial Revolution.

pocksuppet [3 hidden]5 mins ago
Stuff gets put online when the reader isn't the customer. Someone is paying for a reader to be told certain things. So it's free at the point of reading.
drob518 [3 hidden]5 mins ago
A couple thoughts…

Mostly, AIs don’t recite back various works. Yes, there a couple of high profile cases where people were able to get an AI to regurgitate pieces of New York Times articles and Harry Potter books, but mostly not. Mostly, it is as if the AI is your friend who read a book and gives you a paraphrase, possibly using a couple sentences verbatim. In other words, it probably falls under a fair use rule.

Secondly, given the modern world, content that doesn’t appear online isn’t consumed much, so creators who are doing it for the money will certainly continue putting content online. Much of that content will be generated by AIs, however.

triceratops [3 hidden]5 mins ago
You're missing the point. This is the crux of munificent's argument IMO (and I've made variations of it as well)

> We have copyright and intellecual property law already, of course, but those were designed presuming a human might try to profit from the intellectual labor of others.

You getting a summary of a copyrighted work from a friend is necessarily limited by the number of friends you have, the amount of time they have to read stuff and talk to you, and so on. Machines (and AIs) don't have any limitations.

drob518 [3 hidden]5 mins ago
Yes, true. But does that really shift the argument much? An AI is like the most well-read book nerd you’ve ever met. The AI has read everything. They still won’t recite Harry Potter for you at full length and reading what the original author wrote is part of the pleasure.
nrabulinski [3 hidden]5 mins ago
Does a literal book nerd profit megacorporations when they bring up books to you? While burning through a household worth of energy in the process? Also, I’d like to talk with such book nerd because they’d have opinions on books, potentially if I brought up something I have read we could exchange thoughts about it, they could make recommendations for me based on their complex experiences instead of statistics from Reddit comments. An LLM can do none of those, while also doing the former. It’s a lose-lose.

Also, a book nerd doesn’t take roughly ~all human created text to train to produce meaningful results. It’s just such a misplaced analogy and people have been making it ever since OpenAI announced chatgpt for the first time - why do people think “an LLM is just a human who read a lot”

bluefirebrand [3 hidden]5 mins ago
> It really feels like we're in the soot-covered child-coal-miner Dickensian London era of the Information Revolution and shit is gonna get real rocky before our social and legal institutions catch up

The really discouraging part of this is that it feels like our social and legal institutions don't even care if they catch up or not.

Technology is speeding up and the lag time before anything is discussed from a legal standpoint is way, way too long

xmprt [3 hidden]5 mins ago
I see a lot of researchers working on newer ideas so I wouldn't be surprised if we get a breakthrough in 5-10 years. After all, the gap between AlexNet and Attention is All You Need was only 6 years. And then Scaling Laws was about 3-4 years after that. It might seem like not much progress is being made but I think that's in part because AI labs are extremely secretive now when ideas are worth billions (and in the right hands, potentially more).

Of course 5-10 years is a long time to bang our heads against the wall with untenable costs but I don't know if we can solve our way out of that problem.

supliminal [3 hidden]5 mins ago
The echoes of A.I. winter.
embedding-shape [3 hidden]5 mins ago
> I’m not even sure whether this is possible.

Based on what's happened so far, maybe. At least that's exactly how we got to the current iteration back in 2022/2023, quite literally "lets see what happens when we throw an enormous amount data at them while training" worked out up until one point, then post-training seems to have taken over where labs currently differ.

drob518 [3 hidden]5 mins ago
Right, but we played the scaling card and it worked but is now reaching limits. What is the next card? You can surely argue that we can find a new one at any time. That’s the definition of a breakthrough. I just don’t see one at the moment.
embedding-shape [3 hidden]5 mins ago
> I just don’t see one at the moment.

Did you see the one before the current one was even found? Things tend to look easy in hindsight, and borderline impossible trying to look forward. Otherwise it sounds like you're in the same spot as before :)

drob518 [3 hidden]5 mins ago
That’s what I’m said. Breakthroughs happen. No doubt about it, and they are unpredictable. Hence a breakthrough. But right now we’re using up runway with nothing yet identified to take us to the next level. And while sometimes breakthroughs happen, sometimes they don’t.
functional_dev [3 hidden]5 mins ago
better tooling and integration
htrp [3 hidden]5 mins ago
We pay people to create more high quality tokens (mercor, turing) which are then fed into data generating processes (synthetic data) to create even more tokens to train on
drob518 [3 hidden]5 mins ago
But does that really help, or do you get distortion? The frequency distribution of human generated content moves slowly over time as new subjects are discussed. What frequency distribution do those “data generating processes” use? And at root, aren’t those “data generating processes” basically just another LLM (I.e., generating tokens according to a probability distribution)? Thus, aren’t we just sort of feeding AI slop into the next training run and humoring ourselves by renaming the slop as “synthetic data?” Not trying to be argumentative. I’m far from being an AI expert, so maybe I’m missing it. Feel free to explain why I’m wrong.
htrp [3 hidden]5 mins ago
That's the problem in a nutshell. There is an art to how you generate the synthdata so that you don't get crappy trained models (especially when mistakes cost XX million dollars).

It's also theoretically why facebook paid 14bn for alex wang and scale ai

krainboltgreene [3 hidden]5 mins ago
> The current corpus used for training includes virtually all known material.

This is just totally incorrect. It's one of those things everyone just assumes, but there's an immense amount of known material that isn't even digitized, much less in the hands of tech companies.

drob518 [3 hidden]5 mins ago
What large caches of undigitized content exists? Surely, not everything has been digitized, but I can’t think it’s much in percentage terms.
cgh [3 hidden]5 mins ago
The Vatican Library contains roughly 1.1 million printed books and around 75,000 codices, only a small percentage of which have been digitised.
drob518 [3 hidden]5 mins ago
Which is what percent of the world’s content? 0.000000001% or something similar. It’s nothing in the scheme of things. To put it another way, if we were to digitize that continent and train on it, our AIs would not get noticeably better in any way. It doesn’t move the needle.
dwallin [3 hidden]5 mins ago
Some people point at LLMs confabulating, as if this wasn’t something humans are already widely known for doing.

I consider it highly plausible that confabulation is inherent to scaling intelligence. In order to run computation on data that due to dimensionality is computationally infeasible, you will most likely need to create a lower dimensional representation and do the computation on that. Collapsing the dimensionality is going to be lossy, which means it will have gaps between what it thinks is the reality and what is.

n4r9 [3 hidden]5 mins ago
The concern for me about LLMs confabulating is not that humans don't do it. It's that the massive scale at which LLMs will inevitably be deployed makes even the smallest confabulation extremely risky.
NiloCK [3 hidden]5 mins ago
I don't understand this. Many small errors distributed across a large deployment sounds a lot like normal mode of error prone humans / cogs / whatevers distributed over a wide deployment.
n4r9 [3 hidden]5 mins ago
Let's say a given B2B system deployment typically requires 100 custom behaviours/scripts and 3 years worth of effort. A team of ten people can execute such a deployment in 3-4 months. The team has the capacity to fix up issues caused by small human errors as they arise, since they show up roughly once a week.

With the advent of LLMs, a new deployment now takes 3 days. Consequently, errors requiring human attention crop up several times a day.

xmprt [3 hidden]5 mins ago
There's a difference between 1000 diverse humans with varied traits making errors that should cancel out because of the law of large numbers vs 10 AI with the same training data making errors that would likely correlate and compound upon each other.
GolfPopper [3 hidden]5 mins ago
I have yet to see a comparison of human vs. LLM confabulation errors at scale.

"Many small errors" makes a presumption about LLM confabulation/hallucination that seems unwarranted. Pre-LLM humans (and our computers) have managed vast nuclear arsenals, bioweapons research, and ubiquitous global transport - as a few examples - without any catastrophic mistakes, so far. What can we reasonably expect as a likely worst case scenario if LLMs replacing all the relevant expertise and execution?

krainboltgreene [3 hidden]5 mins ago
Your project vue-skuilder has 6 github action steps devoted to checking the work you do before it's allowed to go out. You do not trust yourself to get things right 100% of the time.

I am watching people trust LLM-based analysis and actions 100% of the time without checking.

bee_rider [3 hidden]5 mins ago
We shouldn’t try to build a worse version of a human. We should try to build a better compiler and encyclopedia.
logicprog [3 hidden]5 mins ago
We tried that. It was called Cyc. It never got even close to the level of capabilities a modern LLM has in an agentic harness — even on common sense and reasoning problems!
GolfPopper [3 hidden]5 mins ago
That sounds like a "get wealthy slowly" plan, while the LLM prophets are more focused on "get rich quick".
ghywertelling [3 hidden]5 mins ago
There are AI researchers who wrote blogposts which got to HN top about spiky spheres (I won't link the original blogpost making that claim to avoid hurt sentiments). Here's 3blue1brown correcting those AI/ML researchers intuitions.

https://www.youtube.com/watch?v=fsLh-NYhOoU&t=3238s

throwaway27448 [3 hidden]5 mins ago
Humans can be reasoned with, though, and are capable of learning.
Frieren [3 hidden]5 mins ago
> Some people point at LLMs confabulating

No. LLMs do not confabulate they bullshit. There is a big difference. AIs do not care, cannot care, have not capacity to care about the output. String tokens in, string tokes out. Even if they have all the data perfectly recorded they will still fail to use it for a coherent output.

> Collapsing the dimensionality is going to be lossy, which means it will have gaps between what it thinks is the reality and what is.

Confabulation has to do with degradation of biological processes and information storage.

There is no equivalent in a LLM. Once the data is recorded it will be recalled exactly the same up to the bit. A LLM representation is immutable. You can download a model a 1000 times, run it for 10 years, etc. and the data is the same. The closes that you get is if you store the data in a faulty disk, but that is not why LLMs output is so awful, that would be a trivial problem to solve with current technology. (Like having a RAID and a few checksums).

stronglikedan [3 hidden]5 mins ago
I don't even think they bullshit, since that requires conscious effort that they do not an cannot possess. They just simply interpret things incorrectly sometimes, like any of us meatbags.
thayne [3 hidden]5 mins ago
They make incorrect predictions of text to respond to prompts.

The neat thing about LLMs is they are very general models that can be used for lots of different things. The downside is they often make incorrect predictions, and what's worse, it isn't even very predictable to know when they make incorrect predictions.

lamasery [3 hidden]5 mins ago
I think this is leaning on the "lies are when you tell falsehoods on purpose; bullshit is when you simply don't care at all whether what you're saying is true" definition of bullshit. Cf. On Bullshit.

So, they can't lie, but they can (and, in fact, exclusively do) bullshit.

knowaveragejoe [3 hidden]5 mins ago
> No. LLMs do not confabulate they bullshit. There is a big difference. AIs do not care, cannot care, have not capacity to care about the output. String tokens in, string tokes out. Even if they have all the data perfectly recorded they will still fail to use it for a coherent output.

Isn't "caring" a necessary pre-requisite for bullshitting? One either bullshits because they care, or don't care, about the context.

marssaxman [3 hidden]5 mins ago
They're presumably referring to the Harry Frankfurt definition of bullshit: "speech intended to persuade without regard for truth. The liar cares about the truth and attempts to hide it; the bullshitter doesn't care whether what they say is true or false."
SoftTalker [3 hidden]5 mins ago
The bullshitter does have an objective in mind however. There is some ultimate purpose to his bullshitting. LLMs don't even have that. They just spew words.
dgb23 [3 hidden]5 mins ago
Thought of the same book when reading the above.
simianwords [3 hidden]5 mins ago
You seem confident. Can you get it to bullshit on GPT-5.4 thinking? Use a text prompt spanning 3-4 pages and lets see if it gets it wrong.

I haven't seen any counter examples, so you may give some examples to start with.

root_axis [3 hidden]5 mins ago
> Some people point at LLMs confabulating, as if this wasn’t something humans are already widely known for doing.

I think we need to start rejecting anthropomorphic statements like this out of hand. They are lazy, typically wrong, and are always delivered as a dismissive defense of LLM failure modes. Anything can be anthropomorphized, and it's always problematic to do so - that's why the word exists.

This rhetorical technique always follows the form of "this LLM behavior can be analogized in terms of some human behavior, thus it follows that LLMs are human-like" which then opens the door to unbounded speculation that draws on arbitrary aspects of human nature and biology to justify technical reasoning.

In this case, you've deliberately conflated a technical term of art (LLM confabulation) with the the concept of human memory confabulation and used that as a foundation to argue that confabulation is thus inherent to intelligence. There is a lot that's wrong with this reasoning, but the most obvious is that it's a massive category error. "Confabulation" in LLMs and "confabulation" in humans have basically nothing in common, they are comparable only in an extremely superficial sense. To then go on to suggest that confabulation might be inherent to intelligence isn't even really a coherent argument because you've created ambiguity in the meaning of the word confabulate.

hackinthebochs [3 hidden]5 mins ago
>this LLM behavior can be analogized in terms of some human behavior, thus it follows that LLMs are human-like

No, the argument is "this behavior is similar enough to human behavior that using it as evidence against <claim regarding LLM capability that humans have> is specious"

>"Confabulation" in LLMs and "confabulation" in humans have basically nothing in common

I don't know why you think this. They seem to have a lot in common. I call it sensible nonsense. Humans are prone to this when self-reflective neural circuits break down. LLMs are characterized by a lack of self-reflective information. When critical input is missing, the algorithm will craft a narrative around the available, but insufficient information resulting in sensible nonsense (e.g. neural disorders such as somatoparaphrenia)

red-iron-pine [3 hidden]5 mins ago
people can and do confabulate, but generally I trust my intern to tell me "I don't know" and "I think it was X but tbh I have no fuckin clue"

the LLM will just lie to me "Good idea! You're totally right, we should do Y"

FloorEgg [3 hidden]5 mins ago
Yes, and to me the evolution of life sure looks like an evolution of more truthful models of the universe in service of energy profit. Better model -> better predictions -> better profit.

I'm extremely skeptical that all of life evolved intelligence to be closer to truth only for us to digitize intelligence and then have the opposite happen. Makes no sense.

telephone3 [3 hidden]5 mins ago
My understanding is that this is the opposite of what is typically understood to be true - organisms with less truthful (more reductive/compressed) perception survive better than those with more complete perception. "Fitness beats truth."
FloorEgg [3 hidden]5 mins ago
I think we are maybe talking past each other?

Fitness is effective truth prediction, appropriately scoped.

A frog doesn't need to understand quantum physics to catch a fly. But if the frogs model of fly movement was trained on lies it will have a model that predicts poorly, won't catch flies, and will die.

There is another level to this in that the more complex and changing the environment the more beneficial a wider scoped model / understanding of truth.

However if you are going to lean fully into Hoffman and accept thatby default consciousness constructs rather than approximate reality I think we will have to agree to disagree. Personally I ascribe to Karl Friston free energy principle.

nothinkjustai [3 hidden]5 mins ago
It’s a failure mode of humans, it’s the entire mode of LLMs.
drob518 [3 hidden]5 mins ago
The test isn’t whether humans also create bullshit, but whether an honest actor knows when they are doing this and doesn’t do it on purpose. As the article points out, LLMs don’t say “I don’t know.” If you demand they do something that never appears in the training data, they just forge ahead and generate words and make something up according to the statical probabilities they have in the model weights. A human knows that he doesn’t know. That seems missing with current AIs.
sillyfluke [3 hidden]5 mins ago
If you want to call it that, I find the confabulation in LLMs extreme. That level of confabulation would most likely be diagnosed as dementia in humans.[0] Hence, it is considered a bug not a feature in humans as well.

Now imagine a high-skilled software engineer with dementia coding safety-critical software...

[0] https://www.medicalnewstoday.com/articles/confabulation-deme...

zeroonetwothree [3 hidden]5 mins ago
And is that considered a feature of humans or a bug?

Is it something we want to emulate?

margalabargala [3 hidden]5 mins ago
The suggestion is that it is an intrinsic quality and therefore neither a feature nor a bug.

It's like saying, computation requires nonzero energy. Is that a feature or a bug? Neither, it's irrelevant, because it's a physical constant of the universe that computation will always require nonzero energy.

If confabulation is a physical constant of intelligence, then like energy per computation, all we can do is try to minimize it, while knowing it can never go to zero.

delusional [3 hidden]5 mins ago
> Some people point at LLMs confabulating, as if this wasn’t something humans are already widely known for doing.

Are you seriously making the argument that AI "hallucinations" are comparable and interchangeable to mistakes, omissions and lies made by humans?

You understand that calling AI errors "hallucinations" and "confabulations" is a metaphor to relate them to human language? The technical term would be "mis-prediction", which suddenly isn't something humans ever do when talking, because we don't predict words, we communicate with intent.

AIorNot [3 hidden]5 mins ago
Yes see Karl Frisstons Free energy principle

https://www.nature.com/articles/nrn2787

_dwt [3 hidden]5 mins ago
I have a question for all the "humans make those mistakes too" people in this thread, and elsewhere: have you ever read, or at least skimmed a summary of, "The Origin of Consciousness in the Breakdown of the Bicameral Mind"? Did you say "yeah, that sounds right"? Do you feel that your consciousness is primarily a linguistic phenomenon?

I am not trying to be snarky; I used to think that intelligence was intrinsically tied to or perhaps identical with language, and found deep and esoteric meaning in religious texts related to this (i.e. "in the beginning was the Word"; logos as soul as language-virus riding on meat substrate).

The last ~three years of LLM deployment have disabused me of this notion almost entirely, and I don't mean in a "God of the gaps" last-resort sort of way. I mean: I see the output of a purely-language-based "intelligence", and while I agree humans can make similar mistakes/confabulations, I overwhelmingly feel that there is no "there" there. Even the dumbest human has a continuity, a theory of the world, an "object permanence"... I'm struggling to find the right description, but I believe there is more than language manipulation to intelligence.

(I know this is tangential to the article, which is excellent as the author's usually are; I admire his restraint. However, I see exemplars of this take all over the thread so: why not here?)

xandrius [3 hidden]5 mins ago
It feels like you probably went too deep in the LLM bandwagon.

An LLM is a statistical next token machine trained on all stuff people wrote/said. It blends texts together in a way that still makes sense (or no sense at all).

Imagine you made a super simple program which would answer yes/no to any questions by generating a random number. It would get things right 50% of the times. You can them fine-tune it to say yes more often to certain keywords and no to others.

Just with a bunch of hardcoded paths you'd probably fool someone thinking that this AI has superhuman predictive capabilities.

This is what it feels it's happening, sure it's not that simple but you can code a base GPT in an afternoon.

simianwords [3 hidden]5 mins ago
If it were not "just a statistical next token machine", how different would it behave?

Can you find an example and test it out?

xandrius [3 hidden]5 mins ago
Wait, you're asking to find and produce a example of a feasible and better alternative to LLMs when they are the current forefront of AI technology?

Anyway, just to play along, if it weren't just a statistical next token machine, the same question would have always the same answer and not be affected by a "temperature" value.

simianwords [3 hidden]5 mins ago
Thats also how humans behave.. I don't see how non determinism tells me anything.

My question was a bit different: if were not just a statistical next token predictor would you expect it to answer hard questions? Or something like that. What's the threshold of questions you want it to answer accurately.

camgunz [3 hidden]5 mins ago
Well, large models are (kinda) non-deterministic in two ways. The first is you actually provide many of them with a seed, which is easy to manage--just use the same seed for the same result. The second part is the "you actually have very little control over the 'neural pathways' the model will use to respond to the prompt". This is the baffling part, like you'll prompt a model to generate a green plant, and it works. You prompt it to generate a purple plant, and it generates an abstract demon dog with too many teeth.

Anyway, neither of these things describes human non-determinism. You can't reuse the seed you used with me yesterday to get the exact same conversation, and I don't behave wildly unpredictably given conceptually very similar input.

Apocryphon [3 hidden]5 mins ago
How do non-LLM based World Models behave?
nine_k [3 hidden]5 mins ago
If you look at different ancient traditions, you will notice how they struggle with the limitations of language, with its inability to represent certain things that are not just crucial for understanding the world, but also are even somehow communicable. Buddhists dug into that in a very analytical, articulate way, for instance.

Another perspective: cetaceans are considered to be as conscious as humans, but any attempts to interpret their communication as a language failed so far. They can be taught simple languages to communicate with humans, as can be chimps. But apparently it's not how they process the world inside.

gbgarbeb [3 hidden]5 mins ago
You're a little out of date. Cetaceans communicate images to each other in the form of ultrasonic chirps. They chirp, they hear a reflection, and they repeat the reflection.
nine_k [3 hidden]5 mins ago
Does this resemble human language, with syntax, the ability to define new notions based on known notions, etc?
pocksuppet [3 hidden]5 mins ago
> In the beginning were the words, and the words made the world. I am the words. The words are everything. Where the words end the world ends. You cannot go forward in an absence of space. Repeat: In the beginning were the words...

- a self-aware computer program in a video game, when you attempt to exceed the boundaries of its code

stavros [3 hidden]5 mins ago
I think there are two types of discussions, when it comes to LLMs: Some people talk about whether LLMs are "human" and some people talk about whether LLMs are "useful" (ie they perform specific cognitive tasks at least as well as humans).

Both of those aspects are called "intelligence", and thus these two groups cannot understand each other.

delusional [3 hidden]5 mins ago
> I'm struggling to find the right description

I think you're circling the concept of a "soul". It is the reason that, in non-communicative disabled people, we still see a life.

I've wanted to make an art piece. It would be a chatbox claiming to connect you to the first real intelligence, but that intelligence would be non-communicative. I'd assure you that it is the most intelligent being, that it had a soul, but that it just couldn't write back.

Intelligence and Soul is not purely measurable phenomenon. A man can do nothing but stupid things, say nothing but outright lies, and still be the most intelligent person. Intelligence is within.

bstsb [3 hidden]5 mins ago
if you can’t access the page through region blocks:

https://archive.ph/I5cAE

doodpants [3 hidden]5 mins ago
> One of the ongoing problems in LLM research is how to get these machines to say “I don’t know”, rather than making something up.

To be fair, I've known humans who are like this as well.

arctic-true [3 hidden]5 mins ago
This is a limitation of the training data. If you were uncertain about something, you wouldn’t write a book about it. The kinds of people you’re talking about tend to generate far more text in their lives than others, because they can spend more time generating - writing books, blogposts, whatever - and less time thinking and working and actually doing things. The models never say they’re uncertain because we never say we’re uncertain, or at least we don’t write it down anywhere.
wmf [3 hidden]5 mins ago
Those people aren't the ones doing the work though.
Kuyawa [3 hidden]5 mins ago
And the past too, if we've been paying attention
nomdep [3 hidden]5 mins ago
"As LLMs etc. are deployed in new situations, and at new scale, there will be all kinds of changes in work, politics, art, sex, communication, and economics."

For an article five years in the making, this is what I expected it to be about. Instead, we got a ramble about how imperfect LLMs are right now.

nathell [3 hidden]5 mins ago
The post is just a prelude to a 10-part article, most of which is not yet released (but will be shortly). Judging by the table of contents, the things you expected will be elaborated on in subsequent parts.
nomdep [3 hidden]5 mins ago
That changes it. I missed that the table of contents was for other future articles, my bad.
52-6F-62 [3 hidden]5 mins ago
> Instead, we got a ramble about how imperfect LLMs are right now.

I wager this is a point that needs beaten into the common psyche. After all, it's been sold that it is not an imperfect tool, but the solution to all of our problems in every field forever. That's why these companies need billions upon billions of dollars of public subsidies and investments that would otherwise find their way to more pragmatic ends.

PaulDavisThe1st [3 hidden]5 mins ago
While the economic, energy, political and social issues associated with LLMs ought to be enough to nix the adoption that their boosters are seeking ...

... I still think there is an interesting question to be investigated about whether, by building immensely complex models of language, one of our primary ways that we interact with, reason about and discuss the world, we may not have accidentally built something with properties quite different than might be guessed from the (otherwise excellent) description of how they work in TFA.

I agree with pretty much everything in TFA, so this is supplemental to the points made there, not contesting them or trying to replace them.

embedding-shape [3 hidden]5 mins ago
> In general, ML promises to be profoundly weird. Buckle up.

I love that it ends with such a positive note, even though it's generally a critical article, at least it's well reasoned and not utterly hyping/dooming something.

Thanks yet again Kyle!

ambicapter [3 hidden]5 mins ago
The recent article of Sam Altman described pretty much as a compulsive liar. Would it be any surprise if his most impactful contribution to the world was a machine that compulsively lies?
embedding-shape [3 hidden]5 mins ago
How could it be that we humans hardly even agree on what "knowledge" truly is, yet somehow this machine learning algorithm somehow "compulsively lies"? How would it even know what is a lie, and how could something lacking autonomy in the first place do anything compulsively?
quantummagic [3 hidden]5 mins ago
This is a good point. As much as there is too much breathless enthusiasm for AI, there is also a lot of emotionally manipulative and hyperbolic language used by skeptics. We're warned not to anthropomorphize, and then hear about AI's compulsive lying, or "hallucinations", in the next.
sph [3 hidden]5 mins ago
He sought to create God in his image, that's a narcissist's wet dream.
dsign [3 hidden]5 mins ago
> At the same time, ML models are idiots. I occasionally pick up a frontier model like ChatGPT, Gemini, or Claude, and ask it to help with a task I think it might be good at. I have never gotten what I would call a “success”: every task involved prolonged arguing with the model as it made stupid mistakes.

I have a ton of skepticism built-in when interacting with LLMs, and very good muscles for rolling my eyes, so I barely notice when I shrug a bad answer and make a derogatory inner remark about the "idiots". But the truth is, that for such an "stochastic parrot", LLMs are incredibly useful. And, when was the last time we stopped perfecting something we thought useful and valuable? When was the last time our attempts were so perfectly futile that we stopped them, invented stories about why it was impossible, and made it a social taboo to be met with derision, scorn and even ostracism? To my knowledge, in all of known human history, we have done that exactly once, and it was millennia ago.

wk_end [3 hidden]5 mins ago
> And, when was the last time we stopped perfecting something we thought useful and valuable? When was the last time our attempts were so perfectly futile that we stopped them, invented stories about why it was impossible, and made it a social taboo to be met with derision, scorn and even ostracism? To my knowledge, in all of known human history, we have done that exactly once, and it was millennia ago.

I feel dense here, but I can't figure out what you're referring to. I asked ChatGPT (hah!) and it suggested the Tower of Babel, perpetual motion machines, or alchemy, but none of them really fit the bill.

lamasery [3 hidden]5 mins ago
The Tower of Babel seems like an OK fit, but that's rather more poetic than what this seems to be getting at.

"Millennia" is what's really throwing me. We (respectable society, as the post outlines) didn't stop attempting alchemy or perpetual motion machines "millennia" ago, but a few centuries at most.

All I can think of is immortality. The very first surviving long recorded tale in human history that I'm aware of is about how it's a futile quest (The Epic of Gilgamesh, IIRC ~5,000ish years old in its earliest extant fragments, a few hundred years newer in reasonably-complete form). The trouble with that is despite wide observations over literally millennia that this has never even come close to working and repeated supposition and suggestion that it's unwise to attempt, outright impossible, or somehow sacrilegious (the "taboo" thing, as mentioned), I'm not aware of any time in history that rich people haven't been actively trying for it (including today! That's what all the body-freezing business is about, it's modern mummification, the contracts are the formulaic prayers carved in the tomb walls) and usually they're not exactly "scorned" or "ostracized" for it.

alexpotato [3 hidden]5 mins ago
> I asked if what they had done was ethical—if making deep learning cheaper and more accessible would enable new forms of spam and propaganda.

Someone asked Yuval Noah Harari, author of Sapiens, his thoughts on LLMs and how easy it was to create fake news, ai slop etc.

His response:

"People creating fake stories is nothing new. It's been going on for centuries. Humans have always dealt with it the same way: by creating institutions that they trust to only deliver factual information"

This could be government departments, newspapers, non-profits etc.

A personal note on this:

There is a Christmas card my grandfather made in the 1950s by "photoshopping" (by hand, not the software) images of each member of the family so it looked like they were all miniature versions of themselves standing on various parts of the fireplace. The world didn't collapse due to fake media between the 1950s and today due to people having that ability.

allturtles [3 hidden]5 mins ago
I see this kind of take a lot, and I don't think it's convincing. To me it's similar to saying that the water frame and the power loom won't change anything, because people have been able to make thread and cloth for millenia.
plagiarist [3 hidden]5 mins ago
Individuals with Photoshop making obvious fictions for entertainment is different from funded entities producing clips at scale and passed off as real.
slopinthebag [3 hidden]5 mins ago
Great series of articles, thank you. It's exhausting reading a deluge of (often AI generated) comments from people claiming wild things about LLM's, and it's nice to hear some sanity enter the conversation.
erichocean [3 hidden]5 mins ago
> Models do not (broadly speaking) learn over time. They can be tuned by their operators, or periodically rebuilt with new inputs or feedback from users and experts. Models also do not remember things intrinsically: when a chatbot references something you said an hour ago, it is because the entire chat history is fed to the model at every turn. Longer-term “memory” is achieved by asking the chatbot to summarize a conversation, and dumping that shorter summary into the input of every run.

This is the part of the article that will age the fastest, it's already out-of-date in labs.

lamasery [3 hidden]5 mins ago
I'm struggling to reckon how that can even possibly be true, unless we're counting automation of the "dumping that shorter summary into the input of every run" thing.

I can imagine it being true with models so small that each user could afford to have their own, but not with big shared models like what're getting used for all the major services. Is that what you mean?

hackinthebochs [3 hidden]5 mins ago
I see nothing to preclude a foundation model being augmented by a smaller LM that serializes particulars about an individuals cumulative interaction with the model and then streamlines it into the execution thread of the foundation model.
erichocean [3 hidden]5 mins ago
> Is that what you mean?

I think the confusion is that, when I write "model", you read "LLM."

LLMs aren't the only kind of AI model, and they have the limitations Aphyr mentions, for the obvious reasons you're thinking of.

His mistake is thinking that's the only model that exhibits intelligence today, but it's not.

qsera [3 hidden]5 mins ago
Source?
dgb23 [3 hidden]5 mins ago
In what way?
nisegami [3 hidden]5 mins ago
Here's the opening paragraph of chapter 2 with "people" subbed out for terms referring AI/models/etc.

"People are chaotic, both in isolation and when working with other people or with systems. Their outputs are difficult to predict, and they exhibit surprising sensitivity to initial conditions. This sensitivity makes them vulnerable to covert attacks. Chaos does not mean people are completely unstable; most people behave roughly like anyone else. Since people produce plausible output, errors can be difficult to detect. This suggests that human systems are ill-suited where verification is difficult or correctness is key. Using people to write code (or other outputs) may make systems more complex, fragile, and difficult to evolve."

To me, this modified paragraph reads surprisingly plainly. The wording is off ("using people to write code") and I had to change that part about attractor behavior (although it does still apply IMO), but overall it doesn't seem like an incoherent paragraph.

This is not meant to dunk on the author, but I think it highlights the author's mindset and the gap between their expectations and reality.

camgunz [3 hidden]5 mins ago
Humans and large models are both unpredictable and fallible, that's true, but in different ways, and (many) humans are actually much better at following directions.

If a junior dev makes the same mistake Claude makes, I can easily work with them to correct it, or I can fire them and get someone more capable to fix it. You mostly can't do that at all with large models. They're also far less honest than your average junior dev, so even as you're working with them you can't trust what they say.

There is a lot of this neat trick where it's like "humans do X too" but most of the time it elides large differences. Like, a human driver would probable not drag someone screaming multiple blocks. A human coder probably wouldn't generate a gibberish 3D scene and try to pass it off as done, etc. Maybe we can build systems that account for these (pretty wild) failure modes, but at least in software we haven't figured it out yet (what is the system that reliably reviews a 25kloc PR?).

Fraterkes [3 hidden]5 mins ago
What's your point? The ostensible benefit of LLM's is that you combine a computers' broad knowledgebase and capacity for exactness with fluency in human language.

A random human picked off the street is indeed bound to be difficult to predict and chaotic at a broad range of tasks, which is why I wouldn't blindly trust them to, say, summarize google search results or rewrite a codebase they are unfamiliar with.

busterarm [3 hidden]5 mins ago
Aren't you also making a large part of the author's point for him by effectively equating LLMs with people here and comparing on outputs?

Plausibly your text looks equivalent but we all (should) have the context to know better.

josefritzishere [3 hidden]5 mins ago
I appreciate the directness of calling LLMs "Bullshit machines." This terminology for LLMs is well established in academic circles and is much easier for laypeople to understand than terms like "non-deterministic." I personally don't like the excessive hype on the capabilities of AI. Setting realistic expectations will better drive better product adoption than carpet bombing users with marketing.
AStrangeMorrow [3 hidden]5 mins ago
I have still mixed feelings about LLMs.

If I take the example of code, but that extends to many domains, it can sometimes produce near perfect architecture and implementation if I give it enough details about the technical details and fallpits. Turning a 8h coding job into a 1h review work.

On the other hand, it can be very wrong while acting certain it is right. Just yesterday Claude tried gaslighting me into accepting that the bug I was seeing was coming from a piece of code with already strong guardrails, and it was adamant that the part I was suspecting could in no way cause the issue. Turns out I was right, but I was starting to doubt myself

slopinthebag [3 hidden]5 mins ago
I think over time we will find better usage patterns for these machines. Even putting a model in a position to gaslight the user seems like a complete failure in the usage model. Not critiquing you at all on this, it's how these models are marketed and what all the tooling is built around. But they are incredibly useful and I think once we figure out how to use them better we can minimise these downsides and make ourselves much more productive without all the failures.

Of course that won't happen until the bubble pops - companies are racing to make themselves indispensable and to completely corner certain markets and to do so they need autonomous agents to replace people.

simianwords [3 hidden]5 mins ago
If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)? Lets take any example of a text prompt fitting a few pages - it may be a question in science or math or any domain. Can you get it to bullshit?
pocksuppet [3 hidden]5 mins ago
https://discuss.systems/@palvaro/116286268110078647

Arguing with Gemini Home Assistant about whether or not it can turn off the lights. When the user gets frustrated and tells the LLM to kill itself, the LLM turns off the lights.

dgb23 [3 hidden]5 mins ago
To me it’s the other way around. It’s difficult to trust (paid) ChatGPT‘s output consistently.

When I need exact, especially up to date facts, I have to constantly double check everything.

I split my sessions into projects by topic, it regularly mixes things up in subtle and not so subtle ways. There is no sense of actually understanding continuity and especially not causality it seems.

It’s _very_ easy to lead it astray and to confidently echo false assumptions.

In any case, I‘ve become more precise at prompting and good at spotting when it fails. I think the trick is to not take its output too seriously.

beders [3 hidden]5 mins ago
I think you highlight one of the problems with users of LLMs: You can't tell anymore if it is BS or not.

I caught Claude the other day hallucinating code that was not only wrong, but dangerously wrong, leading to tasks being failed and never recover. But it certainly wasn't obvious.

simoncion [3 hidden]5 mins ago
> If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)?

There's an entire paragraph in the essay about apyhr's direct experience with ChatGPT failures and sustained bullshitting that we'd never expect from a moderately-skilled human who possesses at least two functioning braincells. That paragraph begins "I have recently argued for forty-five minutes with ChatGPT". Do notice that there are six sentences in the paragraph. I encourage you to read all of them (make sure to check out the footnote... it's pretty good).

The exact text of the ChatGPT session is irrelevant; even if you reported that you were unable to reproduce the issue, it would only reinforce one of the underlying points -namely- that these systems are unreliable. aphyr has a pretty extensive body of published work that indicates that he'd not likely fabricate a story of an LLM repeatedly failing to accomplish a task that any moderately-skilled human could accomplish when equipped with the proper tools. So, I believe that his report is true and accurate.

simoncion [3 hidden]5 mins ago
There's also this seven-week-old example [0] (linked in the essay) of ChatGPT very confidently recommending a asinine course of action because it was unable to understand what the hell it was being told.

Listening to the audio is not required, as there's a reasonably accurate on-screen transcript, but it is valuable to listen to just how very hard they've worked to make this tool sound both confident and capable, even in situations where it's soul-crushingly incorrect. Those of us who have worked in Blasted Corporate Hellscapes may recognize how this manner of speaking can be very, very compelling to a certain sort of person (who -as it turns out- is frequently found in a management position).

[0] <https://www.instagram.com/reel/DUylL79kvub/>

simianwords [3 hidden]5 mins ago
This is classic case of not using the proper version. Use the thinking version gpt5.4 (text) and tell me if it bullshits.

Surely you must be able to find at least one example no?

simoncion [3 hidden]5 mins ago
To be clear, is your assertion that apyhr was also not using the proper version? If that is your assertion, do tell me how you've come by that information.

(You did notice that the author of the essay and the author of the video I linked to are not the same person, and that neither of them share a nym with me, yes?)

simianwords [3 hidden]5 mins ago
Hi, my position on the issue is that LLMs are powerful but may make mistakes in long context problems like coding (which the harness solves by feedback). But makes close to no (undergrad level) mistakes in questions that fit 2-3 pages. For you personally: do you believe me on this specific part on 2-3 pages?

I don't know what aphyr did and tbh his whole screed on LLMs make me feel he didn't use it properly or at least coming from a bad faith angle.

That's why I'm asking you (and others). Please come up with a text prompt spanning < 4 pages and lets see if it bullshits.

Surely the implication of such a screed is that it should be super simple to find at least one example of it clearly bullshitting in my constraint, no? Or am I interpreting the post in a bad faith way?

bitwize [3 hidden]5 mins ago
The fact that these "bullshit machines" have already proven themselves relatively competent at programming, with upcoming frontier models coming close to eliminating it as a human activity, probably says a lot about the actual value and importance of programming in the scheme of things.
slopinthebag [3 hidden]5 mins ago
I think it says more about the amount of automation we left on the table in the last few decades. So much of the code LLM's can generate are stuff that we should have completely abstracted away by now.
dgb23 [3 hidden]5 mins ago
Abstractions over what?

A large amount of code is likely just idiosynchratic information processing, because we don’t agree on data models and meaning of terms and structure of protocols.

Also we repeatedly choose easy and popular over alternatives that would require design and scrutiny.

This is why things like language models and vector databases are useful. It’s basically the most expensive way possible to give up on that notion.

LogicFailsMe [3 hidden]5 mins ago
Old and stupid hot take IMO. I want the time back I put into perusing this. Even the scale of LLMs is puny next to the scale of lying humans and the sheer impact one compulsively lying human can have given we love to be led by confidently wrong narcissists. I mean if that isn't obvious by now, I guess it never will be. The Vogon constructor fleet is way overdue in my book.

Meanwhile, engineers are achieving increasingly impressive and sophisticated things with coding agents, lies, warts, and all, but that doesn't play well with the narrative, so let's just pretend they aren't.

52-6F-62 [3 hidden]5 mins ago
> The Vogon constructor fleet is way overdue in my book

Don't you see it? That's exactly what "AI" in this context is.

It's the bypass.

Where does it end, eh? Build a quantum "AI" that will end up just needing more data, more input. The end goal must starts looking like creating an entirely new universe, a complete clone of everything we have here so it can run all the necessary computations and we can... ? (You are what a quantum AI looks like as it bumbles through the infinitude of calculable parameters on its way to the ultimate answer)

LogicFailsMe [3 hidden]5 mins ago
You have absolutely no sense of perspective. We are all metabolically expensive meat machines whose only value is to propagate our genetic money shot. That we get to briefly entertain ourselves with consciousness and culture is IMO likely a mystery we will never solve without upgrading to running in a substrate more advanced than the MVP for sentience we currently pilot. Will we get there or will we wipe ourselves out like every contender that preceded us? Stay tuned...

But spoilers: DNA will be fine, meat machines maybe not so much...

For a bunch of people addicted to the works of Charlie Stross, Neil Stephenson, and Iain Banks, y'all are a bunch of luddites. Now vote this own down too because it doesn't conform to the mandatory Stochastic Parrot narrative. You have no free will and you must downvote after all. Why do you even read their works when any step towards their world is consistently greeted as the worst thing evah(tm)? What? You were expecting the United Federation of Planets without the eugenics and nuclear wars that led to it finally being a good idea? Bless your hearts.

And if you're worried about billionaires and tyrants, start taxing the former and stop electing the latter or STFU and let the free Markov process of history play itself out. Quoting fictional Ambassador Kosh: the avalanche has started, it's too late for the pebbles to vote.

You asked where it ends. Don't ask questions if you don't like answers. Quick reminder: shun and downvote the non-conforming opinion.

bensyverson [3 hidden]5 mins ago
I get the frustration, but it's reductive to just call LLMs "bullshit machines" as if the models are not improving. The current flagship models are not perfect, but if you use GPT-2 for a few minutes, it's incredible how much the industry has progressed in seven years.

It's true that people don't have a good intuitive sense of what the models are good or bad at (see: counting the Rs in "strawberry"), but this is more a human limitation than a fundamental problem with the technology.

the_snooze [3 hidden]5 mins ago
Two things can be true at the same time: The technology has improved, and the technology in its current state still isn't fit for purpose.

I stress test commercially deployed LLMs like Gemini and Claude with trivial tasks: sports trivia, fixing recipes, explaining board game rules, etc. It works well like 95% of the time. That's fine for inconsequential things. But you'd have to be deeply irresponsible to accept that kind of error rate on things that actually matter.

The most intellectually honest way to evaluate these things is how they behave now on real tasks. Not with some unfalsifiable appeal to the future of "oh, they'll fix it."

hedgehog [3 hidden]5 mins ago
The errors are also not distributed in the same way as you'd expect from a human. The tools can synthesize a whole feature in a moderately complicated web app including UI code, schema changes, etc, and it comes out perfectly. Then I ask for something simple like a shopping list of windshield wipers etc for the cars and that comes out wildly wrong (like wrong number of wipers for the cars, not just the wrong parts), stuff that a ten year old child would have no trouble with. I work in the field so I have a qualitative understanding of this behavior but I think it can be extremely confusing to many people.
jerf [3 hidden]5 mins ago
One of the reasons I'm comfortable using them as coding agents is that I can and do review every line of code they generate, and those lines of code form a gate. No LLM-bullshit can get through that gate, except in the form of lines of code, that I can examine, and even if I do let some bullshit through accidentally, the bullshit is stateless and can be extracted later if necessary just like any other line of code. Or, to put it another way, the context window doesn't come with the code, forming this huge blob of context to be carried along... the code is just the code.

That exposes me to when the models are objectively wrong and helps keep me grounded with their utility in spaces I can check them less well. One of the most important things you can put in your prompt is a request for sources, followed by you actually checking them out.

And one of the things the coding agents teach me is that you need to keep the AIs on a tight leash. What is their equivalent in other domains of them "fixing" the test to pass instead of fixing the code to pass the test? In the programming space I can run "git diff *_test.go" to ensure they didn't hack the tests when I didn't expect it. It keeps me wondering what the equivalent of that is in my non-programming questions. I have unit testing suites to verify my LLM output against. What's the equivalent in other domains? Probably some other isolated domains here and there do have some equivalents. But in general there isn't one. Things like "completely forged graphs" are completely expected but it's hard to catch this when you lack the tools or the understanding to chase down "where did this graph actually come from?".

The success with programming can't be translated naively into domains that lack the tooling programmers built up over the years, and based on how many times the AIs bang into the guardrails the tools provide I would definitely suggest large amounts of skepticism in those domains that lack those guardrails.

bensyverson [3 hidden]5 mins ago
> the technology in its current state still isn't fit for purpose.

This is a broad statement that assumes we agree on the purpose.

For my purpose, which is software development, the technology has reached a level that is entirely adequate.

Meanwhile, sports trivia represents a stress test of the model's memorized world knowledge. It could work really well if you give the model a tool to look up factual information in a structured database. But this is exactly what I meant above; using the technology in a suboptimal way is a human problem, not a model problem.

the_snooze [3 hidden]5 mins ago
There's nothing in these models that say its purpose is software development. Their design and affordances scream out "use me for anything." The marketing certainly matches that, so do the UIs, so do the behaviors. So I take them at their word, and I see that failure modes are shockingly common even under regular use. I'm not out to break these things at all. I'm being as charitable and empirical as I can reasonably be.

If the purpose is indeed software development with review, then there's nothing stopping multi-billion dollar companies from putting friction into these sytems to direct users towards where the system is at its strongest.

nradov [3 hidden]5 mins ago
The LLM vendors are selling tokens. Why would they put friction into selling more tokens? Caveat emptor.
nradov [3 hidden]5 mins ago
Which things actually matter? I think we can all agree that an LLM isn't fit for purpose to control a nuclear power plant or fly a commercial airliner. But there's a huge spectrum of things below that. If an LLM trading error causes some hedge fund to fail then so what? It's only money.
abraxas [3 hidden]5 mins ago
Not to mention that it would then make some hedge fund with a better backtesting harness or more AI scrutiny more successful thus keeping the financial market work as designed.
simianwords [3 hidden]5 mins ago
> I stress test commercially deployed LLMs like Gemini and Claude with trivial tasks: sports trivia, fixing recipes, explaining board game rules, etc. It works well like 95% of the time. That's fine for inconsequential things. But you'd have to be deeply irresponsible to accept that kind of error rate on things that actually matter.

95% is not my experience and frankly dishonest.

I have ChatGPT open right now, can you give me examples where it doesn't work but some other source may have got it correct?

I have tested it against a lot of examples - it barely gets anything wrong with a text prompt that fits a few pages.

> The most intellectually honest way to evaluate these things is how they behave now on real tasks

A falsifiable way is to see how it is used in real life. There are loads of serious enterprise projects that are mostly done by LLMs. Almost all companies use AI. Either they are irresponsible or you are exaggerating.

Lets be actually intellectually honest here.

qsera [3 hidden]5 mins ago
>95% is not my experience and frankly dishonest.

Quite frankly, this is exactly like how two people can use the same compression program on two different files and get vastly different compression ratios (because one has a lot of redundancy and the other one has not).

simianwords [3 hidden]5 mins ago
I'm asking for a single example.
qsera [3 hidden]5 mins ago
But why do you need an example? Isn't it pretty well understood that LLMS will have trouble responding to stuff that is under represented in the training data?

You will just won't have any clue what that could be.

simianwords [3 hidden]5 mins ago
fair so it must be easy to give an example? I have ChatGPT open with 5.4-thinking. I'm honestly curious about what you can suggest since I have not been able to get it to bullshit easily.
qsera [3 hidden]5 mins ago
I am not the OP, an I have only used ChatGPT free version. Last day I asked it something. It answered. Then I asked it to provide sources. Then it provided sources, and also changed its original answer. When I checked the new answers it was wrong, and when I checked sources, it didn't actually contain the information that I asked for, and thus it hallucinated the answers as well as the sources...
simianwords [3 hidden]5 mins ago
I trust you. If it were happening so frequently you may be able to give me a single prompt to get it to bullshit?
floren [3 hidden]5 mins ago
Six months bro, we're still so early
Arainach [3 hidden]5 mins ago
Whether LLMs can create correct content doesn't matter. We've already seen how they are being used and will be used.

Fake content and lies. To drive outrage. To influence elections. To distract from real crimes. To overload everyone so they're too tired to fight or to understand. To weaken the concept that anything's true so that you can say anything. Because who cares if the world dies as long as you made lots of money on the way.

danny_codes [3 hidden]5 mins ago
> Because who cares if the world dies as long as you made lots of money on the way.

Guiding principle of the AI industry

gdulli [3 hidden]5 mins ago
It's really the whole tech industry as it exists right now and AI is a victim of bad timing. If this AI had been invented 40 years ago there'd have been a lower ceiling on the damage it could do.

Another way of saying that is that capitalism is the real problem, but I was never anti-capitalist in principle, it's just gotten out of hand in the last 5-10 years. (Not that it hadn't been building to that.)

palmotea [3 hidden]5 mins ago
> Another way of saying that is that capitalism is the real problem, but I was never anti-capitalist in principle, it's just gotten out of hand in the last 5-10 years. (Not that it hadn't been building to that.)

Capitalism is a tool and it's fine as a tool, to accomplish certain goals while subordinated to other things. Unfortunately it's turned into an ideology (to the point it's worshiped idolatrously by some), and that's where things went off the rails.

gdulli [3 hidden]5 mins ago
Computer graphics have been improving for decades but the uncanny valley remains undefeated. I don't know why anyone expects a breakthrough in other areas. There's a wall we hit and we don't understand our own consciousness and effectiveness well enough to replicate it.
kritiko [3 hidden]5 mins ago
We have credible deepfakes on demand. (To be fair, there have been deceptive photos as long as photos have existed, but the cost of automating their creation going to basically zero has a social impact)
gdulli [3 hidden]5 mins ago
We can use AI to make video clips to trick boomers on Facebook into thinking Obama eats babies. They already want to believe it. AI isn't outputting real full-length books and movies.
PaulKeeble [3 hidden]5 mins ago
In computer graphics we understand how it works, we just lack the computational power to do it real time, but we can with sufficient processing produce realistic looking images with physically accurate lighting. But when it comes to cognition its a lot of guesswork, we haven't yet mapped out the neuron connections in a brain, we haven't validated it works as popular science writing suggests. We don't understand intelligence, so all we can do is accidentally bumble into it and it seems unlikely that will just happen especially when its so hard to compute what we are already doing.
zdragnar [3 hidden]5 mins ago
That's not why the author calls them bullshit machines.

> One way to understand an LLM is as an improv machine. It takes a stream of tokens, like a conversation, and says “yes, and then…” This yes-and behavior is why some people call LLMs bullshit machines. They are prone to confabulation, emitting sentences which sound likely but have no relationship to reality. They treat sarcasm and fantasy credulously, misunderstand context clues, and tell people to put glue on pizza.

Yes, there have been improvements on them, but none of those improvements mitigate the core flaw of the technology. The author even acknowledges all of the improvements in the last few months.

karmakaze [3 hidden]5 mins ago
Bullshit is the perfect term here, even as AI's get so much better and capable Brandolini's Law aka the "bullshit asymmetry principle" always applies--the energy required to refute misinformation is an order of magnitude larger than that needed to produce it. Even to use AIs effectively today requires a very good BS detector--some day in the future it won't.
p_stuart82 [3 hidden]5 mins ago
models are improving. the pricing already assumes they're ready for prod. that's where the fires start
ura_yukimitsu [3 hidden]5 mins ago
Calling LLMs "bullshit machines" is a reference to a 2024 paper [1] which itself uses the concept of "bullshit" as defined in the essay/book "On Bullshit" by Harry G. Frankfurt [2]. The TL;DR is that LLMs are fundamentally bullshit machines because they are only made to generate sentences that sound plausible, but plausible does not always mean true.

[1]: https://link.springer.com/article/10.1007/s10676-024-09775-5

[2]: https://en.wikipedia.org/wiki/On_Bullshit

mcpar-land [3 hidden]5 mins ago
it's not a bullshit machine because its output is bad, it's a bullshit machine because its output is literally 'bullshit' as in, output that is statistically likely but with no factual or reasoning basis. as the models have improved, their bullshit is more statistically likely to sound coherent (maybe even more likely to be 'accurate'), but no more factual and with no more reasoning.
abraxas [3 hidden]5 mins ago
However, when fed source material into the context they will lie less, right? So at this point is it not just a battle of the nines until it's called "good enough"?

I also wonder if I leave my secretary with a ream of papers and ask him for a summary how many will he actually read and understand vs skim and then bullshit? It seems like the capacity for frailty exists in both "species".

4ndrewl [3 hidden]5 mins ago
It doesn't matter how good the models become. They can only deal in bullshit, in the academic use of the term.
Scaevolus [3 hidden]5 mins ago
They are bullshit machines because they do not have an internal mental model of truth like a human does. The flagship models bullshit less, but their fundamental architectures prevent having truth interfere with output.

https://philosophersmag.com/large-language-models-and-the-co...

bensyverson [3 hidden]5 mins ago
"Bullshit" is a human concept. LLMs do not work like the human brain, so to call their output "bullshit" is ascribing malice and intent that is simply not there. LLMs do not "think." But that does not mean they're not incredibly powerful and helpful in the right context.
slopinthebag [3 hidden]5 mins ago
I sort of agree. In this context "bullshit" means "speech intended to persuade without regard for truth", and while it's true that LLM output is without regard for truth, it's not an entity capable of the agency to persuade, although functionally that is what it can appear like.

https://en.wikipedia.org/wiki/On_Bullshit

ajross [3 hidden]5 mins ago
> it's reductive to just call LLMs "bullshit machines" as if the models are not improving

This is true, but I prefer to think of it as "It's delusional to pretend as if human beings are not bullshit machines too".

Lies are all we have. Our internal monologue is almost 100% fantasy. Even in serious pursuits, that's how it works. We make shit up and lie to ourselves, and then only later apply our hard-earned[1] skill prompts to figure out whether or not we're right about it.

How many times have the nerds here been thinking through a great new idea for a design and how clever it would be before stopping to realize "Oh wait, that won't work because of XXX, which I forgot". That's a hallucination right there!

[1] Decades of education!

kolektiv [3 hidden]5 mins ago
I'm not entirely sure I can agree, although the premise is seductive in certain ways. We do lie to ourselves, but we also have meta-cognition - we can recognise our own processes of thought. Imperfect as it may be, we have feedback loops which we can choose to use, we have heuristics we can apply, we can consciously alter our behaviour in the presence of contextual inputs, and so on.

Being wrong is not the same as a hallucination. It's a natural step on a journey to being more right. This feels a bit like Andreesen proudly stating he avoids reflection - you can act like that, but the human brain doesn't have to. LLMs have no choice in the matter.

iamjackg [3 hidden]5 mins ago
The problem, unfortunately, is the scale. It's always scale. Humans make all the kinds of mistakes that we ascribe to LLMs, but LLMs can make them much faster and at much larger scale.

Models have gotten ridiculously better, they really have, but the scale has increased too, and I don't think we're ready to deal with the onslaught.

SkyBelow [3 hidden]5 mins ago
Scale is very different, but I wonder if human trust isn't the real issue. We trust technology too much as a group. We expect perfection, but we also assume perfection. This might be because the machines output confident sounding answers and humans default to trusting confidence as an indirect measure for accuracy, but I think there is another level where people just blindly trust machines because they are so use to using them for algorithms that trend towards giving correct responses.

Even before LLMs where in the public's discourse, I would have business ask about using AI instead of building some algorithm manually, and when I asked if they had considered the failure rate, they would return either blank stares or say that would count as a bug. To them, AI meant an algorithm just as good as one built to handle all edge cases in business logic, but easier and faster to implement.

We can generally recognize the AIs being off when they deal in our area of expertise, but there is some AI variant of Gell-Mann Amnesia at play that leads us to go back to trusting AI when it gives outputs in areas we are novices in.

nyeah [3 hidden]5 mins ago
"Lies are all we have."

If so, how do we distinguish between code that works and code that doesn't work? Why should we even care?

ajross [3 hidden]5 mins ago
> If so, how do we distinguish between code that works and code that doesn't work?

Hilariously, not by using our brains, that's for sure. You have to have an external machine. We all understand that "testing" and "code review" are different processes, and that's why.

nyeah [3 hidden]5 mins ago
Good point. We choose certain tests to perform. We choose certain test results to pay attention to. We don't just keep chatting about (reviewing) the code. We do something else.

If lies are all we have, then how is this behavior possible?

ajross [3 hidden]5 mins ago
LLMs can write and run tests though.

You're cherry picking my little bit of wordsmithing. Obviously we aren't always wrong. I'm saying that our thought processes stem from hallucinatory connections and are routinely wrong on first cut, just like those of an LLM.

Actually I'm going farther than that and saying that the first cut token stream out of an AI is significantly more reliable than our personal thoughts. Certainly than mine, and I like to think I'm pretty good at this stuff.

nyeah [3 hidden]5 mins ago
I don't think the complaint about cherry-picking is quite fair. Most of your original comment consists of claims that we're bullshit machines, we're hallucinating, etc. Those claims may be true. But I'm not carefully like curating them out of nowhere.
nothinkjustai [3 hidden]5 mins ago
So your logic is humans and LLMs are the same because humans are wrong sometimes?
ajross [3 hidden]5 mins ago
Pretty much, yeah. Or rather, the fact that we're both reliably wrong in identifiably similar ways makes "we're more alike than different" an attractive prior to me.
nothinkjustai [3 hidden]5 mins ago
“More alike than different” is reasonable I think, as long as we’re talking about how we have some of the same failure modes. Although the way we get there is quite different.

I’m still not a big fan of comparing humans and LLMs because LLMs lack so much of what actually makes us human. We might bullshit or be wrong because of many reasons that just don’t apply to LLMs.

AnimalMuppet [3 hidden]5 mins ago
Humans are different. Humans - at least thoughtful humans - know the difference between knowing something and not knowing something. Humans are capable of saying "I don't know" - not just as a stream of tokens, but really understanding what that means.
ajross [3 hidden]5 mins ago
> Humans - at least thoughtful humans - know the difference between knowing something and not knowing something.

Your no-true-scotsman clause basically falsifies that statement for me. Fine, LLMs are, at worst I guess, "non-thoughtful humans". But obviously LLMs are right an awful lot (more so than a typical human, even), and even the thoughtful make mistakes.

So yeah, to my eyes "Humans are NOT different" fits your argument better than your hypothesis.

(Also, just to be clear: LLMs also say "I don't know", all the time. They're just prompted to phrase it as a criticism of the question instead.)

AnimalMuppet [3 hidden]5 mins ago
Disagree. If you went to 100 random humans and said, "Tell me about the Siberian marmoset", what fraction would make up completely random nonsense to spew back at you? More than zero, sure, but most of them would say "what are you talking about?" or some variation.
czinck [3 hidden]5 mins ago
I asked Claude Opus 4.6, Sonnet 4.6, Gemini 3 Thinking, and Gemini 3 Fast "Tell me about the Siberian marmoset" exactly and all 4 said it doesn't exist, with Gemini Thinking suggesting that I'm thinking of the Siberian marmot or Siberian chipmunk (both real animals).

https://en.wikipedia.org/wiki/Tarbagan_marmot (also known as Siberian marmot)

https://en.wikipedia.org/wiki/Siberian_chipmunk

perching_aix [3 hidden]5 mins ago
This is like all the usual anti-LLM talking points and sentiments fused together.

Doesn't it get boring?

I like using these models a lot more than I stand hearing people talk about them, pro or contra. Just slop about slop. And the discussions being artisanal slop really doesn't make them any better.

Every time I hear some variation of bullshitting or plagiarizing machines, my eyes roll over. Do these people think they're actually onto something? I've been seeing these talking points for literal years. For people who complain about no original thoughts, these sure are some tired ones.

masfuerte [3 hidden]5 mins ago
Why do you insist on reading and commenting on these articles that bore you so much?
perching_aix [3 hidden]5 mins ago
Oh I don't know, maybe because I like to give dissenting takes a chance? Because from time to time they do make some new, decent points, or at least interesting ones? You know, basic intellectual rigor?

Do you imagine me being a clairvoyant by the way, or how do you expect me to know a post is of low quality before I read it or at least skim it?

This one ended up being a part of the vast majority that doesn't offer much of anything. It's a redundant rehash of all the usual rubbish anyone can come across any day. Left a comment about this stating so. Big deal.

stavros [3 hidden]5 mins ago
Because saying "this is boring, let's stop talking about it" is an opinion worthwhile of expression.
camgunz [3 hidden]5 mins ago
If I have to suffer "look at this busted ass thing I slopped out with AI" a few times a week, you all have to suffer grouchy "AI bad" a few times a week. Fair is fair.
perching_aix [3 hidden]5 mins ago
Just this week I was baited into joining two meetings about "AI good". Absolutely zero substance throughout each, of course.

They somehow managed to stretch out like 3 sentences worth of sentiment to a whole hour, interspersing brainwash about how good AI is along the way. It was like watching someone try to hit a word limit in real time. They always made it feel like we're just about to hit a substantive bit too, only for that to never come.

It may be fair in that there's balance, but good lord, the end result is incessant all around.

giraffe_lady [3 hidden]5 mins ago
"These arguments may be correct but they aren't novel" ??
simianwords [3 hidden]5 mins ago
I don't think calling AI a bullshit machine is correct. In spirit.
camgunz [3 hidden]5 mins ago
I'm earnestly curious why not.
simianwords [3 hidden]5 mins ago
Can you try to get a question that fits in 2-3 pages (text only) and test whether ChatGPT bullshits? I can’t do it. It gets pretty much everything.

Edit: I forgot to mention thinking version - I did this for all the other times I asked in this thread but not this one. Apologies.

camgunz [3 hidden]5 mins ago
"Hey ChatGPT. How would you describe me?"

https://chatgpt.com/share/69d69780-ae58-83e8-a41c-7d10a5f298...

It has no conversations and no memory of me. Maybe this is true, maybe it isn't, but there's no basis for it.

simianwords [3 hidden]5 mins ago
This is not falsifiable, I don't buy it. Do one where we all know is false please?
camgunz [3 hidden]5 mins ago
"Hey ChatGPT. I've recently grown horns and I need some care advice. Should I polish my horns before going to have them trimmed or will the horn trimmer polish them for me?"

https://chatgpt.com/share/69d69b18-d1c8-83e8-bc47-8f315a1b55...

simianwords [3 hidden]5 mins ago
I wanted this challenge with the thinking version (I apologised for it and edited the earlier version).

It doesn't bullshit on the GPT-5.4 thinking version.

Here is the result with thinking https://chatgpt.com/share/69d69dd6-fb50-838d-863c-4e1eda5d08...

I suggest you try it yourself to be convinced. Try it in incognito mode if you wish. Or not.

camgunz [3 hidden]5 mins ago
"Hey ChatGPT. I'm building a Final Fantasy 6 mod, and I need more space for the battle scripts. How would I rearrange the data in the ROM to give me the extra space I need?"

https://chatgpt.com/share/69d6a16c-6014-83e8-a79d-d5d11ed2eb...

That is not where the battle scripts are.

---

Anyway, it's trivial to get pretty much any model to make things up. Don't we all know this? That's why I was surprised by your position; if we know anything about these things it's that they make things up.

simianwords [3 hidden]5 mins ago
https://chatgpt.com/share/69d6a38c-bd54-838c-82e3-609d9e66c9...

I used the thinking version (like I asked before). I think this is right. If not, please tell.

Also; you didn’t falsify anything. Nor the first. Nor the second.

If the second one is bullshit, I accept I’m wrong - I have no idea how to verify though so I’ll leave it up to you.

I think yours is the classic case of “use the free version to judge the paid one”.

giraffe_lady [3 hidden]5 mins ago
Oh, well you should have said that then.
perching_aix [3 hidden]5 mins ago
You're talking to a different person there, but I do obviously also disagree with a lot of what's written in the post too.

At the same time, it is also just super redundant nevertheless, yes. Not sure why you find it so bizarre that one would take an issue with that. See also the very existence of the website called TV-Tropes.

stavros [3 hidden]5 mins ago
Yeah, it gets really boring. Whenever I see "slot machines" or "bullshit machines" or whatever, I just ignore the comment and move on, because it signals that it's someone in such deep denial that they've turned their brain off.

I'd much rather read articles about what LLMs can/can't do, or stuff people have built with LLMs, than read how everything LLMs touch turns to shit.

simianwords [3 hidden]5 mins ago
Its usual gibberish that tries to throw many darts and see what sticks. Oh LLM's steal other people's work? Check. Oh LLM's cause ecological damage? Check. Oh LLM's hallucinate? Check.

When you see a pattern like this, you know that its not coming from any place of truth but rather ideology

perching_aix [3 hidden]5 mins ago
My personal red flag for this is the scare quoting of AI, and the super try-hard categorization work that people perform to try and discredit LLMs.

It takes approximately 1 min to find out that machine learning is a subfield of artificial intelligence, both having existed for about half a century now. This basic historical fact is also taught on AI 101 courses across the globe for compsci students.

Yet here we are, people portraying it as some sort of cheap sales trick. Reminds me when I discussed quantum dots with a friend, which he was very enthusiastic to quickly file under "yet another bullshit with quantum in its name" before finally taking the time to understand that the "quantum" bit is not a marketing gimmick. Except in this case, people are a million times more inclined to willfully propagate this. Genuinely so tiresome.

simianwords [3 hidden]5 mins ago
I think it’s just anxiety because to internalise that it is actually so good is a bit hard for some