Uber's $1,500/month AI limit is a useful signal for AI tool pricing
https://www.bloomberg.com/news/articles/2026-06-02/uber-caps... (https://archive.ph/ZrwAy)
566 points by pdyc - 690 commentshttps://www.bloomberg.com/news/articles/2026-06-02/uber-caps... (https://archive.ph/ZrwAy)
566 points by pdyc - 690 comments
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
But, yeah, the prices will come down one way or the other.
At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
Not only that, China may subsidize AI, but so does the US.
So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE
Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
Drugs cost pennies to manufacture after they are researched and make their way through the approval pipeline. There are many generic drug manufacturers who can work off the existing formulas.
The more apt comparison is that LLMs won't be un-trained. Opus 4.8 now exists. Even if Anthropic somehow went bankrupt, that particular asset could, at the very least, be sold for proverbial pennies on the dollar to a "generic" inference provider.
If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.
Only if that skeleton crew had deep deep pockets. If Anthropic closed their doors tomorrow because the market collectively saw that AI was not profitable and so open sourced everything, there wouldn't be any money to train Opus 5.0... it would then have to fall on governments to put money into the hat (which I can't see happening unless it was Europe)
Hardware fails, and also scales out in terms of efficacy to run it as more power efficient, modern hardware turns up. It requires constant investment to keep it useful, and cost efficient
When AI pops, we'll temporarily have some extra compute capacity that will be horrendously uneconomical to run due to the high grid load and low consumer demand, before they get shutdown. There's simply no real use for them at this scale
It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.
Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet
Inference is much cheaper than training a new model, so running them just for inference is a completely different thing than having to price in the fact that at the moment all of these companies need to compromise between compute for inference and compute for training new models. If no new models were to be trained, and all the compute was inference only, that would change everything when it comes to the overall compute cost of AI.
Dotcom infra buildup is a bad comparison, in that it wasn't even close to being all utilized. The infra was completely overproportional to the day to day usage.
If all these other data centers were anywhere near coming on line, that 300mw data center would be a rounding error not a line item as it is right now.
So someone's signed contracts for way more and way larger data centers, someone's purchased billions in hardware for these not yet operational data centers. I'm wondering how depreciation's going to work on all these assets...
Anyhow, I'm not really sure what "max capacity" is here, nor am I really aware when they're going to be delivering the operational assets that are currently levered to their eyeballs and consuming 1/3rd of the memory made on the planet.
As far as inference vs training, have new gotten radically better than old models or only marginally (at the cost of 10x or more the training costs)?
Very exciting stuff.
With investing timing matters a lot.
Replace servers with regular compute.
These AI "GPUs" are worse for gaming than even the crappiest actual GPUs (with a G as in Graphics). Also, the display drivers won't support them, not officially at least.
The feature being bundled in with GamePass makes it worth it. I used to VPN home and try and run games remotely, but it was honestly a bit of a pain. Just pressing a button and having the game launch is quite nice.
You just run the models and sell the tokens. The demand will still be there even if there will be less money in chasing new frontier model
> GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.
AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it
I think the lighter 40 series cards like L40 still have OK graphics features. But otherwise yeah, after the Ampere generation graphics features went down the drain. The A100 and A40 cards can do graphics well but it already makes no sense in terms of power-to-performance ratio.
I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
Big AI investor tells us that investing in AI is good. Oh, the surprise!
Does that invalidate this point? Yes. Because it makes no sense. The big money is not going to R&D but to build infrastructure that will be outdated in 5 years.
> „[AI vendors are] paying for a fixed cost with a depreciating commodity“
That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?
what makes YouTube YouTube is not the video player it’s the servers that can handle petabytes of uploads a day and billions of views. YouTube software wise, is no different from the 100s of porn websites that are coded by small European teams
e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.
I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.
So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.
‘uber for my industry’ is not a sensible business strategy
Honestly, if you know guys whose bottleneck is pure software dev — please let me know, I have a good, experienced team in Eastern Europe, we can do wonders in product development. But coming up with sensible business ideas and executing on them in the real world is crazy hard and extremely rare.
That would be half a trillion[1] redirected to regular people just from Google Ads.
[1] snatched my number from here: https://pixis.ai/blog/2025-google-advertising-benchmarks-for...
An AI generated man talking about his product building journey to make a pressure washer hose that didn't need power (in the AI video it didn't even have a water supply connected!) that was going to be banned in a week because it was too powerful so buy now.
I've seen AI slop before and scam ads before but the combination of the two gave me some real tingly spider-sense that things are going to get worse and that some unethical people will make a lot of money from it so be in no hurry to stop it.
You can't consider it in vacuum. AI takes limited resources. So far it winded up cost on near every consumer electronics that runs an OS, and it winded up cost of energy that is used by the entire industry and every single customer
It's not just the cost of datacenters, it's cost of infrastructure (that given current direction of US govt will just be paid from people's fucking taxes and bills..) and cost of other industries turning outright unprofitable "thanks" to demands of AI
- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks
- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget
- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price
- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.
This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/
The real measure should be cost per ~equivalent task result, not cost per token nor tokens per task.
I think its only accounting depreciation.
I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
The solder joints are notorious to fail at a high rate too.
They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
Why take risk when you can spend money and take no risk
When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.
I've seen T4s running BERT models for document classification.
When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.
There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.
And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".
The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.
Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.
It cost them far more than it made.
Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
https://s24.q4cdn.com/101481333/files/doc_financials/2026/q3...
"Hard Drive exabyte shipments of 199EB, up 39% YoY, with ~90% shipped to data center customers"
"Data center revenue of $2.5B, up 55% YoY, driven by strengthening cloud and enterprise demand"
And an article: https://www.seagate.com/stories/articles/the-ai-era-doesnt-r...
If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
And remember: V100s hours are sometimes sold at 1/10th the price.
If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
In future, we might have fixed cost GPUs but not today.
And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable.
Thats the main issue here.
These were about half of the cost of an used GPU just used for gaming. By that pricr, I'd say a GPU kept busy has twice as high a chance of failure after two years of use.
Not great, not terrible.
As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
Isn’t that just more work than logging it yourself?
Automating it has been way better for me than the alternative of breaking my flow whenever I'm switching tasks to chart my time, or logging all my hours for the week in one sitting. Different strokes for different folks I suppose.
Model routers allow this to happen automatically without any more work by the user.
> a shittier model
A ton of tasks don't require the most expensive frontier models, etc.
> I’m not sure why anyone does it
1. Faster solutions from the LLM - also reduces employee costs of having the employee waiting on the LLM
2. Avoiding things like the half-billion dollar per month bill for a single company’s LLM use recently reported in Axios
Saves like $2-3 per session. Same quality code.
> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
https://www.anthropic.com/research/2028-ai-leadership
And apparently OpenAI and Anthropic think so, too - why else would they try so hard to ban them instead of outcompeting them?
This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.
China bad by default, right?
Safeguards trained into the model (ie exist in the weights) can’t be removed.
There's a subreddit for people wanting to sex-talk to various models. It just so happens that the same prompt they use to 'jailbreak' SOTA models for sex talks also works if you want to have model write malware, or tell you how to design a highly illegal device.
- Oh, they must have been blocked from entering the Chinese market!
But none of that is true. You could see global brands everywhere here — Tesla, Unilever, KFC, Apple, and so on.
---
Or have you ever actually done cross-border trade? Or any international business collaboration? If you had, you’d definitely realize that what’s really stopping you is U.S. legislation. At least, that was the case with our former U.S. partner
Why even bother with 'forced IP transfer' when you can just take it?
Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.
So it’s not even about datacenters.
Here’s a Reuters article about TSMC: https://www.reuters.com/world/asia-pacific/broadcom-flags-su...
So this is actual committed contracts with all kinds of companies such as Apple, NVidia, AMD.
Also, the whole reason they can’t build data centers faster is precisely because of this.
That was because the supplies the datacentre needed were constrained - supply-constrained, not end-user demand constrained, so would be in agreement with the GP comment (and the article I read didn't imply anything about lying).
A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.
You'd be surprised how much of bias exists in easily extractable information. Now imagine how much of that happens during training, that you can't easily extract.
So this is largely a moot point. Yes, Chinese models will likely have some weird things injected into them. But so do the US models. Do I care? Not in the slightest. Models are my code monkeys, and if the code leaves my machine, I assume IP is leaked be it a Chinese model that clearly tells me they do use the data, or US models that pinky promise they don't.
Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..
Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.
Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.
For deepseek-v4-pro:
- $0.350 in, $0.003000 cache, $0.80 out https://crof.ai/pricing
- $0.435 in, $0.003625 cache, $0.87 out https://api-docs.deepseek.com/quick_start/pricing
Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.
My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.
Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.
Can you expand on this?
Just looked into it, seems like at most they have just 3.2, not 4: https://aws.amazon.com/bedrock/pricing/
Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.
Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.
I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"
You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.
I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.
We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.
- Takes weeks or months to get simple features out the door, and when they're out they're buggy as hell and the bugs never get fixed. Sound familiar?
> I’d never touch any line of code unless I absolutely have to
And this is how legacy code is made. Years of everyone "never touching anything they don't have to" leads to a giant steaming pile of shit.
> unless the business is willing to face some down time
How does a simple refactor cause downtime? I do this kind of stuff all the time and pretty much never cause any downtime. In the very rare cases that prod downtime does occur it's generally not because of some simple code refactor, and we have it back up in no time by just rolling it back. Unless it's not related to the code at all, in which case it also wasn't a refactor that caused it.
Are they even making money off them now ?
I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
V3 pricing from them was right in line with what the commodity providers are charging.
Not everyone using AI is using it to code core value IP.
https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...
There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.
Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
One organization, that is a software company
> which seems to be roughly inline with "normal" consumption for most full-time engineers
My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
Uber is not representative of any trend beyond big tech and VC over funded startups.
You want to master your craft, develop "optimal" systems, understand where things are going by utilizing SOTA.
You can call it FOMO, but you get the point.
But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
It's the pipeline, not the model, that gets you quality at a given token budget.
$1,500/mo * 14 months = $21,000.
If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.
Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
> Review everything and point them in the right direction
Sorry upper management doesn't care. That's an engineering problem that you need to solve.
I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.
Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.
If the median compensation package is $330k/year then the median fully loaded cost is probably around $450-500k.
But yeah, double is insane. When I saw prices for COBRA from Facebook, it was $3300 a month, and that was god-tier insurance - the insurance benefits were so good they had a custom list of what was covered that was probably way better than anything available on the market (e.g. you want brand name drugs? no problem. You don't want to try both ambien and trazadone before taking a sleep medication doctors actually recommend? No problem - etc.) - but for my needs it was barely better than COBRA costing way less than half. $3300/mo, or even $1200/mo for an entry level ops worker is a lot of their salary, and probably where the double comes from. At SWE compensation most of it ceases to scale.
The fully loaded costs including proportional management costs isn't relevant to the true marginal engineer, but estimates I've gotten from higher-ups definitely factor into engineering decisions about "should we spend engineering time to save money/make more money - how much will doing this thing cost the company" (opportunity costs are also relevant, but usually less grounded, since most projects don't have concrete benefits like "we will save $x/yr in infra costs")
It's kind of like neuroscientists found the trigger to tell your brain "we're going to do a clean shutdown now, trigger transition to runlevel 0".
Quiviviq, Dayvigo, Belsomra. All still on-patent, so they don't have generics and are pretty expensive (like $1000/mo if your insurance doesn't cover them). A lot of doctors won't recommend them in practice because most of their patients won't yet be able to get them covered.
Ask your doctor about them, look them up in your insurance's formulary to see what's required (e.g. if you have tried both Ambien and Trazadone and can document it), and see what they can do, before writing it off!
The expectation is Belsomra will lose its patent in 2029 and then generic makers can try to get one approved - so it's not that far off!
My experience was not with pure software houses; we had some labs, measurement and RF equipment, but even without the hardware component the offices, insurance, admin expenses, HR, janitors, conference travel and so on would easily bump the total employee cost to double the salary. My 2c.
If one uses AI minimally and is able to out perform peers who are maxing out AI spend, one might want to use that in salary negotiations.
This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.
Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
They get all the glory, but do none of the work.
It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
You can absolutely do this. It's even right most of the time.
You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.
A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.
I have never seen anywhere in the world people that hates so much the working class as people do in the USA.
In my country the average employee is competent, they do their work and create wealth for the nation.
Again, only in the USA people think that billionaires are the ones creating value. Total non-sense indoctrination.
I find this varies by individual, but the AI taking care of so much boilerplate and rote work of coding, and taking the role of architect, test designer, and reviewer is a lot more productive for me. Check the code may take the same skill, but it's an order of magnitude less work.
Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.
* EDIT * What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).
> I shouldn’t have said that with confidence
> I got ahead of myself there
> I overstepped, allow me to correct that
It’s wild seeing how often it’s wrong, and I only know it’s wrong because I am an SME or actually reading the sources. Most of my coworkers are not SMEs with what they are asking and do not read the sources.
A huge part of my job now is fixing fuck ups and failures resulting from these slop jockeys who have already moved on to slop up the next task.
There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.
Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.
I've never worked at a company that didn't have a technical backlog measured in years.
Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.
Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.
The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.
AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.
Same with the MS surface(?) tables (not tablets). I saw load of companies buy into the hype and then discard.
The Concorde turned out to be fad (not "useless" - which was your reframing.) Touted as the future of travel, each seat cost about $20,000 of today's dollars, but it turned out even at those high prices people and companies were willing to pay per-passenger, supersonic trans-Atlantic air travel is not economically viable, and was discontinued.
Bold prediction. :)
I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
The more things change, the more they stay the same.
I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.
Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.
As far as “boring systems are boring”, I can tell you from experience that I work on a pretty boring system, and AI is not all that meaningful in terms of its impact, and it’s not for a lack of trying.
Can it help me create a migration and add an endpoint and such? Sure. But those aren’t the hard problems. They never were.
It’s funny that you think the idea of slowing down is such a bad one, but it is another well-established truth. Slow is smooth, and smooth is fast. This notion of break/fixing your way to prosperity by way of 10,000 ill-conceived PRs is a fool’s game.
Generally we've modified our timelines heavily, systems are working as intended, company is still making money. There are some AI-authored commits that had mistakes that we didn't catch, but I'm sure this could've been an issue even if all were human-authored. I know first-hand multiple other companies who are doing exactly the same thing.
I agree with "slow is smooth, and smooth is fast" for mission critical systems. But super majority of systems are, indeed, not mission critical.
I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
we've got product folks vibing out prototypes (not shippable but clickable) in our main front end in a few minutes to an hour. This would previously have involved 3 people and several weeks, or a ton of figma and documents to fill in the gaps. This saves weeks to months and lets them really experience the items.
Then they hand it off to someone who knows all that stuff who is also using AI and the impl also gets done faster.
The PMs are either moving infinitely faster, or at least 30x faster and not blocked constantly by others.
basically you're not comparing people who don't know much (tech) with those who do, you're comparing them before and after access to AI.
I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...
Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.
The question is, how quickly does a junior with no experience builds intuition without trial and error.
Often that started with the macro recorder. Then you worked out what that "recorded" code/sludge did, removed the crud you didn't need or want, improved the logic and so on. I bought books to understand it better. Now you can ask a (different) LLM "what is this? why is it used? How would I?" etc which is probably a faster learning curve than books, newsgroups and old school personal home pages with good info.
I would have been quite surprised when I first used a VBA macro in anger just how far I would go down the rabbit hole. C, asm, verilog, Linux were no part of what I originally signed up for!
Some people will specialise in the equivalent of recording macros and go no further. And this will be fine for code that gets it done but doesn't matter too much in the other dimensions (security, reliability, usefulness without the authors' support, etc.) Much like VBA utilities inside companies that were useful way back when. Other people will want what they produce to be better, even good, and they will learn about floating point [1] and all the rest, much as I did. Probably learn pretty fast too. [2]
[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...
[2] Working out how to write an excel vba webserver and using it to collect and and collate summary data from various divisions into reports was seedy as hell, solved the actual business problem (given ridiculous but intractable constraints) and isn't something you can record. We all have stories from a misspent youth that we're simultaneously ashamed and yet somehow proud of.
No, but you do need to know the answer to respond to that 3AM page about prod being down.
Let me ask you this: is any technology worth so much break-neck adoption without first seeing clear evidence of ROI? No. The adoption is irrational.
Judging the ROI of an engineer is hard. Adding AI on top of that makes things worse, I think. I've heard AI makes engineers 3X, 5X, 10X and even 100X.
If I told my CEO that I was 4X more effective with AI, I am doubtful he would be willing to spend even 1X my salary on tokens. Even though he would be making out in the end.
At some point the ROI is pretty much vibes, man.
This means that the average engineer is efficient at (say) identifying the first 10 tasks they should do but there are diminishing returns after that? That seems like a weird pattern. Wouldn't it be more likely that certain tasks have a ROI based on how efficient the task is generated?
Like I'm trying to imagine in my head, if you think an engineer is more efficient with the tool, why deny them more tokens. I guess so they think to use them more efficiently?
So, maybe I conclude that I think your conclusion that there must be $1500 per engineer is flawed. And even if it were true, I don't think the benefit would be evenly distributed. I suspect this is a first pass at figuring how to budget them and there will be a second pass.
While it certainly reeks of motivated reasoning, Jensen Huang assertion that an expensive engineer should be using at least their salary in tokens feels more logically sound to me (assuming the average engineer is efficient at using tokens, I have a feeling it's a normal distribution)
At my company we can ask for temporary cap limits if it’s justified, which is fairly common.
Completely agree with that.
That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.
Probably not worth it risking your job for a 200$/month good, but at 5K, I'm sure some folks will be tempted. Especially if companies do stupid things like token usage leaderboards.
everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
All companies who make this transition will be more or less at the mercy of model providers.
Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
I'm optimistic that the demand for AI accessibility will drive programmatic interfaces in places where companies were previously reluctant to.
The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.
You update it for them every 3/4 years (if they're lucky).
It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
NFTs? My company had nothing to do with blockchain but I ended up working on NFT integration regardless.
Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.
As for why they got accepted so quickly 1) the industry's long running desperation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"
So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.
I mean, Github Copilot's pricing just went up considerably, so I guess they were right?
In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)
In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).
I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.
I think the "fad" here is cloud-based, metered AI being a dominant work mode.
Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.
I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.
Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?
While I hope local AI continues to exist, I'm skeptical that it will take over, for the same reason running your own servers hasn't taken over. It's just hard, and involves spending huge sums of money up front.
It's also not really clear how much tokens are being subsidized. The discussion reminds me of Uber. For years people on HN claimed that Uber was going to collapse once they ran out of VC money. Then... that never happened, and everyone just moved on to discussing other things.
Now, that doesn't mean running your own LLM will be easy, but this will mean it's a lot more likely that there will be at least regional LLMs, in my opinion. I.e. there will be Google, whichever (if any) is left standing of OpenAI or Anthropic, and then there will be Chinese hosted LLMs, probably Indian hosted LLMs, European hosted LLMs, plus LLMs hosted on managed services (i.e. Bedrock). For sure I see large banks on the like being able to host the best OSS or even licensed LLMs on their own cloud infrastructure accounts (i.e. at AWS, Azure, etc).
And that's on top of the LLMs running on owned server infrastructure plus actual local, on device LLMs.
If you look at what Uber is spending per developer per month, they clearly have some headroom to consider whether more-local, unmetered AI tools on device, on premises, in private cloud, can be cost-effectively used to cut down how much money they are pouring into Anthropic and OpenAI. Not least because a bit of centralised effort might lead them to distilled models that are better for their purposes. Some of that budget could go into simply putting a bit more capacity on a developer's desk.
Can they do it now for everything? Obviously not. But IMO there is no reason at all for planning and scaffolding tasks to be done with cloud models, and there are many reasons why it might be better to do document processing without leaving the premises.
The incentives are there on the technical, operations and particularly on the business levels, and the relative disruption of the switch really small, considering that all the tooling can use different models for different tasks already. They must at least be investigating the possibility; it's irresponsible not to.
Not impossible, not unlikely, probably 50-50.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
“AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
This is unlike customer facing systems where, if your database server goes down, you probably can't just use the other one--the whole system is down.
Which category of developer tool has on-premise as the more popular option?
Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
People DO.
It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
For the employer those employees cost between 2945 - 7736 EUR per month based on https://kalkulatori.lv/lv/algas-kalkulators (income and social taxes).
So on the lower end that's (1500 USD ~ 1300 EUR) close to half the total expenses of such a developer, on the high end here around 15-20%. That's quite significant, depends on whether their productivity also improves (if that's what the orgs care about).
And we’re not even the country with the worst pay out there, but pay the same for tokens, cause regional pricing isn’t a thing!
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?
I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
Even then it makes more sense to rent the bigger GPU and get your answer faster.
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/
It's profit margin seems to have stabilized around 10%.
The real economic crime is losing at least $40bn over 10 years scaling a business that ended up having retail profit margins (i.e. low profit margins).
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
OK. I guess that's good, too.
I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
There was probably a reason it was on the backlog (because it didn't really have value).
Yes! :)
> There was probably a reason it was on the backlog (because it didn't really have value).
There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
Software engineer quality of life.
There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
Coding faster doesn't really solve that.
Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
Uber engineers do not define their revenue stream; the product leadership team does.
$1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.
You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.
Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.
> You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.
> The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.
I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
Obviously just a personal take though. I’m glad you get the usage you want out of it.
I use Gemini/ChatGPT/Claude to do that work and it unblocked the enjoyable parts of the project while taking care of the tedium.
I also find LLMs help me learn faster because they can often take a paper and turn it into working code, which I find to be a very slow process.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?
This kind of race-to-the-bottom logic needs to be rejected: by workers, business culture, and the government.
Unfortunately business culture embraces races to the bottom (for everyone but owners and executives), and uses its lobbying might to push the government into tolerating or even supporting it. And there are a lot of deluded workers who (for some reason) seem to be feel smart when they parrot the ideas of people who want to screw them.
Hiring someone vs paying a vendor for a service:
- different level of commitment
- might tie your org to a physical location
- different legal risks
- shows investors a different picture (probably this would even influence a bank loan)
- manager has to fight a different bureaucracy
Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.
And the obvious question: what it's the cost of that revenue? Because it looks huge but ...
I just wanted to take their number at face value. It's not like it needs more real information to make AI a bubble.
https://openai.com/index/codex-for-every-role-tool-workflow/
However, that's an absurd scenario.
1. Why it's not a bold assumption: it's a bit shocking now. But in two years or so, many/most companies will realize this is the cost of doing business. Just like people are ok with using Outlook, or Office 365, or (in the case of Wall Street) Bloomberg terminals, people will realize that developers will need AI coding assistants.
2. Why the conclusion does not follow from the assumption: if the limit is set at $1500/developer/month, it does not mean all developers will use it. Companies will set incentives for people to not be very wasteful. It is more likely that on average developers will consume $100-200 worth of tokens per month, and there will be some outliers who will consume 10, 100, or 1000 times as much, but they'll be few.
An entreprise license for 0365 is something like $75 per person per month. Totally different order of magnitude.
And regarding Bloomberg terminals, Bloomberg only has 1 million users (semi random guess).
The reality will be that some places just won't pay for any licenses or will try to set up their own, local LLMs.
Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
And with a bit of careful routing - there isn't a lot stopping you sending the hard stuff to a cloud model and the average stuff to an on prem model.
I definitely have written a goal file, and then just ran claude in a loop over the goal in order to 'token max'... why not? I'm doing research and have some clear KPIs where research into all kinds of techniques / tuning can improve the results. I can spend my budget on a "experiment with blah blah blah to improve blah blah" or give it a list of things to try that I know will take awhile.
Its no problem hitting hundreds of $ of API spend while sitting at a computer with 3 monitors have 6 windows of useful claude code interactive sessions, while working on 2 or 3 projects and using worktrees, and it's a little weird when you hit your limit by 2 o'clock and have to wait for token budgets to reset; god forbid, I manually edit code... which I did do for the first time in months.
You can also start to generate a lot of token spend if you do something like "hey make me a stylized slide deck using internal skill / agent XYZ based on commits A through C", which as an engineer, makes presentations building much less painful.
This uber limit is not high compared to the big SV companies.
If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.
It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).
this is how it works: https://help.markbase.cloud/humans/collections/overview
https://arxiv.org/abs/2602.11988
For what it's worth, if you were considering building context out.
Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.
If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.
This isn't something that is public knowledge, in the sense that you mean it.
Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.
If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.
Where is the knowledge stored?
All of my knowledge typically gets stored in plans outside of the agent?
And each agent window gets archived regularly, anyways.
Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
No one got fired from licensing claude code.
You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)
That's still in the ballpark. A modest change in your usage habits or workload could easily get you there.
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
Who do you think would be paying me, and what would they expect in return?
OpenAI or Anthropic would be paying you, like they pay bot farms and other influencers, and they would expect marketing in return, which you provide in boatloads.
Your job is to be an influencer, I'm not sure why anyone would be surprised that this is a possibility.
The reason so many people read my writing and find it useful is that they see me as a credible source of information: in a world full of clickbait and misinformation, I have a reputation for providing an independent voice that occupies that rare middle ground between "AI will kill us all" doomerism and "AI will solve everything" hype.
Credibility is hard to earn and easy to squander. I've been blogging for 24 years now, which has helped me build credibility with a large array of people across many different interest areas.
The modern influencer business model is to grow an audience and then sell things to them, through partnerships and sponsored content. I refuse to do that, because it strikes directly at that credibility. The moment you say "I've partnered with X to tell you about product Y" you're no longer an independent voice.
Nilay Patel of the Verge (and the excellent Decoder podcast) refuses to read ads from sponsors himself, at significant financial cost to his publication. I've adopted the same policy - I will not let anyone else pay me to put words in my mouth, because it strikes directly at the credibility I value so much.
Until a few months ago the only money I made from my blog was an https://ethicalads.io banner which pulled in a few hundred dollars a month (more if I had a high traffic piece). It helped cover some of my hosting costs for my various projects.
That changed in February - https://simonwillison.net/2026/Feb/19/sponsorship/ - when I added a Troy Hunt-style sponsor banner to my site (no cookies, no JavaScript) - currently sold by an agency called Freeman & Forrest. Sponsored slots are sold on a weekly basis and get a mention in my email newsletter in addition to the blog banner.
I'm earning enough from those that I no longer feel the opportunity cost of not going and getting a proper Silicon Valley engineering job.
If I was a publication like the Verge I'd have a complete firewall between editorial and advertising. I don't have a team, but I've tried to replicate that as much as I can by having Freeman & Forrest sort out the sponsors while I stay hands off. I'll veto sponsors if I have to (no prediction markets etc) but thankfully that hasn't been necessary so far.
I maintain a disclosures section on my blog here: https://simonwillison.net/about/#disclosures - which was inspired by Molly White's: https://www.mollywhite.net/crypto-disclosures/
I'm currently considering extending that to more of an ethics statement like this one on the Verge: https://www.theverge.com/ethics-statement
The Verge policy I'm currently not fulfilling is "Our policy against receiving anything of value from companies we cover includes, but is not limited to, things like gifts, meals, discounted services, or paid trips and junkets. Vox Media and The Verge pay for all travel expenses to all events, including transportation, food, and hotels." - I've occasionally accepted flights, dinners, accommodation and some pretty absurd swag (Microsoft just gave me a jacket with my name stitched onto it as part of the GitHub Stars programme, and a bunch of gadgets in a pelican case) which didn't bother me so much when the blog was a side project, but I think I need to start refusing those kind of gifts.
The day after the jacket I wrote a piece about their new models - https://simonwillison.net/2026/Jun/2/microsofts-new-models/ - which I later had to update because I missed some crucial details. Was I subconsciously influenced by the freebies? I don't think so, but the whole point of "subconsciously" is you don't know for sure.
Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.
Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.
My ongoing coverage of AI ethical issues: https://simonwillison.net/tags/ai-ethics/ - 308 posts
I've been the loudest voice about the fundamental insecurity of LLMs for several years: https://simonwillison.net/tags/prompt-injection/ - 150 posts
In https://simonwillison.net/2025/Aug/25/agentic-browser-securi... I said "I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely."
The fact that you had to dig to August 2025 to find a single article that's actually a critic of something produced by the AI labs is just further proof.
“I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that looks like a carefully considered project evolved over the course of many weeks... in less than an hour.
Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?”
https://simonwillison.net/2026/May/31/the-solution-might-be-...
Here is Simon questioning a fundamental belief held by the pro-LLM lobby. Would a paid shill question that?
Simon is, without question, an enthusiastic pro-LLM person. I disagree with what he says often, the product market fit post was a bad take. But I don’t believe he is shying away from sharing his thoughts when they’re not favorable to the industry.
Note that it's not surprising that he finds his own usage (described in the quote) negative, since his real job is as a blogger, not anything else.
when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.
now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?
I bet the ROI is negative.
If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.
Wait a minute. We didn’t save money by adding AI. We just added an expense.
Now we have to pay for employees AND AI.
I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.
1. They're costs are so so out of control that they need to impose a blanket cap immediately. Figuring out an allocation mechanism that can be deployed company wide is time consuming and they need to staunch the bleeding immediately, despite it being obviously suboptimal.
2. The few people who should have unlimited tokens were given exactly that. No reason to introduce such nuance to a public PR move. The hard-cap limit is a great negotiating posture with token providers.
Maybe a $10k raise would be nice?
If you use stuff like opusplan and /advisor so you use Sonnet for most of the work and only Opus for the really complex stuff then it's quite easy to keep costs low without affecting performance.
The evidence that per-token inference _is not_ subsidized is... a quote or two from Dario and Sam Altman
as far as we know there's no evidence that they can produce any profits at all
The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.
[1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.
The fact that Anthropic is rumoured to have a profitable quarter indicates that their margins on API priced inference are very strong.
One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
coff i would not buy the Bending Spoons IPO coff saaspocalypse
I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.
My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.
To the mooooon!
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
china will be major token exporter soon. mark my words.
Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.
...
> each employee
...
> per AI coding tool
...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI
What on this godforsaken earth are all you rich idiots doing???
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388
I also misconfigured something in my agent's configuration and a simple web tool request (maybe 4 turns) through OR went to GPT-5.5 accidentally and that cost me ~$0.4.
I have no idea how any business can afford API rates without having a mindset of casually setting money on fire.
Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.
Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.
If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.
I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.
Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?
All the code gets summarized and fed into their manager's agent contexts, probably duplicated several times across levels and departments, with some generated back-and-forth emails pinging around the org chart, eventually generating 2-3 long-winded reports that nobody will read chock full of generated visualizations that can all get consolidated into a generated slide deck that they'll show (maybe, at some point) to a handful of humans with more money than a human brain can conceptualize to demonstrate all of the innovation they're doing.
I am increasingly convinced that many of these companies are dead trees whose only function is to burn money lest it fall into the hands of the peasantry.
You're $100/m plan is likely equivalent to thousands of dollars of API pricing. You are being subsidized by the companies using AI.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
Their wet dream was never automation. It was zero marginal cost labor. And that dream is starting to rot.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
They are good at searching for things that have been done 10,000 times before, and slightly changing them. This is the majority of all "new" features.
Almost nothing is "new"...
Refactors are not this. If you can't just write a gsub to do the work, they need to essentially break it up into N problems to solve, each of them pretty slow and expensive. Sure, none of these problems individually are "new" - which is why they can do it. But they can't do it as effectively as you'd think.
We see this firsthand building AI Workdeck (open-source AI workspace for legal teams). A single due diligence review might chain 20+ agent calls: OCR -> text extraction -> clause classification -> risk scoring -> evidence chain assembly. The user sees one action, but the backend burns through significant inference.
The interesting thing about vertical tools is the pricing model can be fundamentally different. Horizontal tools charge per seat or per token. But in legal, the value is in the document, not the seat. A lawyer reviewing a 500-page M&A file gets way more value than one reviewing a 2-page NDA.
Self-hosting changes the calculus too. Our users run on their own infra, so the AI cost is whatever their GPU costs. That makes $1,500/month caps less relevant and throughput optimization more important.