> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.
Power is not free.
What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.
warumdarum [3 hidden]5 mins ago
Actually if you have solar, it kind of is.. so prIvAt AI compute gets defacto cheaper during the day?
reactordev [3 hidden]5 mins ago
If you have solar, it is not, because you have battery and equipment degradation from cycle charging, c’mon man…
I would agree with you if you said it was vastly cheaper overall (with the initial equipment investment amortized over time) compared to The Power Company.
In many states, even if you are generating electricity and selling it back to the power company, they still gonna charge you normal rates of usage because greed.
If you go off grid, you have bigger things to worry about than how to power your AI cluster. It’s manageable enough if you have land but that’s in scarce supply.
enraged_camel [3 hidden]5 mins ago
>> Power is not free.
There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?
dofm [3 hidden]5 mins ago
If an identical task takes a day on both sides, then the human route uses less energy, surely.
Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.
The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.
Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)
The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.
axus [3 hidden]5 mins ago
What would you do for the rest of the day, power off your devices and go for a long bike ride?
enraged_camel [3 hidden]5 mins ago
Speaking personally: yes. That's literally what I'm planning to do this afternoon because it's noon and I'm already done with the coding tasks I had on my plate today.
jacobgold [3 hidden]5 mins ago
"Around $400 a month of plans buys roughly $2800 of API usage at list prices, which is a real bargain right up until you hit the ceiling."
It's more like $200/mo for $4000+/mo in tokens and you can buy additional subscriptions.
There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.
atreids [3 hidden]5 mins ago
I find just going via Deepseek's platform API directly, using their V4 flash model, and hooking into a harness like Opencode more than acceptable. Think I've spent maybe $10 over a couple of weeks.
I did explore self-hosting models but hardware right now is just too expensive.
vadansky [3 hidden]5 mins ago
Can I run something comparable to Opus 4.6 locally yet? I keep hearing conflicting things. If I can spend 10k to do that I would cancel my subscription. The problem is I don’t wanna spend the money to find out myself.
Catloafdev [3 hidden]5 mins ago
If you want frontier-level, the economically reasonable option is OpenRouter or a direct sub to frontier-of-your-choice.
The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now.
You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage.
daemonologist [3 hidden]5 mins ago
There are also significant economies of scale (namely: utilization and batching), which tend to make inference on a shared server more economical even after the operator takes a cut.
grim_io [3 hidden]5 mins ago
10k will not get you anywhere near opus or sonnet.
It's simply not possible for mere mortals currently.
als0 [3 hidden]5 mins ago
> Can I run something comparable to Opus 4.6 locally yet?
Sadly, no. The best comparable thing you can get is about Sonnet 3.7
captaintobs [3 hidden]5 mins ago
i spent 8k and get close to a 2-3x slower sonnet. running 2x spark deep seek v4 flash
esalman [3 hidden]5 mins ago
For me, investing in hardware seems to be the way to go.
I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.
If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.
CraigJPerry [3 hidden]5 mins ago
I wondered if there might be a no brainer "free" option on discarded hardware.
I have a GTX1080ti which i think is circa 2018, it's unused, more than paid for itself over the years, owes me nothing at this point so the hardware is free.
It runs Gemma e4b multimodal, qwen 3.5 8b or the qwen 4b embeddings models well enough (40+ t/s for the LLMs).
The machine consumes 350 watts at the wall when under load (3 watts when sleeping, 80w at idle). Electricity costs me £0.035GBP/kwh which is cheap for the UK (load shifting via house battery).
144k output tokens for around 1pence (and takes an hour to do that in theory).
It's only JUST cheaper to use than the far more capable deepseek v4 flash model despite the free hardware and ~10x cheaper than normal electricity.
iugtmkbdfil834 [3 hidden]5 mins ago
Yes and no. Hardware does lock you in. Granted, I am happy with my 128gb of shared memory, but I am mildly concerned that it actually is more expensive now than when I bought mine. It does not bode well for the future; not when combined with recent WH admin moves on Anthropic and the reality that next batch of good models may require more than 128gb to run well.
edit: I am not dismissing local. I am one such user ( though I have subs too ), but one has to be clear eyed about the trade-offs.
mwcampbell [3 hidden]5 mins ago
I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware?
impure [3 hidden]5 mins ago
I recently made an AI Agent and surprisingly coding with DeepSeek V4 Flash is quite cheap. It probably has to do with the aggressive prompt caching. I'm using OpenRouter with Novita AI as the preferred provider.
throwa356262 [3 hidden]5 mins ago
Deepseek v4 via deepseek themselves is significantly cheaper.
Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.
kagamino [3 hidden]5 mins ago
Same here, deepseek v4 flash on opencode go. It's cheap, fats and good enough to follow my instructions
2muchtime [3 hidden]5 mins ago
I’m using zen because I have a Claude subscription and just like dabbling with the other models and I was shocked at how little flash cost but it was noticeably not at the level I’d like my model to be.
For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.
RomanPushkin [3 hidden]5 mins ago
AI coding at home literally costs $100/month. I'm wondering where $400 is coming from? $100 is more than enough for "coding at home", IMO. I rarely face the limits, and when I do it's just a time for a quick walk anyway.
quickthoughts [3 hidden]5 mins ago
Ha just wrote a post[1] about a sort of 4th option - max out cheap compute to create more tangible things that can be used/run locally.
I think someone could find some way to use the smaller local models to write code. Some kind of framework or harness or language or something. But not too many people are working on that because the big models are pretty cheap and a lot better.
petra [3 hidden]5 mins ago
Maybe one possible path(to make weaker models highly capable) is making the job of the llm as easy as possible.
I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.
And maybe there could be a business model around creating those libraries.
dempedempe [3 hidden]5 mins ago
Did you just copy-and-paste an AI response an post it on your blog?
OutOfHere [3 hidden]5 mins ago
Fixed-price plans ought to be sufficient for most people who actually review their spec and code, for building production-grade software that stand the test of time. A careful spec+review+iteration takes time, resetting the usage quota. Granted, security audits uses tokens too.
If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.
gaigalas [3 hidden]5 mins ago
> The first is to self host. You buy the machine, run open source models locally, and pay nothing per token after that.
In the good ol' days, we bought machines not only to run stuff, but to experiment.
I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.
*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.
So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.
zuzululu [3 hidden]5 mins ago
Another update for codex users they let you accumulate resets which greatly adds to the mileage
I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs
Power is not free.
What I’ve found is that you’re basically paying a premium for privacy, and that’s worth it for me.
I would agree with you if you said it was vastly cheaper overall (with the initial equipment investment amortized over time) compared to The Power Company.
In many states, even if you are generating electricity and selling it back to the power company, they still gonna charge you normal rates of usage because greed.
If you go off grid, you have bigger things to worry about than how to power your AI cluster. It’s manageable enough if you have land but that’s in scarce supply.
There's actually an interesting thought experiment here: if it takes you a full day to build something that AI would otherwise build in a day, do you end up using more power, or less? What is the break-even point, purely from a power consumption perspective?
Brains are thousands or maybe even millions of times more fuel-efficient than computers and you are alive for the whole day either way, right? You probably eat about the same even.
The reason executives think AI is more efficient is that it more space efficient than a human and doesn't demand to be paid or work only a set number of hours. Everything with computing is more efficient if you resent having to give money to other humans. If they could just not have you be alive when they don't need you, it'd possibly be different.
Even though I think at a typical British freelance rate and a truly unsubsidised token price, the AI is possibly more expensive than me. And as a freelancer, from their perspective I really am not alive until they need me. (This is what it often feels like)
The reality is the human and the AI aren't used to build the same things anyway so it's a comparison you can't really make.
It's more like $200/mo for $4000+/mo in tokens and you can buy additional subscriptions.
There's no sense in running local models or doing anything else as long as VCs (and soon the public markets) are willing to pay your bill.
I did explore self-hosting models but hardware right now is just too expensive.
The reality is that they do not offer configurations that would allow a consumer to run that much VRAM on a single setup to protect datacenter margins. Apple used to, and they stopped, those devices are going for ~$20k+ each on ebay now.
You can get very, very capable models on a 3090/4090/5090/6000 series card. But if you want 'frontier level' you are investing ~22k at a bare minimum if you go new. Used you can probably build your own server for much cheaper up-front cost but it's likely going to be 4-6x+ electricity usage.
Sadly, no. The best comparable thing you can get is about Sonnet 3.7
I learned coding nearly 24 years ago and still learning new stuff all the time. At no point in time I had to rely on a subscription model to learn and do new stuff.
If LLM and agents are the default tools for coding and building software, at least for next few years, it seems like a no-brainer to invest $2000-3000 on hardware, like a Halo Strix PC.
I have a GTX1080ti which i think is circa 2018, it's unused, more than paid for itself over the years, owes me nothing at this point so the hardware is free.
It runs Gemma e4b multimodal, qwen 3.5 8b or the qwen 4b embeddings models well enough (40+ t/s for the LLMs).
The machine consumes 350 watts at the wall when under load (3 watts when sleeping, 80w at idle). Electricity costs me £0.035GBP/kwh which is cheap for the UK (load shifting via house battery).
144k output tokens for around 1pence (and takes an hour to do that in theory).
It's only JUST cheaper to use than the far more capable deepseek v4 flash model despite the free hardware and ~10x cheaper than normal electricity.
edit: I am not dismissing local. I am one such user ( though I have subs too ), but one has to be clear eyed about the trade-offs.
Because (1) Huawei collab and (2) vLLM etc dont implement half of the inference optimisations deepseek proposed in their paper.
For me MiniMax 3 has really hit the sweet spot of being very cheap, though more than flash, but I’d also very capable.
1: https://news.ycombinator.com/item?id=48519181
I wonder if part of the solution is building/finding the right libraries, with the right documentation/language/API(one that plays well with LLM's) and maybe creating some synthetic data around them - to make it very easy for the llm.
And maybe there could be a business model around creating those libraries.
If you still need more tokens, odds that you're vibecoding unmaintainable throwaway trash.
In the good ol' days, we bought machines not only to run stuff, but to experiment.
I understand today experiments are limited. Inference is reasonable, fine-tuning is either niche or a stretch, and base training is impossible.
*That is bound to change*, and when it does, there will be an avalanche of hobbysts and amateurs poking at base training. They'll find optimizations no one found before, synthetize data no one ever imagined to synthetize, and when that happens we'll start getting libre models.
So, yeah. Right now, buying the machine doesn't pay off that well, unless you want to pioneer this stuff in severe adverse conditions (hardware prices inflated, etc). Eventually, it will.
I don't think its feasible to have something comparable to these frontier models when they are increasing usage and lowering token costs