Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory
I built LocalGPT over 4 nights as a Rust reimagining of the OpenClaw assistant pattern (markdown-based persistent memory, autonomous heartbeat tasks, skills system).It compiles to a single ~27MB binary — no Node.js, Docker, or Python required.Key features:- Persistent memory via markdown files (MEMORY, HEARTBEAT, SOUL markdown files) — compatible with OpenClaw's format - Full-text search (SQLite FTS5) + semantic search (local embeddings, no API key needed) - Autonomous heartbeat runner that checks tasks on a configurable interval - CLI + web interface + desktop GUI - Multi-provider: Anthropic, OpenAI, Ollama etc - Apache 2.0Install: `cargo install localgpt`I use it daily as a knowledge accumulator, research assistant, and autonomous task runner for my side projects. The memory compounds — every session makes the next one better.GitHub: https://github.com/localgpt-app/localgpt Website: https://localgpt.appWould love feedback on the architecture or feature ideas.
307 points by yi_wang - 144 comments
I do think that local-first will end up being the future long-term though. I built something similar last year (unreleased) also in Rust, but it was also running the model locally (you can see how slow/fast it is here[1], keeping in mind I have a 3080Ti and was running Mistral-Instruct).
I need to re-visit this project and release it, but building in the context of the OS is pretty mindblowing, so kudos to you. I think that the paradigm of how we interact with our devices will fundamentally shift in the next 5-10 years.
[1] https://www.youtube.com/watch?v=tRrKQl0kzvQ
It’s far better for most users to be able to specify an inference server (even on localhost in some cases) because the ecosystem of specialized inference servers and models is a constantly evolving target.
If you write this kind of software, you will not only be reinventing the wheel but also probably disadvantaging your users if you try to integrate your own inference engine instead of focusing on your agentic tooling. Ollama, vllm, hugging face, and others are devoting their focus to the servers, there is no reason to sacrifice the front end tooling effort to duplicate their work.
Besides that, most users will not be able to run the better models on their daily driver, and will have a separate machine for inference or be running inference in private or rented cloud, or even over public API.
Maybe the author should have specified that capability, even though it seems redundant, since local-first implies local capability but also cloud compatibility, or it would be local or local-only.
Privacy-wise, of course, the inference provider sees everything.
Even copy-pasting an API key is probably too much of a hurdle for regular folks, let alone running a local ollama server in a Docker container.
OTOH, its the most flexible and likely to have some support for what you are doing for a lot of those, and especially if yoj are combining multiple of them in the same process.
Where in the world are you getting that this project is for "normies"? Installation steps are terminal instructions and it's a CLI, clearly meant for technical people already.
If you think copying-pasting an API key is too much, don't you think cloning a git repository, installing the Rust compiler and compiling the project might be too much and hit those normies in the face sooner than the API key?
See here:
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
They are in the top of open models, and surpass some closed models.
I've been using devstral, codestral and Le Chat exclusively for three months now. All from misteals hosted versions. Agentic, as completion and for day-to-day stuff. It's not perfect, but neither is any other model or product, so good enough for me. Less anecdotal are the various benchmarks that put them surprisingly high in the rankings
¹https://mistral.ai/news/devstral
The bigger downside, when you compare it to Opus or any other hosted model, is the limited context. You might be able to achieve around 30k. Hosted models often have 128k or more. Opus 4.6 has 200k as its standard and 1M in api beta mode.
You could try letting a model decide, but given my experience with at least OpenAI’s “auto” model router, I’d rather not.
But let's face it. For most people Opus comes at a significant financial cost per token if used more than very casual, so using it for rather trivial or iterative tasks that nevertheless consume a lot of those is something to avoid.
Love or hate it, the amount of money being put into AI really is our generation's equivalent of the Apollo program. Over the next few years there are over 100 gigawatt scale data centres planned to come online.
At least it's a better use than money going into the military industry.
https://www.wsj.com/tech/ai/ai-spending-tech-companies-compa...
https://www.reuters.com/graphics/USA-ECONOMY/AI-INVESTMENT/g...
If I'm running a business and have some number of employees to make use of it, and confidentiality is worth something, sure, but am I really going to rely on anything less then the frontier models for automating critical tasks? Or roll my own on prem IT to support it when Amazon Bedrock will do it for me?
But there's fierce competition by new or small players (deepseek, Mistral etc), many even open source. And Icm convinced they'll keep the prices low.
A company like openai can only increase subscriptions x10 when they've locked in enough clients, have a monopoly or oligopoly, or their switching costs are multitudes of that.
So currently the irony seems to be that the larger the AI company, the more loss they're running at. Size seems to have a negative impact on business. But the smaller operators also prevent companies from raising prices to levels at which they make money.
Got the same feeling when I put on the Hololens for the first time but look what we have now.
I also think it'd be a great starting point for building a private pub/sub network of autonomous agents (e.g. a company that doesn't want to exfil its password files via OpenClaw)
The name, however, is a problem. LocalGPT is misleading in 2 ways. 1. It is not Local, it relies on external LLM providers. 2. It is not a Generative Pretrained Transformer.
I'd highly recommend changing the name to something that more accurately portrays the intent and the method.
Your docs and this post is all written by an LLM, which doesn't reflect much effort.
These plagiarism laundering machines are giving people a brain disease that we haven't even named yet.
Supporters and haters alike, its getting pretty stupid out there.
For the millionth time, it seems learning basics and fundamentals of software engineering is more important than anything else.
I was also burnt many times where some software docs said one thing and after many hours of debugging I found out that code does something different.
LLMs are so good at creating decent descriptions and keeping them up to date that I believe docs are the number one thing to use them for. yes, you can tell human didn't write them, so what? if they are correct I see no issue at all.
Indeed. Are you verifying that they are correct, or are you glancing at the output and seeing something that seems plausible enough and then not really scrutinizing? Because the latter is how LLMs often propagate errors: through humans choosing to trust the fancy predictive text engine, abdicating their own responsibility in the process.
As a consumer of an API, I would much rather have static types and nothing else than incorrect LLM-generated prosaic documentation.
Somehow I doubt at this point in time they can even fail at something so simple.
Like at some point, for some stuff we have to trust LLMs to be correct 99% of the time. I believe summaries, translate, code docs are in that category
Can you provide examples in the wild of LLMs creating good descriptions of code?
I think it depends on your expectations. Writing good documentation is not simple.
Good API documentation should explain how to combine the functions of the API to achieve specific goals. It should warn of incorrect assumptions and potential mistakes that might easily happen. It should explain how potentially problematic edge cases are handled.
And second, good API documentation should avoid committing to implementation details. Simply verbalising the code is the opposite of that. Where the function signatures do not formally and exhaustively define everything the API promises, documentation should fill in the gaps.
Yes. Docs it produces are generally very generic, like it could be the docs for anything, with project-specifics sprinkled in, and pieces that are definitely incorrect about how the code works.
> for some stuff we have to trust LLMs to be correct 99% of the time
No. We don’t.
I guess the term "correct" is different for me. I shouldn't be able to nitpick comments out like that. Putting LLM's aside, they basically did not proof-read your own docs. Things like "No python required" are an obvious sign that you 1. Started talking about a project (you {found || built} in python), want to do it in Rust (because it's fast!) and then the LLM put that detail in the docs.
If they did not skim it out, then they did not read their own documentation. There was no love put into it.
Nonetheless, I totally get your point, and the docs are at least descriptive.
> LLMs are so good at creating decent descriptions and keeping them up to date
I totally agree! And now that CC auto-updates memories, it's much easier to keep track of changes. I'm also confident that you're the type of person to at least proof-read what it wrote, so I do not doubt your validity in your argument. It just sounds a lot different when you look at this project.
I wish this was an effective deterrent against posting low effort slop, but it isn't. Vibe coders are actively proud of the fact that they don't put any effort into the things they claim to have created.
Professional codependent leveraging anonymity to target others. The internet is a mediocrity factory.
A look at OPs post-history, projecting back low-effort meta-analysis of their own uselessness seems apt.
You're using the same memory format (SOUL.md, MEMORY.md, HEARTBEAT.md), similar architecture... but OpenClaw already ships with multi-channel messaging (Telegram, Discord, WhatsApp), voice calls, cron scheduling, browser automation, sub-agents, and a skills ecosystem.
Not trying to be harsh — the AI agent space just feels crowded with "me too" projects lately. What's the unique angle beyond "it's in Rust"?
It tries to do everything, but has no real security architecture.
Exec approvals are a farce.
OC can modify it's own permissions and config, and if you limit that you cannot really use it for is strengths.
What is needed is a well thought out security architecture, which allows easy approvals, but doesn't allow OC to do that itself, with credential and API access control (such as by using Wardgate [1], my solution for now), and separation of capabilities into multiple nodes/agents with good boundaries.
Currently OC needs effective root access, can change its own permissions and it's kinda all or nothing.
[1] https://github.com/wardgate/wardgate
Does this mean the inference is remote and only context is local?
The ReadMe gives only a Antropic version example, but, judging by the source code [1], you can use other providers, including Ollama, just by changing the syntax of that one config file line.
[1] https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
It is more like an OpenClaw rusty clone
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
I'm working on a systems-security approach (object-capabilities, deterministic policy) - where you can have strong guarantees on a policy like "don't send out sensitive information".
Would love to chat with anyone who wants to use agents but who (rightly) refuses to compromise on security.
I can only think of two ways to address it:
1. Gate all sensitive operations (i.e. all external data flows) through a manual confirmation system, such as an OTP code that the human operator needs to manually approve every time, and also review the content being sent out. Cons: decision fatigue over time, can only feasibly be used if the agent only communicates externally infrequently or if the decision is easy to make by reading the data flowing out (wouldn't work if you need to review a 20-page PDF every time).
2. Design around the lethal trifecta: your agent can only have 2 legs instead of all 3. I believe this is the most robust approach for all use cases that support it. For example, agents that are privately accessed, and can work with private data and untrusted content but cannot externally communicate.
I'd be interested to know if you have reached similar conclusions or have a different approach to it?
The third path: fine-grained object-capabilities and attenuation based on data provenance. More simply, the legs narrow based on what the agent has done (e.g., read of sensitive data or untrusted data)
Example: agent reads an email from alice@external.com. After that, it can only send replies to the thread (alice). It still has external communication, but scope is constrained to ensure it doesn't leak sensitive information.
The basic idea is applying systems security principles (object-capabilities and IFC) to agents. There's a lot more to it -- and it doesn't solve every problem -- but it gets us a lot closer.
Happy to share more details if you're interested.
I suppose I'm thinking of it as a more elegant way of doing something equivalent to top-down agent routing, where the top agent routes to 2-legged agents.
I'd be interested to hear more about how you handle the provenance tracking in practice, especially when the agent chains multiple data sources together. I think my question would be: what's the practical difference between dynamic attenuation and just statically removing the third leg upfront? Is it "just" a more elegant solution, or are there other advantages that I'm missing?
> I'd be interested to hear more about how you handle the provenance tracking in practice, especially when the agent chains multiple data sources together.
When you make a tool call that read data, their values carry taints (provenance). Combine data from A and B, result carries both. Policy checks happen at sinks (tool calls that send data).
> what's the practical difference between dynamic attenuation and just statically removing the third leg upfront? Is it "just" a more elegant solution, or are there other advantages that I'm missing?
Really good question. It's about utility: we don't want to limit the agent more than necessary, otherwise we'll block it from legitimate actions.
Static 2-leg: "This agent can never send externally." Secure, but now it can't reply to emails.
Dynamic attenuation: "This agent can send, but only to certain recipients."
(It would help in other cases)
Realistically though, these agents are going to need access to at least SOME of your data in order to work.
Definitely something that can be looked into.
Wardgate is (deliberately) not part of the agent. This means separation, which is good and bad. In this case it would perhaps be hard to track, in a secure way, agent sessions. You would need to trust the agent to not cache sessions for cross use. Far sought right now, but agents get quiet creative already to solve their problem within the capabilities of their sandbox. ("I cannot delete this file, but I can use patch to make it empty", "I cannot send it via WhatsApp, so I've started a webserver on your server, which failed, do then I uploaded it to a public file upload site")
With forking of LLM state you can maintain multiple states with different levels of trust and you can choose which leg gets removed depending on what task needs to be accomplished. I see it like a tree - always maintaining an untainted "trunk" that shoots of branches to do operations. Tainted branches are constrained to strict schemas for outputs, focused actions and limited tool sets.
IFC + object-capabilities are the natural generalization of exactly what you're describing.
I feel Elixir and the BEAM would be a perfect language to write this in. Gateways hanging, context window failures exhaustion can be elegantly modeled and remedied with supervision trees. For tracking thoughts, I can dump a process' mailbox and see what it's working on.
Sounds like exactly this, hot off the presses...
They deliberately only show you a fraction of the thoughts, but charge you for all the secret ones.
"cargo install localgpt" under Linux Mint.
Git clone and change Cargo.toml by adding
"""rust
# Desktop GUI
eframe = { version = "0.30", default-features = false,
features = [ "default_fonts", "glow", "persistence", "x11", ] }
"""
That is add "x11"
Then cargo build --release succeeds.
I am not a Rust programmer.
cd localgpt/
edit cargo.toml and add "x11" to eframe
cargo install --path ~/.cargo/bin
Hey! is that Kai Lentit guy hiring?
Uses Mlx for local llm on apple silicon. Performance has been pretty good for a basic spec M4 mini.
Nor install the little apps that I don't know what they're doing and reading my chat history and mac system folders.
What I did was create a shortcut on my iphone to write imessages to an iCloud file, which syncs to my mac mini (quick) - and the script loop on the mini to process my messages. It works.
Wonder if others have ideas so I can iMessage the bot, im in iMessage and don't really want to use another app.
curious: when you say compatible with OpenClaw's markdown format, does that mean I could point LocalGPT at an existing OpenClaw workspace and it would just work? or is it more 'inspired by' the format?
the local embeddings for semantic search is smart. I've been using similar for code generation and the thing I kept running into was the embedding model choking on code snippets mixed with prose. did you hit that or does FTS5 + local embeddings just handle it?
also - genuinely asking, not criticizing - when the heartbeat runner executes autonomous tasks, how do you keep the model from doing risky stuff? hitting prod APIs, modifying files outside workspace, etc. do you sandbox or rely on the model being careful?
To solve this I've built Wardgate [1], which removes the need for agents to see any credentials and has access control on a per API endpoints basis. So you can say: yes you can read all Todoist tasks but you can't delete tasks or see tasks with "secure" in them, or see emails outside Inbox or with OTP codes, or whatever.
Interested in any comments / suggestions.
[1] https://github.com/wardgate/wardgate
and I'm curious about the filtering logic - is it regex on endpoint paths or something more semantic? because the "tasks with secure in them" example makes me think there's some content inspection happening, not just URL filtering.
Ask and ye shall receive. In a reply to another comment you claim it's because you couldn't be bothered writing documentation. It seems you couldn't be bothered writing the article on the project "blog" either[0].
My question then - Why bother at all?
[0]: https://www.pangram.com/history/dd0def3c-bcf9-4836-bfde-a9e9...
We're past euphoria bubble stage, it's now delulu stage. Show them "AI", and they will like any shit.
Big props for the creators ! :) Nice to see some others not just relying on condensing a single context and strive for more
The real trifect of the pseudo singularity.
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
Can you explain how that works? The `MEMORY.md` is able to persists session history. But it seems that it's necessary for the user to add to that file manually.
An automated way to achieve this would be awesome.
The author can easily do this by creating a simple memory tool call, announcing it in the prompt to the LLM, and having it call the tool.
I wrote an agent harness for my own use that allows add/remove memories and the AI uses it as you would expect - to keep notes for itself between sessions.
https://github.com/localgpt-app/localgpt/blob/main/src%2Fage...
- You can build it into a single binary with no external deps
- The Rust type system + ownership can help you a lot with correctness (e.g. encoding invariants, race conditions)
How much should we budget for the LLM? Would "standard" plan suffice?
Or is cost not important because "bro it's still cheaper than hiring Silicon Valley engineer!"
Can it run on these two OS? How to install it in a simple way?
I assume I could just adjust the toml to point to deep seek API locally hosted right?
Its fast and amazing for generating embedding and lookups
See my post above.