Hacker News

Show HN: ChunkHound, a local-first tool for understanding large codebases

ChunkHound’s goal is simple: local-first codebase intelligence that helps you pull deep, core-dev-level insights on demand, generate always-up-to-date docs, and scale from small repos to enterprise monorepos — while staying free + open source and provider-agnostic (VoyageAI / OpenAI / Qwen3, Anthropic / OpenAI / Gemini / Grok, and more).I’d love your feedback — and if you have, thank you for being part of the journey!

79 points by NadavBenItzhak - 25 comments

25 Comments

goda90 [3 hidden]5 mins ago

A few years ago I set out to refactor some of my team's code that I wasn't particularly familiar with, but we wanted to modularize and re-use in more places. The primary file alone was 18k+ lines of Typescript that was a terrible mess of spaghetti. Most of it had been written in JavaScript but later converted haphazardly. I ended up writing myself a little app that used the Typescript compiler APIs to help me just explore all the many branches of the code and annotate how I would refactor different parts. It helped a bit, but I never got time to add some of the more intelligent features I wanted like finding every execution path between two points.

henryhale [3 hidden]5 mins ago

give depgraph a try - https://github.com/henryhale/depgraph - i'd like to learn about how i could improve it.

dcreater [3 hidden]5 mins ago

you say "local-first" but have placed voyage API for embeddings as the default (had to go to the website and dig to find that you can infact use local embedding models). Please fix

esafak [3 hidden]5 mins ago

It would be convenient if it could load local SLMs itself, otherwise I'll have to manually start the LLM server before I can use it, and it's not something I leave running all the time.

ofriw [3 hidden]5 mins ago

Thank you, yes the docs are overdue for a refresh. It's in the works

romperstomper [3 hidden]5 mins ago

I don't understand how/why all of this is local-first if all these providers are supported and used - could you elaborate what is sent to them?

ofriw [3 hidden]5 mins ago

The DB is stored locally, and any embedding, reranker and LLM will work. It's up to you if you self host these or bring them externally from one SaaS or the other

henryhale [3 hidden]5 mins ago

I have been working on depgraph (https://github.com/henryhale/depgraph) for a while now. It is truly local with several output options(json, mermaid, jsoncanvas). Mutliple languages are supported (js, go, c) - expanding the list slowly but sure.

Neywiny [3 hidden]5 mins ago

Might give this a try to experiment if it's really free to use (I'll have to read up on that I guess). The qemu codebase is huge and every contributer seems to solve problems in slightly different ways. Would be nice if this tool could help distill it.

ofriw [3 hidden]5 mins ago

Completely free, MIT licensed. You can fully self host it if you have the hardware to run Qwen3-embedding and reranker models

apgwoz [3 hidden]5 mins ago

Perhaps I am missing something, but this seems to require a Lemon (LLM)? Is the idea that the Lemon is used to help build an index AOT that can be queried locally, after?

I want to figure out how to build advanced tools, potentially by leveraging Lemons to iterate quickly, that allow us all to rely _less_ on Lemons, but still get 10,20,30x efficiency gains when building software, without needing to battle the ethics of it all.

ofriw [3 hidden]5 mins ago

ChunkHound does it a bit differently, since at true enterprise scale it's very slow and costly to pass all code chunks through an LLM during indexing time. Instead, ChunkHound implements a customized "deep research" algorithm that's been optimized for code exploration so it can answer, on demand, any deep technical question about the indexed codebase. This research agent can be powered by a lower tier LLM (think Haiku, Codex low, etc) that's already included in your subscription.

conception [3 hidden]5 mins ago

I have chunckhound is a few projects and it’s noted in both the agent md file as well as mcp and claude never uses it. Ever. Never once.

Is there a prompt special sauce y’all use to get it to use it?

ofriw [3 hidden]5 mins ago

Just add to your prompt something like "use code research", but yes there's a PR in the works that fixes that and optimizes the MCP tools interface - https://github.com/chunkhound/chunkhound/pull/150

dogman123 [3 hidden]5 mins ago

Is there a way to have the model inside of codex to make use of chunkhound instead of its “built in” search/explore functionality with rg? Whenever I spin up a new agent using xhigh thinking it spins its wheels for a while to get up to speed — wondering if chunkhound can make this process faster.

esafak [3 hidden]5 mins ago

That's what the MCP is for, if you can get the LLM to use it. Sometimes they just like to do it their own way :)

dmos62 [3 hidden]5 mins ago

Will try this out. Was always envious of how Augment was able to do this. Kudos.

bravura [3 hidden]5 mins ago

Can you please expose the functionality as a self-documenting CLI command with machine readable output? (Or did I misunderstand that MCP isn't the only way to use it?)

I am curious to try it but do not want to adopt MCP servers.

Telling Claude to call the CLI tool is more efficient.

ofriw [3 hidden]5 mins ago

`chunkhound search <query>`, `chunkhound search --regex <query>` and `chunkhound research <query>` are the main cli entry points that you can already use today

dcreater [3 hidden]5 mins ago

Agree. And to make the CLI usage more effective/efficient, if you can publish a skill that would be excellent

esafak [3 hidden]5 mins ago

That's why we're asking for the CLI; so we can write the skills.

blackqueeriroh [3 hidden]5 mins ago

Am I confused or is this not an open-source project on GitHub?

You have every ability to make these modifications yourself; is there a reason you feel the need to require the creator to do so?

from_memory [3 hidden]5 mins ago

I think the term is "Instrumentalism".

CamperBob2 [3 hidden]5 mins ago

Looks like the tutorial link is broken.

ofriw [3 hidden]5 mins ago

Fixed, thank you