Show HN: Mysti – Claude, Codex, and Gemini debate your code, then synthesize
Hey HN! I'm Baha, creator of Mysti.The problem: I pay for Claude Pro, ChatGPT Plus, and Gemini but only one could help at a time. On tricky architecture decisions, I wanted a second opinion.The solution: Mysti lets you pick any two AI agents (Claude Code, Codex, Gemini) to collaborate. They each analyze your request, debate approaches, then synthesize the best solution.Your prompt → Agent 1 analyzes → Agent 2 analyzes → Discussion → Synthesized solutionWhy this matters: each model has different training and blind spots. Two perspectives catch edge cases one would miss. It's like pair programming with two senior devs who actually discuss before answering.What you get: * Use your existing subscriptions (no new accounts, just your CLI tools) * 16 personas (Architect, Debugger, Security Expert, etc) * Full permission control from read-only to autonomous * Unified context when switching agentsTech: TypeScript, VS Code Extension API, shells out to claude-code/codex-cli/gemini-cliLicense: BSL 1.1, free for personal and educational use, converts to MIT in 2030 (would love input on this, does it make sense to just go MIT?)GitHub: https://github.com/DeepMyst/MystiWould love feedback on the brainstorm mode. Is multi-agent collaboration actually useful or am I just solving my own niche problem?
149 points by bahaAbunojaim - 124 comments
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...
My repo has other tools that leverage such headless agents; for example there’s a resume [1] functionality that provides alternatives to compaction (which is not great since it always loses valuable context details): The “smart-trim” feature uses a headless agent to find irrelevant long messages for truncation, and the “rollover” feature creates a new session and injects session lineage links, with a customizable extraction of context for the task to be continued.
[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...
To people asking why would you want Claude to call Codex or Gemini, it’s because of orchestration. We have an architect skill we feed the first agent. That agent can call subagents or even use tmux and feed in the builder skill. The architect is harnessed to a CRUD application just keeping track of what features were built already so the builder is focused on building only.
0. https://github.com/coder/agentapi
Contributions will be highly appreciated and credited
I think my only real take-away from all of it was that Claude is probably the best at prototyping code, where Codex make a very strong (but pedantic) code-reviewer. Gemini was all over the place, sometimes inspired, sometimes idiotic.
0: https://github.com/pjlsergeant/captive-wifi-tool/tree/main
But the “playing house” approach of experts is somewhere between pointless and actively harmful. It was all the rage in June and I thought people abandoned that later in the summer.
If you want the model to eg review code instead of fixing things, or document code without suggesting improvements (for writing docs), that’s useful. But there’s. I need for all these personas.
Honest question. How is Mysti better than a simple Claude skill that does the same work?
I guess it depends?
You can usually count on Claude Code or Codex or Gemini CLI to support the model features the best, but sometimes having a consistent UI across all of them is also nice - be it another CLI tool like OpenCode (that was a bit buggy for me when it came to copying text), or maybe Cline/RooCode/KiloCode inside of VSC, so you don't also have to install a custom editor like Cursor but can use your pre-existing VSC setup.
Okay, that was a bit of a run on sentence, but it's nice to be able to work on some context and then to switch between different models inline: "Hey Sonnet, please look at the work of the previous model up until this point and validate its findings about the cause of this bug."
I'd also love it if I could hook up some of those models (especially what Cerebras Code offers) with autocomplete so I wouldn't need Copilot either, but most of the plugins that try to do that are pretty buggy or broken (e.g. Continue.dev). KiloCode also added autocomplete, but it doesn't seem to work with BYOK.
Will definitely try to add those features in a future release as well
Often, revolutions take longer to happen than we think they will, and then they happen faster than we think they will. And when the tipping point is finally reached, we find more people pushing back than we thought there would be.
On the other hand agentic teams will take over solo agents.
Do they?
There was a paper about HiveMind in LLMs. They all tend to produce similar outputs when they are asked open ended questions.
But actually hosted on https://www.deepmyst.com/ with no forwarding from the Apex domain to www so it looks like the website is down.
Otherwise excited to deep dive into this as this is a variant of how we do development and seems to work great when the AI fights each other.
Update:
I've already found a solution based on a comment, and modified it a bit.
Inside claude code i've made a new agent that uses the MCP gemini through https://github.com/raine/consult-llm-mcp. this seems to work!
Claude code:
Now let me launch the Gemini MCP specialist to build the backend monitoring server:
gemini-mcp-specialist(Build monitoring backend server) ⎿ Running PreToolUse hook…
@gemini could you review the code and then provide a summary to @claude?
@claude can you write the classes based on an architectural review by @codex
What do you think? Does that make sense ?
“If”, oh, idk, just the tool 90% of potential users will have installed.
https://github.com/BeehiveInnovations/pal-mcp-server
Don’t quote me, but I think the other methods rely on passing general detail/commands and file paths to Gemini to avoid the context overhead you’re thinking about.
Personally, I wouldn't use the personas. Some people like to try out different modes and slash commands and whatnot - but I am quite happy using the defaults and would rather (let it) write more code than tinker with settings or personas.
So I can well imagine that this sort of approach could work very well, although agree with your sentiment that measurement would be good.
I often write with Claude, and at work we have Gemini code reviews on GitHub; definitely these two catch different things. I'd be excited to have them working together in parallel in a nice interface.
If our ops team gives this a thumbs-up security wise I'll be excited to try it out when back at work.
You may want to study [1] - this is the latest thinking on agent collaboration from Google.
[1] https://www.linkedin.com/posts/shubhamsaboo_we-just-ran-the-...
Autogen from ms was an early attempt at this, and it was fun to play with it, but too early (the models themselves kinda crapped out after a few convos). This would work much better today with how long agents can stay on track.
There was also a finding earlier this year, I believe from the swe-bench guys (or hf?), where they saw better scores with alternating between gpt5/sonnet4 after each call during an execution flow. The scores of alternating between them were higher than any of them individually. Found that interesting at the time.
Can you also include Cursor CLI for the brainstorming? This would allow someone to unlock brainstorming with just one CLI since it allows to use multiple models.
This turned me off as well. Especially with no published pricing and a link to a site that is not about this product.
At minimum, publish pricing.
If it's solving even your own niche problem, it is actually useful though right? Kind of a "yes or yes" question.
That may solve the original problem of paying for three different models.
I was thinking to make the model choice more dynamic per agent such that you can use any model with any agent and have one single payment for all so you won’t repeat and pay for 3 or more different tools. Is that in line with what you are saying ?
Can you comment on that?
I find it helpful to even change the persona of the same agent “the prompt” or the model the agent is using. These variations always help but I found having multiple different agents with different LLMs in the backend works better
I personally have moved to a pattern where i use mastra-agents in my project to achieve this. I've slowly shifted the bulk of the code research and web research to my internal tools (built with small typescript agents).. I can now really easily bounce between different tools such as claude, codex, opencode and my coding tools are spending more time orchestrating work than doing the work themselves.
Note that this functionality is not yet integrated with Mysti but we are planning to add it in the near future and happy to accelerate.
I think token optimization will help with larger projects, longer context and avoiding compact.
I think once I add cursor and cline then will also try to make it work with any number of agents
https://github.com/karpathy/llm-council