This is the wrong way to do it. As software architects, you need to learn to appropriate the correct usage of algorithms and AI. Using AI for building everything is not just a waste of tokens, it also is an exercise in futility.
Here is how I solved this problem:
1. There is already a knowledgebase of almost all APIs (the ones that are useful to the average Joe anyway) in either Swagger.json or Postman.json format. This is totally upto you as to what format you prefer.
2. Write a generator (I use Elixir) to infer which format 1. uses and generate your API modules using a code generator. There are plenty, or you can even write your own using simple File.write!
3. In the rare occurence you coming across a shitty API with only scattered documentation across outdated static pages online, only then use an LLM + browser to automate it to write it into the format listed in 1. (Swagger.json or Postman.json)
Throwing an LLM at everything is just inefficient lazy work.
Falimonda [3 hidden]5 mins ago
Define "it" in the context of "doing it wrong".
The post provides a lot of good food for thought based on experience which is exactly what the title conveys
gchamonlive [3 hidden]5 mins ago
> There are two obvious approaches: start with lots of guardrails, or start with very few and learn what the models actually do.
> We chose the second because we didn’t want to overfit our assumptions.
> Some of it went better than expected.
> But they also broke in very unexpected ways, sometimes spectacularly.
You clearly missed the whole point of the article, which is to experiment with agents and explore the limits of having them run wild.
Efficient use of tokens and which tasks to delegate is secondary to the experiment. Optimizing these is in any case premature if you don't understand the limits of the models.
neya [3 hidden]5 mins ago
> which is to experiment with agents
I think you completely missed the point - they built a product purely using agents and deployed it to production for others to use. Read what the product actually does first.
gchamonlive [3 hidden]5 mins ago
Why shouldn't they ship it to production if the experiment was a success? You say the only way to code is to "learn to appropriate the correct usage of algorithms and AI" which for you is to code a generator and only use "dumb" generators to produce code, which is fine, but they just showed that for 20 bucks and a few minutes you can get very far, so their evidence is just stronger than yours.
neya [3 hidden]5 mins ago
> their evidence is just stronger than yours.
What evidence? There is 0 evidence. It's deployed to production, but that doesn't mean it works fine or is free of bugs - which is exactly my point and why you use algorithms for these types of things. They're testable, repeatable and scalable.
With LLM slop it's just that - slop.
gchamonlive [3 hidden]5 mins ago
Have you seen the code to write it off as slop?
groby_b [3 hidden]5 mins ago
Pardon me if I misread, but wouldn't that be better served by a ready-made library (with, if you must AI, some futzing to account for call signature)?
What is the value add of having the AI rebuild code over and over, individually for each project using it?
bilekas [3 hidden]5 mins ago
I don't know, maybe I'm misunderstanding too but they basically just asked an agent to interface with an API. It seems the agent will create new code each time..
I hope this isn't their business model.
j16sdiz [3 hidden]5 mins ago
In my experience, most "SDK" we have today is just thin wrapper of the HTTP call generated from openapi / swagger.
It take lots of readings and testing before integrating to your project.
rguldener [3 hidden]5 mins ago
Author here, the build happens together with building your app.
Once built, the code executes deterministically at runtime.
The news here is the AI reading the API docs, assembling requests, and iterating on them until it works as expected.
This sounds simple, but is time consuming and error prone for humans to do.
skybrian [3 hidden]5 mins ago
I think the question is why integrating with, say, Google Calendar is different for each customer? How much is custom versus potentially reusable code?
mellosouls [3 hidden]5 mins ago
Nango claims to be fully open source but the documentation seems to imply the self-hosted version is a small subset:
Ofc that may well be my misreading but it seems important in the context of the claim and the analysis using OpenCode.
Perhaps they could clarify and/or revisit the docs.
yojo [3 hidden]5 mins ago
The TL;DR dos not seem to match the rest of the article.
They claim the agents reliably generated a week’s worth of dev work for $20 in tokens, then go on to list all the failure modes and debugging they had to do to get it to work, and conclude with “Agents are not ready to autonomously ship every integration end-to-end.”
Generally a good write up that matches my experience (experts can make systems that can guide agents to do useful work, with review), but the first section is pretty misleading.
cpursley [3 hidden]5 mins ago
If you're using Elixir (or don't mind running a separate Elixir service), we've built what is effectively a clone of the oAuth part of Nango (formally Pizzly). Drop into any Elixir project and get full oAuth management out of the box, and it's compatible with all of the Nango provider strategies:
A lot of these smells like skill issue on the model. So many are completely non-issues if using Claude Opus 4.5+
The idea of assigning a code-owner agent per directory is really interesting. A2A (read: message passing and self-updating AGENTS.md files) might really shine there in some way.
Here is how I solved this problem:
1. There is already a knowledgebase of almost all APIs (the ones that are useful to the average Joe anyway) in either Swagger.json or Postman.json format. This is totally upto you as to what format you prefer.
2. Write a generator (I use Elixir) to infer which format 1. uses and generate your API modules using a code generator. There are plenty, or you can even write your own using simple File.write!
3. In the rare occurence you coming across a shitty API with only scattered documentation across outdated static pages online, only then use an LLM + browser to automate it to write it into the format listed in 1. (Swagger.json or Postman.json)
Throwing an LLM at everything is just inefficient lazy work.
The post provides a lot of good food for thought based on experience which is exactly what the title conveys
> We chose the second because we didn’t want to overfit our assumptions.
> Some of it went better than expected.
> But they also broke in very unexpected ways, sometimes spectacularly.
You clearly missed the whole point of the article, which is to experiment with agents and explore the limits of having them run wild.
Efficient use of tokens and which tasks to delegate is secondary to the experiment. Optimizing these is in any case premature if you don't understand the limits of the models.
I think you completely missed the point - they built a product purely using agents and deployed it to production for others to use. Read what the product actually does first.
What evidence? There is 0 evidence. It's deployed to production, but that doesn't mean it works fine or is free of bugs - which is exactly my point and why you use algorithms for these types of things. They're testable, repeatable and scalable.
With LLM slop it's just that - slop.
What is the value add of having the AI rebuild code over and over, individually for each project using it?
I hope this isn't their business model.
It take lots of readings and testing before integrating to your project.
The news here is the AI reading the API docs, assembling requests, and iterating on them until it works as expected.
This sounds simple, but is time consuming and error prone for humans to do.
https://nango.dev/docs/guides/platform/free-self-hosting/con...
Ofc that may well be my misreading but it seems important in the context of the claim and the analysis using OpenCode.
Perhaps they could clarify and/or revisit the docs.
They claim the agents reliably generated a week’s worth of dev work for $20 in tokens, then go on to list all the failure modes and debugging they had to do to get it to work, and conclude with “Agents are not ready to autonomously ship every integration end-to-end.”
Generally a good write up that matches my experience (experts can make systems that can guide agents to do useful work, with review), but the first section is pretty misleading.
https://github.com/agoodway/tango
The idea of assigning a code-owner agent per directory is really interesting. A2A (read: message passing and self-updating AGENTS.md files) might really shine there in some way.