Hacker News

Automating myself out of development

91 points by nisabek - 54 comments

noelwelsh [3 hidden]5 mins ago

I wish people would describe in more detail the tasks they use LLMs to code. My experience is that simple components in an existing architecture are fine, but anything requiring architectural considerations quickly becomes a mess. On my projects (e.g. a ui framework), running multiple agents in parallel would just increase the speed at which it can stuff up the project.

girvo [3 hidden]5 mins ago

I'm currently using it to do a large migration from one Relay environment to another, but this is possible because

1. We've done it by hand for another route already, which the LLM uses as reference

2. Theres a strong validation setup/harness I've setup for it with storybooks, and component tests

3. It's a _mostly_ mechanical transform. Not entirely, as the two environments/APIs are not 1:1, but it's close enough

But! I and my team are still reviewing everything shrug it is "faster" because I get to have this running while I'm in meetings planning other more interesting projects

And this isn't really that many agents in parallel. Yeah, plenty of fan-out subagents, but that IMO doesn't count/isn't really the same as what others are talking about

germanptr [3 hidden]5 mins ago

I get this question a lot, and I found it hard to answer briefly, so I ended up writing a longer post about how I work:

https://www.trigosec.com/insights/mob-programming-for-one/

The short version is that I don’t let AI agents work unsupervised on my code. I treat them like participants in a mob programming session instead of autonomous developers. Different agents get different roles (implementer, reviewer, architect, security reviewer, etc.), and I stay involved throughout the process.

I also agree with your point about architecture. Generating isolated components is relatively easy; preserving and evolving the architectural boundaries across a larger codebase is much harder.

We’re still missing a good way to express and measure architectural quality. Until then, architecture heavy work requires much closer supervision than implementation heavy work

Swizec [3 hidden]5 mins ago

> We’re still missing a good way to express and measure architectural quality

Architectural complexity[1]! There’s several really good papers on this.

Unfortunately it never caught on and we don’t have great automated tools to spit out a number. Also the majority of people just don’t care enough. Research in this field kinda died out when we invented microservices and started treating those as a silver bullet to The Architecture Problem (it’s not [2])

[1] https://swizec.com/blog/why-taming-architectural-complexity-...

[2] https://youtu.be/y8OnoxKotPQ

iot_devs [3 hidden]5 mins ago

> Also the majority of people just don’t care enough.

Yet! It is the next frontier and we will need it for having agent as described in the post to really work

Swizec [3 hidden]5 mins ago

> Yet! It is the next frontier

While researching my book I read papers from the 80’s saying this. If you get a good enough spec and define the contracts and architecture, you then just hand off implementation to juniors/offshore/etc

So far has not worked. Maybe this time!

fbrchps [3 hidden]5 mins ago

Didn't even need to click the YouTube link, I knew it would be Krazaam.

vslira [3 hidden]5 mins ago

> The short version is that I don’t let AI agents work unsupervised on my code. I treat them like participants in a mob programming session instead of autonomous developers.

I wonder if OS maintainers would have a leg up in defining workflows to better leverage this. Of course, OS contributors are autonomous developers, but maybe a trick or two might transfer across

zem [3 hidden]5 mins ago

i've been running claude in what the blog calls phase 0 for the last 6-7 months. i'm perfectly happy with it, my development velocity has increased while i still have a good grasp of the entire app, and i've actually been making decent progress with web development for a personal project, which is something i've bounced off several times in the past. also i do not get stuck as often on stuff like "how do i get django to statically serve up a js bundle with relative imports" which is more about knowing specific APIs of specific frameworks than any feature of my code or architecture.

i would not want to go down the "take myself out of the loop" path because yes, i do have to micromanage the claude session, often course-correcting every commit and then doing large scale refactoring every so often. but i'm perfectly happy doing that - i see claude as more of a tool than a coder i can hand work off to.

davidcann [3 hidden]5 mins ago

I built this with 94% written by coding agents: https://buildermark.dev/

The complete log of all prompts and commits is here: https://demo.buildermark.dev/projects/u020uhEFtuWwPei6z6nbN

MonstraG [3 hidden]5 mins ago

It seems that pages 2-5 on

https://demo.buildermark.dev/projects/u020uhEFtuWwPei6z6nbN/...

still show content of page 1

davidcann [3 hidden]5 mins ago

Thanks for the report. I messed up the CDN settings. It looks fixed now.

amelius [3 hidden]5 mins ago

I personally limit LLMs to single files only at the moment. Self-contained components.

Using LLMs in a larger scope can sometimes work, but it has the real risk of turning a project into a mess after which you will have to undo the work and lose a lot of time.

Also, using LLMs this way with less clear boundaries will make reading and maintaining the code more cumbersome.

rootusrootus [3 hidden]5 mins ago

I use this strategy, too. I liken it to limiting the blast radius. If the LLM truly fouls things up it’s easier to pick up the pieces if you keep the scope limited.

pjmlp [3 hidden]5 mins ago

Me when not trying to meet management expectations, only as smarter code completion, formatting code, basic code analysis, and helping copy pasting code examples between languages.

Me when meeting management expectations, agent orchestration tools like Boomi and Workato calling into tools, doing with AI what a few years ago would be done with BPEL.

TheBigSalad [3 hidden]5 mins ago

You have to make those architectural decisions and feed them to the agents. Be very specific. That's been my experience.

warumdarum [3 hidden]5 mins ago

The true test challenges should be how far an AI can minimize a given fucked up codebase and keep full functionality.

I also think that writting large codebases into a sort of functional transformer tree as information compression stage would allow them to easier reason about large code bases by having a large lossless overview with minimal token usage.

properbrew [3 hidden]5 mins ago

I used LLMs to develop Whistle Enterprise (https://whistle-enterprise.com) from the ground up, from scratch.

It's taken _a lot_ of time and effort, but this is an example of what can be developed using LLMs alone.

You have to have dedication and a goal to reach, but you can absolutely build anything if you're building with the right foundations in mind.

ryanackley [3 hidden]5 mins ago

I think the relevant question isn’t what can be built but the amount of effort in comparison to doing this the old fashioned way.

What do you think the productivity gain was from using an LLM? This question assumes you’re already an experienced developer.

properbrew [3 hidden]5 mins ago

Thank you for the assumption, I'm actually not a developer at all.

I'm from a hardware / networking / infrastructure background. I've had extensive exposure to (web) application development as I'm working closely with development teams and I do have the bash/powershell scripting knowledge.

But honestly, if I tried this "the old fashioned way" it probably would have taken me about 6 to 7 years to develop that application, that's an optimistic estimate. You really do have to have a passion for what you're building, I didn't know that voice transcription and local LLMs would be such a driving force for me, but it's all I think about, so much that I find it hard to go to sleep sometimes.

andai [3 hidden]5 mins ago

n=1 but, a friend of mine spent the last few months working on an experimental music software with Claude. What he built is amazing and far beyond my abilities (I have been programming for 20 years). He doesn't know any programming.

In fact, it's far beyond what I would even attempt, because I've just spent two decades building up a data bank of how hard things are supposed to be.

He doesn't know it's supposed to be hard, so he just does it.

dmortin [3 hidden]5 mins ago

Is his code maintainable, though? Or is it just a pile of code which happens to work? What if he wants to change something? Does he generate again the whole thing from scratch? Or does he tell Claude to make the changes and doesn't even know when something breaks when a new thing is added? (Assuming the software is complex, having multiple non trivial features.)

motoroco [3 hidden]5 mins ago

There’s no free lunch, it takes time and effort still. And expertise if you need it to be robust.

In terms of velocity, let me offer some numbers. In 6 months I generated >150k lines of code and merged 10k PRs to ship and iterate on https://plotalong.app

I follow best practices and isolate agents to continuously deployed dev environments, semi-manually review PRs and gate the release process between multiple protected envs. The project is getting close to 500 end-to-end tests in Playwright.

That’s just working nights and weekends. Before AI, it took my team at the office 4 years to produce this much work. There are some qualitative differences but the speed and results are real

leguy [3 hidden]5 mins ago

neat. I saw the "no bot joins the call". Is it obvious to others in the virtual meeting that you are using this tool?

properbrew [3 hidden]5 mins ago

Thank you! No they cannot tell. It is your requirement as per the laws of your country to notify the other party if you're going to use it.

pipes [3 hidden]5 mins ago

I found that this guys stuff has really helped me:

https://youtu.be/-QFHIoCo-Ko?is=FYYdukWluYX3vdQL

Worth a watch.

nullbio [3 hidden]5 mins ago

It's great for people who are just maintaining something. Less so for someone building something from scratch, in the earlier phases.

Npovview [3 hidden]5 mins ago

There are hour long youtube videos where people explain the process by using a complex toy project. Search for them.

2001zhaozhao [3 hidden]5 mins ago

Good writeup. I think the main difference in my workflow is that I skipped the sandboxing part and accepted the coding agent having access to the entire 24/7 dev machine, so I'm still running on worktrees. Also, the "idea enrich" steps in my workflow are less formal - I tend to write most details in a feature spec myself. I also do my workflow on my own self-hosted custom interface which comes with a kanban board for project tracking, so I don't need Github. The rest of the workflow looks pretty similar.

gnunicorn [3 hidden]5 mins ago

Interestingly, despite it being much more detailed and a lot more process and procedure than what I currently do - which is more akin to the version 0 described, but in parallel - we come up at the same final problem: reviews and quality assurance.

I sign off the code I merged, part of company policy but also just to be sure it is actually decent. But reviewing has become the real draining bottleneck: even stacked PRs, if that total 5-6k lines is not a 5min job. Even if I brainstormed and set the plan, that's really the part that doesn't scale right now for me in this. But the author is very shy about that: either the changes arent that big in the end or they trust the process enough to review in a more casual manner. Being equally untrusting I can't do that ...

philbo [3 hidden]5 mins ago

For decades, engineers understood that large code reviews are harder than small ones. Out of both politeness and a desire to receive better code reviews, we learned to break our large changes into smaller chunks. Some engineers took things even further and replaced code reviews with pair programming. But then LLMs showed up and everyone seems to have forgotten those lessons.

They can be still be applied now using coding agents, if you're willing to push back against the default setup and change your mode of thinking a little bit. Of course it doesn't help that an entire industry is dedicated to persuading us that maximizing token spend is the only way to get shit done.

I appreciate this probably seems like an extremist take, but I wrote some more about it here in case there's anybody out there who identifies with it:

https://philbooth.me/blog/agentic-coding-and-mental-models

girvo [3 hidden]5 mins ago

> They can be still be applied now using coding agents, if you're willing to push back against the default setup and change your mode of thinking a little bit. Of course it doesn't help that an entire industry is dedicated to persuading us that maximizing token spend is the only way to get shit done.

Yeah the problem is the executives and managers around us are demanding we ship massive features as quickly as possible, and I like having a job and dread having to find a new one in this market...

aocallaghan17 [3 hidden]5 mins ago

Agree with this completely. This push for more autonomy I think is the complete wrong direction for how to use LLMs.

I want less code to maintain not more that I don't even fully understand.

I think research and very supervised coding with lots of guardrails is the way to actually gain productivity from these tools.

firegodjr [3 hidden]5 mins ago

I think that's reasonable. My only gripe is that making small sets of changes is often faster to do by hand than waiting on llm reasoning, so I've found it amounts to very little speedup.

strogonoff [3 hidden]5 mins ago

Proper review should take longer than writing it yourself, because you need to know the correct solution, understand the proposed solution, and evaluate the difference between the two. When designing it yourself, you just need to know the correct solution and write it, and with modern high-level languages and IDEs with autocomplete writing it is hardly a bottleneck.

minihat [3 hidden]5 mins ago

It is harder to solve a sudoku than verify a solution's correctness. I find similar benefits occasionally when coding with LLMs.

layer8 [3 hidden]5 mins ago

I disagree under the following circumstances, which in my experience is the common case: You don’t know from the outset all relevant considerations that go into implementing something. Coding yourself is an exploration process of those considerations. Being shown a finished solution doesn’t let you see and understand all the considerations and the possible options that you’d have contemplated when implementing it yourself. When reviewing, you still have to do that exploratory thinking to weigh the possible options. And the fact that you have to do that exploration purely mentally rather than in a process of working with code arguably makes it harder (similar to contemplating alternative solutions to a Sudoku purely mentally, actuallu).

There rarely is a single correct way of implementing some requirement or feature. It’s a trade-off between compromises, not binary correct or incorrect like a Sudoku puzzle. The insights that the exploration give you may even lead you to implement something significantly different from what you originally set out to.

strogonoff [3 hidden]5 mins ago

Imagine sudoku with hundreds of subtle, sometimes mutually exclusive rules, and no single valid solution.

This is not about LLMs, by the way. It’s about reviewing any code, including by a fellow human. It’s just that many people mistakenly feel like with LLMs they can lower their guard and accept even if they have not gone through the steps of themselves coming up with their solution and comparing it to the one suggested by the LLM.

The reason is that many correctly see proper review as duplicate work, and while it is justified with another human (because it is (A) instructive and (B) reducing bus factor) with LLMs most people simply can’t be bothered. If you personally can, you are a minority.

skydhash [3 hidden]5 mins ago

Sudoku’s constraints are knownn and easy to build an harness for. Software has a more malleable structure. An harness is hard to build and the tests cases for the constraints can be a lot.

nisabek [3 hidden]5 mins ago

If I'm attentive during spec/plan creation I sort of build this "expectation" of what the actual PR will look like, the mental model of it. Then it's somewhat easier to review. But the mental load is brutal tbh, and still not sure if it's "worth it"

brcmthrowaway [3 hidden]5 mins ago

More Yegge tier psychosis.

pydry [3 hidden]5 mins ago

>Automating myself out of development

>I want to start by saying that I’m neither an AI-fanatic

Kind of like saying you are a fanatic before saying you aren't.

I don't think theres too much here (e.g. "spec driven development") I haven't seen elsewhere.

general1465 [3 hidden]5 mins ago

I am completely calm regarding AI and development.

First nobody sane want to give their domain IP to OpenAI/Anthropic. That's why local AI will eventually prevail and flourish because people who actually have some IP will have no problem to buy 10k+ EUR machine to run some pretty good models on it. However if your main job is just doing CRUD stuff, then you are screwed.

Secondly hallucination is really Achilles heel of every LLM. Sure you can recreate an application which exists in thousand of variations on the internet, but the moment you will try to go more into domain knowledge you will start struggling more and more.

Try to make CAN driver for ESP32, easy it is probably going to work. Try to make CAN driver for STM32F7xx now the AI will start having a problem but probably will be able to produce something what is working after a lot of debugging. Now let's make CAN driver for MPC5555. AI will start writing fairy tales about registers which do not exist. All of processor above have reference manuals and sometimes example git repositories available on open internet.

duggan [3 hidden]5 mins ago

> First nobody sane want to give their domain IP to OpenAI/Anthropic. That's why local AI will eventually prevail and flourish because people who actually have some IP will have no problem to buy 10k+ EUR machine to run some pretty good models on it. However if your main job is just doing CRUD stuff, then you are screwed

Replace OpenAI/Anthropic with AWS and this is not too dissimilar to the arguments in 2009 about cloud providers.

It’s not that there's nobody for whom this is true, it’s just that there’s enough of everyone else to build an empire with.

bonoboTP [3 hidden]5 mins ago

Did you try this by giving it access to the materials? Human programmers also don't memorize all this stuff. If this is the reason for your calmness it's quite shortsighted.

There are problems when you rely too much on AI generated code, but these shallow dismissals are quite annoying.

abletonlive [3 hidden]5 mins ago

> All of processor above have reference manuals and sometimes example git repositories available on open internet.

okay? then give those reference manuals and git repositories? I haven't heard something know LLMs can't get around and figure out?

yieldcrv [3 hidden]5 mins ago

I don't know if I’m overly critical but there’s gotta be a middle ground between totally AI pilled people that otherwise have no talents, and control freak veteran developers who cant let go

My current process is also using Github projects in a normal scrum style way, with many tickets written or fleshed out and state managed by the LLM, and it doubling as the memory system

Completely leapfrogging all these other open and closed source concoctions and being more effective

But its effective enough that I don’t need OP’s final form state of still approving everything

Auto-mode is fine. Worktrees are built into Claude Code now. I just tell it to classify tickets as sequential or parallel possible and spawn subagents to tackle all of the tickets in the todo list

They all get their own context window its pretty perfect now

in the meantime I work in a couple tabs of Claude Design for different flows of any client side app. My philosophy has been that devs could pick up graphic and UI/UX design easily, its just still a full time job to make variations of layouts and portray their states.

UI/UX is not a full time job anymore.

And I use Claude chat to flesh out aspects of the overall idea

I think you may be overcomplicating your workflow in the concluding state.

Overall I agree that planning and intention is now most of the time, before a 10 subagent precision strike is initiated

bluefirebrand [3 hidden]5 mins ago

> control freak veteran developers who cant let go

It is not control freak behavior to want to be in control when you are the one accountable for it if it breaks.

nisabek [3 hidden]5 mins ago

Could be (the overcomplicating part), I'm just not yet comfortable loosing the mental model of the final application. At least not in all types of tickets. Are you not seeing that?..

yieldcrv [3 hidden]5 mins ago

I focus on one side project at a time, alongside work applications

Both are giving me skillsets to excel in the other domain

I watch the subagents, push back on some choices, look at commits and glance at pull requests

thi2 [3 hidden]5 mins ago

There are tons of people, those are just not as vocal.

ai_fry_ur_brain [3 hidden]5 mins ago

All these people saying UI/UX is dead, then I see their designs and they're absolutely the worst (but they're always swearing by how incredible it is).

Sorry access to an LLM (even if it could center a div reliably and make a responsive designs, it can't) does not give you taste, intuition or make you good at building user interfaces. You people/sloppers have no idea the amount of sweat that gets poured into great UX.

Its insulting when you people say these things and Im not even a designer or frontend dev.

I actually think UI/UX designers and devs will be the last to fall. I will want beautiful products that were built by beautiful minds, thats how you will set yourself apart from the slop. And fortunately it will be even easier when 80% of everything is half assed cranked out UI by llm design tools. The contrast is already glaring.

yieldcrv [3 hidden]5 mins ago

I’ve seen that slop but

Claude Design has barely been out for a month

And it’s fulfilled my needs better than v0, lovable, playwright via LLM or just iterating in the coding LLM. I’ve worked with graphic designers my whole career and have also contracted design agencies to do style guides and collaborate on branding and layouts. I’ve gotten the output that I’m looking for with Claude Design

eventually you’ll see examples but its not in my purview to publicly link any of my projects as being vibe coded

ai_fry_ur_brain [3 hidden]5 mins ago

Lmao claude design sucks ass. You have low standards.