I start with a high level design md doc which an AI helps write. Then I ask another AI - whether the same model without the context, or another model - to critique it and spot bugs, gaps and omissions. It always finds obvious in hindsight stuff. So I ask it to summarize its findings and I paste that into the first AI and ask its opinions. We form an agreed change and make it and carry on this adversarial round robin until no model can suggest anything that seems weighty.
I then ask the AI to make a plan. And I round robin that through a bunch of AIs adversarially as well. In the end, the plan looks solid.
Then the end to end test cases plan and so on.
By the end of the first day or week or month - depending on the scale of the system - we are ready to code.
And as code gets made I paste that into other AIs with the spec and plan and ask them to spot bugs, omissions and gaps too and so on. Continually using other AI to check on the main one implementing.
And of course you have to go read the code because I have found it that AI misses polishes.
gen220 [3 hidden]5 mins ago
The discourse around AI is that we’ve unlocked a whole new unsupervised paradigm of development; but you’re basically describing how Google has built code for a decade, just with humans of different levels of trust instead of AI.
And I’m not saying that to poke fun at you (my workflow is essentially identical to yours), or at Google, but rather to say that there’s nothing new :)
AI is a fantastic accelerator of effective and ineffective workflows alike. It’s showing us which are effective and ineffective on way shorter timescales / in realtime!
wood_spirit [3 hidden]5 mins ago
That is actually reassuring. I used to try to work this way with people but the culture where I work didn’t align and I found it easier to work this way alone by trying to put myself into critique mode and so on. Now much better to get AIs to do it. And I find the more I polish the plan the less expensive the AI needed to implement too.
> And of course you have to go read the code because I have found it that AI misses polishes
Since you mentioned using other agents, do you get mileage out of code reviews with another agent polishing the unpolished bits? My colleagues swear by it, though I personally remain skeptical about its value without a human reviewer.
How much faster/slower are you with that process compared to writing code yourself?
pbowyer [3 hidden]5 mins ago
Developer of 20+ years here, can't give you an accurate multiplier but I am faster.
Because spotting holes in specs has never been one of my strengths. And working without technical colleagues much of the time, it's a boon to be able to "rubber-duck" my ideas with something that is at least more intelligent than plastic.
Grabbing multipliers from thin air, the coding bit may only be 2x faster with a poorer-quality outcome, but working out what's needed is a good 5x faster.
And yes, I'm using the same adversarial AI MO as @wood_spirit, combined with Matt Pocock's excellent /grill-me and /grill-with-docs skills [1] and Plannotator [2] to review the plans.
I actually use LLMs a lot to rubber duck my problems and help develop plans. Then I manually code, to ensure my skills don't deteriorate. I feel like I'm a lot faster, with few of the downsides. Do you have any thoughts on this process?
pbowyer [3 hidden]5 mins ago
If you can type code fast and accurately, it sounds a great process to use. You're using LLMs for the bit where they bring great value, and yourself as a higher quality coding agent :)
sn9 [3 hidden]5 mins ago
Have you considered incorporating formal modelling?
Only at the "hmm that seems an interesting idea" level.
Thanks for the links, going to have a read and see if I can apply any to my work.
SkyPuncher [3 hidden]5 mins ago
Thanks for sharing those. They look interesting.
tracker1 [3 hidden]5 mins ago
Can't speak for GP or OP, but I see about 10x the output and 2-4x the value of what I would be able to get by hand. Within the gap between 2-4x and the 10x is really a lot of design documents, user/dev documentation and testing that I might not have rolled to nearly the extent that I do/get when using AI.
I haven't been using multiple AIs adversarially as OP, but might consider giving it a try with Codex and Opus. That said, my AI workflow has been pretty similar... lots of iterations on just design, then iterations on documentation, testing, etc... then iterations on implementation, testing, validation and human review in the mix.
My analogy is that it's really close to working with a foreign dev team, but your turnaround is in minutes instead of days, where it's much more interactive.
nomel [3 hidden]5 mins ago
I'm seeing the same, for gains being largely from documentation.
I feel strong making "dev" documentation though, since it seems a bit redundant/superfluous. I fully suspect nobody is going to read it at this point.
tracker1 [3 hidden]5 mins ago
Fair... but the AI will/may as you use agents for dealing with issues/bugs, etc.
SkyPuncher [3 hidden]5 mins ago
For me, sometimes faster/sometimes slower, but there are a lot of other benefits besides speed:
* I can work in code I'm not familiar with much easier
* LLMs often identify confusion or uncertainty upfront, so I can address it earlier.
* I'm much less mentally taxed so I can go for longer at my top end.
* Meetings, disruptions, end of day is WAY less critical since I can lean on the LLM to get back into things.
* I can do something else productive while the LLM is running. Bug fixes, documentation, PR reviews, etc.
alfalfasprout [3 hidden]5 mins ago
Having tried something similar, the perceived speedup does not, in the steady state, last.
To get a quality, lasting, result you're ultimately having to carefully study everything otherwise you end up quickly accumulating cognitive debt and the speedup soon shrinks as you're constantly having to revisit the initial approaches.
tibbar [3 hidden]5 mins ago
Reviewing 22,000 lines of code, even from antirez, with this complex of a feature set and minimal PR description sounds like a nightmare. One starts to see why major open-source software like Postgres tends to be developed on a mailing list, with intermediate design decisions discussed by the community, separate patches for different related features, incremental review, and then a spaced release cadence.
antirez [3 hidden]5 mins ago
The code is 5000 lines of code in total, comments included:
2000 lines the sparse array.
2000 lines the t_array commands and upper layer implementation.
~500 lines of AOF / RDB code.
All the other stuff is tests, JSON command descriptions, TRE library under "deps".
SkyPuncher [3 hidden]5 mins ago
I might be the outlier, but this PR feels like heaven to review. It's a complete, all encompassing PR that I can work through with the entire context right in front of me.
If the initial development bar is relatively high, it's far, far easier to identify flaws and gaps when you have the whole thing in front of you all at once.
fancy_pantser [3 hidden]5 mins ago
I think the point GP is making is this is a PR that smells like a solo dev working on their own project and not how a community-driven project adds major new functionality, although I'm sure there are docs and descriptions (or at least a discussion of tradeoffs and design decisions if not ADRs) are somewhere, but not linked handily to the PR. There is a lot of explanation in the blog post and PR, but it's unilateral-looking.
c.f. valkey and others
antirez [3 hidden]5 mins ago
Redis was completely built in this way since the start. I believe this is a better way to create software. Compromise in design is, in my opinion, something to avoid: feedbacks are important, but often times a single person that studied a lot the problem and have design taste, can come up with a great solution. Mediating such solution, even among two stellar A and B solutions, will not produce a C soution that is better, since you can't produce such solution by interpolation. It is simpler to damage A and B. And: it is rare that in a big set of people all have stellar ideas, so you have to mediate, often, also with people having poor ideas. Not worth the effort for the way I'm wired. What works better for me is to provide hints about what I'm doing, then I receive feedbacks, and sometimes there are really great ideas in this feedbacks, and I incorporate the part I like.
fancy_pantser [3 hidden]5 mins ago
Thanks, I think I'm all caught up now. The timeline is like this if I understand correctly: your successors (Yossi Gottlieb and Oran Agra) explicitly announced a new governance model in 2020, saying the project had "outgrown the BDFL-style of management" and that they wanted to "promote more teamwork and structure". With the relicensing in 2024, however, external contributors with five or more commits to Redis dropped to zero in the first six months (basically, community contribution collapsed). In late 2024, you came back in the role of "Redis evangelist" and a year ago there was an additional licensing change, adding AGPLv3 as an option (8.0's tri-license). So now redis has your steady hand on the wheel again.
I was confused because the last time I checked on things, it was still about fostering community input and advancement but not necessarily consensus. Things have tipped back in the original direction since then. I don't think "Redis was completely built in this way since the start" is completely accurate, but also the community effort under the new governance model never got very deeply entrenched while you were away.
tibbar [3 hidden]5 mins ago
First of all, redis is amazing, and your 4 month development process speaks to the fact that you've already designed and verified correctness super thoroughly.
... just speaking as someone who sometimes has to review very long PRs sometimes, though, I feel like 25% is a roughly normal level of "signal to noise." 5,000 lines of core logic is a LOT, and the tests and dependencies do still need to be read.
EDIT: I feel like the problem, as a reviewer, is processing 4 months of intensive research/development and providing useful feedback. At that point, there's probably not much major input you can have into the core architecture or strategy, so you're probably not providing much more than a bugbot at that point.
derefr [3 hidden]5 mins ago
> At that point, there's probably not much major input you can have into the core architecture or strategy
Sure you can? In this concrete case, Redis is very "flat" — there's the data structure implementations, and there's the commands that use them. 1+N. You could have feedback about the data structure (i.e. whether it's optimal for the use-cases); or about any of the commands (i.e. not just their impls, but also whether they're the best core API surface to lock in long-term, or even whether they're worth including at all.)
Any given feedback would necessitate fairly limited rework to address, as you're either modifying the data structure (and its tests) or a command (and its tests and docs.)
tibbar [3 hidden]5 mins ago
Fair point that there might be some functional changes you can suggest, but I continue to suspect that by the time this PR hit GitHub, all the most important decisions have already been finalized.
fancy_pantser [3 hidden]5 mins ago
I think where we went wrong in understanding this PR is in the assumption that it's designed to invite review because that's how a lot of other team- or community-driven projects work.
epolanski [3 hidden]5 mins ago
Postgres and Redis are dramatically different projects with radically different stories, contributions and development team.
Virtually all major Redis features are a solo job of the post author.
By the way reviewers are paid good money for this and know the setup.
tibbar [3 hidden]5 mins ago
Oh wow, I didn't realize that Redis is still mostly just authored by antirez! (My understanding is that he had left for some time and then returned to the project.) That is, honestly, pretty amazing. Well, redis is great and clearly it's worked out.
SuperV1234 [3 hidden]5 mins ago
Closely matches my own experiences with current SOTA AI. Extremely useful collaborator, far from being a replacement for human intelligence and creativity.
foobarian [3 hidden]5 mins ago
I like to say, AI is the duck programming duck I always wanted
bonesss [3 hidden]5 mins ago
LLMs are the insensitive Asmovian robots I’ve always wanted, who translate and do the hardest part of my job: ensuring my emails are polite and none of my true thoughts or feelings are revealed…
Now I just need a way to protect my chats from any potential discovery, and <pew pew> business’ll be easy.
genghisjahn [3 hidden]5 mins ago
I occasionally type into slack "Future lawyers, the previous conversation is a joke. No one is doing cocaine to get through writing requirements docs."
imadethis [3 hidden]5 mins ago
We have a “don’t get the slack subpoenaed” emoji that gets frequent use. Incidentally, a lawyer doing discovery in the future could just search for uses of that emoji to find what they’re looking for.
antirez [3 hidden]5 mins ago
There are projects that I develop mostly not looking at the code, but owning the concepts, algorithms and ideas asking questions and giving hints, and owning especially the product. But, not for Redis, not yet at least. When in the future this will be possible, server software, the way it is developed today, will be over. I bet there will be still projects and repositories, as accumulation of features, fixes and experiences will still be worth it, but the role of programmers will be very similar to what Linus did so far for the kernel. And for certain projects I'm developing, like the DeepSeek v4 inference engine, I'l already working like that.
gurgeous [3 hidden]5 mins ago
Thanks for adding this. Excited about array/regex, also very interested in your experience using LLMs to stretch your abilities. There are many of us laboring quietly on various projects attempting the same. "Vibe coding" (and the backlash) doesn't really capture how we work.
tracker1 [3 hidden]5 mins ago
I definitely don't consider how I've used agents as vibe coding at all... I'm much too involved and validate/verify/review everything.
epolanski [3 hidden]5 mins ago
The problem with "vibe coding" is that the author who coined the term gave it a very specific definition (after all, it's his term): writing software without looking at the code, just "vibing".
Then it quickly lost its original meaning as people started using it for virtually all forms of AI-assisted coding.
sylvinus [3 hidden]5 mins ago
Thanks for the write up. Always interesting to see how very senior developers interact with AI these days.
@antirez: Introducing a regex feature that late into the project for a seemingly unrelated feature feels a bit weird? Can you explain more your rationale on that? thanks!
antirez [3 hidden]5 mins ago
Once I realized arrays were a great fit for text files, many use cases I could conceive were always limited by the fact we need to grep on files. So I thought: what is the AROP equivalent for files? ARGREP. Then I made sure to add both fast, exact and regexp matching so that depending on the use case the best tool could be used. I then discovered that for many OR-ed strings regexps could be the faster way if we'll optimized. And then I specialized TRE a bit.
simonw [3 hidden]5 mins ago
Are there other existing Redis data types and features that might benefit from integrating TRE?
antirez [3 hidden]5 mins ago
KEYS comes immediately to mind :)
ericpauley [3 hidden]5 mins ago
Couldn't some of the use cases presented for this be accomplished with ZSETs? I get the performance angle, but it seems that this could have been accomplished without the new API surface by selectively optimizing ZSET storage for dense values (in the same way that Arrays selectively use sparse representations).
The RE component is interesting, but as commentary here has noted it seems orthogonal to the array data structure (i.e., usable on others as well). Does this not make more sense to accomplish with Lua scripting? Or if performance of Lua is an issue perhaps abstracting OP to be composable on top of any command that returns a range of values.
I say this with reverence for Antirez as the expert in this space, but some of this new feature set feels like the sort of solution that I tend to see arise from LLM-driven development; namely creation of new functionality instead of enhancement of existing, plus overcomplicating features when composition with others might be more effective.
antirez [3 hidden]5 mins ago
Unfortunately not, sorted sets are actually a bit in the other side of the spectrum: they are semantically sound, but absolutely wasteful because of the combined skiplist + array. Also, if the underlying representation is not an array, range queries and ring buffers will never be as efficient and compact as they should. In theory you can do everything with everything, but segmenting what each API can do allows you to exploit the use cases to provide the best underlying implementation.
ardline [3 hidden]5 mins ago
Solid work. The devil's in the operational complexity, but this looks manageable.
localhoster [3 hidden]5 mins ago
Let's make it very clear - this is the original creator of redis, or one of them.
He is not "your avg dev" and it took him 4 months with llm.
This is not a seal of approval for you to go and command all your developers to move to Claude code/codex/any other ai coding tool fully.
I'm looking at you - any avg CEO of a startup.
simonw [3 hidden]5 mins ago
It's a pretty strong endorsement for the idea that coding agents, used skillfully by experienced developers, can further amplify their expertise.
zozbot234 [3 hidden]5 mins ago
Sure but the OP suggests that these were minor gains, and that this limited scope for gains was necessary in order to preserve the quality standard that's long been expected in that FLOSS community. We aren't talking about either a 10x productivity gain or one-shotting entire new features from scratch.
This is arguably a key quote: "Then, it was time to read all the code, line by line. ... I found many small inefficiencies or design errors ... so I started a process of manual and AI-assisted rewrite of many modules." We should not underestimate that step: reading code line by line might easily require more time than writing it from scratch.
nl [3 hidden]5 mins ago
The author said "You know what was the biggest realization of all that?"
> For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms.
simonw [3 hidden]5 mins ago
Right, and those of us who advocate for a sensible approach to agentic engineering don't talk about 10x productivity gains or one-shotting entire new (production-ready) features from scratch either.
I remain unconvinced by the "faster to write it by hand than read it" arguments though. My experience throughout my career is that most people, myself included, top out at a couple of hundred lines of tested, production-ready code per day. I can productively review a couple of thousand.
FEELmyAGI [3 hidden]5 mins ago
"top out at a couple of hundred lines of tested, production-ready code per day" + " productively review a couple of thousand." + LLM agents that write code for you = apparent contradiction with your first paragraph.
zozbot234 [3 hidden]5 mins ago
Right, I don't think you can "productively review a couple thousand" lines of code per day. That would imply that the review step for this very patch only took a couple days in total (since the core code is described as 5k lines) which is rather implausible to say the least.
skybrian [3 hidden]5 mins ago
Both Simon Willison and Antirez said that using LLMs helped them, so it's kind of perverse to read them and conclude the opposite.
In particular, doing direct comparisons between metrics like that doesn't work. "Lines of code" isn't a good way to measure complexity of the code, and the amount of time it takes to review the code will vary quite a bit based on the use case.
There's a lot of diversity in what kind of code people write and just because it worked for someone else doesn't mean it will work for the kinds of problems you solve. It's anecdotal evidence that someone else found it useful, your mileage may vary.
zozbot234 [3 hidden]5 mins ago
The relevant question is whether it helped them 10x or anywhere close to what AI is now being sold as (supposedly even replacing software developers' jobs altogether and one-shotting complete products from a single prompt), or it's just acting as a kind of glorified autocomplete. So far we're clearly seeing the latter based on what both Simon Willison and Antirez are referencing.
oulipo2 [3 hidden]5 mins ago
Simon often says that its LLMs help him "write productive code", but most of the code he shows are python libs doing menial tasks. That's fine for tooling, etc, which is sometimes useful.
It would absolutely NOT work for production-code with critical concurrency / embedded / real-time stuff
nl [3 hidden]5 mins ago
Antirez wrote Redis. That is "production-code with critical concurrency"
To quote another of his posts:
> I fixed transient failures in the Redis test. This is very annoying work, timing related issues, TCP deadlock conditions, and so forth. Claude Code iterated for all the time needed to reproduce it, inspected the state of the processes to understand what was happening, and fixed the bugs.
...
> In the past weeks I operated changes to Redis Streams internals. I had a design document for the work I did. I tried to give it to Claude Code and it reproduced my work in, like, 20 minutes or less (mostly because I'm slow at checking and authorizing to run the commands needed).
BTW the last day. I played with Claude to fix the simple things all by himself. Sadly we are on gitlab so I needed to tell him to use glab cli and I needed a little bit more time to setup than GitHub (why do they not support gitlab or other code forges…)
However it is definitely a time saver in these 1-3 line changes. My workflow basically was:
Let the LLM cook by doing the issues one by one. In the meantime I could start reviewing them. Checkout, running, reading.
It was definitely faster since it also correctly linked everything, etc. of course once the change goes beyond that it probably is not working.
However I really thought that a good idea would be to check for that work and implement it according to the issue description and change a Mr once the description changes, at least as long as the Mr is 1-3 lines. And even if it does not work, I can just discard it.
(A lot of these problems are often typos that do not even need a checkout, they come in through bigger Mrs that should not be blocked because of them)
gfody [3 hidden]5 mins ago
> Sure but the OP suggests that these were minor gains
When antirez says 'I ventured to a level of complexity that I would have otherwise skipped,' I don't think you can call that a minor gain. The alternative is likely something 'good enough' that leaves the community dissatisfied for months, and then after initial design mistakes become load-bearing the ideal implementation can never be realized.
zozbot234 [3 hidden]5 mins ago
He writes that right after saying "For high quality system programming tasks you have to still be fully involved". He's just saying that AI was useful to him for tedious special-case tasks (citing the addition of 32-bit support and fishing out bugs in new low-level implementations), that this required starting from a "huge specification" (not a one-shotted prompt!) and that he still had to go over everything with a fine-toothed comb afterwards. That's the farthest thing from the 10x silver bullet AI is now being sold as.
oulipo2 [3 hidden]5 mins ago
Exactly. LLMs are good at "code inpainting": you give them the structures / constraints / specs, and they write the boilerplate.
Then you need a senior to go realize the 100 mistakes it did, fix them, and iterate, which is why you can't replace "natural intelligence"
And there are real mathematical reasons why computers won't be able to break through "mathematical reasoning" on their own (indecidability, etc)
oulipo2 [3 hidden]5 mins ago
...which is VERY, VERY, FAR from "LLMs can automate coding" that people like to say, which is completely false
nl [3 hidden]5 mins ago
That makes the title of another of his posts very ironic then:
> He is not "your avg dev" and it took him 4 months with llm.
To clarify, from TFA:
> even before LLMs the implementation was likely something I could do in four months. What changed is that in the same time span, I was able to do a lot more
The initial timeframe was 4 months, he was able to do more work within the same timeframe with LLMs.
tracker1 [3 hidden]5 mins ago
I would add that the output was likely more as well.. ex: more thorough tests, documentation, etc.
I've been working on a Database adapter for a couple months using an LLM... I've got a couple minor refactors to do still, then getting the "publish" to jsr/npm working... I've mostly held off as I haven't actually done a full review of the code... I've reviewed the tests, and confirmed they're working though. The hard part is there's some features I really want when in Windows to a Windows SQL Server instance that isn't available in linux/containers. I don't think I'll ever choose SQL again, but at least I can use/access a good API with windows direct auth and FILESTREAM access in Deno/Bun/Node.
FWIW: My final implementation landed on ODBC via rust+ffi so after I get the mssql driver out, I'll strip a few bits in a fork and publish a more generic odbc client adapter. using/dispose and async iterators as first class features in the driver.
mlmonkey [3 hidden]5 mins ago
This _is_ the original creator of Redis, and one of the best C coders out there, who writes impeccable C code.
slig [3 hidden]5 mins ago
>He is not "your avg dev" and it took him 4 months with llm.
He's not, but his work is obviously not average.
Average dev work is plumbing and CRUDs.
jareklupinski [3 hidden]5 mins ago
it's honest work
slig [3 hidden]5 mins ago
It is, and LLMs help me a lot doing honest work.
artyom [3 hidden]5 mins ago
Antirez is 100% the creator of Redis. And not only that, it's the kind of mind that you probably only get "a handful each generation".
oulipo2 [3 hidden]5 mins ago
Well that's mostly my point: LLMs are mostly useful now as "code inpainting" / "boilerplate writing" when you have a defined spec
I'm doing my work mostly the same as Antirez is doing, writing detailed spec (which is actually 80% of the hard work, even without LLMs), then where I would have written the "boring stuff" I use the LLM to "autocomplete", and then see all the mistakes (which require being a senior to see / fix), correct, and iterate
It makes the work "feel" easier because we mostly skip writing the boilerplate, but it still doesn't replace coders. And companies that think they will be able to skip training juniors (in order to later replace seniors) and still have seniors onboard are making a huge mistake
thallavajhula [3 hidden]5 mins ago
Salvatore really wants to popularize the term Automatic Programming/Coding it seems. (https://antirez.com/news/159)
beyti [3 hidden]5 mins ago
I keep finding myself to minimize the words to describe the same thing as well, since we are finding ourselves doing "that" operation more and more over time.
maybe shortening the term to "auto-code" would help tho.
zozbot234 [3 hidden]5 mins ago
https://en.wikipedia.org/wiki/Automatic_programming It's an acknowledged term in computer science, describing any mechanism whatsoever of auto-generating code from a description at a higher level of abstraction. Of course LLM's are highly unusual in being non-deterministic and having a surprisingly broad scope, but this does not make the term inapplicable.
jdw64 [3 hidden]5 mins ago
It feels like Redis is becoming a small database, which seems to make it more convenient to use. Could you add more examples that clarify where the boundary should be?
antirez [3 hidden]5 mins ago
Well, Redis is a data structures server, and has very complicated and edgy data structures like the HyperLogLog, so I have very little doubts that a fundamental data type like the Array will fit :) Also the actual complexity added is mostly two C files that are quite commented and understandable.
Sure there are also the AOF / RDB glues, the tests, the vendored TRE library for ARGREP. But all in all it's self contained complexity with little interactions with the rest of the server.
A quick note: if we focus only on that part of the implementation, skipping tests and persistence code which is not huge, 4075 lines in 4 months are an average of 33 lines per day, which is quite low.
jdw64 [3 hidden]5 mins ago
I’m a big fan of your work, and I honestly didn’t expect to receive a reply from you. Thank you.
Also, thank you for pointing out exactly where I was misunderstanding the issue.
In the past, I used Redis for temperature measurements in a smart farm project. I used Hashes back then, but it seems like Array would fit that use case much better.
This looks like a very useful feature. Thank you again for the reply.
antirez: i'm curious, with the final code, have you experimented with effectively one-shotting the final result? i wonder if we can get there with GEPA, and maybe there's something we can learn in how to elicit/prompt these models to get what we want.
or maybe the conclusion is that model providers need to clean up their training data!
nitwit005 [3 hidden]5 mins ago
The use of C stdlib localization functions (toupper, mbrtowc, etc), makes me suspect if there will be some regex behavior differences between systems or locales.
antirez [3 hidden]5 mins ago
Redis sets the locale at startup to avoid issues so should be ok but we will document that for instance è will not match È when nocase is used.
srinikhilr [3 hidden]5 mins ago
Anyone know how to get the specification mentioned in the blog post? Don't see one in the linked PR.
ok123456 [3 hidden]5 mins ago
Is this an apologia since the PR is +22,212 -34?
antirez [3 hidden]5 mins ago
Haha, ~5000 LOC with comments. The rest is tests + TRE code + TRE tests.
gbalduzzi [3 hidden]5 mins ago
Is it possible to see the specification file you created and used for AI assisted development?
Very cool anyway! Can I expect a youtube video about this soon?
antirez [3 hidden]5 mins ago
Yep I will release it, it is a bit out of sync at this point, but will do a pass of updating and will release it.
nojvek [3 hidden]5 mins ago
It’s always a great HN thread when an author of a widely used lib/app engages on a technical level.
antirez - you inspire a generation of devs. Thanks for all you do.
dsecurity49 [3 hidden]5 mins ago
AI is a fantastic co-pilot, but you still need to know how to fly the plane when the edge cases start hitting the fan.
leetrout [3 hidden]5 mins ago
On safari mobile it's a page with the title header and a footer. Theres no content rendering.
antirez [3 hidden]5 mins ago
Checking, thanks. EDIT: works very well on my iPhone, so without being able to reproduce is not easy to fix.
tobr [3 hidden]5 mins ago
Same here, I need to turn off content blockers for the article content to load.
antirez [3 hidden]5 mins ago
I should probably remove the Adsense JS which I don't use anyway...
leetrout [3 hidden]5 mins ago
Oh shoot. Sorry I didn't even think about having a content blocker running on my phone. Sorry for the distraction.
epolanski [3 hidden]5 mins ago
Got few questions:
- the project essentially spans almost 3 different (albeit minor) generations of LLMs. Have you noticed major differences in their personas, behavior, output for that specific use case?
- when using AI for feedback, have you ever considered giving it different "personalities"? I have few skills that role play as very different reviewers with their own different (by design conflicting) personalities. I found this to improve the output, but also to be extremely tiring and to often have high noise ratio.
- when did you, if ever, felt that AI was slowing you down massively compared to just doing it yourself (e.g. some specific bug or performance or design fix)? Are there recurring patterns?
- conversely, how often did AI had moments where it genuinely gave you feedback or ideas that would've not come to you?
- last: do you have specific prompts, skills, setups, etc to work on specific repositories?
antirez [3 hidden]5 mins ago
1. The huge jump from from Opus to GPT 5.3. Game changer. GPT 5.4, 5.5, were better but only incrementally better.
2. Nope I don't give much personalities, but I use subtle prompt differences to maximize certain responses I want, to make the model focusing in a given detail or acting in a specific kind of engineering mindset.
3. It never happened that the AI was slowing me down since I always had the full context and code detail in mind of what was happening. I believe that this happens more when you don't have a clear idea. Also GPT >= 5.3/4 is not the past generation of models, it is very hard to trap it into a situation where it seems unable to understand what you mean.
4. A few times the AI provided fresh insights that I really liked. Most of the times it was the other way around. Certain implementations were written by the AI at a very impressive level of quality.
5. I don't use general skills, I build skills with deep search when needed for specific projects, and build an AGENT.md that works as a knowledge base as I work with the AI. One thing that I use a lot is, when there is a very complex problem, to tell GPT that I have a friend called Machiavelli that is an incredible computer scientist. To write him an email in /tmp/letter.md with the problem we are facing, and I'll try to get a reply. Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply. Often times after I feed back the reply, the agent will be able to see things a lot more clearly.
epolanski [3 hidden]5 mins ago
Thanks a lot for the insights. I like the Machiavelli thing.
> Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply.
Any reason why Codex can't do that?
antirez [3 hidden]5 mins ago
If Pro is the same model (hard to tell, I'm not sure) it has a token budget to think (test time scaling) which is huge compared to the Codex endpoint.
jaunt7 [3 hidden]5 mins ago
In short, Redis can't be trusted any more.
Who is going to do an LLM free fork?
dontdoxxme [3 hidden]5 mins ago
Your comment is not constructive, why can't it be trusted?
If every user of an LLM took this much care and attention, many people would have fewer issues with LLM assisted coding. In this case the author has demonstrated they can write plenty of code without an LLM, so why not use it carefully to benefit their productivity?
I start with a high level design md doc which an AI helps write. Then I ask another AI - whether the same model without the context, or another model - to critique it and spot bugs, gaps and omissions. It always finds obvious in hindsight stuff. So I ask it to summarize its findings and I paste that into the first AI and ask its opinions. We form an agreed change and make it and carry on this adversarial round robin until no model can suggest anything that seems weighty.
I then ask the AI to make a plan. And I round robin that through a bunch of AIs adversarially as well. In the end, the plan looks solid.
Then the end to end test cases plan and so on.
By the end of the first day or week or month - depending on the scale of the system - we are ready to code.
And as code gets made I paste that into other AIs with the spec and plan and ask them to spot bugs, omissions and gaps too and so on. Continually using other AI to check on the main one implementing.
And of course you have to go read the code because I have found it that AI misses polishes.
And I’m not saying that to poke fun at you (my workflow is essentially identical to yours), or at Google, but rather to say that there’s nothing new :)
AI is a fantastic accelerator of effective and ineffective workflows alike. It’s showing us which are effective and ineffective on way shorter timescales / in realtime!
> And of course you have to go read the code because I have found it that AI misses polishes
Since you mentioned using other agents, do you get mileage out of code reviews with another agent polishing the unpolished bits? My colleagues swear by it, though I personally remain skeptical about its value without a human reviewer.
> Then I ask another AI
May be synthesis-antithesis-thesis works better in applied computer science... https://en.wikipedia.org/wiki/Dialectic#Criticisms
Because spotting holes in specs has never been one of my strengths. And working without technical colleagues much of the time, it's a boon to be able to "rubber-duck" my ideas with something that is at least more intelligent than plastic.
Grabbing multipliers from thin air, the coding bit may only be 2x faster with a poorer-quality outcome, but working out what's needed is a good 5x faster.
And yes, I'm using the same adversarial AI MO as @wood_spirit, combined with Matt Pocock's excellent /grill-me and /grill-with-docs skills [1] and Plannotator [2] to review the plans.
1. https://github.com/mattpocock/skills
2. https://github.com/backnotprop/plannotator
Like:
[0] https://csci1710.github.io/2026/ and https://forge-fm.github.io/book/2026/
[1] https://elliotswart.github.io/pragmaticformalmodeling/
[2] https://quint.sh/
Thanks for the links, going to have a read and see if I can apply any to my work.
I haven't been using multiple AIs adversarially as OP, but might consider giving it a try with Codex and Opus. That said, my AI workflow has been pretty similar... lots of iterations on just design, then iterations on documentation, testing, etc... then iterations on implementation, testing, validation and human review in the mix.
My analogy is that it's really close to working with a foreign dev team, but your turnaround is in minutes instead of days, where it's much more interactive.
I feel strong making "dev" documentation though, since it seems a bit redundant/superfluous. I fully suspect nobody is going to read it at this point.
* I can work in code I'm not familiar with much easier
* LLMs often identify confusion or uncertainty upfront, so I can address it earlier.
* I'm much less mentally taxed so I can go for longer at my top end.
* Meetings, disruptions, end of day is WAY less critical since I can lean on the LLM to get back into things.
* I can do something else productive while the LLM is running. Bug fixes, documentation, PR reviews, etc.
To get a quality, lasting, result you're ultimately having to carefully study everything otherwise you end up quickly accumulating cognitive debt and the speedup soon shrinks as you're constantly having to revisit the initial approaches.
2000 lines the sparse array.
2000 lines the t_array commands and upper layer implementation.
~500 lines of AOF / RDB code.
All the other stuff is tests, JSON command descriptions, TRE library under "deps".
If the initial development bar is relatively high, it's far, far easier to identify flaws and gaps when you have the whole thing in front of you all at once.
c.f. valkey and others
I was confused because the last time I checked on things, it was still about fostering community input and advancement but not necessarily consensus. Things have tipped back in the original direction since then. I don't think "Redis was completely built in this way since the start" is completely accurate, but also the community effort under the new governance model never got very deeply entrenched while you were away.
... just speaking as someone who sometimes has to review very long PRs sometimes, though, I feel like 25% is a roughly normal level of "signal to noise." 5,000 lines of core logic is a LOT, and the tests and dependencies do still need to be read.
EDIT: I feel like the problem, as a reviewer, is processing 4 months of intensive research/development and providing useful feedback. At that point, there's probably not much major input you can have into the core architecture or strategy, so you're probably not providing much more than a bugbot at that point.
Sure you can? In this concrete case, Redis is very "flat" — there's the data structure implementations, and there's the commands that use them. 1+N. You could have feedback about the data structure (i.e. whether it's optimal for the use-cases); or about any of the commands (i.e. not just their impls, but also whether they're the best core API surface to lock in long-term, or even whether they're worth including at all.)
Any given feedback would necessitate fairly limited rework to address, as you're either modifying the data structure (and its tests) or a command (and its tests and docs.)
Virtually all major Redis features are a solo job of the post author.
By the way reviewers are paid good money for this and know the setup.
Now I just need a way to protect my chats from any potential discovery, and <pew pew> business’ll be easy.
Then it quickly lost its original meaning as people started using it for virtually all forms of AI-assisted coding.
@antirez: Introducing a regex feature that late into the project for a seemingly unrelated feature feels a bit weird? Can you explain more your rationale on that? thanks!
The RE component is interesting, but as commentary here has noted it seems orthogonal to the array data structure (i.e., usable on others as well). Does this not make more sense to accomplish with Lua scripting? Or if performance of Lua is an issue perhaps abstracting OP to be composable on top of any command that returns a range of values.
I say this with reverence for Antirez as the expert in this space, but some of this new feature set feels like the sort of solution that I tend to see arise from LLM-driven development; namely creation of new functionality instead of enhancement of existing, plus overcomplicating features when composition with others might be more effective.
He is not "your avg dev" and it took him 4 months with llm.
This is not a seal of approval for you to go and command all your developers to move to Claude code/codex/any other ai coding tool fully.
I'm looking at you - any avg CEO of a startup.
This is arguably a key quote: "Then, it was time to read all the code, line by line. ... I found many small inefficiencies or design errors ... so I started a process of manual and AI-assisted rewrite of many modules." We should not underestimate that step: reading code line by line might easily require more time than writing it from scratch.
> For high quality system programming tasks you have to still be fully involved, but I ventured to a level of complexity that I would have otherwise skipped. AI provided the safety net for two things: certain massive tasks that are very tiring (like the 32 bit support that was added and tested later), and at the same time the virtual work force required to make sure there are no obvious bugs in complicated algorithms.
I remain unconvinced by the "faster to write it by hand than read it" arguments though. My experience throughout my career is that most people, myself included, top out at a couple of hundred lines of tested, production-ready code per day. I can productively review a couple of thousand.
In particular, doing direct comparisons between metrics like that doesn't work. "Lines of code" isn't a good way to measure complexity of the code, and the amount of time it takes to review the code will vary quite a bit based on the use case.
There's a lot of diversity in what kind of code people write and just because it worked for someone else doesn't mean it will work for the kinds of problems you solve. It's anecdotal evidence that someone else found it useful, your mileage may vary.
It would absolutely NOT work for production-code with critical concurrency / embedded / real-time stuff
To quote another of his posts:
> I fixed transient failures in the Redis test. This is very annoying work, timing related issues, TCP deadlock conditions, and so forth. Claude Code iterated for all the time needed to reproduce it, inspected the state of the processes to understand what was happening, and fixed the bugs.
...
> In the past weeks I operated changes to Redis Streams internals. I had a design document for the work I did. I tried to give it to Claude Code and it reproduced my work in, like, 20 minutes or less (mostly because I'm slow at checking and authorizing to run the commands needed).
From "Don't fall into the anti-AI hype" https://antirez.com/news/158
Let the LLM cook by doing the issues one by one. In the meantime I could start reviewing them. Checkout, running, reading. It was definitely faster since it also correctly linked everything, etc. of course once the change goes beyond that it probably is not working. However I really thought that a good idea would be to check for that work and implement it according to the issue description and change a Mr once the description changes, at least as long as the Mr is 1-3 lines. And even if it does not work, I can just discard it.
(A lot of these problems are often typos that do not even need a checkout, they come in through bigger Mrs that should not be blocked because of them)
When antirez says 'I ventured to a level of complexity that I would have otherwise skipped,' I don't think you can call that a minor gain. The alternative is likely something 'good enough' that leaves the community dissatisfied for months, and then after initial design mistakes become load-bearing the ideal implementation can never be realized.
Then you need a senior to go realize the 100 mistakes it did, fix them, and iterate, which is why you can't replace "natural intelligence"
And there are real mathematical reasons why computers won't be able to break through "mathematical reasoning" on their own (indecidability, etc)
"Automatic programming"
https://antirez.com/news/159
To clarify, from TFA:
> even before LLMs the implementation was likely something I could do in four months. What changed is that in the same time span, I was able to do a lot more
The initial timeframe was 4 months, he was able to do more work within the same timeframe with LLMs.
I've been working on a Database adapter for a couple months using an LLM... I've got a couple minor refactors to do still, then getting the "publish" to jsr/npm working... I've mostly held off as I haven't actually done a full review of the code... I've reviewed the tests, and confirmed they're working though. The hard part is there's some features I really want when in Windows to a Windows SQL Server instance that isn't available in linux/containers. I don't think I'll ever choose SQL again, but at least I can use/access a good API with windows direct auth and FILESTREAM access in Deno/Bun/Node.
FWIW: My final implementation landed on ODBC via rust+ffi so after I get the mssql driver out, I'll strip a few bits in a fork and publish a more generic odbc client adapter. using/dispose and async iterators as first class features in the driver.
He's not, but his work is obviously not average.
Average dev work is plumbing and CRUDs.
I'm doing my work mostly the same as Antirez is doing, writing detailed spec (which is actually 80% of the hard work, even without LLMs), then where I would have written the "boring stuff" I use the LLM to "autocomplete", and then see all the mistakes (which require being a senior to see / fix), correct, and iterate
It makes the work "feel" easier because we mostly skip writing the boilerplate, but it still doesn't replace coders. And companies that think they will be able to skip training juniors (in order to later replace seniors) and still have seniors onboard are making a huge mistake
maybe shortening the term to "auto-code" would help tho.
A quick note: if we focus only on that part of the implementation, skipping tests and persistence code which is not huge, 4075 lines in 4 months are an average of 33 lines per day, which is quite low.
This looks like a very useful feature. Thank you again for the reply.
or maybe the conclusion is that model providers need to clean up their training data!
Very cool anyway! Can I expect a youtube video about this soon?
antirez - you inspire a generation of devs. Thanks for all you do.
- the project essentially spans almost 3 different (albeit minor) generations of LLMs. Have you noticed major differences in their personas, behavior, output for that specific use case?
- when using AI for feedback, have you ever considered giving it different "personalities"? I have few skills that role play as very different reviewers with their own different (by design conflicting) personalities. I found this to improve the output, but also to be extremely tiring and to often have high noise ratio.
- when did you, if ever, felt that AI was slowing you down massively compared to just doing it yourself (e.g. some specific bug or performance or design fix)? Are there recurring patterns?
- conversely, how often did AI had moments where it genuinely gave you feedback or ideas that would've not come to you?
- last: do you have specific prompts, skills, setups, etc to work on specific repositories?
2. Nope I don't give much personalities, but I use subtle prompt differences to maximize certain responses I want, to make the model focusing in a given detail or acting in a specific kind of engineering mindset.
3. It never happened that the AI was slowing me down since I always had the full context and code detail in mind of what was happening. I believe that this happens more when you don't have a clear idea. Also GPT >= 5.3/4 is not the past generation of models, it is very hard to trap it into a situation where it seems unable to understand what you mean.
4. A few times the AI provided fresh insights that I really liked. Most of the times it was the other way around. Certain implementations were written by the AI at a very impressive level of quality.
5. I don't use general skills, I build skills with deep search when needed for specific projects, and build an AGENT.md that works as a knowledge base as I work with the AI. One thing that I use a lot is, when there is a very complex problem, to tell GPT that I have a friend called Machiavelli that is an incredible computer scientist. To write him an email in /tmp/letter.md with the problem we are facing, and I'll try to get a reply. Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply. Often times after I feed back the reply, the agent will be able to see things a lot more clearly.
> Then I ask GPT 5.5 Pro on the web with extensive reasoning set on. It will take sometimes 30 minutes or more to reply.
Any reason why Codex can't do that?
Who is going to do an LLM free fork?
If every user of an LLM took this much care and attention, many people would have fewer issues with LLM assisted coding. In this case the author has demonstrated they can write plenty of code without an LLM, so why not use it carefully to benefit their productivity?