How the Software Is Made — bennherrera.dev

I wrote this for a team of very smart but very young colleagues at a startup where I worked. The postscript, added in February 2026, addresses its direct relevance to current developments regarding AI.

Brief

In any software project at any stage of development the single attribute most correlated with velocity is the speed with which constructive changes can be made. It does not matter what condition the code base is in. It does not matter how far it needs to go. If your team can iterate quickly they can get it done. The inverse is also true. Regardless of all other factors, high barriers to iteration lead to inescapably low velocity.

Terms and Conditions Apply

Velocity

Velocity is the rate at which high quality features, improvements, and fixes can be incorporated into a project and delivered to its consumers. High velocity allows a team to capitalize on an opportunity within the window of maximum advantage. Low velocity leads to repeated “near miss” and “too little, too late” outcomes.

Constructive Change

A constructive change has three qualities

Makes at least one thing better
Doesn’t make other things worse
Can reasonably demonstrate that 1 and 2 are not dirty lies

Iteration

Development iteration is a nested loop. The inner loop is the IC’s local development work. The middle loop is delivering encapsulated chunks of that work to the project at large with the outer loop being delivery to the consumers.

The Inner Loop

The inner loop is the developer’s local modify → build^† → debug cycle plus whatever local runs of the automated tests they do to validate their work.

The Middle Loop

The middle loop is the process required for getting a body of changes off of a developer’s machine to a context where it affects other people. That means it will be used or built on by others in the course of their work. In most of our projects this is done by merging a PR from a working branch to main, with initial exposure being our internal ‘develop’ environment.

The Outer Loop

The outer loop is the process required for hardening, validation (internal and possibly 1st party), and publication of software for use by the target audience.

And Ever the Twain Must Meet

When balancing speed vs. quality the consequences of error must be weighed against those of tardiness. This is true at each stage where iteration occurs, from a developer’s local edit to shipping a release candidate. Most companies’ ultimate delivery risks fall within the realm of bugs leading to aggravated clients and delays resulting in missed business opportunities. When it comes to catching windows of opportunity, speed is life.

Inner Loop Speed

Human working memory has a limited life span. When the inner loop has waits that exceed it people lose their train of thought. Repeated loss and re-acquisition of context is lethal to velocity. Any repeated machine-executed step that takes over 60 seconds is bad. Complete unit & integration test suites may take several minutes, but there must always be a way to isolate specific tests germane to the current task.

Some points of reference from the engine project

It takes ~60 seconds for a complete scratch bootstrap & build from a fresh clone, ~40 seconds for a complete code rebuild, and 10-25 seconds for an incremental build depending on what changed^‡.
The full integration test suite takes 4 to 5 minutes, but any single test can be run and debugged in isolation.

Middle and Outer Loop Speed

These processes must be efficient, reliable, and flexible in the face of exigent circumstances.

CI Testing

One of the best contributors to greater velocity and reduced stress is the nice, warm security blanket of high test coverage running on CI. Unit tests are generally not enough. Reaching 100% coverage in a manner that is also 100% consistent with deployed or integrated usage is a very high bar that leads to expensive diminishing returns. Integration tests are a great accelerator of reliability. Nothing reals[sic] quite like reality. Testing under running conditions allows for internal validation checks to provide the additional service of driving automated test results as well as deployed runtime safety.

When the tests are comprehensive, PR reviews are much easier. The code is proven to work, so the rest is conversation.

Qualifying “Constructive”

It hardly needs to be said that rapid production of changes that are not fully constructive is wheel spinning at best and self destruction at worst.

Makes at least one thing better
1. Does it do what it says on the tin?
2. Is it a direct improvement or does it set one up?
Doesn’t make other things worse
1. Regressions and side effects are bad
2. Opacity and excessive complexity are bad - impenetrable code is tech debt
  1. There must be some flexibility here, particularly for new systems or features.
  2. When working in a living code base there has to be a reasonable bar for “good enough for this step” to allow continuous improvement.
  3. Improving and refactoring a system over a series of PRs is natural. Requiring that it be maximally right on the first try is highly counterproductive.
3. Excess surface area for the current development context is bad - no system refactors on release candidates
4. Insufficient surface area for the current development context is bad - do not do a targeted hack on a developer-facing branch and move on. Duct tape is bad. Fix the real problem.
5. Uncommunicated breaking changes are bad - busting other people’s workflows is a great path to pariah status.
Can reasonably demonstrate that 1 and 2 are not dirty lies
1. Does it have test coverage?
2. If the project is in early development or high in tech debt^§ does the change include a step toward improving testing & testability
3. Optionally, can the developer provide a list of steps taken to verify the work?
  1. At varying stages of a project it may make much more sense to satisfy #3 by listing the manual tests done in a PR. This would be true for a highly exploratory phase where solidifying rapidly changing state in tests would be wasteful
  2. At other stages it would be grossly negligent. This would be the case for a mature, complex project with lots of subtle gotchas.

“Constructive” is contextual and does not mean “optimal.” When evaluating whether a change qualifies as constructive there is one crucial question: “How does the cost of amending the change compare to the cost of delaying the change?”

Examples:

A PR targeting a release candidate
- High cost of amending the change
  - High net clock/calendar time to cycle a release candidate
  - Requires many person hours, may include steps not fully under internal control (e.g. 1st party submission & acceptance)
  - Late-breaking changes may require special handling upstream, resulting in reputation loss and organizational ‘karmic debt’
  - Could require rollback, delaying feature delivery until the next release cycle
- Lower cost of delaying the change
  - Delays can be up to a significant fraction of the net release cycle turnaround time and still be cheaper
  - During PR review only a developer and a reviewer are involved - much cheaper than spinning a new release candidate.
- Net result: high stringency results in higher velocity and better outcomes
A PR targeting a develop branch or an internal tool
- Low cost of amending the change
  - “You broke it you fix it” - ideally, the developer who broke it amends the error
  - Internal delivery process entirely within the control of a single team
  - Any reputational costs are internal and apply to one developer or team
- High cost of delaying the change
  - It’s tautological that forward movement is a sequence of steps. If you stop marching you don’t get anywhere.
  - Low velocity leads to late delivery^‖ of features, fixes, and maintenance.
  - Late features lead to missed windows of business opportunity.
- Net result: looser stringency results in higher velocity and better outcomes

It is critical to match the level of stringency to the delivery context in order to maximize velocity.

PR Process and Code Review

The purpose of code review is bidirectional knowledge transfer. Proffering a PR is a contract, and the biggest stipulation is “if you make a change, you own the consequences.” Owning the consequences means being on the hook for dropping everything to fix it if you broke it. It also means being responsible for continuing to improve things that start out as functional but rough.

The purpose of a code review is not enforcement or dissection. There should never be a hard technological lockout preventing a developer from merging changes. Stated policy should be enough to guide behavior. There’s an old leadership adage that says “if you have to use your title as a reason you have failed.” The same principle applies to hard merge lockouts. If a team must compel compliance via technology something is wrong.
It is reasonable to be concerned about

A pattern of errors after merging
- Does test coverage and automation need to be prioritized?
Consistent disagreements on whether a change set should be merged
- Does everyone fully understand the consequences of the go/no-go options and the risks thereof?
Lack of follow-up on refinements
- Is the IC overtasked?
Making out-of-policy self merges
- Was it an emergency hot fix?

These indicate a problem outside of the PR process. It may be related to infrastructure, workloads, team cohesion, or IC performance. Hampering iteration as a remedy is a counterproductive bandaid.

The general outline of an efficient PR process:

Contributor does some work
They test their work locally, including running the same tests CI will run
- Expand test coverage if necessary or ensure existing work is already covered
- If the project is in early stages or has high debt the testing automation may be less rigorous than optimal, but they will be on the hook to improve that over time
Contributor opens a PR and documents changes in description
- The project should have a PR description template with a merge requirements checklist
- There should be a ticket associated with the PR and linked in the description
PR gets reviewed
- Reviewer seeks to understand the shape of the changes
- Reviewer calls out hazards, maybe provides additional information
- Reviewer may request changes
  - If a PR is one of a series working on the same topic most changes should be deferred to a subsequent PR.

This process keeps development work flowing, minimizing back and forth with re-reviews. As it is not burdensome, there is much less of a tendency for reviews to be delayed. Additionally, tech debt can be ameliorated incrementally in tandem with feature work in a “clean up as you go along” fashion.

Without going into exhaustive examples, this process is how things are done on the engine project.

All contributors have unrestricted access for pushing their changes.
- Emergencies can be handled by a single developer
Reviews are efficient
- They don’t get put off and piled up
Velocity is high
- Tech debt is managed
- Delivery times correspond primarily to task complexity
The product is solid
- Consistent 4 star consumer ratings

Investing in Speed

Any technical effort invested in improving speed moves an enormous lever. Improving tooling, acquiring utility software, improving network performance, or anything else that lowers round trip time adds up quickly and pays off handsomely. When it comes to investing in speed, it is expensive to be cheap.

Summary

I have three core beliefs. The first two involve the sanctity of human rights and the value of critical thinking. The third is that iteration speed is life. Software is infinitely malleable. If you can change it quickly, you can win with intentionality by making it into what you need in time to capitalize on opportunities. If you can’t change it quickly you can only win by blind luck.

Post Script, February 2026

A great deal has happened since this document was written. The rapid adoption of AI coding has been a flash flood down the canyons of all industries that touch on software. First as “auto-complete on steroids”, then as basic vibe coding tools, then full-blown agentic coding systems. Expectations of productivity gains measured in orders of magnitude drove huge RIFs or allowed pent-up desire for staff reductions to be framed as AI-related. Many companies that were originally tepid on AI adoption saw competitors who appeared to be keeping 100% capability for 25% of the cost. They felt enormous pressure to cut or risk getting left behind to die. It wasn’t just FOMO (fear of missing out), it was FOLD (fear of life or death). Most everyone placed the same bet at the same time.

At first the bets appeared to be paying off. Agentic systems could produce unprecedented volumes of code. They seemed capable of writing their own tests to verify it, and all that was needed of the remaining senior engineers was to give smart directives and babysit. Code was free, patience for new features was a yesterday problem. But the stats started telling a different story. Audits of machine produced code showed disturbing characteristics indicating unhealthy practices, repeated revisions of reviewed & merged PRs for post-hoc fixups, porous test cases. Our industry has spent the last year accumulating tech debt at machine speed after the eager RIFs alienated if not outright incinerated large fractions of irreplaceable institutional knowledge. The expected order of magnitude productivity gain did not materialize. In its place we got a hissing bomb.

This document I wrote for a brilliant but young group of engineers is highly relevant to how we got here. The section titled “Qualifying Constructive” explains it in the first sentence: “It hardly needs to be said that rapid production of changes that are not fully constructive is wheel spinning at best and self destruction at worst.”

Agentic coding as most commonly practiced has made it not just easy, but inevitable to produce a rapid series of changes that do not qualify as fully constructive.

Makes one thing better

Gets the most attention and excitement
Easiest to check, this requirement is usually satisfied

Doesn’t make other things worse

Judging this requires understanding what shapes “worse” could take
LGTM reviews from engineers forced to speed read generated code
Strong signal to merge as many PRs as possible (don’t slow the magic machine!)
This requirement is a frequent miss

Can reasonably demonstrate that 1 and 2 are not dirty lies

Hard to prove, gets less attention (unit and integration tests are peak unsexy)
The conversation that could resolve this is NOT a conversation - it’s a prompt loop with all of the hazards entailed in AI interactions.
Frequently passes process while actually being itself a dirty lie.

Meatloaf may have sung Two Out of Three Ain’t Bad, but even he’d agree one out of three is awful.

Not only do the constructive change criteria go unmet - the implicit contract of the proffered PR becomes a blank document. There’s no one with real visibility of the consequences on the hook for the failure cases. Agents don’t hear from angry product managers or feel the warning message of getting put on a PIP when customer complaints related to their changes pile up.

As an industry, we’re in a hole, and we are still digging. There have been predictions (see links section) of a coming “Reckoning”, and that’s less prophecy than arithmetic. That will be the moment when enough things go enough wrong that everyone collectively realizes their bets lost. Then comes the attempt at a recovery, and to make it through that we’re going to need at least these three things:

Get all the talent that was sent packing back between keyboards and chairs.
Repair the engineering / management trust rift.
Ironically enough, AI coding systems.

The first two are hilariously pure examples of “Simple? Yes. Easy? No.” Let’s address these in order at a little more length.

Recovering the talent and institutional knowledge Much of the knowledge is simply lost to the organizations forever. When the hiring frenzy starts very few will be able to be tempted back to the company that RIFed them barring extraordinary inducements. And those inducements still leave item 2 to deal with. There will be a lot of musical chairs. There will not be enough people capable of and willing to do the recovery work. We’ll touch on this again in a minute.
Repair the trust rift The scale of this problem easily matches that of the code crisis itself. You can induce someone to work for you if you crank up the comp package enough. But that can never make them trust you. This is going to require organizational leadership to take demonstrable action rather than make performative gestures. Like employment contracts with guaranteed minimum terms and buyout clauses for early termination. Like representation at the board level where decisions for 50% RIFs get made. Among most engineers, hearing a sincere admission of error would get more mileage than adding another 10K to the offer. On the engineering side there will need to be people willing to honestly engage in the repair instead of primarily seeking payback. And in the middle there are going to be the unsung heroes of many a company - the technical managers. Technical managers with reputations for integrity, who engender trust, are going to be worth their weight in platinum.

The fact is that the mountain of shifted dirt that needs to go back in the hole is too massive to be moved by organizations split into armed camps and too important to neglect. Imagine trying to pull off the massive, international Y2K remediation project but instead of having well-regarded engineers with a sense of mission you had to get it done with a crew of terminally suspicious, aggrieved mercenaries. We need cohesion to successfully navigate the Recovery. Full stop, no notes.

AI coding systems We’re going to need AI. Even if every engineer could be magically slotted back where they were with no hard feelings, we’d need AI. Too much that is too novel has been created in too short a time. We’re going to have to reshape our use of AI not only during the recovery, but going forward.

We need to use it for what it is good for and have the engineers do what they are good at. There’s been a popular, accurate complaint about how AI gets used: I want a robot that will do my dishes and laundry so I have more time to make art, not the other way around. It turns out that for all the perfect style mimicry of generative AI, it is actually terrible at capital A Art. (see Psychology Side Bar below)

Unlike household applications where “doing my dishes and laundry” require solutions to outrageously difficult and expensive physical problems (hats off, Boston Dynamics), the software engineering equivalent is already possible. There are three things that engineers do out of discipline, not love, and AI can absolutely accelerate them.

Reading reams of novel code
- Generating logic flow and layer diagrams
- Producing natural language descriptions of subsystems
Generating tests and test data
- Taking definitions of risk modes and coverage patterns and grinding out test case code
- Generating test data both perfect and subtly, specifically flawed
- Writing “fuzzer” tests
Updating documentation
- Re-running code reading passes with previous context and new diffs to keep human-facing doc up to date

What this boils down to is using AI for the things all engineers should be doing more of but don’t. It’s more than simple aversion. They frequently are under time pressure to focus on the product work and skimp on the invisible stuff.

In the case of navigating the Recovery, we need AI for these things desperately. There’s too much code to read. Getting overviews will massively accelerate architecture comprehension so engineers can zero in on where the problems are and apply direct code reading where it matters the most. Once the problems are identified and fixed we will want watertight tests to prevent regression, and we’re going to need a lot of them. Finally, we’re going to need to leave documentation for subsequent humans to avoid a repeat of the institutional knowledge loss catastrophe.

The Reckoning is coming. And we have to survive the Recovery. To do that we’re going to need to be on our collective A game. It’s going to take people, it’s going to take processes, it’s going to take cohesive teams. As Tank famously said in The Matrix “We’ve got a lot to do. We gotta get to it.”

Sources

AI Tech Debt
- 4x Velocity, 10x Vulnerabilities: AI Coding Assistants Are Shipping More Risks — Apiiro
- GenAI Code Security Report — Veracode
- AI-Generated Code Leading to Expanded Technical Security Debt — Dark Reading
- AI-Generated Code Creates New Wave of Technical Debt — InfoQ
Reckoning and Recovery
- AI-Led Job Disruption Will Escalate, While Fears Of A Job Apocalypse Are Overstated — Forrester
- Half of AI-Driven Layoffs Will Reverse by 2027 — Gartner (via Metaintro)
- AI Layoffs to Backfire: Half Quietly Rehired at Lower Pay — The Register
- Companies Are Laying Off Workers Because of AI’s Potential—Not Its Performance — Harvard Business Review
- Global Tech-Sector Layoffs Surpass 244,000 in 2025 — Network World
- Tech Layoffs 2025: A Running List — TechCrunch

The current workflows for using agentic systems absolutely suck for the motivational patterns and reward loops that drive talented engineers. Transitioning engineers from powerful makers (positive energy cycle) to anal-retentive micromanagers of fast but untrustworthy agents (soul crushing drain) is a setup for failure.

The use of agentic coding systems to do the building of interesting things is exactly the ‘robots doing art so we can do the chores’ mistake. We don’t need robots to build the cool stuff so we have more time to read someone (something) else’s code and hunt for errors. No engineer likes reading code. It’s a necessary discipline, but it is strictly a toll on the road to entering the glorious cyberspace in our skulls where we manipulate lattices of crystalline logic. Babysitting AI generated code is like taking laps around the toll booth. It’s pulling through, getting a glimpse of the shining city at the other end of the bridge, then turning around to go back and pay the toll again.

Footnotes

† ↩︎ Don’t be smug now, Python people.

‡ ↩︎ Times given Windows/i9 x86_64. On macOS/arm64 it’s around twice as fast.

§ ↩︎ You can’t control the past when joining a project.

‖ ↩︎ Or non-delivery