Software Engineering — Agile, Testing, CI/CD & Tech Debt

Writing Code That Works Is Easy -- Keeping It Working Is the Hard Part

The Mars Climate Orbiter crashed into Mars in 1999 because one team used metric units and another used imperial. Nobody caught the mismatch. Knight Capital lost $440 million in 45 minutes in 2012 because a deployment error activated dead code that started making random stock trades at machine speed. Healthcare.gov collapsed on its 2013 launch day because nobody load-tested it at realistic traffic levels. Therac-25, a radiation therapy machine, killed three patients and injured three more in the 1980s because of a race condition in its software that only occurred when operators typed commands too quickly.

Every one of these disasters was caused by code that "worked" in some environment. It compiled. It passed whatever tests existed. Someone signed off on it. But it failed catastrophically in production because writing code that works once on your machine is not the same discipline as building software that works reliably, at scale, over time, maintained by a team of people who were not the original authors.

That distinction -- between code and software engineering -- is what this topic is about. Software engineering is the discipline of building software that does not fail, or fails gracefully when it does.

$440 million

Lost by Knight Capital in 45 minutes due to a deployment error activating dead code

45 minutes

Time from Knight Capital's bad deployment to bankruptcy-level losses

2 weeks

Typical sprint length in Agile methodology -- the heartbeat of modern teams

80%

Recommended minimum code test coverage for production software

Code vs. Software Engineering

A script you write at 2 AM to rename 500 files is code. Gmail is software engineering. The difference is not complexity -- it is everything around the code itself.

Code solves a problem right now. Software engineering solves it for the next five years, ten developers, and one hundred times the traffic. The additional concerns that turn code into engineering include: testing (does it still work after the last change?), documentation (can a new developer understand it without asking the author?), monitoring (is it running correctly in production right now?), deployment (can you ship a new version without breaking the old one?), security (is it safe from malicious input?), and maintenance (can you fix a bug in the payment system without accidentally breaking the search feature?).

None of these concerns produce visible features. Users never see your test suite, your monitoring dashboard, or your deployment pipeline. They see the result: an application that works reliably, updates without downtime, recovers from errors without losing data, and does not get hacked. The engineering is invisible. Its absence is extremely visible.

Key Insight

Google's internal definition: "Software engineering is programming integrated over time." A line of code might be correct today, but software engineering asks: will it still be correct when the database has 100 million rows instead of 100? When a third developer modifies it without reading the original author's context? When the third-party API it depends on changes its response format? Engineering means building for the future you cannot fully predict.

Agile: How Modern Software Teams Work

Before Agile, the dominant methodology was Waterfall: gather all requirements upfront, design the complete system, build it, test it, ship it. Each phase finishes before the next begins. The problem: in software, you cannot specify everything upfront. Requirements change. Users do not know what they want until they see a working prototype. Technology shifts mid-project. By the time Waterfall delivers a finished product, the requirements may have changed so much that the product is obsolete on arrival.

The Agile Manifesto, written in 2001 by seventeen software developers, proposed a different approach: deliver working software in small increments, get feedback after each increment, and adjust course constantly. Instead of a 12-month plan executed blindly, Agile teams work in short cycles, each producing something usable.

Waterfall

Planning: Everything defined upfront in detailed specification documents. Flexibility: Changes are expensive and resisted -- they require renegotiating the entire plan. Delivery: One big release after months or years of development. Risk: High -- you discover problems at the end when fixing them is most expensive. Feedback: From users only after the entire product is complete. Best for: Fixed-scope projects with stable, well-understood requirements (bridge construction, regulatory compliance systems).

Agile

Planning: High-level vision upfront, detailed planning one sprint at a time. Flexibility: Changes are expected and welcomed -- they are incorporated into the next sprint. Delivery: Working software every 2 weeks. Risk: Lower -- problems surface early when they are cheap to fix. Feedback: From users after every sprint. Best for: Software products where requirements evolve, user needs are discovered iteratively, and market conditions change.

The Sprint Cycle

The most common Agile implementation is Scrum, which organizes work into sprints -- fixed time periods, usually two weeks. Each sprint follows the same cycle.

Sprint Planning (2-4 hours): the team selects work items from a prioritized backlog -- a ranked list of everything the product needs. Each item is written as a user story: "As a customer, I want to filter search results by price, so that I can find products within my budget." User stories keep the focus on user value, not technical implementation.

Daily Standups (15 minutes): every morning, each team member answers three questions. What did I complete yesterday? What will I work on today? What is blocking me? The purpose is not reporting -- it is unblocking. If someone is stuck, the team can help immediately instead of discovering the blockage at the end of the sprint.

Sprint Review (1-2 hours): the team demonstrates the completed work to stakeholders. This is where feedback happens. "The search filter works, but users want to combine multiple filters." That feedback becomes a new item in the backlog for a future sprint.

Retrospective (1-2 hours): the team asks three questions. What went well? What could be improved? What will we commit to changing? This is how the process itself evolves. If code reviews are taking too long, the team might agree to smaller pull requests. If deployments are scary, they might invest in automated testing.

Testing: The Safety Net That Catches What You Missed

Untested code is a liability. Every time you change untested code, you are gambling that your change did not break something. Testing removes the gamble. A well-tested codebase can be modified with confidence because the test suite will scream immediately if something breaks.

Tests are organized into a pyramid, with many small fast tests at the bottom and few large slow tests at the top.

Unit tests test a single function in isolation. Given specific inputs, does the function produce the expected outputs? A unit test for a calculateTax function would verify that a $100 item with 8% tax returns $108. Unit tests are fast (thousands run in seconds), reliable (they test pure logic with no external dependencies), and cheap to write. A project should have hundreds or thousands of them.

Integration tests test how multiple components work together. Does the API endpoint correctly query the database, apply business logic, and return the right response? Integration tests are slower because they involve real databases, network calls, or file systems. They catch bugs that unit tests miss -- like when two correctly implemented components communicate incorrectly.

End-to-end (E2E) tests test the entire application as a user would experience it. A browser automation tool like Playwright or Cypress clicks buttons, fills forms, and verifies that the right things appear on screen. E2E tests catch real user-facing bugs but are slow (minutes per test), brittle (they break when the UI changes), and expensive to maintain. You should have relatively few of them, covering only the most critical user flows -- signup, purchase, core feature usage.

Test-Driven Development (TDD)

TDD inverts the normal workflow: instead of writing code first and tests later, you write the test first, watch it fail, then write the minimum code to make it pass. The cycle is Red (failing test) -> Green (passing test) -> Refactor (clean up the code while keeping tests green). TDD forces you to think about what the code should do before thinking about how to implement it. Not every team practices TDD strictly, but the most disciplined engineers use it for complex business logic where correctness is critical.

Real-World Example

When the Ariane 5 rocket exploded 37 seconds after launch in 1996, the cause was a software integer overflow -- a 64-bit floating-point number was converted to a 16-bit integer, and the number was too large to fit. The specific code had been reused from Ariane 4 without retesting against Ariane 5's different flight trajectory. A single unit test verifying the conversion's range limits would have caught it. The explosion destroyed $370 million worth of satellite payload. Testing is not bureaucracy. It is the difference between software that works and software that explodes.

Code Review: Catching What Tests Miss

Automated tests catch mechanical errors -- wrong calculations, broken logic, failed validations. Code review catches architectural errors, readability problems, security vulnerabilities, and knowledge gaps that no test can detect.

The mechanism is the pull request (PR). A developer finishes a feature on a separate branch, pushes the code, and creates a PR requesting that it be merged into the main codebase. One or more teammates review the changes: reading the diff (the line-by-line differences), checking the logic, suggesting improvements, and eventually approving or requesting changes. Only after approval does the code get merged.

Code review serves four purposes beyond catching bugs. First, knowledge sharing: every reviewer learns how a new part of the system works. If the original author leaves the company, others can maintain their code. Second, consistency: reviewers enforce team conventions -- naming patterns, architectural patterns, error handling approaches. Third, mentorship: senior developers teach junior developers through review comments, explaining not just "change this" but "change this because." Fourth, security: a second pair of eyes catches vulnerabilities that the author was too close to see -- SQL injection, missing authentication checks, leaked secrets.

The most important principle of code review: reviews are collaboration, not criticism. "This function is doing too much -- could we split it into two smaller functions?" is productive. "This code is bad" is not. Teams that treat reviews as adversarial end up with developers who avoid submitting code for review, which defeats the purpose entirely.

DevOps and CI/CD: From Push to Production

In the old world, "development" and "operations" were separate departments. Developers wrote code and threw it over the wall to operations, who figured out how to deploy and run it. DevOps merges these responsibilities: the people who build the software also deploy, monitor, and operate it. "You build it, you run it" is the DevOps philosophy.

The technical implementation of DevOps is the CI/CD pipeline -- an automated sequence of steps that takes code from a developer's push to production without manual intervention.

Continuous Integration (CI) means every developer merges their code into the main branch frequently -- at least once a day. Each merge triggers automated tests. If tests fail, the merge is blocked. This prevents the nightmare scenario of five developers working in isolation for three weeks, then spending another week resolving merge conflicts and broken integrations.

Continuous Deployment (CD) means that code which passes all automated checks is automatically deployed to production. No manual "deploy" button. No scheduled release windows. Amazon deploys new code to production every 11.7 seconds on average. This sounds risky, but the logic is sound: small, frequent changes are safer than large, infrequent ones. If you deploy once and something breaks, you know exactly which small change caused it. If you deploy 200 changes at once, finding the broken one is a nightmare.

The mantra that captures this philosophy: "If deploying feels scary, you are not deploying often enough."

Technical Debt: Shortcuts That Compound

Technical debt is the accumulated cost of shortcuts and compromises in a codebase. Like financial debt, it is sometimes worth taking intentionally, but it always accrues interest. The interest comes in the form of slower development speed, more bugs, and harder maintenance.

Examples of technical debt: hard-coding values that should be configurable. Copying and pasting code instead of creating a reusable function. Skipping tests to meet a deadline. Using a quick hack instead of a proper solution. Each of these saves time now but costs more time later -- every developer who touches that code in the future must work around the shortcut, or must fix it before they can make their own change.

The dangerous thing about technical debt is that it is invisible to non-engineers. The product looks the same. Features still ship. But velocity gradually slows as developers spend more time navigating around accumulated hacks and less time building new features. A codebase with heavy technical debt feels like running through mud -- every step takes twice the effort.

Key Insight

Refactoring is the process of paying down technical debt: restructuring existing code without changing its behavior. Renaming a confusingly named variable. Splitting a 500-line function into smaller, focused functions. Replacing a brittle hack with a proper implementation. The "boy scout rule" says: leave the code cleaner than you found it. If every developer improves one small thing in every file they touch, the codebase improves continuously without dedicated "refactoring sprints" that never get prioritized.

Architecture Decisions: The Choices That Shape Everything

Some decisions in software engineering are easily reversible -- choosing a CSS framework, picking a state management library, structuring a folder layout. Others are extremely expensive to reverse. These are architecture decisions, and they deserve deliberate thought and documentation.

Monolith vs. microservices. Relational database vs. document store. Synchronous communication vs. event-driven architecture. Build a feature in-house vs. buy a third-party service. Each of these decisions has cascading consequences for years. Choosing a document database for data that turns out to be heavily relational means either living with inefficient queries forever or migrating the entire data layer -- a project that can take months.

Architecture Decision Records (ADRs) are short documents that record what was decided, what alternatives were considered, and why the chosen option won. They are not bureaucracy -- they are context preservation. When a developer in two years asks "why did we build this ourselves instead of using a third-party service?", the ADR explains the reasoning. Without it, the only answer is "nobody remembers," which often leads to relitigating the same decision.

Identify decision

→

List alternatives

→

Evaluate tradeoffs

→

Decide + document

→

Revisit if context changes

Answers to Questions People Actually Ask

Is software engineering the same as programming? Programming is a core skill within software engineering, like swinging a hammer is a core skill within carpentry. But carpentry also involves reading blueprints, selecting materials, estimating timelines, and ensuring the structure meets building codes. Software engineering similarly encompasses testing, architecture, deployment, collaboration, and maintenance. You can be a programmer without being a software engineer. You cannot be a software engineer without being a programmer.

Do I need a computer science degree? For software engineering roles at most companies: no. Self-taught developers, bootcamp graduates, and career changers fill engineering roles at companies of every size, including major tech companies. What you need is demonstrable skill: a portfolio of projects, the ability to pass technical interviews, and ideally some open-source contributions or professional experience. A CS degree helps (it provides theory that self-taught developers often lack), but it is neither necessary nor sufficient. Some of the strongest engineers have no degree, and some degree holders cannot build anything.

How do teams decide what to build next? In well-run teams, a product manager maintains a prioritized backlog based on user research, business metrics, and strategic goals. Engineers provide input on technical feasibility and effort estimation. The final priority is typically based on impact (how many users benefit or how much revenue is generated) divided by effort (how many developer-weeks the feature requires). Features with high impact and low effort ship first. Features with low impact and high effort often never ship -- and that is the right decision.

What is "10x engineer" and is it real? The myth is that some engineers are ten times more productive than average. The reality is more nuanced. Individual coding speed varies by 2-3x at most. But some engineers make the entire team more productive: they design systems that prevent bugs, they write documentation that saves everyone time, they review code in ways that level up junior developers, and they make architecture decisions that avoid months of wasted work. The "10x engineer" is real, but the multiplier comes from leverage, not personal speed.

What about formal methods and proofs?

Formal verification uses mathematical proofs to guarantee that software behaves correctly -- not just testing that it works for known inputs, but proving it works for all possible inputs. NASA, Intel, and Amazon Web Services use formal methods for safety-critical and infrastructure software. AWS uses TLA+ (a formal specification language) to verify the correctness of core distributed systems like S3 and DynamoDB. The trade-off: formal verification is expensive, requiring specialized expertise and significantly more development time. For most applications, testing provides sufficient confidence at lower cost. For systems where failure means lost lives or lost billions, formal methods are worth the investment.

What is "Infrastructure as Code" (IaC)?

Instead of manually configuring servers through a web console (clicking buttons to set up a database, configuring firewall rules by hand), Infrastructure as Code defines your entire server infrastructure in code files. Tools like Terraform, Pulumi, and AWS CloudFormation let you describe your infrastructure -- servers, databases, networks, load balancers -- in configuration files that are version-controlled like any other code. The benefit: your infrastructure is reproducible (spin up an identical copy for testing), auditable (review infrastructure changes through pull requests), and recoverable (if a server dies, recreate it from the code in minutes). IaC transformed operations from a manual craft into an engineering discipline.

The takeaway: Software engineering is the difference between code that works on your laptop and systems that work reliably in production for millions of users. Its practices -- Agile planning, automated testing, code review, CI/CD, architecture documentation, and technical debt management -- exist because the industry learned from decades of catastrophic failures what happens without them. These practices are not overhead. They are the reason modern software is as reliable as it is. Learning to write code takes months. Learning to engineer software takes years. The investment pays for your entire career.

Software Engineering: From Prototype to Production