Module 09 · Delivery · All tracks

Build · CI/CD · DevOps

DevOps is a culture before it's a toolchain: shrink the distance between writing code and running it safely in production. This module covers both the philosophy and the machinery.

⏱ 75 min deep read 🎯 11 sections 📊 3 diagrams

By the end you'll be able to explain, with conviction:

CI vs CD vs CD — and why each shortens the feedback loop.
Deployment strategies that ship without downtime or big-bang risk.
What Docker and Kubernetes actually solve, and the three pillars of observability.

1What CI and CD really mean

Three acronyms, often blurred. Keeping them distinct is the first signal of fluency.

Continuous Integration — developers merge small changes to a shared main branch frequently, and every merge triggers an automated build and test run. The goal is catching integration problems within minutes, not at a painful end-of-sprint merge. CI is the foundation everything else stands on.

Continuous Delivery — every change that passes CI is automatically prepared for release, so you could deploy at any moment with a button press. Continuous Deployment goes one step further: passing changes deploy to production automatically, no human gate. The throughline is the same as Agile (Module 02): shrink the feedback loop between writing code and learning whether it works in production.

💬 Interview angle

"CI is automatically building and testing every merge so integration issues surface in minutes. Continuous Delivery keeps the app always releasable with one click; Continuous Deployment removes even that click. It's all one idea — shorten the loop from commit to running software."

2Build tools per stack

A build tool turns source code into a runnable, distributable artifact — compiling, resolving dependencies, running tests, and packaging. You should be able to name the standard one per ecosystem rather than detail any: Maven/Gradle (Java), npm/yarn/pnpm + Vite/webpack (JavaScript), pip/Poetry (Python), Cargo (Rust), go build (Go).

The conceptual point that matters: a build must be reproducible. Pinned dependency versions (a lockfile) mean the same source always produces the same artifact, on any machine — which is what makes CI trustworthy. "Works on my machine" is precisely the disease reproducible builds cure.

3Anatomy of a pipeline

A CI/CD pipeline is an automated assembly line from commit to production. Each stage is a gate — fail one, and the change stops before it can do harm.

flowchart LR C[Commit] --> B[Build] B --> T[Unit tests] T --> Q[Lint · SAST · scan] Q --> A[Package artifact] A --> ST[Deploy to staging] ST --> IT[Integration tests] IT --> PR[Deploy to prod]

Cheap, fast checks run first; expensive ones later — so failures surface as early and cheaply as possible.

The ordering principle is "fail fast and cheap": run the quick unit tests and linters before the slow integration tests and deploys, so a broken commit is rejected in seconds, not after a 20-minute deploy. Security scanning (SAST, dependency checks) belongs in the pipeline too — "shift left" means catching problems early, where they're cheapest to fix.

4Artifact repositories

Once the pipeline builds something — a JAR, a Docker image, an npm package — that artifact needs a versioned home so the exact same build can be deployed everywhere. That home is an artifact repository: Docker registries (Docker Hub, ECR), or tools like Artifactory and Nexus for language packages.

The principle is build once, deploy many: you build the artifact a single time, then promote that identical binary through staging to production. You never rebuild per environment — rebuilding risks subtle differences, defeating the whole point. The artifact you tested is the artifact you ship.

5Deployment strategies

How you release decides your risk and downtime. The three to know cold:

Rolling — replace instances in batches; no downtime, but two versions run side by side briefly, and rollback is slow.
Blue-Green — run two identical environments; deploy to the idle one (green), test it, then flip all traffic over. Instant rollback (flip back), but you pay for double the infrastructure.
Canary — release to a small slice of users first, watch metrics, then gradually ramp to 100%. Lowest blast radius — a bad release hurts 1% before you catch and abort it.

flowchart LR LB[Load Balancer] -->|95%| V1[v1 stable] LB -->|5% canary| V2[v2 new] V2 -.metrics ok? ramp.-> LB

Canary: a small percentage hits the new version while you watch metrics, then you ramp or roll back.

💬 Interview angle

"I match the strategy to risk. Rolling for routine low-risk changes, blue-green when I want instant rollback and can afford duplicate infra, canary for risky changes — release to 1–5% of users, watch error and latency metrics, then ramp or abort. The goal is a small blast radius."

6Feature flags

Feature flags decouple deployment from release. Code ships to production behind an off switch, then you turn it on for some or all users without another deploy. This is what makes trunk-based development (Module 08) safe — unfinished features hide behind a flag while their code is continuously integrated.

The benefits compound: instant kill switch if something breaks, gradual rollout (the application-level cousin of canary), A/B testing, and decoupling a marketing launch date from an engineering deploy. The cost is flag debt — stale flags must be cleaned up, or the codebase rots into a maze of dead conditionals.

⚠ Common trap

Flags are not free — every flag is a branch in your code and your testing matrix. The mature habit is treating them as temporary and removing them once a feature is fully rolled out. "We have hundreds of permanent flags" is a smell, not a brag.

7Infrastructure as Code

IaC means defining your infrastructure — servers, networks, databases, permissions — in version-controlled code rather than clicking through a console. Tools like Terraform (declarative, multi-cloud) and CloudFormation let you apply a definition and get exactly that infrastructure, repeatably.

The wins are enormous: environments are reproducible (staging truly matches prod), changes are reviewed and audited through Git like any code, and disaster recovery becomes "re-run the script." A key idea is declarative + idempotent — you describe the desired end state, and the tool figures out the diff to get there, so re-running is safe. This is the antidote to fragile, hand-built "snowflake" servers nobody can reproduce.

💬 Interview angle

"IaC puts infrastructure in version control — declarative and idempotent, so I describe the desired state and Terraform reconciles to it. That gives me reproducible environments, peer-reviewed infra changes, and disaster recovery that's just re-running the code instead of remembering what someone clicked."

8Docker — the mental model

A container packages an application with all its dependencies into one portable unit that runs identically anywhere. It's the definitive cure for "works on my machine" — the machine comes with the app.

The crucial distinction from VMs: a VM virtualises hardware and runs a full guest OS (heavy, minutes to boot); a container virtualises the OS and shares the host kernel (lightweight, starts in milliseconds). That efficiency is why containers, not VMs, became the unit of modern deployment. A Dockerfile declares how to build an image (the template); a running instance of an image is a container.

💬 Interview angle

"A container bundles the app with its dependencies so it runs identically everywhere. Unlike a VM it virtualises the OS and shares the host kernel — so it's far lighter and starts in milliseconds. The image is the template; the container is a running instance of it."

9Kubernetes — what it solves

One container is easy; running hundreds across many machines is not. Kubernetes is a container orchestrator — it solves the problems that appear at scale: scheduling containers onto nodes, restarting crashed ones (self-healing), scaling replicas up and down with load, rolling out new versions, service discovery, and load balancing between them.

flowchart TB U["Desired state: 5 replicas"] --> CP[Control Plane] CP --> N1[Node · pods] CP --> N2[Node · pods] CP -.watches and reconciles.-> CP

You declare the desired state; the control plane continuously reconciles reality toward it.

The core idea worth landing is declarative, self-healing reconciliation: you tell Kubernetes "I want 5 healthy replicas," and it constantly works to make that true — replacing failures, rescheduling, scaling. It's the same declarative philosophy as IaC, applied to running workloads. The honest caveat: it's powerful but complex, and plenty of teams don't need it.

10Observability — logging, metrics, tracing

You can't operate what you can't see. Observability rests on three pillars, and naming them precisely is a strong signal:

Logs — timestamped records of discrete events. Great for "what exactly happened here?" Structured (JSON) logs are queryable.
Metrics — numeric time-series (CPU, request rate, error rate, latency). Great for dashboards and alerting on trends.
Traces — the path of a single request across services, with timing at each hop. Essential in microservices for finding where latency lives.

The distinction that matters: monitoring tells you that something is wrong (known questions, dashboards, alerts); observability lets you ask why — investigating novel problems you didn't predict. Add the four "golden signals" (latency, traffic, errors, saturation) and you sound like someone who's been on-call.

11Incident response & post-mortems

How a team handles outages reveals its maturity. During an incident: detect (alerts fire), declare and assign roles (an incident commander coordinates), mitigate first (stop the bleeding — roll back, flip a flag — before root-causing), then resolve. Speed of mitigation matters more than elegance.

Afterward comes the blameless post-mortem: a written analysis of what happened, the timeline, the root cause, and concrete action items — focused on systems and process, not individuals. The premise is that good people make mistakes in flawed systems, so you fix the system. Two key metrics frame reliability: MTTR (mean time to recovery) and MTBF (mean time between failures).

💬 Interview angle

"In an incident I mitigate before I root-cause — roll back or flip a flag to stop the bleeding, then investigate. Afterward I'd run a blameless post-mortem that fixes the system, not the person, with concrete action items. The goal is lowering MTTR, because failure is inevitable; slow recovery isn't."

Recap — what you can now teach

CI = auto build/test every merge; CD keeps it always releasable (delivery) or auto-deploys (deployment).
Pipelines fail fast and cheap; build once, deploy the same artifact many times.
Rolling / blue-green / canary trade infra cost and rollback speed for blast radius.
Feature flags decouple deploy from release; IaC is declarative, idempotent, version-controlled infra.
Containers virtualise the OS (light); Kubernetes does self-healing declarative orchestration.
Observability = logs, metrics, traces; incidents → mitigate first, then a blameless post-mortem.

Self-check

Say each answer out loud before revealing it.

Continuous Delivery vs Continuous Deployment?

Why "build once, deploy many"?

How does a container differ from a VM?

What core problem does Kubernetes solve?

Name the three pillars of observability.

Next module → 10 · Cloud Fundamentals