Module 11 · Quality · All tracks

Testing

Testing isn't about chasing 100% coverage — it's about buying confidence to change code fast. This module frames testing as risk management, which is how strong engineers talk about it.

⏱ 60 min deep read 🎯 10 sections 📊 1 diagram

By the end you'll be able to explain, with conviction:

The test pyramid and why its shape matters.
TDD, test doubles, and the right tool for each test level.
What "good coverage" really means — and why the number lies.

1The test pyramid

A simple shape that encodes a deep truth about where to invest testing effort.

flowchart TB E2E[End-to-End · few · slow · expensive] --- INT[Integration · some] INT --- UNIT[Unit · many · fast · cheap]

Many fast unit tests at the base, fewer slow end-to-end tests at the top.

The pyramid says: have many unit tests (fast, cheap, isolated), fewer integration tests, and few end-to-end tests (slow, brittle, expensive). The reasoning is economic — unit tests give you the most confidence per second of runtime, so they should dominate, while e2e tests are precious but costly and flaky, so you use them sparingly for critical paths.

⚠ Common trap

The "ice-cream cone" anti-pattern — mostly slow e2e tests and few unit tests — gives a suite that's slow, flaky, and painful to maintain. If your CI takes 40 minutes and tests fail randomly, you've inverted the pyramid.

2Unit vs integration vs e2e

The levels differ in how much they exercise at once:

Unit — one piece (a function/class) in isolation, dependencies faked. Pinpoints exactly what broke; milliseconds to run.
Integration — several components together (e.g. service + real database), verifying they collaborate correctly. Catches the wiring bugs unit tests can't.
End-to-end — the whole system through the user's eyes (UI → API → DB). Highest confidence that a real journey works, but slowest and most fragile.

The senior insight is the tradeoff: as you climb, confidence rises but speed, isolation, and stability fall. Each level catches bugs the others miss, so you want a deliberate mix — not all of one kind.

💬 Interview angle

"Each level trades speed for realism. Unit tests are fast and pinpoint failures; integration tests catch wiring bugs between components; e2e tests prove a real user journey but are slow and flaky. I weight toward unit tests and reserve e2e for the critical happy paths."

3TDD & BDD

Test-Driven Development inverts the usual order: write a failing test first, write the minimum code to pass it, then refactor — the red-green-refactor loop. The benefits aren't just coverage: writing the test first forces you to design the interface from the caller's perspective and guarantees the code is testable. The discipline produces a comprehensive safety net as a byproduct.

Behavior-Driven Development raises TDD to the language of behaviour, using Given-When-Then scenarios that non-engineers can read (tools like Cucumber). The point is shared understanding — the test doubles as a specification the whole team, including product, agrees on. TDD is about how you build; BDD is about aligning on what to build.

💬 Interview angle

"TDD's real value isn't coverage — it's that writing the test first forces a clean, testable interface and gives me a safety net to refactor fearlessly. BDD lifts that into Given-When-Then language so product and engineering agree on the behaviour before I build it."

4Test doubles — mocking, stubbing, faking

To test a unit in isolation, you replace its real dependencies with test doubles. The distinctions are a favourite precision check:

Stub — returns canned answers to calls (e.g. "this method always returns user 42"). Used to control state.
Mock — a stub that also verifies interactions ("was save() called exactly once?"). Used to assert behaviour.
Fake — a lightweight working implementation (e.g. an in-memory database) — real logic, just not production-grade.

⚠ Common trap

Over-mocking makes tests assert how code works instead of what it produces, so they break on every refactor and stop catching real bugs — they just mirror the implementation. Prefer testing observable behaviour; mock only true external boundaries (network, time, third parties).

5UI automation — Selenium vs Playwright vs Cypress

These automate a real browser to test the UI end-to-end. Know them at a glance: Selenium is the veteran — broad language and browser support, but older and more flake-prone. Cypress runs inside the browser with a great developer experience and time-travel debugging, but is more frontend-focused. Playwright (Microsoft) is the modern favourite — fast, cross-browser, with auto-waiting that kills much of the flakiness.

The deeper point transcends the tool: UI tests are inherently the most brittle, so you keep them few (the pyramid), target stable selectors (data-testid, not CSS classes), and lean on auto-waiting over hard-coded sleeps. Showing you know why UI tests flake — and how to reduce it — matters more than naming a favourite.

6API testing

API tests sit in the sweet spot: more realistic than unit tests, far more stable and fast than UI tests, because they skip the brittle browser layer and hit the contract directly. You verify status codes, response shape, error handling, auth, and edge cases at the HTTP boundary — tools like Postman for exploration, and code (REST-assured, supertest, requests) in CI.

Because they exercise real integration without UI fragility, API/service tests are where a lot of teams concentrate their integration coverage — a pragmatic refinement of the pyramid. Mentioning contract testing (Pact) — verifying that a producer and consumer agree on the API shape — is a sharp, senior addition for microservice contexts.

7Performance testing

Performance testing measures behaviour under load, and the vocabulary distinctions are the interview substance:

Load testing — expected traffic; does it meet latency/throughput targets?
Stress testing — push past limits to find the breaking point and see how it fails.
Soak testing — sustained load over hours to expose leaks and slow degradation.
Spike testing — sudden surges, e.g. a flash sale.

Report results in percentiles, not averages — p95/p99 latency reveals the tail of slow requests that an average hides. "Average response time looks fine but p99 is 4 seconds" is exactly the kind of insight that signals you understand real user experience.

💬 Interview angle

"I distinguish load, stress, soak, and spike testing, and I always look at p95/p99 latency rather than averages — the tail is where real users feel pain, and averages hide it completely."

8Security testing — SAST & DAST

Two complementary approaches, tied back to Module 05. SAST (Static Application Security Testing) analyses source code without running it — catching issues like injection patterns or hardcoded secrets early, right in the pipeline ("shift left"). DAST (Dynamic) tests the running application from the outside, like an attacker, probing for exploitable vulnerabilities the code alone won't reveal.

The clean framing: SAST reads the code; DAST attacks the app — you want both, at different pipeline stages. Add SCA (Software Composition Analysis — scanning dependencies for known CVEs) and you've covered the three automated security checks every mature pipeline runs.

9Bug lifecycle, severity vs priority

A bug flows through a lifecycle: New → Assigned → In Progress → Fixed → In Test → Verified → Closed (or Reopened). Knowing it shows you've worked a real tracker (Module 02). The distinction that trips people up is the important one:

Severity — technical impact of the bug (crash vs cosmetic). Set by engineering/QA.
Priority — business urgency to fix. Set by product.

They're independent: a typo in the company name on the homepage is low severity but high priority; an obscure crash in a feature nobody uses is high severity, low priority. Being able to give that cross-example is the senior tell.

💬 Interview angle

"Severity is technical impact; priority is business urgency, and they're independent. A misspelled brand name on the homepage is low severity but high priority — cosmetic, yet it must ship today. That decoupling is how teams triage sensibly."

10What "good test coverage" means

Code coverage measures the percentage of code executed by tests — useful as a signal, dangerous as a target. The trap: 100% coverage proves every line ran, not that anything was asserted correctly. You can have 100% coverage with tests that check nothing.

Good coverage is about testing the right things: critical business logic, edge cases, error paths, and boundaries — not trivial getters chased for a number. The mature stance: coverage is a conversation-starter ("this critical module has 20% — why?"), and you'd rather have 70% coverage of meaningful behaviour than 95% of trivia. The real goal is confidence to change code safely, which is what testing actually buys you.

⚠ Common trap

Mandating a coverage number (e.g. "90% or the build fails") drives gaming — developers write assertion-free tests to hit the gate. Coverage is a flashlight for finding untested risk, never the goal itself (Goodhart's law again).

Recap — what you can now teach

The pyramid: many unit, fewer integration, few e2e — maximise confidence per second.
TDD forces testable design and fearless refactoring; BDD aligns the team on behaviour.
Stub for state, mock for interaction, fake for a working stand-in — but don't over-mock.
UI tests are brittle (keep few, stable selectors); API tests are the stable integration sweet spot.
Report performance in p95/p99; SAST reads code, DAST attacks the app, SCA scans deps.
Severity ≠ priority; coverage is a signal, not a target — the goal is confidence to change.

Self-check

Say each answer out loud before revealing it.

Why is the test pyramid shaped that way?

Mock vs stub?

Why report p99 rather than average latency?

Give an example where severity and priority diverge.

Why is 100% coverage not the goal?

Next module → 12 · Frontend