Prompt Engineering for Code: A Practical Discipline

A prompt that produces working code is engineered, not improvised. The improvised prompt feels faster in the moment and costs you twice the time over the next thirty minutes, because the agent will produce something plausible-but-wrong, and you will spend the saved minutes wrestling its output into a shape your codebase actually accepts.

The core claim of this page is opinionated and worth stating up front: prompt engineering for code is a real engineering discipline, not a vibe and not a folk art. It has structure. It has patterns that work and patterns that fail. It has measurable trade-offs between specificity and effort. Treating it as a craft -- the same way you would treat writing tests or designing schemas -- is the difference between a coding agent that compounds your output and a coding agent that produces a stream of near-misses you have to clean up by hand.

This page lays out the discipline in concrete terms. What a good code prompt is composed of, why specificity beats brevity, when to add examples, how to iterate without losing the thread, how to decompose work into steps the agent can actually execute, what the recurring failure modes look like, and a handful of templates worth memorizing. The whole page is built around prompts you would actually send to Claude Code, Cursor, Copilot, or Codex CLI in a working session, not theoretical formulations.

~30 min

Time invested in writing solid templates pays back over the next thirty features

Components every working code prompt has -- context, intent, constraints, expected output

~2x

Time penalty for an under-specified prompt, measured against an engineered one for the same task

What Prompt Engineering for Code Actually Means

Prompt engineering for code is not magic phrases. It is not "you are an expert senior engineer with twenty years of experience" preambles, which do nothing for current models and read as folklore from 2023. It is not the search for the one secret prompt that unlocks the agent. The agent does not have a hidden setting that the right phrase activates. The agent has the same capability whether you call it an expert or not, and the productivity difference comes from somewhere else entirely.

The actual definition: prompt engineering for code is the discipline of compressing context, intent, and constraints into instructions an agent can act on. It is engineering in the literal sense -- you are constructing a working artifact (the prompt) from materials (your codebase knowledge, the goal, the rules) under constraints (the agent's context window, the cost of iteration, the standards of your project). The artifact either produces useful output or it does not, and the difference is in how it was constructed, not in the magic incantations sprinkled on top.

Every prompt that produces working code, in any tool, has four components. They do not all need to be long, but they all need to be present. If one is missing, the agent will fill the gap with its own guess, and the guess is statistically median. Median is not the same as good.

Context -- what the codebase is and what state it is in

What language and framework, what conventions are in use, where the relevant files live, what shape the code is in right now. The agent needs enough context to know it is working inside your project rather than scaffolding a generic example. In tools like Claude Code or Cursor in agent mode, the agent reads files itself, but it still needs you to point at the right ones. Without context, the agent guesses.

Intent -- what should change and why

The actual goal of the work. Not "build a feature" but "add a contact form that posts to /api/contact and shows a success state on submit." The "why" matters because it disambiguates trade-offs. "Make this faster because the page hits a 600ms cold start" leads to different code than "make this faster because the dev server reload is slow." The agent picks better solutions when it knows the constraint that motivates the change.

Constraints -- the rails the agent has to stay between

Tech you do or do not want to use, conventions to follow, things to avoid. "Use the existing Form primitives, do not introduce a new validation library, no new dependencies." The constraint section is where you anchor the output to your codebase rather than to the median of the training data. Skip this and the agent will reach for the most popular pattern in the language, which is rarely the most popular pattern in your project.

Expected output -- the shape of the result

File paths to touch, the shape of the commit, whether tests are required, whether you want a summary at the end. "Edit src/components/ContactForm.tsx and src/app/api/contact/route.ts; add a test in __tests__/ContactForm.test.tsx; do not touch unrelated files." This is the contract for what "done" means, and it heads off the failure where the agent does roughly the right thing in roughly the wrong place.

The discipline is constructing those four parts efficiently, not necessarily verbosely. A good code prompt for a small change can be five lines. A good code prompt for a large change might be two paragraphs. The size scales with the work, not with the engineer's anxiety, and the practitioners who get good at this learn to write the smallest prompt that contains all four parts.

Specificity Beats Brevity

The most common mistake in prompt engineering for code is treating brevity as a virtue. Short prompts feel professional, the way a senior engineer might give a one-line instruction to a teammate who already has the context. But the agent does not have the context. The agent has whatever context you fed it in this session and whatever context it can read from the files. A short prompt with no context produces output that is short on specifics and long on generic patterns.

The trade-off is not "short vs long." It is "vague vs specific." A long vague prompt is worse than a short specific one. A short specific prompt is better than a long vague one. The win is in the specifics, regardless of length. And specifics come from naming things directly -- file paths, function names, exact behaviors, exact constraints -- not from adding adjectives.

Bad prompt -- vague and improvised

"Build me a login form. It should be modern and have good validation."

The agent has no idea what framework you are using, where the file should go, what auth backend exists, what validation library is in use, or what "modern" means in your context. The output will be a generic React component using a popular form library that is probably not the one you use, posting to an endpoint that does not exist, with styles that do not match your design system. You will spend twenty minutes converting it.

Good prompt -- specific and engineered

"Add a sign-in form at src/app/(auth)/sign-in/page.tsx. Use the existing useAuth hook in src/lib/auth.ts (it expects email and password and returns a Promise). Match the styling pattern in src/app/(auth)/sign-up/page.tsx. Email and password are required; show inline errors with the existing FormError component. On success, router.push to /dashboard. No new dependencies. Add a test in __tests__/sign-in.test.tsx that covers the empty-state and error cases."

The agent has the file path, the existing helpers, the styling reference, the validation expectations, the success behavior, and the test contract. The output is correct on the first turn, and you spend two minutes reviewing instead of twenty minutes converting.

The reason specificity wins is mechanical. Models do not invent the unspecified parts; they fill them with the statistically most common option from training. For most decisions in software, the statistically most common option is "what a beginner tutorial would do." That is sometimes correct and often wrong. By naming the specifics, you replace a generic guess with a specific instruction, and the output stops being a generic example and starts being your codebase's next file.

A useful internal test, before you send a prompt: read it back and ask "if a competent stranger received this prompt with no other context, would they produce something that fits this codebase?" If the answer is no, the prompt is under-specified, and the agent's output will be wrong in the same way the stranger's would be. The agent is not better than the stranger at filling in your local context. It just types faster.

# A real prompt that ships working code on the first turn.
Goal: add a "delete account" flow.

Context:
- Next.js App Router app, files at src/app/.
- Auth via NextAuth, helper at src/lib/auth.ts.
- Existing settings page at src/app/(account)/settings/page.tsx.
- DB via Prisma; user model in prisma/schema.prisma.

Intent:
- Add a "Delete account" section at the bottom of the settings page.
- On confirm, call POST /api/account/delete, which removes the user and signs them out.

Constraints:
- Reuse the existing Button and Modal primitives from src/components/ui.
- Use the confirm-modal pattern from src/app/(account)/profile/page.tsx (the avatar removal flow).
- No new dependencies. No client-side toast library beyond the existing useToast.
- Wrap the DB delete in a transaction; soft-delete is fine, set deletedAt.

Expected output:
- Edit src/app/(account)/settings/page.tsx.
- Add src/app/api/account/delete/route.ts.
- Update prisma/schema.prisma if soft-delete column is missing.
- Migration filename: yyyymmdd_add_user_deleted_at.
- Test in __tests__/account-delete.test.ts covering success and unauthenticated.
- Summarize changed files at the end.

That prompt is roughly twenty lines, and it produces a working multi-file change in one turn. The same task with the prompt "add a delete account flow to my Next app" produces a sprawling mess that touches files it should not touch and skips files it should touch. The twenty lines are not overhead. They are the engineering.

The Role of Constraints

Constraints are the part of a prompt that most engineers under-invest in, and they have the highest payoff per word. Stating what NOT to do is often more important than stating what to do, because the "what to do" tends to be obvious from the goal, while the "what not to do" carries all the local information about your codebase that the agent cannot infer.

A constraint section answers questions the agent would otherwise have to guess at. Which library should I use? Which file should I avoid? Which pattern is forbidden in this project? What conventions does the existing code follow that I should follow too? Without explicit constraints, the agent uses its defaults, and its defaults reflect the median of every codebase it has been trained on. Your codebase is not the median. Your codebase has specific choices, made for specific reasons, and those choices need to be conveyed.

The kinds of constraints worth naming, and the patterns that work for each:

Library constraints. "Do not introduce a new validation library; use the existing Zod schemas in src/lib/validators.ts." The agent's default is to reach for whatever is most popular for the task. If you have already picked, say so explicitly, or you will get a second library installed.

Convention constraints. "Follow the conventions in src/lib/db.ts -- query functions return a tuple of [result, error] rather than throwing." If your codebase has a non-default pattern, name it. The agent will follow conventions if you name them, and ignore them if you do not.

Scope constraints. "Touch only src/app/(billing) files. Do not modify any imports in components outside that directory." Without this, the agent occasionally decides to refactor unrelated code "for cleanliness" and you end up with a sprawling diff.

Negative constraints -- the most underrated kind. "Do not catch and re-throw errors with a different message; let them propagate." "Do not add comments explaining what the code does; the code should be self-explanatory." "Do not write 'as any' to silence type errors; if a type does not match, fix the type or tell me." Negative constraints head off the patterns the agent reaches for when it does not know any better.

Dependency constraints. "No new dependencies." This one carries so much weight it deserves to be in nearly every prompt for a non-trivial change. The agent's instinct is to install something, because most tutorials install something. In a real codebase, every new dependency has a cost, and most of the time the existing toolkit is enough.

Takeaway

A 30-word constraint section saves 300 words of cleanup later. The constraints carry the local information the agent cannot infer from the codebase alone, and adding them costs you 30 seconds of writing in exchange for not having to roll back a wrong-shaped diff. The asymmetry is large enough that omitting constraints from a prompt is almost always a net loss in time, even when it feels faster in the moment.

One pattern worth naming explicitly: the "follow the patterns already in the codebase" constraint. This single phrase, dropped into a prompt, makes the agent read existing files and match their style. It is not a substitute for specific constraints, but it is a useful default for the smaller decisions you do not want to enumerate. Combined with a pointer to a specific file ("look at src/components/Button.tsx for the convention"), it gives the agent a working reference and saves you from listing every minor pattern.

The rule of thumb that shipping practitioners settle into: spend at least as many words on what to avoid as on what to do, when the work is non-trivial. The constraint section is the part of the prompt that prevents the agent's defaults from quietly degrading your codebase, and the prevention is cheaper than the cleanup every time.

Examples in Prompts -- One-Shot, Few-Shot, Anti-Examples

When constraints are not enough to pin down what you want, examples close the gap. The agent is excellent at imitating a pattern it can see, and providing the pattern is often the shortest path from "here is what I want" to "here is the working code." The skill is in choosing the right kind of example for the situation, and there are three kinds worth naming.

One-shot. You provide a single example of the kind of file you want, and ask the agent to produce a new one in the same shape. "Here is how we structure an API route at src/app/api/users/route.ts; do the new src/app/api/posts/route.ts the same way." This is the highest-payoff form of example for code, because most file types in a project follow a repeated pattern, and showing the pattern once eliminates ambiguity. One-shot works when the pattern is clear and the new file is structurally similar.

Few-shot. You provide two or three examples to show variation. This is the right move when one example would not communicate the full pattern -- for instance, when there are different shapes the file might take depending on conditions, and the agent needs to see the conditions to pick the right shape. "Here are two existing endpoints, one read-only and one mutating; the new endpoint is mutating, follow the second pattern." Few-shot is more verbose but disambiguates between options.

Anti-examples. You provide an example of what NOT to do, with a reason. "Do not write it like this -- the project moved away from that pattern because it leaks the internal model into the API response. Use the new pattern in src/app/api/v2/." Anti-examples are useful when the agent's default would land on the wrong shape, and the most efficient way to redirect is to show the wrong shape and label it. They take longer to write but save iterations later.

Pattern is clear, one shape

One-shot

Multiple shapes possible

Few-shot

Wrong default likely

Anti-example

The choice between these is mostly mechanical. If the pattern is clear and there is one obvious shape, point at one file. If there are conditions that change the shape, show the relevant variants. If the agent's default would be wrong, show the wrong default and explain why. The skill is in noticing which case you are in, and the noticing comes from a few cycles of getting wrong outputs and tracing them back to which kind of guidance was missing.

One mistake worth avoiding: do not paste the example inline if you are working in a tool that can read your files. If you are using Claude Code or Cursor in agent mode, "look at src/components/Button.tsx for the pattern" is more efficient than copying the file contents into the prompt. The agent reads the file, picks up the patterns, and operates on the live file rather than your possibly-stale paste. Inline examples are useful for chat-style interactions and for tools without file access. In agentic tools, file paths are usually the cleaner choice.

The other mistake: examples that are not actually examples of what you want. The agent imitates what you show. If you point it at a file that has a known issue ("we are migrating away from this pattern"), the new file will have the same issue. Either fix the example before referencing it, or pair the reference with an anti-example noting what to change. Otherwise, the agent does what you implicitly told it to do, which is replicate the wrong shape.

Iterating Without Losing Context

Most non-trivial work happens over many turns rather than a single prompt. The skill of running an iterative session well is its own discipline, and it has a few patterns that show up across the practitioners who get a lot done. The shape of an effective session is not "send prompts and accept results"; it is a managed conversation where context is maintained, errors are handled, and the agent's understanding stays close to yours.

The session approach. You stay in one conversation, and you build up shared context as you go. The first prompt orients the agent to the codebase. Subsequent prompts assume the orientation. The agent gets better as the session progresses because it has read more of your code, learned more of your conventions, and seen what you accept and what you reject. This works well for sessions of two or three hours on a focused area. The downside is that very long sessions can accumulate noise -- corrections that cancel out, dead ends that no longer apply -- and the agent's reasoning quality can drift.

The reset approach. When the session has accumulated too much noise, you start fresh. You write a short summary of what is done and what remains, paste it into a new session, and the agent re-orients with a clean working memory. This is useful after a debugging marathon, after a wrong path that took several turns to recover from, or any time the session feels muddled. The reset is fast and almost always improves output quality. Beginners avoid it because it feels wasteful; experienced practitioners do it routinely because they know the cost of carrying confusion forward is higher than the cost of re-orienting.

The summary approach. Before a complex turn, you ask the agent to summarize the relevant context. "Before we start the next change, summarize the auth flow in this codebase as you currently understand it." This serves two purposes: it tells you whether the agent's model matches yours (if not, correct it before the next turn), and it concentrates the relevant context for the agent itself, since summarization tends to surface the parts that matter for the next step. Summary is cheap and surprisingly effective for catching misunderstandings before they become wrong code.

The "write it, then critique it" pattern. After the agent produces code, you ask the agent to critique its own output. "Now review the code you just wrote and list the three places it is most likely to be wrong." This sounds gimmicky and is genuinely useful. Models do better at evaluation than generation in many cases, and self-critique surfaces issues that the original generation glossed over. You can then ask for the fixes, and the second-pass output is usually meaningfully better than the first.

# A real session shape, lightly compressed.
> Read src/lib/auth.ts and src/app/(auth)/sign-up/page.tsx, then summarize
  how auth currently works in this project.
[agent reads files, produces summary]

> Good, that matches my model. Now: add a sign-in page at
  src/app/(auth)/sign-in/page.tsx, following the sign-up patterns.
  Same form library, same error component, same toast hook.
[agent produces page]

> Review the page you just wrote -- list three places it might be
  wrong or fragile, and one place it could be simpler.
[agent self-critiques, surfaces a real issue]

> Fix the first issue. Leave the rest, I'll review them after running tests.
[agent fixes]

> Summarize what we changed in this session, in 3 bullets, for the
  commit message.
[agent produces summary]

The shape above is not exotic. It is what a productive Claude Code session looks like for a moderate task. The patterns -- summarize, build, critique, fix, summarize -- compound into output quality over the session. The alternative ("write it; ship it") tends to produce code that needs more cleanup downstream than it would have taken to do the iteration in the first place.

One small habit that pays off: when you correct the agent on something it got wrong, name the kind of correction. "That hook should not be called from a server component" is more useful than "wrong" because it tells the agent the rule, not just the verdict. The agent will apply the rule to the next turn. If you only say "wrong," the agent will guess at what was wrong and may guess differently than you intended. Naming the correction shapes the rest of the session.

Multi-Step Task Decomposition

"Build the whole feature" prompts are the single largest category of failure in prompt engineering for code. They produce sprawling diffs touching files they should not touch, missing files they should touch, and combining decisions you would have made differently if you had been asked one at a time. The fix is decomposition -- breaking the work into reviewable slices and prompting for one slice at a time.

The reason "build the whole feature" fails is structural. The agent generates output token by token. The longer the generation, the more compounding decisions it has to make without your input, and the more likely the trajectory drifts away from what you wanted. By the time you read the output, fifteen judgment calls have been baked in, and rolling back any of them means rolling back the work that came after. Decomposition keeps the judgment calls reviewable while they are still cheap to change.

The decomposition pattern: identify the seams in the work, prompt for one slice, review it, prompt for the next. The seams are usually obvious -- the database change is one slice, the API endpoint is another, the UI is a third, the tests are a fourth. Each slice is a few hundred lines at most. Each slice produces a review-shaped output. Each slice can be rolled back independently if it is wrong.

Identify the seams

Look at the work and ask "what are the natural boundaries here?" For a feature with a database change, the seams are usually schema, data access layer, API, UI, tests. For a refactor, the seams are usually one module at a time. The seams are where changes can be reviewed independently. If you cannot name the seams, the work is too vague to start, and the prompt should be a planning prompt, not an implementation prompt.

Plan with the agent first

Ask the agent for a plan before any code. "Lay out the steps for adding a comments feature to the post detail page; do not write code yet." The plan is fast to produce, fast to read, and easy to correct. A wrong plan corrected in 30 seconds saves you from a wrong implementation that takes 30 minutes to unwind. This step is almost always worth it for non-trivial work, and almost always skipped by people new to the practice.

Prompt one slice at a time

Take the plan and prompt for the first step. Wait for the output. Review it. If it is wrong, correct before moving on, because errors compound across slices. If it is right, move to the next slice. The temptation to batch slices together is real -- "while you're at it, do step 2 also" -- and it is the temptation that destroys decomposition's benefits. Resist it.

Stitch and integrate at the end

After the slices are done, ask the agent to walk through the integration. "All four steps are done -- summarize how they connect, run the tests, and flag anything that looks inconsistent across the slices." This catches the small mismatches that creep in when slices were built without seeing each other's final shape. The integration pass is fast and almost always finds at least one minor inconsistency worth fixing.

One concrete heuristic: if a single prompt would produce more than about 300 lines of code, it is too large. Three hundred lines is roughly the limit at which careful review remains feasible without losing focus. Beyond that, you start scanning instead of reading, and the agent's mistakes pass through unnoticed. Splitting into sub-300-line slices keeps the review honest.

The other concrete heuristic: when the agent produces code for a slice, do not move on until the slice would compile and run. The temptation is to "finish the whole thing first, then run the tests." It does not work, because the second slice often depends on assumptions that the first slice failed to satisfy, and you find out at the end with a stack of broken pieces instead of finding out turn by turn. Verify each slice before continuing. The discipline of incremental verification is what makes decomposition pay off.

Common Failure Modes and Recovery Patterns

The agent fails in predictable ways. The failures are not random; they cluster into a small number of recurring patterns, and recognizing the pattern is the first step in recovery. The practitioners who get good at this build up a small mental library of "this is the X failure -- here is the fix" reflexes, and the library gets you out of dead ends quickly instead of grinding through the same kind of issue every time.

The hallucinated import. The agent confidently uses an API that does not exist -- a function from a library that was never installed, a method on an object that does not have it, an export from a file that does not export it. The signal is usually that the code looks plausible but tests fail with "X is not a function" or the type checker complains about an unknown export. The fix is to point the agent at the actual surface area: "the @auth/core package only exports A, B, and C; rewrite using those." The agent will accept the correction and rewrite. The prevention is to name your dependencies in the constraint section and to point the agent at the file containing the helper you actually want it to use.

The wrong-file edit. The agent edits a file that is similar to the one you wanted but is not the right one. Often happens in monorepos where multiple files have the same name, or in projects where two similar components exist for different contexts. The signal is that the changes show up in the wrong package, or in the legacy version of a file rather than the new version. The fix is to specify the file path in absolute or near-absolute terms next time, and to undo the edit and redirect. The prevention is to put exact file paths in the expected-output section of every non-trivial prompt.

The misread intent. The agent does something that is reasonable for "what you said" but not for "what you meant." You asked for a search box; the agent built a search box that searches across all entities in the database. You wanted it to search just one entity. The signal is that the output is more elaborate than the goal called for, or solves an adjacent problem rather than the actual one. The fix is to be more specific about scope on the next try. The prevention is to include the boundary explicitly: "search posts only; do not include comments or users; we'll add those later if needed."

The unstated-constraint break. The agent produces code that is technically correct but breaks an unstated rule. The auth function bypasses the rate limiter you built. The new endpoint does not pass the tenant ID through. The new query does not respect the soft-delete column. The signal is that the code looks fine but causes a subtle issue downstream. The fix is to add the missing constraint and ask for a corrected version. The prevention -- and this is the one that pays back the most -- is to maintain a project-level instruction file that names the constraints once, so they apply automatically to every prompt in the session.

Failure mode

Hallucinated import: code uses an API that does not exist.

Wrong-file edit: changes land in a similar but incorrect file.

Misread intent: solves an adjacent problem, not the actual one.

Unstated-constraint break: technically correct, breaks a hidden rule.

Recovery

Point at the real surface area; rewrite using the actual exports.

Specify file paths exactly; undo and redirect to the correct path.

Restate the goal with explicit scope boundaries; ask for a focused fix.

Add the missing constraint; ask for a corrected version; document it.

One pattern across all four: when the agent fails, do not just say "that's wrong." Name the failure type and the correct rule. "You used a function that does not exist; the package only exports X, Y, Z" is more useful than "that's broken." The named correction sticks for the rest of the session. Unnamed corrections do not, and you end up correcting the same kind of failure three times.

The deeper recovery pattern, when a session has accumulated multiple failures and is going sideways: reset. Open a new session. Write a one-paragraph summary of the goal, the relevant files, the constraints that have come up, and the part that is currently working. Continue from the new session. The reset is faster than trying to recover a confused conversation, and the output quality usually jumps. Knowing when to reset is one of the underrated skills in this practice.

Prompt Templates Worth Memorizing

The patterns above can be packaged into templates. The templates are not magic, and they are not the whole job -- you still need to fill them with specifics for your codebase -- but they save you from re-discovering the structure of a good prompt every time. Memorize a small number of templates, adapt them to your project, and the cost of writing the next prompt drops to near zero.

Four templates carry most of the weight. They cover feature additions, bug fixes, refactors, and code reviews. Each one bakes in the four components from earlier (context, intent, constraints, expected output) and a small amount of structure that is specific to the kind of work being done.

The "feature add" template

Goal: [one-sentence goal of the feature]

Context:
- Project: [framework, language, app router or pages, etc.]
- Relevant files: [paths the agent should read first]
- Existing helpers to reuse: [auth helper, db helper, ui primitives, etc.]

Intent:
- [what the user-visible behavior should be]
- [what the underlying mechanism should be]

Constraints:
- Reuse [specific helper] -- do not introduce a new [auth/db/state] approach
- Follow conventions in [reference file]
- No new dependencies
- [any other constraint specific to this project]

Expected output:
- Edit: [specific file paths]
- Add: [new file paths]
- Test: [test file path and what it should cover]
- Summary: list changed files at the end

The "bug fix" template

Bug fixes have a different shape because the goal is "make the broken thing work" rather than "build a new thing." The template puts more weight on the symptom and the suspected cause, less on file paths -- the agent will discover the file paths by reading. The full debugging workflow has its own topic in this curriculum; here is the prompt-shape for handing a bug to the agent.

Bug: [one-sentence description of the broken behavior]

Symptoms:
- Expected: [what should happen]
- Actual: [what is happening]
- Where it shows up: [page, endpoint, function]

Suspected cause: [your best guess, or "unknown"]

Reproduction:
- [steps or test that shows the bug]

Constraints:
- Fix the root cause, not the symptom
- Do not silence the error or wrap it in a try/catch unless that's the actual fix
- Add a test that fails on the bug and passes after the fix
- Do not modify unrelated code

Expected output:
- The fix
- The new test
- A 2-3 line explanation of the actual cause

The "refactor" template

Refactors are the most dangerous prompts to send because the agent's instinct is to rewrite more than you wanted. The template is heavy on what NOT to change. The discipline of stating the boundary is more important than stating the goal.

Refactor: [what is being refactored]

Current shape: [how the code is structured now]
Target shape: [how the code should be structured after]

Reason: [why the refactor is happening]

Constraints:
- Behavior must not change. All existing tests must still pass without modification.
- Do not refactor anything outside [specific scope].
- Do not add or remove dependencies.
- Do not "improve" things that are unrelated to this refactor.
- If you find a bug while refactoring, leave a comment with TODO and tell me; do not fix it in this PR.

Expected output:
- The refactored files
- Confirmation that existing tests still pass
- List of any TODOs added
- Brief diff summary

The "code review" template

Asking the agent to review code is one of the most undervalued uses of a coding agent that does not actually involve writing code. The agent is a competent reviewer, especially for the categories of issues that are pattern-recognizable -- security mistakes, performance issues, common bugs, style inconsistencies. The template is short because the agent does most of the work.

Review: src/[file path] (or PR diff pasted below).

Check for:
- Security issues (injection, missing auth checks, sensitive data in logs)
- Performance issues (N+1 queries, unnecessary re-renders, missing indexes)
- Correctness issues (off-by-ones, race conditions, missing error handling)
- Style issues vs the conventions in [reference file]
- Anything that should be tested but is not

For each finding:
- Severity: high / medium / low
- Where: file and line
- Why it matters
- Suggested fix

Do not rewrite the code in this turn. Just produce the review.

Why memorize templates

The thirty minutes you spend writing your first solid template saves hours over the next thirty features. The template is not a substitute for engineering judgment, but it is a substitute for re-thinking the structure of a good prompt every time. Adapt them to your project, save them in a notes file, copy-paste-modify when you start work. The discipline pays off the way every well-built tool pays off -- you stop noticing it because it just works.

One adaptation worth doing: rewrite each template with your project's specifics filled in, and save the project-specific versions. The generic template is "reuse the existing auth helper"; the project-specific version is "use the useAuth hook from src/lib/auth.ts which returns { user, signIn, signOut, session }." The project-specific version is the one you actually use, and it shrinks every prompt you write by a few lines because the agent already has the specifics.

A Few Higher-Order Habits

The templates and patterns above cover the mechanics. A few habits sit on top of the mechanics and matter more over the course of months than any single prompt does. They are the difference between a practitioner who has the moves memorized and one who has internalized the discipline.

Maintain a project instruction file. Most coding agents support a project-level file that gets loaded into every session -- CLAUDE.md for Claude Code, .cursorrules for Cursor, similar files for the others. Use it. Put the conventions, the architectural decisions, the constraints, the file-path map, and anything else the agent needs to know about your project. The agent reads it for free at the start of every session, and the cost of one well-written file is paid back across hundreds of subsequent prompts. This is one of the highest-return things you can do in this practice, and it is the place where most beginners under-invest.

Keep prompt notes. When you write a prompt that worked unusually well, save it. When you discover a phrase that fixes a recurring failure, save it. When you find an anti-example that prevents a bad default, save it. Over a few weeks, the notes turn into a personalized cookbook that you can reach for. The cookbook is more useful than any general guide because it is calibrated to your codebase and your workflow.

Read the agent's output. This sounds obvious and is the single rule that gets violated the most. Read every diff. Read the function bodies, not just the summaries. Read the test file the agent wrote, not just the headline that it wrote one. The agent's output looks plausible by default; the work of catching the implausible parts is review work, and review work that does not happen is review work that happens later when the bug shows up in production. The thirty seconds you save by skimming a diff is paid back at a steep multiplier when the un-skimmed code breaks.

Push back on the first answer when it does not feel right. The agent's first answer is often correct, and it is sometimes the median answer when a better answer exists. If you read the output and something feels off but you cannot articulate why, ask the agent for two alternative implementations and a comparison. The exercise often surfaces the better path. The agent will produce the alternatives without complaint, and the comparison reveals the trade-offs that the first answer hid. This is one of the cheapest ways to escape a "first thing the agent reaches for" trap.

Treat the prompt as code. Save the prompts that work. Version them when you change them. Notice when a prompt that used to work stops working, the same way you would notice a flaky test, and figure out what changed -- in the codebase, in the model, in your conventions. Prompts decay as the project evolves. Keeping them tracked means you notice the decay early and update them rather than discovering they no longer work in the middle of a feature.

Time spent reading agent output in a productive session ~50%

Time spent writing the prompt itself ~25%

Time spent correcting and iterating ~25%

The proportions above are approximate and shift by task, but the headline holds: review takes more time than writing the prompt. If your session has the proportions reversed -- ten minutes writing the prompt, two minutes scanning the output -- you are under-reviewing. Recalibrate. The prompt is the input; the output is what you ship. The output deserves more attention than the input.

A Note on Models and Tools

The patterns above hold across coding agents. Claude Code, Cursor in agent mode, GitHub Copilot in agent mode, OpenAI Codex CLI -- the prompt structure that works in one works in the others, with minor adjustments for the tool's interaction model. The discipline is more transferable than the specific tool.

That said, the choice of model matters for prompt engineering specifically. Larger context windows let you include more code in the prompt, which means less manual orientation work. Better instruction-following means constraints get respected more reliably, which means you spend less time correcting drift. Better self-critique means the "review your own work" pattern produces sharper feedback. The differences between top-tier models are not huge, but they compound across a long session.

The recommendation, calibrated to mid-2026: Claude Code with a current Claude model is the protagonist of this kind of work. The 200K context window holds the whole codebase for most projects. The instruction-following is strong, which means constraints stick. The agent runs in your terminal, reads your files, runs your tests, and iterates without constant babysitting. The interaction model rewards the discipline this page describes. Cursor users ship serious software too, and the editor-integrated workflow suits people who think in editor windows. Copilot users ship serious software too, especially in GitHub-heavy workflows. The differences between the best tools are smaller than the difference between using any of them well and using any of them poorly.

The model choice within Claude itself: Sonnet is the workhorse for most prompt-engineered code work; Opus for the heaviest reasoning, planning, and architectural work; Haiku for the small, fast, well-specified tasks where you want a quick turnaround. Most sessions live on Sonnet. The few-times-a-week reach for Opus is for when the problem is hard and the cost of a wrong answer is high. The few-times-a-day reach for Haiku is for the small jobs where speed matters more than depth. The switching is cheap and the returns on matching model to task are real.

The Compounding Argument

Closing on the claim that opens the page: prompt engineering for code is a craft, and the craft compounds. Every well-written prompt teaches you something about what works. Every bad output traces back to a missing piece in the prompt structure, and the missed piece becomes a thing you remember to include next time. The templates get sharper. The constraint sections get tighter. The decomposition gets more natural. The session hygiene gets cleaner. The total time spent per feature drops, not because the typing is faster, but because the prompt is more efficient and the iteration is shorter.

The improvised prompt is fast right now and slow over a project. It feels like progress because the prompt is short, but the output requires more cleanup, more correction, more retry. Over a sprint, the cost adds up. The engineered prompt is slow right now and fast over a project. It feels like overhead because writing the constraints takes time, but the output ships closer to correct on the first turn. Over a sprint, the engineering pays back many times over.

The discipline is teachable. The templates are reusable. The patterns repeat across projects. The investment is front-loaded and the return is back-loaded, which is the shape of every craft worth practicing. If you spend the next thirty minutes writing your first solid feature-add template, calibrated to your project, you will save hours over the next thirty features. That is not a hype claim. It is the standard arithmetic of writing tools that you keep using, and the writing of better prompts is exactly that kind of tool.

The rest of this curriculum goes deep on the surrounding skills. Context engineering -- the art of giving the model what it needs and nothing more. Agent instruction files and how to write them well. Debugging when the agent has gone off the rails. Tool selection and model selection in detail. Each one builds on the discipline this page introduces. Prompt engineering for code is the foundation; the surrounding topics are the rooms built on top of it. Learn this one well, and the rest follow.