AI Pair Programming

In 2026, AI pair programming is no longer a question of "will this be useful." Most engineers using these tools daily would not return to working without them. The remaining questions are *which* tool, *how* to use it, and *where* it falls down.

This page is the working engineer's view: what's available, how they differ, and the workflow patterns that produce real productivity instead of theatre.

The tool landscape

| Tool | Strength | Weakness | Best for |

|---|---|---|---|

| **GitHub Copilot** | Inline completion in IDE; ubiquitous; integrated everywhere | Tab-completion mindset; less powerful for multi-file changes | Augmented typing |

| **Cursor** | AI-first IDE built on VS Code; agent mode for multi-file work | Subscription cost; AI judgment in your IDE moves fast | Mid-complexity edits across files |

| **Claude Code** | Terminal-native; extensive tool use; agent harness | Less GUI-friendly; opinionated | Repository-scale tasks, autonomous work |

| **Aider** | Open source, terminal, git-aware | Less polished; smaller feature set | Self-hosted, model-flexible workflows |

| **Continue.dev** | Open source IDE plugin | Newer; growing | Self-hosted with open-weights models |

| **Cody** (Sourcegraph) | Strong code search + LLM | Stronger in large codebases | Enterprise, large monorepos |

The sub-categories are converging: Copilot has agent mode, Cursor has tab completion. Choose based on your workflow preferences, not feature checklists.

What "AI pair programming" actually buys you

Honest accounting from observed practice:

- **Boilerplate reduction**: 80% of typing eliminated for routine code (tests, model classes, CRUD endpoints, parsing). Closer to 95% for pattern-following code.

- **Refactoring across files**: rename, extract, restructure — 5-10× faster than manual.

- **Documentation generation**: docstrings, README updates, change descriptions. The model knows the code; let it write about it.

- **Translating ideas to syntax**: "implement X using Y library" — the model handles the API surface; you handle the design.

- **Test generation**: covers happy path well; misses edge cases; useful as a starting point.

- **Debugging assistance**: paste the error, the code, the stack trace. Often points to the bug in seconds.

- **Onboarding to unfamiliar code**: "explain this codebase / file / function" works.

What it doesn't buy you:

- **Architecture decisions.** Models can articulate tradeoffs but won't make the right call without your context.

- **Domain-specific judgement.** "Should this be one service or two" — the model doesn't know your team, your scale, your constraints.

- **Subtle correctness.** Off-by-one errors, race conditions, security vulnerabilities — models miss these regularly.

- **Replacement for thinking.** Engineers who code-by-completion produce worse code than ones who think first and use AI to type faster.

The workflows that work

"AI as autocomplete on steroids"

The Copilot baseline. Type a function signature, get a suggestion, accept or modify. Best for:

- Highly patterned code where you know what you want; the model just types it.

- Test boilerplate.

- Small fills (writing a JSON parser, a regex, a formatter).

When it goes wrong: the model writes plausible-looking code that does the wrong thing because the function signature was ambiguous. Always read what was suggested before accepting.

"Conversation-driven implementation"

Cursor's chat mode, Claude Code's basic interaction. You describe what you want; the model implements; you iterate.

```

You: Add a function to parse semver strings; handle pre-release and build metadata.

AI: Here's the implementation... [presents code with explanation]

You: Looks good but pre-release shouldn't accept leading zeros per spec.

AI: Updated. [revised code]

```

Strong for medium-complexity tasks. Weakness: requires you to know enough to spot what the model got wrong.

"Agent-driven autonomous work"

Claude Code, Cursor agent mode, Aider. You describe a task at higher level; the model plans, makes changes across files, runs tests, iterates.

```

You: Add OAuth login to the users module. Match our existing auth pattern.

[AI reads existing auth code, plans the change, writes the new code,

modifies routes, adds tests, runs the test suite, fixes failures]

You: [reviews the diff, approves or asks for changes]

```

Strongest for moderate-scope features. The "moderate scope" part is critical — autonomous work on small tasks is overkill; on large tasks it produces sprawling changes that are hard to review.

"AI as code reviewer"

Run the model against your changes before submitting a PR:

```

You: Review my last commit. Find bugs, missed edge cases, style issues.

AI: [structured feedback]

```

Catches a meaningful fraction of bugs, especially in unfamiliar areas. See [AiForCodeReview]().

What good prompting looks like

Vague: "Fix this bug." Useless without context.

Better: "This function should return ascending sorted dates. Sometimes it's returning descending. The bug appeared after the recent timezone refactor. Test in `test_dates.py:test_sort_orders` is failing."

The pattern: (a) what the code should do, (b) what it's doing, (c) what changed, (d) where the failing test is. Pretend you're emailing a colleague who knows the codebase but not this issue.

For larger work:

- **Specify the abstractions you want.** "Use a strategy pattern with three concrete implementations" — the model needs the structural decision; you make it.

- **Specify the existing patterns.** "Match the style of our other repository classes." Show one as a template.

- **Specify the constraints.** "Don't add new dependencies." "This must work in Python 3.10+."

- **Specify what done looks like.** "All tests pass; coverage doesn't drop; lint clean."

Anti-patterns

- **Accepting the first suggestion.** Models are confidently wrong. Read what they wrote.

- **"Implement everything."** Vague large requests produce sprawling unfocused changes. Decompose first.

- **No tests.** Without tests, AI-generated code might appear to work and quietly not. Tests catch regressions; they're more important when AI moves fast.

- **Skipping code review.** AI-written code needs review; possibly more carefully than human-written code (different failure modes).

- **Letting AI fix tests by changing assertions.** Common pattern: test fails → AI "fixes" by relaxing the assertion. Watch for this.

- **Running agents on production credentials.** Until permission scoping is more mature, don't.

- **Multi-step autonomous work without supervision.** Set a budget (in time or in cost); the agent is bounded.

Productivity, honestly

Studies (GitHub's Copilot study, 2023; subsequent academic work) show 20-50% productivity gains on coding tasks. Real-world experience varies more:

- **Junior engineers** see large gains on routine work, smaller gains on complex work; risk of producing code they don't understand.

- **Senior engineers** see meaningful gains on typing-heavy work; smaller relative gains where their bottleneck was thinking, not typing.

- **Across the board**, the gains compound when AI is used for documentation, refactoring, and exploration alongside straight code generation.

The common pattern after a year of using AI tooling: engineers report doing more work in the same time, with less of that time spent on parts of the job they don't enjoy (boilerplate, mechanical refactoring, finding the syntax for a library).

When AI pair programming gets bad reviews

Often it's because the team:

- Skipped review and shipped bugs.

- Used the tool on tasks where it can't help (architectural design).

- Didn't invest in good prompting habits.

- Tried full autonomous mode on tasks that need supervision.

- Optimised for speed over correctness; got incidents.

These aren't tool failures; they're tool-misuse failures. The same way a power tool can build a house faster or take off a finger faster, the workflow matters.

A pragmatic adoption pattern

For a team adopting AI pair programming:

1. **Start with autocomplete.** Copilot or equivalent. Low risk; high leverage; low learning curve.

2. **Add chat for medium work.** Cursor or Claude Code or equivalent for iterative work.

3. **Add agents for repetitive features.** "Implement this CRUD" or "add this column everywhere it appears" — bounded autonomous work.

4. **Add AI code review.** Before PRs. Catches a chunk of issues early.

5. **Don't lose code review discipline.** Human review still matters.

6. **Track outcomes.** Bug rate, deploy frequency, time to merge. AI should improve these; if it doesn't, examine your workflow.

Six months in, most teams find AI tooling indispensable. The tooling will have changed by then; the workflow patterns above will have evolved less.

Further reading

- [AiForCodeReview]() — AI-assisted review specifically

- [AiForSoftwareTesting]() — AI in test workflows

- [AiAugmentedWorkflows]() — broader AI-augmented work patterns

- [AgenticArchitecture]() — when the AI is the agent, not the tool

- [AcceleratingAiLearning]() — building competence with AI tooling