AI Pair Programming
In 2026, AI pair programming is no longer a question of "will this be useful." Most engineers using these tools daily would not return to working without them. The remaining questions are *which* tool, *how* to use it, and *where* it falls down.
This page is the working engineer's view: what's available, how they differ, and the workflow patterns that produce real productivity instead of theatre.
The tool landscape
| Tool | Strength | Weakness | Best for |
|---|---|---|---|
| **GitHub Copilot** | Inline completion in IDE; ubiquitous; integrated everywhere | Tab-completion mindset; less powerful for multi-file changes | Augmented typing |
| **Cursor** | AI-first IDE built on VS Code; agent mode for multi-file work | Subscription cost; AI judgment in your IDE moves fast | Mid-complexity edits across files |
| **Claude Code** | Terminal-native; extensive tool use; agent harness | Less GUI-friendly; opinionated | Repository-scale tasks, autonomous work |
| **Aider** | Open source, terminal, git-aware | Less polished; smaller feature set | Self-hosted, model-flexible workflows |
| **Continue.dev** | Open source IDE plugin | Newer; growing | Self-hosted with open-weights models |
| **Cody** (Sourcegraph) | Strong code search + LLM | Stronger in large codebases | Enterprise, large monorepos |
The sub-categories are converging: Copilot has agent mode, Cursor has tab completion. Choose based on your workflow preferences, not feature checklists.
What "AI pair programming" actually buys you
Honest accounting from observed practice:
- **Boilerplate reduction**: 80% of typing eliminated for routine code (tests, model classes, CRUD endpoints, parsing). Closer to 95% for pattern-following code.
- **Refactoring across files**: rename, extract, restructure — 5-10× faster than manual.
- **Documentation generation**: docstrings, README updates, change descriptions. The model knows the code; let it write about it.
- **Translating ideas to syntax**: "implement X using Y library" — the model handles the API surface; you handle the design.
- **Test generation**: covers happy path well; misses edge cases; useful as a starting point.
- **Debugging assistance**: paste the error, the code, the stack trace. Often points to the bug in seconds.
- **Onboarding to unfamiliar code**: "explain this codebase / file / function" works.
What it doesn't buy you:
- **Architecture decisions.** Models can articulate tradeoffs but won't make the right call without your context.
- **Domain-specific judgement.** "Should this be one service or two" — the model doesn't know your team, your scale, your constraints.
- **Subtle correctness.** Off-by-one errors, race conditions, security vulnerabilities — models miss these regularly.
- **Replacement for thinking.** Engineers who code-by-completion produce worse code than ones who think first and use AI to type faster.
The workflows that work
"AI as autocomplete on steroids"
The Copilot baseline. Type a function signature, get a suggestion, accept or modify. Best for:
- Highly patterned code where you know what you want; the model just types it.
- Test boilerplate.
- Small fills (writing a JSON parser, a regex, a formatter).
When it goes wrong: the model writes plausible-looking code that does the wrong thing because the function signature was ambiguous. Always read what was suggested before accepting.
"Conversation-driven implementation"
Cursor's chat mode, Claude Code's basic interaction. You describe what you want; the model implements; you iterate.
```
You: Add a function to parse semver strings; handle pre-release and build metadata.
AI: Here's the implementation... [presents code with explanation]
You: Looks good but pre-release shouldn't accept leading zeros per spec.
AI: Updated. [revised code]
```
Strong for medium-complexity tasks. Weakness: requires you to know enough to spot what the model got wrong.
"Agent-driven autonomous work"
Claude Code, Cursor agent mode, Aider. You describe a task at higher level; the model plans, makes changes across files, runs tests, iterates.
```
You: Add OAuth login to the users module. Match our existing auth pattern.
[AI reads existing auth code, plans the change, writes the new code,
modifies routes, adds tests, runs the test suite, fixes failures]
You: [reviews the diff, approves or asks for changes]
```
Strongest for moderate-scope features. The "moderate scope" part is critical — autonomous work on small tasks is overkill; on large tasks it produces sprawling changes that are hard to review.
"AI as code reviewer"
Run the model against your changes before submitting a PR:
```
You: Review my last commit. Find bugs, missed edge cases, style issues.
AI: [structured feedback]
```
Catches a meaningful fraction of bugs, especially in unfamiliar areas. See [AiForCodeReview]().
What good prompting looks like
Vague: "Fix this bug." Useless without context.
Better: "This function should return ascending sorted dates. Sometimes it's returning descending. The bug appeared after the recent timezone refactor. Test in `test_dates.py:test_sort_orders` is failing."
The pattern: (a) what the code should do, (b) what it's doing, (c) what changed, (d) where the failing test is. Pretend you're emailing a colleague who knows the codebase but not this issue.
For larger work:
- **Specify the abstractions you want.** "Use a strategy pattern with three concrete implementations" — the model needs the structural decision; you make it.
- **Specify the existing patterns.** "Match the style of our other repository classes." Show one as a template.
- **Specify the constraints.** "Don't add new dependencies." "This must work in Python 3.10+."
- **Specify what done looks like.** "All tests pass; coverage doesn't drop; lint clean."
Anti-patterns
- **Accepting the first suggestion.** Models are confidently wrong. Read what they wrote.
- **"Implement everything."** Vague large requests produce sprawling unfocused changes. Decompose first.
- **No tests.** Without tests, AI-generated code might appear to work and quietly not. Tests catch regressions; they're more important when AI moves fast.
- **Skipping code review.** AI-written code needs review; possibly more carefully than human-written code (different failure modes).
- **Letting AI fix tests by changing assertions.** Common pattern: test fails → AI "fixes" by relaxing the assertion. Watch for this.
- **Running agents on production credentials.** Until permission scoping is more mature, don't.
- **Multi-step autonomous work without supervision.** Set a budget (in time or in cost); the agent is bounded.
Productivity, honestly
Studies (GitHub's Copilot study, 2023; subsequent academic work) show 20-50% productivity gains on coding tasks. Real-world experience varies more:
- **Junior engineers** see large gains on routine work, smaller gains on complex work; risk of producing code they don't understand.
- **Senior engineers** see meaningful gains on typing-heavy work; smaller relative gains where their bottleneck was thinking, not typing.
- **Across the board**, the gains compound when AI is used for documentation, refactoring, and exploration alongside straight code generation.
The common pattern after a year of using AI tooling: engineers report doing more work in the same time, with less of that time spent on parts of the job they don't enjoy (boilerplate, mechanical refactoring, finding the syntax for a library).
When AI pair programming gets bad reviews
Often it's because the team:
- Skipped review and shipped bugs.
- Used the tool on tasks where it can't help (architectural design).
- Didn't invest in good prompting habits.
- Tried full autonomous mode on tasks that need supervision.
- Optimised for speed over correctness; got incidents.
These aren't tool failures; they're tool-misuse failures. The same way a power tool can build a house faster or take off a finger faster, the workflow matters.
A pragmatic adoption pattern
For a team adopting AI pair programming:
1. **Start with autocomplete.** Copilot or equivalent. Low risk; high leverage; low learning curve.
2. **Add chat for medium work.** Cursor or Claude Code or equivalent for iterative work.
3. **Add agents for repetitive features.** "Implement this CRUD" or "add this column everywhere it appears" — bounded autonomous work.
4. **Add AI code review.** Before PRs. Catches a chunk of issues early.
5. **Don't lose code review discipline.** Human review still matters.
6. **Track outcomes.** Bug rate, deploy frequency, time to merge. AI should improve these; if it doesn't, examine your workflow.
Six months in, most teams find AI tooling indispensable. The tooling will have changed by then; the workflow patterns above will have evolved less.
Further reading
- [AiForCodeReview]() — AI-assisted review specifically
- [AiForSoftwareTesting]() — AI in test workflows
- [AiAugmentedWorkflows]() — broader AI-augmented work patterns
- [AgenticArchitecture]() — when the AI is the agent, not the tool
- [AcceleratingAiLearning]() — building competence with AI tooling