Batch API Design: The Mathematics of Bulk Data Transfer

Calling a REST API once per item is perfectly acceptable for a small number of entities ( $N$ ). However, as $N$ grows large, or as network latency increases, the overhead of individual HTTP requests will catastrophicly degrade system performance.

The batch API design problem is deceptively complex. It is not simply a matter of allowing an array in a JSON payload. Architects must solve for partial failures, atomic transactional boundaries, HTTP semantic mapping, and idempotency guarantees. When a client sends a request to process $100K worth of transactions in a single batch, and the connection drops halfway through, the API design determines whether the system gracefully recovers or double-charges the users.

This guide explores the foundational "why" of batching, the structural patterns that actually work in production, and the dangerous anti-patterns that plague junior implementations.

1. The "Why": The Physics of Network Overhead

To understand why batching is mandatory at scale, one must look at the physical overhead of an HTTP request.

The Protocol Tax: Every single API call incurs DNS resolution, TCP handshake (SYN, SYN-ACK, ACK), TLS negotiation (which involves heavy cryptographic math), HTTP header parsing, and authentication validation (e.g., verifying a JWT signature).
The Database Tax: On the server side, a single API request typically opens a database connection, begins a transaction, executes a query, commits, and closes the connection.

If a client needs to create 1,000 users, making 1,000 separate POST /users requests forces the client and server to pay this "tax" 1,000 times. If the network round-trip time is 50ms, the theoretical floor for completing this task serially is 50 seconds, not including server processing time.

The Batching Solution: By combining all 1,000 users into a single POST /users/batch request, the TCP/TLS overhead is paid exactly once. Furthermore, the server can open a single database transaction, utilize a prepared SQL statement (INSERT INTO users ...), and execute a bulk insert. A task that took 50+ seconds serially can easily be completed in 500 milliseconds.

2. Request and Response Shapes

The most common structural pattern for a batch API mirrors the request payload in the response payload.

2.1 The Request Structure

A well-designed batch request accepts an array of operations. It is critical to enforce a maximum batch size. Without hard limits (e.g., maximum 1,000 items or 5MB payload size), a malicious or buggy client can send an infinitely large JSON array, causing the server to run out of memory (OOM) and crash while attempting to parse the payload.

POST /api/orders/batch
{
    "operations": [
        { "id": "req-1", "amount": 100.00 },
        { "id": "req-2", "amount": 250.50 }
    ]
}

2.2 The HTTP Status Code Dilemma

How do you return an HTTP status code if item 1 succeeds but item 2 fails?

The Standard Approach: The industry standard is to return an HTTP 200 OK. The 200 signifies only that the batch request was successfully received, parsed, and processed by the server. The actual success or failure of the individual items is buried within the JSON response body.
Why not 400 Bad Request? If you return a 400 when some items fail, naive HTTP clients will assume the entire request was rejected and will retry the entire batch, causing massive double-processing issues.
What about 207 Multi-Status? While WebDAV introduced the 207 Multi-Status code exactly for this scenario, it is notoriously poorly supported by standard HTTP client libraries (like Axios or Fetch) and is generally avoided in modern REST APIs.

3. Handling Partial Failures (The Hard Part)

The defining characteristic of a professional batch API is how it handles partial failures.

3.1 All-or-Nothing (Atomic Batches)

In an "All-or-Nothing" design, the server wraps the entire batch in a single database transaction. If 999 items succeed and the 1,000th item fails validation, the server rolls back the transaction. Nothing is saved.

The "Why": This is mathematically the easiest pattern to reason about. The client simply fixes the one broken item and resubmits the entire batch.
The Caveat: At high scale, long-running database transactions lock tables and degrade concurrent performance. Furthermore, if the batch involves interacting with third-party APIs (e.g., calling Stripe to process 1,000 credit cards), true atomicity is physically impossible (you cannot "rollback" a charged credit card if the next card fails).

3.2 Independent Item Processing (The Standard)

In most modern APIs, items are processed independently. The response explicitly maps the outcome of each requested item.

{
    "results": [
        { "id": "req-1", "status": "ok", "created_id": "ord_999" },
        { "id": "req-2", "status": "error", "error_code": "INSUFFICIENT_FUNDS" }
    ]
}

Client Responsibility: The client is responsible for parsing this array, filtering out the "error" results, fixing them, and retrying only the failed items.

4. Idempotency and Retries

When dealing with financial or critical data, idempotency is non-negotiable. If a client sends a batch of 1,000 payments, and the server processes them successfully but the Wi-Fi connection drops before the client receives the 200 OK, the client is in a blind state. Did it work?

If the client blindly retries the exact same request, it might charge the users twice.

4.1 Item-Level Idempotency Keys

To solve this, the API must mandate that the client provides a unique "Idempotency Key" for every single item in the batch (often a UUID generated by the client).

The Mechanism: When the server receives an item, it checks the database to see if an operation with that exact idempotency key has already been successfully executed. If it has, the server skips the processing but returns a "success" response for that item.
The Value: This allows clients to aggressively and safely retry batches during network instability without fear of duplicating data.

5. Architectural Patterns for High-Volume Batching

Different use cases demand entirely different architectural implementations.

5.1 Synchronous Batching (Low Latency)

For small, fast operations (e.g., inserting 500 telemetry logs into a database), the server processes the batch synchronously. The client holds the HTTP connection open, and the server returns the results within a few seconds.

5.2 Asynchronous Polling (Heavy Workloads)

If a batch contains 50,000 items, or if processing each item requires slow third-party API calls, the synchronous model breaks. The HTTP connection will time out (usually restricted to 30 or 60 seconds by API Gateways).

The Pattern: The client sends the payload. The server instantly replies with a 202 Accepted and a job_id.
The Follow-up: The client must then poll a secondary endpoint (GET /api/batch/jobs/{job_id}) every few seconds to check the status. Once the job is marked COMPLETED, the client downloads the results array.

5.3 Webhook Callbacks (Event-Driven)

Polling is inherently inefficient. For enterprise-grade batch APIs (like bulk exporting data from Salesforce), the client provides a webhook_url in the initial request. When the server finally finishes processing the massive batch three hours later, it actively sends an HTTP POST request to the client's webhook URL containing the final results.

6. Common Anti-Patterns to Avoid

The Fake Batch API: A junior developer implements a /batch endpoint, but the backend simply runs a for loop over the array, making 1,000 separate database queries. This saves network TCP overhead but completely fails to optimize the database layer. A true batch API utilizes bulk database operations (e.g., INSERT INTO ... VALUES (), (), ()).
Implicit Ordering: Assuming that the array order matters (e.g., Item 1 is a parent object, Item 2 is a child object that references Item 1). Batch processing is almost always parallelized by the server for speed. If order matters, it must be explicitly defined in the API contract, or the client will experience highly non-deterministic race conditions.

Conclusion

Batch API design forces architects to confront the chaotic reality of distributed systems. Networks fail, databases lock, and clients send malformed data. By embracing item-level idempotency, clear partial-failure contracts, and explicit asynchronous patterns for heavy workloads, engineering teams can build resilient bulk-transfer systems that scale gracefully.