The Dual Response Pattern: Scaling LLM-Powered Reports

Part 2 of "Scaling LLM-Based Reporting Past the Demo"

November 28th, 2025

TLDR

MCP's resource_link, outputSchema, and structuredContent primitives combine into a complete reporting architecture. This post specifies the dual response schema, REST endpoints for paginated retrieval, resource lifecycle management with expiration and pinning, multi-tenant security considerations, and the prompt engineering that makes LLMs work reliably with the pattern.

In Part 1, we established the problem: LLMs excel at translating natural language into database queries, but choke when you try to push full result sets through their context window. The solution is dual response - give the LLM a sample to validate its work, and give your app a handle to fetch the complete dataset out-of-band.

This post specifies the pattern in detail. We'll walk through the schema, show how it builds on MCP's native primitives, and cover the lifecycle management that keeps your server from drowning in abandoned queries.

Everything here is designed to work with the Model Context Protocol as it exists today - we're not inventing new protocol features, just using existing ones in a coordinated way.

MCP's Building Blocks

Before we dive into dual response, let's be clear about what MCP already gives us. The protocol has evolved significantly over the past year, and the 2025-06-18 and 2025-11-25 spec releases added primitives that make this pattern much cleaner than it would have been before.

Tool responses can include resource links. When your tool returns results, you're not limited to text. The resource_link content type lets you return a URI reference that the client can fetch separately:

{
  "type": "resource_link",
  "uri": "resource://query/abc123",
  "name": "Institution Search Results",
  "mimeType": "application/json"
}

The LLM sees this link. The host application can fetch the full data via that URI without it ever touching the context window.

Tools can declare output schemas. The outputSchema field lets you define exactly what structure your tool returns. Clients can validate against it, and - critically - the LLM knows what to expect:

{
  "name": "search_institutions",
  "outputSchema": {
    "type": "object",
    "properties": {
      "results": { "type": "array" },
      "total_count": { "type": "number" },
      "resource_uri": { "type": "string" }
    }
  }
}

Structured content provides typed responses. The structuredContent field in tool responses gives you a proper JSON object that machines can parse, alongside the text content the LLM reasons over.

Dual response coordinates these primitives into a pattern optimized for reporting.

The Dual Response Schema

Here's what a query tool returns:

interface DualResponse {
  // Sample rows for LLM validation (10-50 typical)
  results: Record<string, any>[];
  
  // Reference to complete dataset
  resource: {
    uri: string;           // e.g., "resource://query/abc123"
    name: string;          // Human-readable identifier
    mimeType: string;      // "application/json" for tabular data
  };
  
  // Query context
  metadata: {
    total_count: number;   // Full result set size
    columns: ColumnDef[];  // Schema information
    executed_at: string;   // ISO 8601 timestamp
    expires_at?: string;   // When resource becomes unavailable
  };
}

interface ColumnDef {
  name: string;
  type: "string" | "number" | "boolean" | "date";
  description?: string;
}

And here's what the actual MCP tool response looks like:

{
  "content": [
    {
      "type": "text",
      "text": "Found 847 institutions matching criteria. Sample results and full dataset link below."
    },
    {
      "type": "resource_link",
      "uri": "resource://query/abc123",
      "name": "Institution Search Results",
      "mimeType": "application/json"
    }
  ],
  "structuredContent": {
    "results": [
      {"name": "MIT", "state": "MA", "enrollment": 11934},
      {"name": "Stanford", "state": "CA", "enrollment": 17534},
      {"name": "Carnegie Mellon", "state": "PA", "enrollment": 15818}
    ],
    "resource": {
      "uri": "resource://query/abc123",
      "name": "Institution Search Results", 
      "mimeType": "application/json"
    },
    "metadata": {
      "total_count": 847,
      "columns": [
        {"name": "name", "type": "string"},
        {"name": "state", "type": "string"},
        {"name": "enrollment", "type": "number"}
      ],
      "executed_at": "2025-01-15T10:30:00Z",
      "expires_at": "2025-01-15T10:45:00Z"
    }
  }
}

The LLM gets three things:

  1. Text summary it can incorporate into its response
  2. The resource link it can reference when telling the user where to get full results
  3. Structured data with sample rows it can validate against

The host application gets:

  1. The resource URI to fetch complete data
  2. Metadata to set up the UI (column definitions, total count for pagination)
  3. Lifecycle hints (expiration time)

Why Samples Matter: Iteration in Practice

Let's walk through a real query cycle using our IPEDS example.

User: "Show me enrollment trends for computer science programs"

LLM generates first query: Builds a query joining completions and institutions, filtering on CIP code 11.07 (computer science).

Server returns dual response:

{
  "results": [
    {"year": 2023, "program": "Certificate", "completions": 2847},
    {"year": 2023, "program": "Certificate", "completions": 1923},
    {"year": 2022, "program": "Associate", "completions": 4521}
  ],
  "metadata": {"total_count": 12847}
}

LLM notices the problem: The sample shows certificate and associate programs. The user probably meant bachelor's degrees when they said "computer science programs." The LLM refines the query to filter for award level = "Bachelor's".

Server returns refined dual response:

{
  "results": [
    {"year": 2023, "institution": "MIT", "completions": 1087},
    {"year": 2023, "institution": "Stanford", "completions": 892},
    {"year": 2022, "institution": "MIT", "completions": 1024}
  ],
  "metadata": {"total_count": 23451}
}

LLM validates: Data looks right - bachelor's degrees, reasonable institutions. Returns the resource link to the user.

This iteration cycle happened with ~30 sample rows total in context. Without dual response, those two query attempts would have pushed 36,000+ rows through the context window.

Sampling Strategy

The sample needs to be representative, not random. If the user asks for results ordered by enrollment descending, your sample should be the top N rows, not a random slice. The LLM needs to see what the user will actually see.

The server should apply the full query specification - all filters, joins, ordering - then limit to sample size. Think of it as:

-- Full query the user requested
SELECT i.name, c.year, SUM(c.completions) as total
FROM completions c
JOIN institutions i ON c.unit_id = i.unit_id
WHERE c.cip_code LIKE '11.07%' AND c.award_level = 4
GROUP BY i.name, c.year
ORDER BY total DESC

-- Server applies sample limit
LIMIT 15

The sample shows the LLM exactly what shape the final results will have.

Out-of-Band Retrieval

The resource link points to data the host application fetches directly - it never goes through the LLM's context. This is where MCP and REST work together.

MCP handles the conversation: tool discovery, query generation, sample validation. REST handles the bulk data transfer.

Why REST?

MCP is optimized for the LLM ↔ tool interaction pattern: request-response, JSON-RPC, relatively small payloads. It's not designed for streaming gigabytes of query results with pagination, sorting, and filtering.

REST is. Your host application already knows how to make HTTP requests, handle pagination, render tables. We're not inventing new protocol features - we're using the right tool for each job.

Endpoint Design

Keep it simple. Two endpoints handle most needs:

GET /resources/{id} - Metadata only

{
  "total_count": 847,
  "columns": [...],
  "executed_at": "2025-01-15T10:30:00Z",
  "expires_at": "2025-01-15T10:45:00Z",
  "status": "ready"
}

POST /resources/{id} - Paginated data retrieval

// Request
{
  "offset": 0,
  "limit": 100,
  "sort": {"field": "enrollment", "order": "desc"}
}

// Response
{
  "data": [...],
  "total_count": 847,
  "returned_count": 100,
  "offset": 0,
  "pagination": {
    "has_next": true,
    "next_offset": 100
  }
}

Using POST for data retrieval lets clients specify complex parameters without URL length limits. The sort parameter here is client-side re-sorting of the already-filtered results - it doesn't re-execute the query.

Discovery

How does the client know where to find the REST endpoints? MCP's capability negotiation handles this during initialization.

When your server starts up and a client connects, the initialization handshake includes capability declarations:

{
  "protocolVersion": "2025-11-25",
  "capabilities": {
    "tools": {
      "listChanged": true
    },
    "resources": {
      "subscribe": false,
      "listChanged": true
    }
  },
  "serverInfo": {
    "name": "ipeds-reporting",
    "version": "1.0.0"
  }
}

For dual response, we extend this with custom metadata that tells clients where to find the REST layer:

{
  "capabilities": {
    "tools": {},
    "experimental": {
      "dualResponse": {
        "enabled": true,
        "baseUrl": "https://api.example.com/resources",
        "defaultExpiration": 900
      }
    }
  }
}

The experimental namespace is intentional - MCP encourages using it for capabilities not yet in the core spec. If this pattern gains adoption, it could become a formal MCP extension.

The client constructs the full URL by appending the resource ID from the resource_link URI to the base URL. Resource URI resource://query/abc123 + base URL https://api.example.com/resources = https://api.example.com/resources/abc123. Simple, predictable, no magic.

Resource Lifecycle

Here's something easy to miss: LLMs are speculative. They explore. They try queries, look at samples, refine, try again. A single user request might trigger a dozen query tool invocations as the LLM iterates toward the right answer.

Each invocation creates a resource. Without lifecycle management, you're accumulating query results indefinitely.

Resource States

A resource moves through predictable states:

created → ready → [accessed] → expired → deleted
                ↘ pinned → [accessed indefinitely]

The metadata endpoint reflects current state:

{
  "status": "ready",
  "total_count": 847,
  "expires_at": "2025-01-15T10:45:00Z",
  "access_count": 3,
  "last_accessed": "2025-01-15T10:32:00Z"
}

For failed queries, the status tells clients what happened:

{
  "status": "failed",
  "error": {
    "code": "QUERY_TIMEOUT",
    "message": "Query exceeded 30 second timeout"
  }
}

Expiration by Default

Every resource gets an expiration timestamp. The default should be aggressive - 15 minutes works for most interactive use cases. This handles the 90% case: exploratory queries the user never explicitly saves.

{
  "metadata": {
    "expires_at": "2025-01-15T10:45:00Z"
  }
}

Your server runs a cleanup job that purges expired resources. Simple, automatic, prevents unbounded growth.

Consider extending expiration on access - if a user is actively paginating through results, you don't want the resource to expire mid-session. A simple approach: reset the expiration clock on each data retrieval request.

Pinning for Persistence

When users want to keep a query - for a dashboard, a recurring report, a shared artifact - they pin it:

PUT /resources/{id}

This removes the expiration timestamp. The resource persists until explicitly deleted:

DELETE /resources/{id}

The LLM can tell users about this: "I've found 847 institutions matching your criteria. The results are available for the next 15 minutes, or you can save them for later."

Storage Strategies

You have two options for what gets stored:

Store query results: Faster retrieval, consistent snapshots, but storage grows with result size. Good for datasets where freshness isn't critical.

Store query definitions: Re-execute on retrieval, always fresh data, minimal storage. Good for frequently-changing data where users expect current values.

The right choice depends on your data characteristics. For IPEDS - annual survey data that doesn't change frequently - storing results makes sense. For real-time metrics dashboards, store the query and re-execute.

Connection to MCP Tasks

The 2025-11-25 MCP spec introduced Tasks - a new primitive for tracking long-running work. Tasks have lifecycle states (working, completed, failed, cancelled) and can be polled for status. As of this writing, Tasks are marked as an experimental capability, meaning they're part of the spec but still being refined based on production feedback.

For most reporting queries that execute in milliseconds, you don't need Tasks. But for expensive analytical queries - the kind that aggregate millions of rows or join across multiple data sources - Tasks provide a natural fit:

  1. Tool invocation returns immediately with a Task ID
  2. Client polls task status
  3. When complete, task result includes the dual response structure
  4. Resource lifecycle proceeds as normal

This is an optional enhancement. The core dual response pattern works without Tasks for synchronous queries.

Prompt Integration

The LLM needs to understand how to work with dual responses. This goes in your system prompt:

When you invoke query tools, you'll receive:
- A sample of results (typically 10-15 rows)  
- A total_count indicating full result set size
- A resource link for complete data access

Decision logic:
- If total_count ≤ sample size: Answer directly from results
- If total_count > sample size: Provide the resource link for full access
- Always validate sample data before returning results to user

When presenting results:
- Summarize what you found: "Found 847 institutions matching your criteria"
- Reference the resource link for users who need complete data
- Note any limitations or refinements needed

The LLM learns to check sample data for issues (wrong columns, unexpected nulls, mixed data types) before committing to a response. This catches query errors before they reach the user.

Works for Both Tool Styles

Remember from Part 1: there are two ways to expose data to LLMs.

API-style tools have structured parameters:

search_institutions({ state: "NJ", control: "public", program_cip: "11.07" })

SQL-style tools accept raw queries:

query("SELECT * FROM institutions WHERE state = 'NJ' AND control = 1")

Dual response works identically for both. The server receives either structured parameters or SQL text, executes the query, and returns the same dual response structure. The only difference is what gets stored for later retrieval:

When the client fetches via the resource link, the server re-executes from stored parameters or SQL. Same result, different input format.

Multi-Tenant Considerations

If your data has access controls - and most enterprise data does - the dual response pattern needs to respect them consistently.

The authentication context from the original tool invocation must carry through to REST retrieval. The simplest approach: OAuth tokens that work for both MCP and REST endpoints, with tenant filtering injected at query execution time.

async function executeQuery(query: string, auth: AuthContext) {
  // Inject tenant filter regardless of query source
  const filtered = injectTenantFilter(query, auth.tenantId);
  return db.execute(filtered);
}

This applies to both the sample generation (during tool invocation) and full retrieval (during REST access). The user should see exactly the same rows through both paths.

Token Binding

Here's a subtle security issue: what prevents User A from taking a resource link returned to User B and accessing their data?

The answer is token binding. When a resource is created, associate it with the authenticated user or tenant:

const resource = {
  id: generateId(),
  query: query,
  tenantId: auth.tenantId,
  userId: auth.userId,
  createdAt: new Date()
};

On retrieval, validate that the requesting user matches:

async function getResourceData(resourceId: string, auth: AuthContext) {
  const resource = await storage.get(resourceId);
  
  if (resource.tenantId !== auth.tenantId) {
    throw new ForbiddenError("Access denied");
  }
  
  return executeQuery(resource.query, auth);
}

This ensures that even if resource IDs leak, unauthorized users can't access the data.

Authorization with MCP

The 2025-11-25 MCP spec significantly improved authorization support. MCP servers are now classified as OAuth Resource Servers, and clients are required to use Resource Indicators (RFC 8707) to bind tokens to specific servers.

For dual response, this means:

  1. The MCP tool invocation is protected by OAuth
  2. The REST endpoint shares the same authorization server
  3. Tokens are scoped to your specific server, not reusable elsewhere

If you're building for enterprise, look at the new Client ID Metadata Documents flow - it solves the "how do I register with every MCP server" problem without requiring Dynamic Client Registration.

For public datasets like IPEDS, multi-tenancy is less critical - but the pattern matters when you're building for enterprise deployment.

Error Handling

Queries fail. Connections drop. Data sources go down. The dual response pattern needs to handle errors gracefully at multiple levels.

Tool-Level Errors

When query execution fails during tool invocation, use MCP's standard error response with isError: true:

{
  "content": [
    {
      "type": "text",
      "text": "Query failed: Unable to connect to database. Please try again."
    }
  ],
  "isError": true
}

The LLM sees this and can respond appropriately - maybe retry, maybe ask the user to wait, maybe try a simpler query.

Resource-Level Errors

Errors during out-of-band retrieval are standard HTTP:

The metadata endpoint should reflect error states for resources that failed during creation:

{
  "status": "failed",
  "error": {
    "code": "QUERY_TIMEOUT",
    "message": "Query exceeded 30 second timeout",
    "details": {
      "query_time_ms": 30000,
      "estimated_rows": 5000000
    }
  }
}

Validation Errors

For SQL-style tools where the LLM writes queries directly, validation errors are common:

{
  "content": [
    {
      "type": "text", 
      "text": "Query validation failed: Unknown column 'enrollmnt' in field list. Did you mean 'enrollment'?"
    }
  ],
  "isError": true,
  "structuredContent": {
    "error": {
      "type": "VALIDATION_ERROR",
      "column": "enrollmnt",
      "suggestions": ["enrollment"]
    }
  }
}

The structured error content helps the LLM understand what went wrong and potentially fix it automatically.

Putting It Together

Let's trace a complete flow:

  1. User asks: "Show me CS programs at research universities"

  2. LLM invokes tool: search_programs({ cip: "11.07", carnegie: "Doctoral" })

  3. Server executes query, returns dual response:

    • Sample: 15 rows
    • Resource URI: resource://query/abc123
    • Metadata: total_count: 847, expires_at: +15min
  4. LLM validates sample: Columns look right, data makes sense

  5. LLM responds to user: "Found 847 computer science programs at doctoral research universities. [View Full Results]"

  6. Host application: Extracts resource URI, fetches from REST endpoint, renders sortable table

  7. 15 minutes later: Server cleans up expired resource (unless user pinned it)

The LLM saw ~1,000 tokens of tool output. The user got access to 847 complete records. The server didn't accumulate unbounded query results.

That's the pattern.


In Part 3, I'll walk through a complete Node.js implementation - the MCP server, the REST endpoints, the lifecycle management. We'll use the IPEDS dataset as our example and test it with OpenAI's SDK. All the code is available at github.com/freezer333/mcp-rlp.

For the formal specification of these patterns, I've written them up in a paper presented at ACDSA 2026.


Previous: Part 1 - Why Your LLM Can't Be Your Reporting Layer

Next: Part 3 - Building a Dual Response MCP Server