Skip to main content
userTourKit
@tour-kit/ai

CAG Guide

Context-Augmented Generation guide: inject tour documentation directly into AI prompts for fast, deterministic responses

domidex01Published

Context-Augmented Generation (CAG) is the simplest way to make an AI assistant tour-aware. Every chat request includes the relevant tour documentation in the system prompt, so the model can answer accurately without a vector database, embedding pipeline, or retrieval step. This page covers when CAG is the right call, how to wire it up, the failure modes, and when to migrate to RAG.

CAG vs RAG — the decision in one table

SignalPick CAGPick RAG
Total documentation sizeUnder ~50 KB / ~12k tokensOver 50 KB
Document count< 20 docs / tour steps> 20 docs
Update cadenceInfrequent (every release)Frequent (daily content updates)
Infrastructure budgetZero — no DB, no embeddingsOK with vector store + embedding job
Cost sensitivityOK with 3–8k tokens per requestNeed to cap at retrieved-chunk size
DeterminismImportant — same context every timeOK with retrieval variance

Rule of thumb: if your entire docs corpus fits inside the model's context window (gpt-4o-mini = 128k, Claude Sonnet = 200k) AND total tokens stay under your per-request cost ceiling, CAG is the right call. Migrate to RAG when either constraint breaks.

When to use CAG

  • A focused onboarding flow under ~20 steps
  • A single product with stable documentation
  • Quick prototypes — you can ship CAG in 15 minutes and migrate later
  • Strict determinism requirements (legal review wants the same context every request)
  • Air-gapped environments where you can't run a vector DB

When NOT to use CAG

  • Documentation > 50 KB (token cost dominates per-request bill)
  • Multi-product or multi-tenant setups (sending all products' docs leaks context across tenants)
  • Frequently-changing knowledge bases — CAG redeploys whenever docs change
  • Context-window limits matter for response quality — the model has less room for reasoning when half the window is stuffed

Setup

Client configuration

Wrap your tree with AiChatProvider and enable tourContext so the current tour state is forwarded with every request:

import { AiChatProvider } from '@tour-kit/ai'

function App() {
  return (
    <AiChatProvider
      config={{
        endpoint: '/api/chat',
        tourContext: true,
      }}
    >
      <YourApp />
    </AiChatProvider>
  )
}

Server configuration

Use createChatRouteHandler with strategy: 'context-stuffing'. Provide your documents inline — they are injected into the system prompt on every request:

// app/api/chat/route.ts
import { createChatRouteHandler } from '@tour-kit/ai/server'
import { openai } from '@ai-sdk/openai'

const { POST } = createChatRouteHandler({
  model: openai('gpt-4o-mini'),
  context: {
    strategy: 'context-stuffing',
    documents: [
      {
        id: 'onboarding-overview',
        content:
          'The Welcome tour introduces three concepts: workspaces, projects, and the activity feed. ' +
          'Workspaces are top-level containers shared with your team. Projects live inside a workspace. ' +
          'The activity feed shows realtime updates from every project you have access to.',
      },
      {
        id: 'billing-faq',
        content:
          'Plans: Free (1 workspace, 3 projects), Pro ($12/user/mo, unlimited), Enterprise (contact sales). ' +
          'Upgrading is instant; downgrading takes effect at the next billing cycle.',
      },
    ],
  },
  instructions: {
    productName: 'Acme App',
    tone: 'friendly',
    boundaries: [
      'Only answer questions about Acme App onboarding and billing.',
      'If asked about competitors, decline politely.',
      'Never invent feature names. If unsure, suggest contacting support.',
    ],
  },
})

export { POST }

How CAG works under the hood

   User asks: "What is a workspace?"


   ┌──────────────────────────────────┐
   │ AiChatProvider (client)          │
   │  · collects tour state           │
   │  · POSTs {messages, tourContext} │
   └──────────────────────────────────┘


   ┌──────────────────────────────────┐
   │ createChatRouteHandler (server)  │
   │  1. Build system prompt:         │
   │     instructions + ALL docs[]    │
   │     + tourContext                │
   │  2. Forward to model             │
   │  3. Stream response back         │
   └──────────────────────────────────┘


   Model sees the full corpus in its context window
   and answers with grounded responses.

The server rebuilds the same system prompt on every request. There is no caching, no retrieval, no chunking — the trade-off for simplicity is that every request pays for every document's tokens.

Token-cost math

Estimating a CAG bill is straightforward:

per_request_tokens = system_prompt_tokens
                   + sum(all documents)
                   + user_message_tokens
                   + model_response_tokens

monthly_cost ≈ per_request_tokens * requests_per_month * price_per_token

Worked example with 4 KB of docs (~1k tokens), a 0.5k system prompt, 100-token user messages, 300-token responses, gpt-4o-mini at $0.15/1M input + $0.60/1M output:

  • Per request: 1,600 input + 300 output = ~$0.00042
  • 10,000 requests/month = ~$4.20/month

Now scale to 20 KB of docs (~5k tokens), same traffic:

  • Per request: 5,600 input + 300 output = ~$0.00102
  • 10,000 requests/month = ~$10.20/month

That linear scaling is the headline reason to migrate to RAG once your corpus gets large — RAG only sends retrieved chunks (typically 1–3 KB) regardless of total docs.

Common failure modes

  • Context drift. When you update a document, redeploy the server route. CAG has no live data source — the docs are baked into the deployed bundle.
  • Token budget exceeded. Models silently drop earlier context past their window. Check the model's max input tokens against sum(documents) + system_prompt + conversation_history.
  • Cross-tenant leakage. If your route is shared across customers, each request sees every customer's documents. Either scope documents per request (pull from a database in the route) or use RAG.
  • Stale answers after a release. A user on an old client may still ask about removed features. Add a version line to your documents array or the instructions.boundaries.

Migration path to RAG

When you outgrow CAG, the migration is mostly server-side:

  1. Set up a vector store (Pinecone, pgvector, Chroma — see RAG Guide)
  2. Embed your existing documents[] and load them into the store
  3. Swap strategy: 'context-stuffing'strategy: 'rag' in createChatRouteHandler
  4. Provide the retriever / store reference instead of inline documents

The client code (AiChatProvider, useAiChat, your chat UI) does not change.

Next steps

Free & open source

Ship onboarding, not config.

npm i @tour-kit/core is MIT and free. The Pro packages work unlicensed too — a one-time $99 license removes the production watermark when you ship.

MIT-licensed — no signup, no credit card. Pay once, only when you ship.