AI DOCX editing with the Vercel AI SDK

Tutorial: wire an LLM to a DOCX editor with the Vercel AI SDK. The AI reads paragraphs, comments, and suggests tracked changes live in the browser.

What you'll build

A <DocxEditor> with an assistant panel on the right. The user asks a question, the model reads document text through tools, adds comments or tracked changes in the editor, and finishes with a text reply. The package includes chat components, a tool-call timeline, and the panel shell; you write one API route and one page.

This is the same wiring as examples/agent-chat-demo/ in the repo, which you can run locally.

How it works

Three facts drive the wiring:

Tools execute client-side, in the browser. The server route declares the tool schemas but no execute functions. The AI SDK forwards each tool call to the client, where useDocxAgentTools runs it against the live editor.
The document never leaves the browser. Your /api/chat route sees chat messages and tool-call text (tool names, arguments, results). It never receives the DOCX file. read_document runs locally and its output goes up as a tool result, so the model sees document text only when it asks for it.
Anchors survive concurrent edits. Locate tools return paragraphs tagged with paraId (Word's w14:paraId). A ParaIdAllocatorExtension in the editor assigns fresh ids on Enter, paste, and split, so the user typing mid-conversation does not desync the agent's anchors.

One tool round: the client sends the user message, the model streams a tool call, the client executes it against the editor, and the tool result is sent back to the model. Repeat until the model writes text instead of another tool call.

Setup

Install the editor, the agent toolkit, and the AI SDK:

npm install @docx-editor.dev/react @docx-editor.dev/agents \
            ai @ai-sdk/react @ai-sdk/openai

Set OPENAI_API_KEY in your environment (or swap @ai-sdk/openai for any AI SDK provider).

The server route

// app/api/chat/route.ts
import { streamText, convertToModelMessages, stepCountIs, type UIMessage } from 'ai';
import { openai } from '@ai-sdk/openai';
import { type AgentContextSnapshot } from '@docx-editor.dev/agents/server';
import { getAiSdkTools } from '@docx-editor.dev/agents/ai-sdk/server';

// No `execute` on these tools: the AI SDK forwards every call to the
// client's useChat({ onToolCall }), which runs it against the live editor.
const tools = getAiSdkTools();

export async function POST(req: Request) {
  const { messages, context } = (await req.json()) as {
    messages: UIMessage[];
    context?: AgentContextSnapshot;
  };

  const result = streamText({
    model: openai('gpt-4o'),
    system:
      'You are a careful document assistant. Locate paragraphs with ' +
      'read_document or find_text before commenting or suggesting changes.' +
      (context?.selection?.paraId
        ? `\nThe user's cursor is in paragraph ${context.selection.paraId}.`
        : ''),
    messages: await convertToModelMessages(messages),
    tools,
    // AI SDK stops after a single step by default. Without stopWhen the
    // model never reads its own tool results and never writes a final reply.
    stopWhen: stepCountIs(12),
  });

  return result.toUIMessageStreamResponse();
}

stopWhen is not optional

streamText defaults to one step. The model calls read_document, the step ends, and the user never gets an answer. stopWhen: stepCountIs(12) lets the model read, comment, and summarize across multiple tool steps while still capping the run.

The context field carries getContext()'s snapshot of the user's selection and page, so the model knows what the user is looking at without spending a tool call on read_selection.

The client

'use client';

import { useMemo, useRef, useState } from 'react';
import dynamic from 'next/dynamic';
import { useChat } from '@ai-sdk/react';
import { DefaultChatTransport, lastAssistantMessageIsCompleteWithToolCalls } from 'ai';
import { type DocxEditorRef } from '@docx-editor.dev/react';
import '@docx-editor.dev/react/styles.css';
import {
  AgentChatLog,
  AgentComposer,
  useDocxAgentTools,
  getToolDisplayName,
  type EditorRefLike,
} from '@docx-editor.dev/agents/react';
import { toAgentMessages } from '@docx-editor.dev/agents/ai-sdk/react';

// Client-only import; see /docs/1.x/installation for the SSR recipe.
const DocxEditor = dynamic(
  () => import('@docx-editor.dev/react').then((m) => ({ default: m.DocxEditor })),
  { ssr: false }
);

export default function Page() {
  const editorRef = useRef<DocxEditorRef>(null);
  const [file, setFile] = useState<File | null>(null);
  const [input, setInput] = useState('');

  // The hook owns the bridge to the live editor: a tool executor plus a
  // context snapshot for the system prompt.
  const { executeToolCall, getContext } = useDocxAgentTools({
    // RefObject is invariant; DocxEditorRef satisfies EditorRefLike, so cast at the boundary.
    editorRef: editorRef as React.RefObject<EditorRefLike | null>,
    author: 'Assistant',
  });

  // `chat` is not defined yet inside onToolCall, so route the tool result
  // back through a ref that is set after useChat returns.
  const chatRef = useRef<{ addToolResult: (args: unknown) => Promise<void> } | null>(null);
  const chat = useChat({
    transport: new DefaultChatTransport({
      api: '/api/chat',
      prepareSendMessagesRequest: ({ messages }) => ({
        body: { messages, context: getContext() },
      }),
    }),
    // Re-send the conversation after each tool result so the model can read
    // its own output and either call another tool or write the final reply.
    sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls,
    onToolCall: ({ toolCall }) => {
      const result = executeToolCall(
        toolCall.toolName,
        (toolCall.input ?? {}) as Record<string, unknown>
      );
      const output =
        typeof result.data === 'string'
          ? result.data
          : (result.error ?? JSON.stringify(result.data));
      void chatRef.current?.addToolResult({
        tool: toolCall.toolName,
        toolCallId: toolCall.toolCallId,
        output,
      });
    },
  });
  // useChat's return type is wider than the minimal addToolResult shape the ref declares.
  chatRef.current = chat as unknown as typeof chatRef.current;

  const messages = useMemo(
    () => toAgentMessages(chat.messages, chat.status),
    [chat.messages, chat.status]
  );
  const loading = chat.status === 'streaming' || chat.status === 'submitted';

  return (
    <div style={{ height: '100vh', display: 'flex', flexDirection: 'column' }}>
      <div style={{ padding: 8 }}>
        <input
          type="file"
          accept=".docx"
          onChange={(event) => setFile(event.target.files?.[0] ?? null)}
        />
      </div>
      <DocxEditor
        ref={editorRef}
        documentBuffer={file}
        agentPanel={{
          title: 'Assistant',
          render: () => (
            <>
              <AgentChatLog
                messages={messages}
                loading={loading}
                error={chat.error?.message}
                humanizeToolName={getToolDisplayName}
              />
              <AgentComposer
                value={input}
                onChange={setInput}
                onSubmit={() => {
                  if (!input.trim() || loading) return;
                  chat.sendMessage({ text: input });
                  setInput('');
                }}
                disabled={loading}
              />
            </>
          ),
        }}
      />
    </div>
  );
}

Notes on this code:

agentPanel mounts the right-hand panel and adds a toggle button to the toolbar. The render prop is yours; AgentChatLog and AgentComposer are optional conveniences, not requirements.
toAgentMessages(chat.messages, chat.status) converts the AI SDK's UIMessage[] into the flat AgentMessage[] shape <AgentChatLog> renders, including the collapsible tool-call timeline.
sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls is the second half of the loop. addToolResult alone commits the result to history; this option re-sends the conversation so the model keeps going.

Example tool loop

After choosing a .docx, the user types "tighten the wordiest paragraph" and hits send.

Client → server. useChat POSTs the messages plus context: getContext() (cursor in paragraph 4A1F3B, page 2 of 5).
Model → client. The model streams a read_document call. onToolCall fires; executeToolCall('read_document', {}) runs in the browser and returns the document as [paraId] text lines. The timeline in the panel shows "Reading document" with a spinner, then a check.
Client → server. addToolResult commits the text; sendAutomaticallyWhen re-sends the conversation.
Model → client. The model picks paragraph 7C22E0 and streams suggest_change with { paraId: '7C22E0', search: 'in order to be able to', replaceWith: 'to' }. The tracked change appears in the document immediately, attributed to "Assistant".
Client → server. Result: Replacement proposed: "in order to be able to" → "to" on 7C22E0. The conversation re-sends.
Model → client. No more tool calls. The model streams its final text reply ("I suggested one change; accept it from the change card"). stopWhen never triggered because the model stopped on its own at step 3 of 12.

The user accepts or rejects the change in the editor UI. The agent cannot do that part; accept and reject are deliberately not tools.

Controlling what the agent can do

useDocxAgentTools takes allow and block lists for the built-in tools:

// Read-only agent: can look, cannot touch.
useDocxAgentTools({
  editorRef,
  include: ['read_document', 'read_selection', 'find_text', 'read_comments', 'read_changes'],
});

// Comment-only reviewer: reads plus comments, no text edits.
useDocxAgentTools({
  editorRef,
  include: ['read_document', 'find_text', 'add_comment', 'reply_comment'],
});

// Everything except direct edits.
useDocxAgentTools({
  editorRef,
  exclude: ['suggest_change', 'apply_formatting', 'set_paragraph_style'],
});

executeToolCall enforces the filter at execution time. If the model calls a filtered tool, it gets Tool 'x' is not enabled. back. Filter the server side too, so the model does not receive schemas for tools it cannot use:

const REVIEW_TOOLS = ['read_document', 'find_text', 'add_comment'];
const all = getAiSdkTools();
const tools = Object.fromEntries(
  Object.entries(all).filter(([name]) => REVIEW_TOOLS.includes(name))
);

Custom tools are added after the built-ins and are not filtered by include or exclude. A custom tool with a built-in's name replaces the built-in:

import type { AgentToolDefinition } from '@docx-editor.dev/agents/react';

const fetchClause: AgentToolDefinition<{ name: string }> = {
  name: 'fetch_clause_template',
  displayName: 'Fetching template',
  description: 'Fetch a clause from the template library by name.',
  inputSchema: { type: 'object', properties: { name: { type: 'string' } }, required: ['name'] },
  handler: (input) => ({ success: true, data: fetchTemplateSync(input.name) }),
};

useDocxAgentTools({ editorRef, tools: { fetch_clause_template: fetchClause } });

Vue

The Vue adapter shares the same bridge contract (EditorRefLike), the same tool catalog, and the same AgentMessage[] chat shape:

import {
  useAgentBridge,
  AgentPanel,
  AgentChatLog,
  AgentComposer,
} from '@docx-editor.dev/agents/vue';
import { toAgentMessages } from '@docx-editor.dev/agents/ai-sdk/vue';

const { executeToolCall, toolSchemas } = useAgentBridge({ editorRef, author: 'Assistant' });

Differences from React, as of 1.4: the agentPanel prop on <DocxEditor> is React-only, so in Vue you mount the AgentPanel component next to the editor yourself; and useAgentBridge exposes executeToolCall plus toolSchemas without the include/exclude filtering of useDocxAgentTools (filter toolSchemas before sending them to your route, and guard tool names in your own dispatch).

Next steps

Live editor bridge reference, the full hook and panel API
AI redlining, restrict the agent to tracked changes and comments
Tools reference, every parameter and return shape
Bring your own agent, the same loop without the AI SDK

AI editing