EngineeringApril 29, 2026·4 min read

Agent toolkit for DOCX documents

An overview of docx-editor-agents: a toolkit for letting AI agents read, comment on, and edit .docx documents. Works live in the browser, headless on the server, or over MCP.

@eigenpal/docx-editor-agents is the integration layer between AI agents and .docx documents. Fourteen tools (read, find, comment, suggest, format, scroll) in OpenAI function-calling format, anchored against the stable Word paraId so coordinates survive multi-step loops. One catalog, three transports: live inside a React <DocxEditor>, headless against a parsed buffer in Node, or exposed over the Model Context Protocol.

The rest of this post walks through the deployments developers are building on top of it, the addressing model that makes multi-step loops reliable, and the framework adapters that ship in the box.

What developers can build with it

The integration surface is the same regardless of agent role; what differs is which tools the deployment exposes and what the system prompt tells the agent to do.

  • Contract review — read + add_comment + suggest_change. The agent leaves a tracked redline; an attorney accepts or rejects it in Word, Google Docs, or the embedded editor.
  • Compliance scans — read-only (include: ['read_document', 'find_text', 'read_comments']). The agent flags PII, policy violations, or missing clauses as comments anchored to the offending phrase.
  • Document copilots — full catalog. A chat panel next to the user's draft, scoped via include / exclude to whatever subset of the surface fits the product.
  • Word add-in alternative — for developers already on Office.js. The toolkit mirrors Range.insertComment, comment.reply, body.search, range.scrollIntoView on the web; most call sites port across without rewriting.

The tools

Fourteen of them. Exported as both raw definitions and ready-to-use schemas in OpenAI function-calling format. Anthropic tool use, the Vercel AI SDK, and anything else that takes OpenAI-shape tools accept the schemas as-is.

GroupToolsPurpose
Locateread_document, read_selection, read_page, read_pages, find_text, read_comments, read_changesReturn paragraphs tagged with paraId.
Mutateadd_comment, suggest_change, apply_formatting, set_paragraph_style, reply_comment, resolve_commentTake a paraId and an optional search phrase.
NavigatescrollLive transport only.

Locate tools return paragraphs keyed by their stable Word paraId; mutate tools take that paraId. Coordinates never go stale across a multi-step loop, and the read view, the search view, and the anchor view all agree on what the document looks like.

Building a custom agent

A custom agent picks the subset of the catalog the deployment needs, adds whatever domain-specific tools it owns, and writes the system prompt that drives the loop. The wiring is a single React hook plus a streaming chat client.

The Roastmaster demo embedded below is a worked example. It is a brutally honest editor that reads the document, picks the worst three to five offenders, and leaves a witty comment anchored to each. Read + comment only — no destructive tools, no edits.

Step 1: scope the tools. Roastmaster never edits text, so useDocxAgentTools is given an include allow-list that drops every mutate tool other than add_comment:

import { useDocxAgentTools } from "@eigenpal/docx-editor-agents/react";
 
const { tools, executeToolCall, getContext } = useDocxAgentTools({
  editorRef,
  author: "Roastmaster",
  include: [
    "read_document", "read_selection", "find_text",
    "read_comments", "read_changes", "scroll",
    "add_comment", "reply_comment", "resolve_comment",
  ],
});

The same include mechanism turns the toolkit into a compliance scanner (read-only), a copilot (full catalog), or a redliner (read + add_comment + suggest_change). Custom tools merge in the same way — pass them under tools: and they appear in the catalog alongside the built-ins.

Step 2: write the system prompt. The prompt is what makes one agent different from another. Roastmaster's prompt is opinionated and short — it caps the agent at five comments per turn, tells it to anchor each to a unique phrase from the paragraph, and gives it a tone (witty, specific, never mean). The full prompt is in src/app/api/agent-chat/route.ts on the docs-site repo.

Step 3: run the loop. useChat from @ai-sdk/react handles the streaming, and executeToolCall from the hook runs each tool call client-side against the live editor. The server route just emits schemas — it never sees the document.

const chat = useChat({
  transport: new DefaultChatTransport({
    api: "/api/agent-chat",
    prepareSendMessagesRequest: ({ messages }) => ({
      body: { messages, context: getContext() },
    }),
  }),
  sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls,
  onToolCall: ({ toolCall }) => {
    const result = executeToolCall(toolCall.toolName, toolCall.input);
    chatRef.current?.addToolResult({
      tool: toolCall.toolName,
      toolCallId: toolCall.toolCallId,
      output: result.success ? String(result.data) : result.error ?? "",
    });
  },
});

getContext() returns the user's current selection and page; piping it through prepareSendMessagesRequest lets the system prompt know what the user is looking at without an extra tool round-trip. sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls keeps the loop running until the model writes a non-tool reply.

The full source for the example below — including the matching server route — is on the Live editor docs page.

"use client";
 
import { useMemo, useRef, useState } from "react";
import { DocxEditor, AgentChatLog, AgentComposer } from "@eigenpal/docx-js-editor";
import { useDocxAgentTools, getToolDisplayName } from "@eigenpal/docx-editor-agents/react";
import { toAgentMessages } from "@eigenpal/docx-editor-agents/ai-sdk/react";
import { useChat } from "@ai-sdk/react";
import { DefaultChatTransport, lastAssistantMessageIsCompleteWithToolCalls } from "ai";
 
export function EditorWithAgent({ buffer }: { buffer: ArrayBuffer }) {
  const editorRef = useRef(null);
  const { executeToolCall, getContext } = useDocxAgentTools({ editorRef, author: "Agent" });
  const chatRef = useRef(null);
 
  const chat = useChat({
    transport: new DefaultChatTransport({
      api: "/api/chat",
      prepareSendMessagesRequest: ({ messages }) => ({
        body: { messages, context: getContext() },
      }),
    }),
    sendAutomaticallyWhen: lastAssistantMessageIsCompleteWithToolCalls,
    onToolCall: ({ toolCall }) => {
      const result = executeToolCall(toolCall.toolName, toolCall.input);
      void chatRef.current?.addToolResult({
        tool: toolCall.toolName,
        toolCallId: toolCall.toolCallId,
        output: result.success ? String(result.data) : (result.error ?? ""),
      });
    },
  });
  chatRef.current = chat;
 
  const messages = useMemo(() => toAgentMessages(chat.messages, chat.status), [chat]);
  const [input, setInput] = useState("");
 
  return (
    <DocxEditor
      ref={editorRef}
      documentBuffer={buffer}
      agentPanel={{
        title: "Agent",
        render: () => (
          <>
            <AgentChatLog messages={messages} humanizeToolName={getToolDisplayName} />
            <AgentComposer
              value={input}
              onChange={setInput}
              onSubmit={() => {
                chat.sendMessage({ text: input });
                setInput("");
              }}
            />
          </>
        ),
      }}
    />
  );
}

The matching server route uses getAiSdkTools() from /ai-sdk/server and Vercel AI SDK's streamText({ tools }). Full source on the Live editor docs page.

Try it

The demo below is the same component, hosted inside this docs site. The agent is a brutally honest editor named Roastmaster (read + comment only, no destructive tools). Pick a suggestion or type a prompt to see the agent operate on the sample document.

Scope

  • Tracked-change acceptance is human-only. No accept_change / reject_change tools — the human keeps the final commit on every revision.
  • Formatting verbs. apply_formatting covers bold, italic, underline, strike, color, highlight, font size, and font family. Paragraph-level mutators (alignment, spacing) are not yet wired through the toolkit.

Get started

The source is on GitHub. Issues and pull requests are welcome.

Related reading: Track Changes in a React DOCX Editor and Real-time Collaboration in docx-js-editor.