title: How to build a PDF Autofiller Agent?
tags: [agent, pdf]
categories: [agent]
date: [2026-01-15 17:15:00]
index_img: /img/agent.png
cover: /img/agent.png
thumbnail: /img/agent.png
excerpt: Notes

How to build a PDF Autofiller Agent?

Requirements:

Design a Copilot Chatbox to provide such functionality: user uploads a PDF file with fields to fill in, and give Chatbox certain commands to fill out some fields. AI will use this command and identify the fields and values to fill, then fill those fields with the values that user specifies and return the form to users.

Existing Tools

Tool Printed Select Printed Edit Scanned Select Scanned Edit Comments
Adobe Acrobat PDF Needs Pro subscription to edit
ABBYY Finereader PDF Can’t install on Mac
PDFfiller
LuminPDF Need s Pro subscription to edit

Workflow Overview

1
2
3
4
5
6
7
8
9
10
11
12
PDF form Template

PDF form parsing

PDF manifest Generation

LLM API

PDF filling engine

Generate filled PDF

Tech Stack (Client-side, Serverless)

PDF Form Parsing

Input: PDF raw data

1
2
3
4
5
6
7
8
9
10
11
12 0 obj
<<
/Type /Annot
/Subtype /Widget
/FT /Tx % Field Type: Text
/T (Age_Field) % Field Name (Key)
/V ( ) % Value: Empty
/Rect [100 100 200 120] % Position on page
/AP << /N 13 0 R >> % Appearance Stream (how it looks)
>>
endobj

Since inputs and labels are not connected data structure-wise, i.e., they are not linked in the source code, unlike HTML where labels and inputs might be linked by id. The only way to identify related labels and inputs is to compare the coordinates.

Web Browser

pdf.js: parse raw PDF in browser.

Output: return a map between each object (input or label) and its coordinates.

1
2
3
4
5
6
7
8
9
10
11
{
id: annot.fieldName || "unknown_id", // The internal ID (e.g., "txt_01")
type: annot.subtype, // "Widget" (usually)
inputType: annot.fieldType, // "Tx" (Text), "Btn" (Button/Checkbox)
rect: {
x: Math.round(x),
y: Math.round(y),
width: Math.round(xMax - x),
height: Math.round(yMax - y)
}
}

PDF Manifest Generation

Given the coordinates of each object, find the matching ones. Especially, for all the input fields, find the matching label. Return the relationship as a JSON.

Using for-loops to calculate the Euclidean Distance between each coordinate pair can work.

LLM Agent

use Vercel AI SDK to orchestrate the “Reasoning-Action” loop. The LLM does not modify the file directly; it acts as a router to decide which client-side tool to call.

  • Framework: Next.js App Router + ai (Vercel AI SDK).
  • Tool Calling: Define a tool schema (using Zod) that describes the form fields. The LLM outputs structured JSON matching this schema instead of plain text.
  • Client-Side Execution: Use the useChat hook to intercept the LLM’s tool call. When the LLM requests fill_fields, the browser executes the JavaScript logic to update the PDF.

PDF Filling Engine

  • Library: pdf-lib (Client-side JavaScript).
  • Logic:
    1. Load the PDF Uint8Array in memory.
    2. Locate fields using the IDs provided by the LLM tool call.
    3. Execute Write:
      • form.getTextField(id).setText(value)
      • form.getCheckBox(id).check()
    4. Update Appearance: Run form.updateFieldAppearances() to ensure text is rendered visibly (generating the /AP stream).
    5. Output: Generate a new Blob for user download.

Tech Stack (Server-side)

PDF Form Parsing

Input: PDF raw data (Bytes) Similar to the JS version, inputs (Widgets) and visual labels (Text) are disconnected in the PDF structure. We need to extract them separately.

Tool: PyMuPDF (import fitz)

  • Why: Faster and more accurate coordinate extraction than other Python libraries.

Output: A map between each object and its coordinates.

Python

1
2
3
4
5
6
# Extracted using page.widgets() and page.get_text("words")
{
"id": widget.field_name, # Internal ID (e.g., "txt_01")
"type": widget.field_type, # Text, Checkbox, etc.
"rect": [x0, y0, x1, y1] # Bounding box coordinates
}

PDF Manifest Generation

Logic: Spatial Matching (Euclidean Distance). Given the coordinates of widgets and text blocks, find the matching pair.

  • Algorithm: For each input field, calculate distance to all text blocks. Find the text that is closest (Top/Left priority) to the input field.
  • Result: A clean JSON list linking field_id to label_text (e.g., {"id": "t1", "label": "Date of Birth"}).

LLM Agent

Tool: LangChain + Pydantic Use Pydantic to define the strict schema for the LLM output (Structured Output), replacing the need for raw prompt parsing.

Workflow:

  1. Context Injection: Inject the PDF Manifest JSON directly into the System Prompt.
  2. Reasoning: LLM maps User Command -> Field IDs.
  3. Output: LLM returns a Pydantic object (JSON) containing the fill plan.

Python

1
2
3
4
5
6
class FieldUpdate(BaseModel):
field_id: str
value: str

# LLM is forced to return this structure
structured_llm = chat_model.with_structured_output(FieldUpdate)

PDF Filling Engine

Tool: pypdf

  • Why: Robust support for writing AcroForms and updating appearance streams.

Action:

  1. Load PDF bytes using PdfReader.

  2. Map the LLM’s Pydantic output to a dictionary: { "field_id": "value" }.

  3. Execute filling:

    Python

    1
    2
    3
    4
    5
    writer.update_page_form_field_values(
    writer.pages[0],
    fields_dict,
    auto_regenerate=True # Crucial for visible text (/AP Stream)
    )
  4. Return the BytesIO stream to the user.

Author

John Doe

Posted on

2026-01-13

Updated on

2026-02-02

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.