title: How to build a PDF Autofiller Agent?
tags: [agent, pdf]
categories: [agent]
date: [2026-01-15 17:15:00]
index_img: /img/agent.png
cover: /img/agent.png
thumbnail: /img/agent.png
excerpt: Notes
How to build a PDF Autofiller Agent?
Requirements:
Design a Copilot Chatbox to provide such functionality: user uploads a PDF file with fields to fill in, and give Chatbox certain commands to fill out some fields. AI will use this command and identify the fields and values to fill, then fill those fields with the values that user specifies and return the form to users.
Existing Tools
| Tool | Printed Select | Printed Edit | Scanned Select | Scanned Edit | Comments |
|---|---|---|---|---|---|
| Adobe Acrobat PDF | ✅ | ❓ | ✅ | ❓ | Needs Pro subscription to edit |
| ABBYY Finereader PDF | Can’t install on Mac | ||||
| PDFfiller | ✅ | ✅ | ❌ | ❌ | |
| LuminPDF | ✅ | ❓ | ✅ | ❓ | Need s Pro subscription to edit |
Workflow Overview
1 | PDF form Template |
Tech Stack (Client-side, Serverless)
PDF Form Parsing
Input: PDF raw data
1 | 12 0 obj |
Since inputs and labels are not connected data structure-wise, i.e., they are not linked in the source code, unlike HTML where labels and inputs might be linked by id. The only way to identify related labels and inputs is to compare the coordinates.
Web Browser
pdf.js: parse raw PDF in browser.
Output: return a map between each object (input or label) and its coordinates.
1 | { |
PDF Manifest Generation
Given the coordinates of each object, find the matching ones. Especially, for all the input fields, find the matching label. Return the relationship as a JSON.
Using for-loops to calculate the Euclidean Distance between each coordinate pair can work.
LLM Agent
use Vercel AI SDK to orchestrate the “Reasoning-Action” loop. The LLM does not modify the file directly; it acts as a router to decide which client-side tool to call.
- Framework: Next.js App Router +
ai(Vercel AI SDK). - Tool Calling: Define a tool schema (using Zod) that describes the form fields. The LLM outputs structured JSON matching this schema instead of plain text.
- Client-Side Execution: Use the
useChathook to intercept the LLM’s tool call. When the LLM requestsfill_fields, the browser executes the JavaScript logic to update the PDF.
PDF Filling Engine
- Library:
pdf-lib(Client-side JavaScript). - Logic:
- Load the PDF
Uint8Arrayin memory. - Locate fields using the IDs provided by the LLM tool call.
- Execute Write:
form.getTextField(id).setText(value)form.getCheckBox(id).check()
- Update Appearance: Run
form.updateFieldAppearances()to ensure text is rendered visibly (generating the/APstream). - Output: Generate a new Blob for user download.
- Load the PDF
Tech Stack (Server-side)
PDF Form Parsing
Input: PDF raw data (Bytes) Similar to the JS version, inputs (Widgets) and visual labels (Text) are disconnected in the PDF structure. We need to extract them separately.
Tool: PyMuPDF (import fitz)
- Why: Faster and more accurate coordinate extraction than other Python libraries.
Output: A map between each object and its coordinates.
Python
1 | # Extracted using page.widgets() and page.get_text("words") |
PDF Manifest Generation
Logic: Spatial Matching (Euclidean Distance). Given the coordinates of widgets and text blocks, find the matching pair.
- Algorithm: For each input field, calculate distance to all text blocks. Find the text that is closest (Top/Left priority) to the input field.
- Result: A clean JSON list linking
field_idtolabel_text(e.g.,{"id": "t1", "label": "Date of Birth"}).
LLM Agent
Tool: LangChain + Pydantic Use Pydantic to define the strict schema for the LLM output (Structured Output), replacing the need for raw prompt parsing.
Workflow:
- Context Injection: Inject the PDF Manifest JSON directly into the System Prompt.
- Reasoning: LLM maps User Command -> Field IDs.
- Output: LLM returns a Pydantic object (JSON) containing the fill plan.
Python
1 | class FieldUpdate(BaseModel): |
PDF Filling Engine
Tool: pypdf
- Why: Robust support for writing AcroForms and updating appearance streams.
Action:
Load PDF bytes using
PdfReader.Map the LLM’s Pydantic output to a dictionary:
{ "field_id": "value" }.Execute filling:
Python
1
2
3
4
5writer.update_page_form_field_values(
writer.pages[0],
fields_dict,
auto_regenerate=True # Crucial for visible text (/AP Stream)
)Return the
BytesIOstream to the user.
install_url to use ShareThis. Please set it in _config.yml.

