Skip to content

Python SDK

Use the NextPDF Python Software Development Kit (SDK) when your Python application, asyncio service, AI agent, or terminal workflow needs PDF extraction with provenance. The SDK returns structured blocks with citation anchors: page index, confidence, optional bounding box, and a semantic node identifier. You can trace every extracted value back to its source location.

The package includes a synchronous NextPDF client for scripts and notebooks, an asynchronous AsyncNextPDF client for asyncio runtimes, a nextpdf command-line interface (CLI) for streaming extraction from large files, and an optional Model Context Protocol (MCP) server that lets AI agents call extraction tools directly. All four paths use the same Abstract Syntax Tree (AST) surface through a NextPDF Connect endpoint.

You need Python 3.10 or newer and, for production extraction, a NextPDF Connect endpoint. Install the SDK with pip install nextpdf. For the agent server, use pip install nextpdf[mcp].

PageUse it for
OverviewWhat the SDK provides, which backend to choose, and where the limits are.
QuickstartInstall the SDK and extract cited text with page-level provenance.
API referenceClients, AST method chains, Pydantic models, CLI commands, and exceptions.
Developer guideArchitecture boundaries, runtime lifecycle, async batching, and failure handling.
CLIRun citation-aware extraction from the terminal and stream large documents.
MCP serverExpose extraction tools to AI agents that support MCP.
SymbolRole
NextPDFSynchronous client for scripts, batch jobs, and notebooks.
AsyncNextPDFAsynchronous client and async context manager for asyncio runtimes.
client.ast.get_document_ast()Builds the full Semantic AST from PDF bytes.
client.ast.extract_cited_text()Extracts text blocks with citation anchors.
client.ast.extract_cited_tables()Extracts tables with cell-level citation anchors.
client.ast.search_ast_nodes()Finds nodes by type, page, or text query.
client.ast.get_ast_diff()Compares two PDF versions structurally.
nextpdfCommand-line interface for terminal and pipeline extraction.