Python SDK

At a glance

Use the NextPDF Python Software Development Kit (SDK) when your Python application, asyncio service, AI agent, or terminal workflow needs PDF extraction with provenance. The SDK returns structured blocks with citation anchors: page index, confidence, optional bounding box, and a semantic node identifier. You can trace every extracted value back to its source location.

The package includes a synchronous NextPDF client for scripts and notebooks, an asynchronous AsyncNextPDF client for asyncio runtimes, a nextpdf command-line interface (CLI) for streaming extraction from large files, and an optional Model Context Protocol (MCP) server that lets AI agents call extraction tools directly. All four paths use the same Abstract Syntax Tree (AST) surface through a NextPDF Connect endpoint.

You need Python 3.10 or newer and, for production extraction, a NextPDF Connect endpoint. Install the SDK with pip install nextpdf. For the agent server, use pip install nextpdf[mcp].

Section map

Page	Use it for
Overview	What the SDK provides, which backend to choose, and where the limits are.
Quickstart	Install the SDK and extract cited text with page-level provenance.
API reference	Clients, AST method chains, Pydantic models, CLI commands, and exceptions.
Developer guide	Architecture boundaries, runtime lifecycle, async batching, and failure handling.
CLI	Run citation-aware extraction from the terminal and stream large documents.
MCP server	Expose extraction tools to AI agents that support MCP.

Primary APIs

Symbol	Role
`NextPDF`	Synchronous client for scripts, batch jobs, and notebooks.
`AsyncNextPDF`	Asynchronous client and async context manager for asyncio runtimes.
`client.ast.get_document_ast()`	Builds the full Semantic AST from PDF bytes.
`client.ast.extract_cited_text()`	Extracts text blocks with citation anchors.
`client.ast.extract_cited_tables()`	Extracts tables with cell-level citation anchors.
`client.ast.search_ast_nodes()`	Finds nodes by type, page, or text query.
`client.ast.get_ast_diff()`	Compares two PDF versions structurally.
`nextpdf`	Command-line interface for terminal and pipeline extraction.

Python SDK

At a glance

Section map

Primary APIs

See also