HTML single-pass streaming constraints (ADR-001)
At a glance
Section titled “At a glance”NextPDF renders HyperText Markup Language (HTML) in one forward pass and keeps no element tree in memory. ADR-001 records that choice and the constraints it places on every Cascading Style Sheets (CSS) feature.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”The HTML subsystem renders HTML and CSS to Portable Document Format (PDF) in a single streaming pass. ADR-001 (“Stream-based Rendering Pipeline Retention”, accepted 2026-04-06) is the architecture decision that defines this model. This page explains the model, its boundaries, and the constraints contributors must honor.
In this model, the tokenizer (HtmlTokenizer) reads the input once and produces a flat token list. HtmlParser::processTokens() walks that list from left to right and writes PDF content-stream operators into a string buffer as it reaches each element. The engine builds no persistent element graph between calls. Any state that must survive a handler call moves through a snapshot value object (HtmlBlockCursor), not through shared nodes. Style inheritance uses a push-and-pop stack of flat HtmlStyleState instances, not a parent-pointer tree.
This is not a retained-document model. The engine does not hold a document tree, does not re-lay out content it has already written, and does not let the input mutate after the parse starts. The boundary is plain: NextPDF streams end to end. A retained renderer builds the whole document in memory first; NextPDF does not.
Two operations need limited lookahead. Both are explicit, bounded exceptions. Table column sizing scans every row before it places a cell. It buffers those rows in an ephemeral table buffer inside TableParser, an exception ADR-001 acknowledges by name. The :has() relational selector and the :last-child and :last-of-type selectors use a bounded pre-scan over the flat token list, not a tree walk. ADR-001 records both exceptions and their bounds.
The model is worker-safe. HtmlParser is constructed once per request, never as a singleton. HtmlParser::parse() resets every field at the start of each call. No static mutable state exists in the render path, so RoadRunner, Swoole, and Laravel Octane can reuse the process without state leaking between documents.
API surface
Section titled “API surface”The symbols below enforce these constraints. Verify each one against src/Html/.
| Symbol | Location | Role |
|---|---|---|
HtmlParser::parse(string $html): HtmlRenderResult | src/Html/HtmlParser.php | Entry point. Resets all state, then runs the single pass. |
HtmlParser::MAX_ELEMENT_COUNT (50_000) | src/Html/HtmlParser.php | Hard cap on processed elements. |
HtmlParser::MAX_NESTING_DEPTH (100) | src/Html/HtmlParser.php | Hard cap on nesting depth. |
HtmlBlockCursor | src/Html/HtmlBlockCursor.php | Cursor snapshot. The only shared-state mechanism. |
HtmlStyleState | src/Html/HtmlStyleState.php | Stack-pushed style frame. No parent pointer. |
TableParser::reset() | src/Html/TableParser.php | Mandatory reset of the ephemeral table buffer between tables. |
Code sample — Quick start
Section titled “Code sample — Quick start”You do not manage the streaming model directly. One call renders any supported document.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();$doc->setTitle('Streaming render');$doc->addPage();$doc->writeHtml('<h1>One forward pass</h1><p>No retained tree.</p>');$doc->save(__DIR__ . '/output/streaming.pdf');Code sample — Production
Section titled “Code sample — Production”Render a large document within a fixed memory budget. The element cap is the safety boundary, so size the input before the call.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;use NextPDF\Exception\HtmlParsingException;
/** * Render trusted HTML, surfacing the streaming-model limits as typed errors. * * @param non-empty-string $html */function renderReport(string $html, string $out): void{ $doc = Document::createStandalone(); $doc->addPage();
try { $doc->writeHtml($html); } catch (HtmlParsingException $e) { // Thrown on the 10 MB input cap, the 50,000-element cap, // or the 100-level nesting cap. These are model boundaries, // not transient faults — do not retry. throw $e; }
$doc->save($out);}Edge cases & gotchas
Section titled “Edge cases & gotchas”- Element cap is a hard stop. The engine throws
HtmlParsingExceptionatMAX_ELEMENT_COUNT = 50_000. Split very large reports across multiplewriteHtml()calls or multiple documents. - Nesting cap is a hard stop. Depth above
MAX_NESTING_DEPTH = 100throws. Deeply nested wrappers usually cause this. - Input-size cap.
HtmlParser::parse()rejects input larger than 10 MB before tokenizing. :has()is gated. The:has()pre-scan runs only when thecss.hasexperimental feature is active. Without it,:has()selectors do not match.- Table buffering is the one ephemeral tree. A single very wide or very tall table holds its rows in memory until
render().TableParserbounds this buffer per table and resets it between tables; it is not a document-wide tree. - No re-layout. Content already written is never moved. A late style cannot retroactively change earlier output.
Performance
Section titled “Performance”The streaming model holds at most one HtmlStyleState per nesting level, bounded by MAX_NESTING_DEPTH = 100, plus the active cursor fields. Style-state and cursor memory are O(depth), not O(element count). ADR-001 records the design intent that this stays well below a retained object graph for the same input. The controlled 50,000-element peak resident set size (RSS) benchmark is the empirical validation target named in ADR-001. The HTML render-pipeline performance benchmark tracks it with a 5% regression gate (merged work, PR #564). Treat the per-page performance_budget (wall_ms: 1500, peak_mb: 64) as the operational ceiling.
Security notes
Section titled “Security notes”The caps on this page also act as denial-of-service controls. DefaultHtmlSecurityPolicy enforces a 10 MB input ceiling and the 100-level nesting ceiling independently of the parser, so a hostile document cannot exhaust memory through depth or size. The streaming model itself bounds memory by construction: there is no element graph for an attacker to inflate. See the HTML module security model and the layer contracts for the full policy surface.
Conformance
Section titled “Conformance”This page cites no external standard. These constraints come from ADR-001 and from the enforcing source symbols listed under API surface. Behavioral CSS-spec mappings are documented on css-resolver, not here.
Commercial context
Section titled “Commercial context”Enterprise capability. The streaming architecture is identical in Core and Premium. Premium broadens CSS coverage; it does not change the single-pass model or relax these caps. See the CSS support matrix.