HTML rendering pipeline
At a glance
Section titled “At a glance”When you call writeHtml(), it runs one forward pass over HyperText Markup Language (HTML): tokenize input, resolve @page and styles, lay out content, and paint Portable Document Format (PDF) operators. It does not retain an element tree between stages.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”The HTML rendering pipeline converts HTML+CSS, meaning HTML plus Cascading Style Sheets (CSS), into PDF content-stream operators in one forward pass. It does not build a retained document tree. The stages below reflect HtmlParser::parse() on main.
Stage 1 — Sanitize and normalize. HtmlParser::parse() rejects input over 10 MB, strips control characters, and normalizes line endings: both CRLF and bare CR become LF, matching the HTML line-ending normalization in the source. It then resets every instance field, so state from an earlier call cannot carry forward.
Stage 2 — Extract @page and style blocks. The parser first extracts <style> blocks, then applies discovered @page rules to reconfigure page geometry. It does this before processing any token, because page size affects every later layout decision.
Stage 3 — Tokenize. HtmlTokenizer::cleanHtml() normalizes whitespace while preserving <pre> content. tokenize() then produces a flat list<HtmlToken>. This is a token list, not a node graph. The pipeline discards whitespace-only text tokens immediately. HtmlChildScanner::scan() builds index maps (child counts, tag counts, emptiness) over the flat list, so structural selectors do not need a tree.
Stage 4 — Optional :has() pre-scan. When you enable the css.has experimental feature, CssResolver::resolveHasSelectors() runs one bounded pre-scan over the token list to resolve the relational selector. This documented, bounded step is the exception to the single-pass rule.
Stage 5 — Process tokens (style, layout, paint). HtmlParser::processTokens() walks the token list once. For each element, it resolves the cascade (Layer 1 applicators write HtmlStyleState), computes geometry (Layer 3 layout), and emits PDF operators (Layer 4 paint). Style inheritance uses a push-and-pop HtmlStyleState stack. The cursor (x, y, margins, stream offset) moves between handlers through HtmlBlockCursor snapshots.
Stage 6 — Return the result. parse() returns an immutable HtmlRenderResult with the emitted content stream, the end cursor position, and the used font keys. The caller (writeHtml()) passes the cursor back to the page coordinate frame.
For the four-layer separation inside Stage 5, see the layer contracts page. For the no-retained-tree property and its caps, see the streaming constraints page.
API surface
Section titled “API surface”| Symbol | Location | Stage |
|---|---|---|
Document::writeHtml(string $html): static | src/Core/Concerns/HasTextOutput.php | Public entry |
HtmlParser::parse(string $html): HtmlRenderResult | src/Html/HtmlParser.php | Orchestrates all stages |
HtmlTokenizer::cleanHtml() / tokenize() | src/Html/HtmlTokenizer.php | Stage 3 |
HtmlChildScanner::scan() | src/Html/HtmlChildScanner.php | Stage 3 index maps |
CssResolver::resolveHasSelectors() | src/Html/CssResolver.php | Stage 4 (gated) |
HtmlRenderResult (stream, endX, endY, usedFontKeys) | src/Html/HtmlRenderResult.php | Stage 6 |
Code sample — Quick start
Section titled “Code sample — Quick start”Sourced from examples/08-html-basic.php.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();$doc->setTitle('HTML Basic');$doc->addPage();$doc->writeHtml('<h1 style="color:#1E3A8A;">HTML Rendering</h1><p>One pass.</p>');$doc->save(__DIR__ . '/output/08-html-basic.pdf');Code sample — Production
Section titled “Code sample — Production”Render a styled report with an embedded <style> block. The pipeline extracts and applies the style block before processing any token.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;use NextPDF\Exception\HtmlParsingException;
function renderInvoice(string $bodyHtml, string $out): void{ $doc = Document::createStandalone(); $doc->setTitle('Invoice'); $doc->addPage();
$html = '<style>@page { margin: 20mm; } ' . 'h1 { color: #1E3A8A; } ' . 'table { width: 100%; }</style>' . $bodyHtml;
try { $doc->writeHtml($html); } catch (HtmlParsingException $e) { // Sanitize/cap failures surface here. Do not retry. throw $e; }
$doc->save($out);}Edge cases & gotchas
Section titled “Edge cases & gotchas”@pageis read before tokens. A@pagerule after content still applies, because style extraction precedes tokenization. Page geometry is fixed before Stage 5.<pre>whitespace is preserved.cleanHtml()protects<pre>content; the pipeline collapses whitespace elsewhere.:has()is gated. If you do not enable thecss.hasexperimental feature, Stage 4 does not run and:has()selectors do not match.- One stream buffer. The pipeline writes to one string buffer. It never moves content already written. There is no re-layout.
- Caps apply mid-pass. The element and nesting caps throw during Stage 5, not before. A document can fail partway.
Performance
Section titled “Performance”The pipeline traverses in O(token count). Table column sizing adds a bounded per-table row scan (Stage 5, TableParser). When enabled, the :has() pre-scan adds one bounded token-list pass (Stage 4). Memory is O(nesting depth) for the style stack, not O(element count); see streaming constraints. The HTML render-pipeline performance benchmark guards against regressions with a 5% gate (merged work, PR #564). The per-page performance_budget (wall_ms: 1500, peak_mb: 64) is the operational ceiling.
Security notes
Section titled “Security notes”Stage 1 is the first security boundary: the 10 MB input cap, control-character stripping, and line-ending normalization all run before tokenization. During Stage 5, DefaultHtmlSecurityPolicy gates allowed tags, attributes, CSS properties, and URL schemes. See the HTML module security model.
Conformance
Section titled “Conformance”Line-ending normalization follows the HTML standard’s line-ending handling: CRLF and bare CR become LF. Per-property CSS conformance is documented in the CSS support matrix, and cascade behavior is documented on css-resolver. This page does not restate per-property support.
Commercial context
Section titled “Commercial context”Enterprise capability. Premium widens CSS coverage on the same pipeline. The six-stage sequence does not change between editions. See the CSS support matrix.