Skip to content

HTML rendering pipeline

When you call writeHtml(), it runs one forward pass over HyperText Markup Language (HTML): tokenize input, resolve @page and styles, lay out content, and paint Portable Document Format (PDF) operators. It does not retain an element tree between stages.

Terminal window
composer require nextpdf/core:^3

The HTML rendering pipeline converts HTML+CSS, meaning HTML plus Cascading Style Sheets (CSS), into PDF content-stream operators in one forward pass. It does not build a retained document tree. The stages below reflect HtmlParser::parse() on main.

Stage 1 — Sanitize and normalize. HtmlParser::parse() rejects input over 10 MB, strips control characters, and normalizes line endings: both CRLF and bare CR become LF, matching the HTML line-ending normalization in the source. It then resets every instance field, so state from an earlier call cannot carry forward.

Stage 2 — Extract @page and style blocks. The parser first extracts <style> blocks, then applies discovered @page rules to reconfigure page geometry. It does this before processing any token, because page size affects every later layout decision.

Stage 3 — Tokenize. HtmlTokenizer::cleanHtml() normalizes whitespace while preserving <pre> content. tokenize() then produces a flat list<HtmlToken>. This is a token list, not a node graph. The pipeline discards whitespace-only text tokens immediately. HtmlChildScanner::scan() builds index maps (child counts, tag counts, emptiness) over the flat list, so structural selectors do not need a tree.

Stage 4 — Optional :has() pre-scan. When you enable the css.has experimental feature, CssResolver::resolveHasSelectors() runs one bounded pre-scan over the token list to resolve the relational selector. This documented, bounded step is the exception to the single-pass rule.

Stage 5 — Process tokens (style, layout, paint). HtmlParser::processTokens() walks the token list once. For each element, it resolves the cascade (Layer 1 applicators write HtmlStyleState), computes geometry (Layer 3 layout), and emits PDF operators (Layer 4 paint). Style inheritance uses a push-and-pop HtmlStyleState stack. The cursor (x, y, margins, stream offset) moves between handlers through HtmlBlockCursor snapshots.

Stage 6 — Return the result. parse() returns an immutable HtmlRenderResult with the emitted content stream, the end cursor position, and the used font keys. The caller (writeHtml()) passes the cursor back to the page coordinate frame.

For the four-layer separation inside Stage 5, see the layer contracts page. For the no-retained-tree property and its caps, see the streaming constraints page.

SymbolLocationStage
Document::writeHtml(string $html): staticsrc/Core/Concerns/HasTextOutput.phpPublic entry
HtmlParser::parse(string $html): HtmlRenderResultsrc/Html/HtmlParser.phpOrchestrates all stages
HtmlTokenizer::cleanHtml() / tokenize()src/Html/HtmlTokenizer.phpStage 3
HtmlChildScanner::scan()src/Html/HtmlChildScanner.phpStage 3 index maps
CssResolver::resolveHasSelectors()src/Html/CssResolver.phpStage 4 (gated)
HtmlRenderResult (stream, endX, endY, usedFontKeys)src/Html/HtmlRenderResult.phpStage 6

Sourced from examples/08-html-basic.php.

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
$doc->setTitle('HTML Basic');
$doc->addPage();
$doc->writeHtml('<h1 style="color:#1E3A8A;">HTML Rendering</h1><p>One pass.</p>');
$doc->save(__DIR__ . '/output/08-html-basic.pdf');

Render a styled report with an embedded <style> block. The pipeline extracts and applies the style block before processing any token.

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
use NextPDF\Exception\HtmlParsingException;
function renderInvoice(string $bodyHtml, string $out): void
{
$doc = Document::createStandalone();
$doc->setTitle('Invoice');
$doc->addPage();
$html = '<style>@page { margin: 20mm; } '
. 'h1 { color: #1E3A8A; } '
. 'table { width: 100%; }</style>'
. $bodyHtml;
try {
$doc->writeHtml($html);
} catch (HtmlParsingException $e) {
// Sanitize/cap failures surface here. Do not retry.
throw $e;
}
$doc->save($out);
}
  • @page is read before tokens. A @page rule after content still applies, because style extraction precedes tokenization. Page geometry is fixed before Stage 5.
  • <pre> whitespace is preserved. cleanHtml() protects <pre> content; the pipeline collapses whitespace elsewhere.
  • :has() is gated. If you do not enable the css.has experimental feature, Stage 4 does not run and :has() selectors do not match.
  • One stream buffer. The pipeline writes to one string buffer. It never moves content already written. There is no re-layout.
  • Caps apply mid-pass. The element and nesting caps throw during Stage 5, not before. A document can fail partway.

The pipeline traverses in O(token count). Table column sizing adds a bounded per-table row scan (Stage 5, TableParser). When enabled, the :has() pre-scan adds one bounded token-list pass (Stage 4). Memory is O(nesting depth) for the style stack, not O(element count); see streaming constraints. The HTML render-pipeline performance benchmark guards against regressions with a 5% gate (merged work, PR #564). The per-page performance_budget (wall_ms: 1500, peak_mb: 64) is the operational ceiling.

Stage 1 is the first security boundary: the 10 MB input cap, control-character stripping, and line-ending normalization all run before tokenization. During Stage 5, DefaultHtmlSecurityPolicy gates allowed tags, attributes, CSS properties, and URL schemes. See the HTML module security model.

Line-ending normalization follows the HTML standard’s line-ending handling: CRLF and bare CR become LF. Per-property CSS conformance is documented in the CSS support matrix, and cascade behavior is documented on css-resolver. This page does not restate per-property support.

Enterprise capability. Premium widens CSS coverage on the same pipeline. The six-stage sequence does not change between editions. See the CSS support matrix.