HTML rendering pipeline

At a glance

When you call writeHtml(), it runs one forward pass over HyperText Markup Language (HTML): tokenize input, resolve @page and styles, lay out content, and paint Portable Document Format (PDF) operators. It does not retain an element tree between stages.

Install

composer require nextpdf/core:^3

Conceptual overview

The HTML rendering pipeline converts HTML+CSS, meaning HTML plus Cascading Style Sheets (CSS), into PDF content-stream operators in one forward pass. It does not build a retained document tree. The stages below reflect HtmlParser::parse() on main.

Stage 1 — Sanitize and normalize. HtmlParser::parse() rejects input over 10 MB, strips control characters, and normalizes line endings: both CRLF and bare CR become LF, matching the HTML line-ending normalization in the source. It then resets every instance field, so state from an earlier call cannot carry forward.

Stage 2 — Extract @page and style blocks. The parser first extracts <style> blocks, then applies discovered @page rules to reconfigure page geometry. It does this before processing any token, because page size affects every later layout decision.

Stage 3 — Tokenize. HtmlTokenizer::cleanHtml() normalizes whitespace while preserving <pre> content. tokenize() then produces a flat list<HtmlToken>. This is a token list, not a node graph. The pipeline discards whitespace-only text tokens immediately. HtmlChildScanner::scan() builds index maps (child counts, tag counts, emptiness) over the flat list, so structural selectors do not need a tree.

Stage 4 — Optional :has() pre-scan. When you enable the css.has experimental feature, CssResolver::resolveHasSelectors() runs one bounded pre-scan over the token list to resolve the relational selector. This documented, bounded step is the exception to the single-pass rule.

Stage 5 — Process tokens (style, layout, paint). HtmlParser::processTokens() walks the token list once. For each element, it resolves the cascade (Layer 1 applicators write HtmlStyleState), computes geometry (Layer 3 layout), and emits PDF operators (Layer 4 paint). Style inheritance uses a push-and-pop HtmlStyleState stack. The cursor (x, y, margins, stream offset) moves between handlers through HtmlBlockCursor snapshots.

Stage 6 — Return the result. parse() returns an immutable HtmlRenderResult with the emitted content stream, the end cursor position, and the used font keys. The caller (writeHtml()) passes the cursor back to the page coordinate frame.

For the four-layer separation inside Stage 5, see the layer contracts page. For the no-retained-tree property and its caps, see the streaming constraints page.

API surface

Symbol	Location	Stage
`Document::writeHtml(string $html): static`	`src/Core/Concerns/HasTextOutput.php`	Public entry
`HtmlParser::parse(string $html): HtmlRenderResult`	`src/Html/HtmlParser.php`	Orchestrates all stages
`HtmlTokenizer::cleanHtml()` / `tokenize()`	`src/Html/HtmlTokenizer.php`	Stage 3
`HtmlChildScanner::scan()`	`src/Html/HtmlChildScanner.php`	Stage 3 index maps
`CssResolver::resolveHasSelectors()`	`src/Html/CssResolver.php`	Stage 4 (gated)
`HtmlRenderResult` (`stream`, `endX`, `endY`, `usedFontKeys`)	`src/Html/HtmlRenderResult.php`	Stage 6

Code sample — Quick start

Sourced from examples/08-html-basic.php.

<?php

declare(strict_types=1);

require_once __DIR__ . '/../vendor/autoload.php';

use NextPDF\Core\Document;

$doc = Document::createStandalone();
$doc->setTitle('HTML Basic');
$doc->addPage();
$doc->writeHtml('<h1 style="color:#1E3A8A;">HTML Rendering</h1><p>One pass.</p>');
$doc->save(__DIR__ . '/output/08-html-basic.pdf');

Code sample — Production

Render a styled report with an embedded <style> block. The pipeline extracts and applies the style block before processing any token.

<?php

declare(strict_types=1);

require_once __DIR__ . '/../vendor/autoload.php';

use NextPDF\Core\Document;
use NextPDF\Exception\HtmlParsingException;

function renderInvoice(string $bodyHtml, string $out): void
{
    $doc = Document::createStandalone();
    $doc->setTitle('Invoice');
    $doc->addPage();

    $html = '<style>@page { margin: 20mm; } '
          . 'h1 { color: #1E3A8A; } '
          . 'table { width: 100%; }</style>'
          . $bodyHtml;

    try {
        $doc->writeHtml($html);
    } catch (HtmlParsingException $e) {
        // Sanitize/cap failures surface here. Do not retry.
        throw $e;
    }

    $doc->save($out);
}

Edge cases & gotchas

@page is read before tokens. A @page rule after content still applies, because style extraction precedes tokenization. Page geometry is fixed before Stage 5.
<pre> whitespace is preserved. cleanHtml() protects <pre> content; the pipeline collapses whitespace elsewhere.
:has() is gated. If you do not enable the css.has experimental feature, Stage 4 does not run and :has() selectors do not match.
One stream buffer. The pipeline writes to one string buffer. It never moves content already written. There is no re-layout.
Caps apply mid-pass. The element and nesting caps throw during Stage 5, not before. A document can fail partway.

Performance

The pipeline traverses in O(token count). Table column sizing adds a bounded per-table row scan (Stage 5, TableParser). When enabled, the :has() pre-scan adds one bounded token-list pass (Stage 4). Memory is O(nesting depth) for the style stack, not O(element count); see streaming constraints. The HTML render-pipeline performance benchmark guards against regressions with a 5% gate (merged work, PR #564). The per-page performance_budget (wall_ms: 1500, peak_mb: 64) is the operational ceiling.

Security notes

Stage 1 is the first security boundary: the 10 MB input cap, control-character stripping, and line-ending normalization all run before tokenization. During Stage 5, DefaultHtmlSecurityPolicy gates allowed tags, attributes, CSS properties, and URL schemes. See the HTML module security model.

Conformance

Line-ending normalization follows the HTML standard’s line-ending handling: CRLF and bare CR become LF. Per-property CSS conformance is documented in the CSS support matrix, and cascade behavior is documented on css-resolver. This page does not restate per-property support.

Commercial context

Enterprise capability. Premium widens CSS coverage on the same pipeline. The six-stage sequence does not change between editions. See the CSS support matrix.