HTML: HTML+CSS to PDF rendering subsystem
At a glance
Section titled “At a glance”The HTML subsystem converts HyperText Markup Language (HTML) and Cascading Style Sheets (CSS) into Portable Document Format (PDF) content streams in one forward pass. It is the engine’s largest and highest-risk subsystem, with 324 files under src/Html/.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”The HTML subsystem is a single-pass streaming HTML+CSS-to-PDF renderer. Its public surface is one method: Document::writeHtml(). Internally, HtmlParser tokenizes the input, resolves styles, computes layout, and emits PDF operators in one forward pass, without retaining a document tree.
Be clear about the scope. This subsystem is not a retained-document renderer. It does not hold an element graph, re-lay-out content already written, or let input mutate after parsing starts. It implements a curated subset of CSS at fixed specification pins. Two Architecture Decision Records (ADRs) govern it. ADR-001 defines the single-pass streaming model and its caps. ADR-010 defines the four-layer contract (CSS parsing, style state, layout, paint), plus the paged-media and measurement adjuncts.
HtmlParser is rated critical risk in the module manifest. Five files carry documented danger-zone annotations: the HtmlParser orchestrator (streaming tokenizer, 1000+ lines of code (LOC)), HtmlStyleState (100+ CSS property fields with a stack inheritance model), HtmlBlockHandler (block dispatch coupled to style state), FlexLayoutEngine (full flex measurement and layout), and TableParser (colspan/rowspan pagination across page breaks). Treat changes here as plan-mode work.
Use this page as the entry point. See pipeline for the stage sequence, css-resolver for cascade and specificity, layer-contracts-adr010 for the layer boundaries, and streaming-constraints-adr001 for the no-retained-tree model and its caps.
Right-to-left and bidirectional text
Section titled “Right-to-left and bidirectional text”writeHtml() renders right-to-left (RTL) content. Set the CSS direction: rtl property on the body, a table, or any element. The engine resolves the visual order with the Unicode Bidirectional Algorithm (UAX #9) through the typography layer’s bidirectional engine — see Typography for the BidiEngine details. Mixed Latin, Arabic, and numeric content orders correctly, and a number after Arabic keeps its digits left to right.
Arabic also takes contextual shaping: the engine selects the initial, medial, final, or isolated form of each letter and applies the Lam-Alef ligature. Shaping needs a registered font whose character map covers the Arabic Presentation Forms-B block; a Latin-only face, including the standard-14 fonts, cannot draw Arabic. In tables, each cell is reordered and shaped on its own and aligns to the start (right) edge under direction: rtl. RTL applies to Arabic, Hebrew, Persian, and Urdu; Hebrew is reordered but not shaped.
Set direction with the CSS direction property — the HTML dir attribute does not map to it. Horizontal alignment of non-table block and inline text, and text-align: justify, are not yet applied. For a runnable Arabic invoice and the full list of current limitations, see Render right-to-left Arabic HTML.
API surface
Section titled “API surface”| Symbol | Location | Role |
|---|---|---|
Document::writeHtml(string $html): static | src/Core/Concerns/HasTextOutput.php | Public entry point. Renders HTML at the current cursor. |
Document::createStandalone(): self | src/Core/Document.php | Construct a standalone document. |
HtmlParser::parse(string $html): HtmlRenderResult | src/Html/HtmlParser.php | Internal orchestrator. |
HtmlRenderResult | src/Html/HtmlRenderResult.php | Immutable result: stream, end cursor, and used fonts. |
DefaultHtmlSecurityPolicy | src/Html/DefaultHtmlSecurityPolicy.php | Default tag, attribute, CSS, and Uniform Resource Locator (URL) policy. |
HtmlSecurityPolicyInterface | src/Contracts/HtmlSecurityPolicyInterface.php | Policy contract for custom policies. |
Code sample — Quick start
Section titled “Code sample — Quick start”Source: examples/08-html-basic.php.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();$doc->setTitle('HTML Basic');$doc->addPage();$doc->writeHtml('<h1 style="color:#1E3A8A;">HTML Rendering</h1><p>Direct to PDF.</p>');$doc->save(__DIR__ . '/output/08-html-basic.pdf');Code sample — Production
Section titled “Code sample — Production”This sample shows a table report with an embedded style block, modeled on examples/09-html-table.php.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;use NextPDF\Exception\HtmlParsingException;
function renderInventory(string $rowsHtml, string $out): void{ $doc = Document::createStandalone(); $doc->setTitle('Inventory'); $doc->addPage();
$html = '<style>table { width: 100%; } ' . 'th { background-color: #1E3A8A; color: #FFFFFF; }</style>' . '<table border="1" cellpadding="5">' . $rowsHtml . '</table>';
try { $doc->writeHtml($html); } catch (HtmlParsingException $e) { // Input cap, element cap (50,000), or nesting cap (100). Do not retry. throw $e; }
$doc->save($out);}Edge cases & gotchas
Section titled “Edge cases & gotchas”- Curated CSS subset. Support is pinned per module. Check the CSS support matrix before you rely on a property.
- Hard caps throw. The 10 MB input,
50,000elements, and 100 nesting levels caps each throwHtmlParsingException. See streaming constraints. - No re-layout. The renderer writes output once in document order; late styles cannot change earlier output.
:has()is gated behind thecss.hasexperimental feature.- Critical-risk subsystem. Five files are marked as danger zones. Use plan mode for changes under
src/Html/.
Single-pass streaming constraints (ADR-001)
Section titled “Single-pass streaming constraints (ADR-001)”The renderer keeps no document tree and runs one forward pass. The element, nesting, and input caps are hard limits. For full detail and the worker-safety contract, see streaming constraints (ADR-001).
Layer contracts (ADR-010)
Section titled “Layer contracts (ADR-010)”CSS parsing, style state, layout, and paint are separated into four layers with one-directional contracts, plus paged-media and measurement adjuncts. For full detail, see layer contracts (ADR-010).
Memory budget for large documents
Section titled “Memory budget for large documents”Style-state and cursor memory is O(nesting depth), not O(element count). The per-page performance_budget is peak_mb: 64. The 50,000-element cap is the hard ceiling; split larger inputs across multiple writeHtml() calls. For details, see streaming constraints.
Performance
Section titled “Performance”Traversal is O(token count). Table column sizing adds a bounded per-table row scan. The optional :has() pre-scan adds one bounded token-list pass. The HTML render-pipeline performance benchmark enforces a 5% regression gate (merged work, pull request (PR) #564). The per-page performance_budget (wall_ms: 1500, peak_mb: 64) is the operational ceiling.
Security notes
Section titled “Security notes”DefaultHtmlSecurityPolicy enforces an allowlist of tags, attributes, CSS properties, and URL schemes, plus a 10 MB input ceiling and a 100-level nesting ceiling, independently of the parser. The CSS property allowlist is the security ceiling. The runtime support table is a separate capability ceiling. Implement HtmlSecurityPolicyInterface to supply a stricter policy. DefaultExternalResourcePolicy governs external resource fetching separately.
In href and image src values, the URL allowlist also rejects backslash-rooted (\…) and Universal Naming Convention (UNC) (\\host\share) paths, alongside the existing protocol-relative (//) rejection and the http(s)-or-relative-only allowlist. Backslashes are normalized to forward slashes before the check, so a Windows absolute-path local-file include or a Server Message Block (SMB) share fetch cannot fall through the “no scheme, therefore relative” branch. Neither path carries a Uniform Resource Identifier (URI) scheme.
CSS support matrix excerpt (Verified-only rows)
Section titled “CSS support matrix excerpt (Verified-only rows)”This page does not restate per-property support. The CSS support matrix is the single authority for verified per-World Wide Web Consortium (W3C) module status, including which modules are Verified versus Claimed.
Conformance
Section titled “Conformance”The subsystem implements a curated CSS subset at fixed specification pins. Behavioral spec mappings for the cascade are documented with clause and chunk identifiers on css-resolver. Per-module conformance status appears in the CSS support matrix.
Commercial context
Section titled “Commercial context”Enterprise capability. Premium widens CSS coverage (advanced print and additional modules) on the identical single-pass pipeline. The architecture, caps, and layer contracts stay the same across editions. See the CSS support matrix.