Author a tagged PDF for accessibility from HTML
At a glance
Section titled “At a glance”Use this recipe to create a tagged PDF from semantic HTML. NextPDF writes the logical structure tree, the catalog language, and the PDF/UA-2 markers. A checker decides conformance. This recipe follows examples/31-pdfua2-tagged.php.
Prerequisites
Section titled “Prerequisites”- Install Core:
composer require nextpdf/core:^3. - Use a PDF/UA validator for verification. The example uses veraPDF.
Recipe
Section titled “Recipe”- Create the document.
- Call
enableTaggedPdf()with a Best Current Practice (BCP) 47 language tag beforewriteHtml(). The HTML pipeline detects tagged mode when it builds the parser, then connects the tagged-content emitter. - Set the document metadata (title, language). You can call
setLanguage()safely withenableTaggedPdf()because it is idempotent. - Write semantic HTML. The parser maps each block element to a structure element:
h1toH1,ptoP,ul/litoL/LI, andtabletoTable/TR/TD. - Save, then validate with a PDF/UA checker.
Full example
Section titled “Full example”<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
// Step 2 — enable tagged mode BEFORE writeHtml(). The lang argument// drives the catalog /Lang entry and the structure tree root language.$doc->enableTaggedPdf(lang: 'en');
// Step 3 — metadata. setLanguage() restates intent; it is idempotent here.$doc->setTitle('Accessible Report');$doc->setLanguage('en');
$doc->addPage();
// Step 4 — semantic HTML. Each block element becomes a StructElem; text// runs are wrapped in BDC/EMC operators with stable MCIDs.$html = <<<'HTML'<h1>Quarterly Accessibility Report</h1><p>This document opts into tagged PDF so assistive technology can exposea meaningful reading order.</p><h2>Findings</h2><ul> <li>Headings carry semantic roles.</li> <li>Lists keep their item structure.</li></ul>HTML;
$doc->writeHtml($html);
$doc->save(__DIR__ . '/accessible.pdf');
echo "Wrote accessible.pdf — validate with: verapdf --flavour ua2 accessible.pdf\n";Expected output
Section titled “Expected output”Wrote accessible.pdf — validate with: verapdf --flavour ua2 accessible.pdfThe output includes a structure tree, /MarkInfo << /Marked true >>, the catalog /Lang entry, and the pdfuaid:part XMP marker. Run the checker to confirm that these items are present and internally consistent.
Edge cases
Section titled “Edge cases”- Call order. Calling
enableTaggedPdf()afterwriteHtml()does not retroactively tag content that was already written. Enable tagged mode first. - Language tag. Pass
ConformancePolicy::strictUa2()as the second argument to reject a malformed BCP 47 tag at the API boundary instead of dropping it silently at write time. - Idempotent re-enable. If you call
enableTaggedPdf()twice, NextPDF updates the language without rebuilding the structure tree you have already populated. - Manual tagging. For non-HTML content, wrap items with
beginTag()/endTag(). Container types (Table,TR,L,LI) become grouping elements with no marked content. Leaf types (P,H1–H6,TD) get MCIDs. - Support is not conformance. NextPDF emits the structural metadata that PDF/UA-2 needs. It also surfaces a degraded-parity warning: for production sign-off, external oracle validation is the caller’s responsibility. Run a checker before you state that a file conforms.
Conformance
Section titled “Conformance”| Statement | Spec | Clause | reference_id |
|---|---|---|---|
| Real content requires a logical structure. | ISO 14289-2 | §8.2.2 | |
| Structure elements follow a defined nesting and reading order. | ISO 14289-2 | §8.2.3 |
This recipe shows you how to produce the tagged structure that supports accessible authoring. It does not assert PDF/UA-2 conformance; a checker decides that.