Skip to content

Author a tagged PDF for accessibility from HTML

Use this recipe to create a tagged PDF from semantic HTML. NextPDF writes the logical structure tree, the catalog language, and the PDF/UA-2 markers. A checker decides conformance. This recipe follows examples/31-pdfua2-tagged.php.

  • Install Core: composer require nextpdf/core:^3.
  • Use a PDF/UA validator for verification. The example uses veraPDF.
  1. Create the document.
  2. Call enableTaggedPdf() with a Best Current Practice (BCP) 47 language tag before writeHtml(). The HTML pipeline detects tagged mode when it builds the parser, then connects the tagged-content emitter.
  3. Set the document metadata (title, language). You can call setLanguage() safely with enableTaggedPdf() because it is idempotent.
  4. Write semantic HTML. The parser maps each block element to a structure element: h1 to H1, p to P, ul/li to L/LI, and table to Table/TR/TD.
  5. Save, then validate with a PDF/UA checker.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
// Step 2 — enable tagged mode BEFORE writeHtml(). The lang argument
// drives the catalog /Lang entry and the structure tree root language.
$doc->enableTaggedPdf(lang: 'en');
// Step 3 — metadata. setLanguage() restates intent; it is idempotent here.
$doc->setTitle('Accessible Report');
$doc->setLanguage('en');
$doc->addPage();
// Step 4 — semantic HTML. Each block element becomes a StructElem; text
// runs are wrapped in BDC/EMC operators with stable MCIDs.
$html = <<<'HTML'
<h1>Quarterly Accessibility Report</h1>
<p>This document opts into tagged PDF so assistive technology can expose
a meaningful reading order.</p>
<h2>Findings</h2>
<ul>
<li>Headings carry semantic roles.</li>
<li>Lists keep their item structure.</li>
</ul>
HTML;
$doc->writeHtml($html);
$doc->save(__DIR__ . '/accessible.pdf');
echo "Wrote accessible.pdf — validate with: verapdf --flavour ua2 accessible.pdf\n";
Wrote accessible.pdf — validate with: verapdf --flavour ua2 accessible.pdf

The output includes a structure tree, /MarkInfo << /Marked true >>, the catalog /Lang entry, and the pdfuaid:part XMP marker. Run the checker to confirm that these items are present and internally consistent.

  • Call order. Calling enableTaggedPdf() after writeHtml() does not retroactively tag content that was already written. Enable tagged mode first.
  • Language tag. Pass ConformancePolicy::strictUa2() as the second argument to reject a malformed BCP 47 tag at the API boundary instead of dropping it silently at write time.
  • Idempotent re-enable. If you call enableTaggedPdf() twice, NextPDF updates the language without rebuilding the structure tree you have already populated.
  • Manual tagging. For non-HTML content, wrap items with beginTag() / endTag(). Container types (Table, TR, L, LI) become grouping elements with no marked content. Leaf types (P, H1H6, TD) get MCIDs.
  • Support is not conformance. NextPDF emits the structural metadata that PDF/UA-2 needs. It also surfaces a degraded-parity warning: for production sign-off, external oracle validation is the caller’s responsibility. Run a checker before you state that a file conforms.
StatementSpecClausereference_id
Real content requires a logical structure.ISO 14289-2§8.2.2
Structure elements follow a defined nesting and reading order.ISO 14289-2§8.2.3

This recipe shows you how to produce the tagged structure that supports accessible authoring. It does not assert PDF/UA-2 conformance; a checker decides that.