Skip to content

Accessibility: tagging primitives and the PDF/UA-2 structure model

NextPDF Core provides primitives for accessible authoring: a logical structure tree, standard role mapping, marked-content tagging, and Best Current Practice (BCP) 47 language attributes that align with the structure-tree model in ISO 14289-2 (PDF/UA-2) and ISO 32000-2 §14.7. The produced file conforms only when the final document, the author’s content choices, and an external checker support that result. The library does not assert that guarantee for you.

Terminal window
composer require nextpdf/core

A tagged Portable Document Format (PDF) file includes a logical structure tree whose root holds a single Document structure element. Assistive technology reads that tree to determine a meaningful reading order that does not depend on the visual layout (ISO 32000-2 §14.7.2; ISO 14289-2 §8.2.5.2). NextPDF models this with three cooperating types in the NextPDF\Accessibility namespace.

StructureTree owns the hierarchy. It allocates marked-content identifiers for each page, tracks parent and child nesting, and serializes the structure-tree root, structure elements, the parent tree, the role map, and the PDF 2.0 standard-structure namespace per ISO 32000-2 §14.7. createRoot() seeds the mandatory single Document element with a language attribute. addElement() attaches typed children. hasRoot() and rootHasChildren() report whether the tree exists and whether it has descendants.

StructureElement is the value object for one structure-element dictionary. It stores the standard structure type (Table 368 names such as H1 through H6, P, L, LI, Table, Figure, Link), marked-content identifier entries, and optional accessibility attributes for alternative text, replacement text, title, and language. A single element can span multiple pages. It accumulates one identifier entry per page so the kids array references marked content across page boundaries.

TaggedContentEmitter connects the Hypertext Markup Language (HTML) pipeline to the structure tree. When Document::enableTaggedPdf() is active, the HTML renderer wires the emitter so block-level elements create paired marked-content operators and matching structure-element nodes. HtmlToStructureMap provides the table-driven mapping from HTML tags to PDF structure types (ISO 14289-2 §8). The emitter routes decorative running content, such as the HTML header and footer regions, to an artifact and keeps it out of the reading order.

Bcp47Validator validates language tagging (Request for Comments (RFC) 5646). It provides a well-formed syntactic check and a registry-backed validity check. Strict mode (ConformancePolicy::strictUa2()) rejects malformed tags at the application programming interface (API) boundary instead of dropping them silently at write time. This matches the ISO 14289-2 §8.4.4 requirement that the catalog language entry resolve to a specific language.

SymbolKindSummary
Document::enableTaggedPdf(string $lang = 'en', ?ConformancePolicy $policy = null): staticmethodActivate the structure tree and HTML bridge; set the mark-info and catalog language entries.
Document::setLanguage(string $lang): staticmethodSet the document-level natural language (BCP 47).
Document::isTaggedPdfEnabled(): boolmethodReport whether the active conformance mode mandates structural tagging.
StructureTree::createRoot(string $lang = 'en'): intmethodCreate the mandatory single Document root element.
StructureTree::addElement(int $parentIndex, string $type, int $pageIndex, ...): intmethodAttach a typed child structure element.
StructureTree::hasRoot(): bool and rootHasChildren(): boolmethodReport whether the tree exists and whether it has descendants.
StructureElementfinal classValue object for one structure element (alternative text, replacement text, title, language, identifiers).
RoleMap::standard(): array<string,string>staticReturn the standard structure-type vocabulary (ISO 32000-2 Table 368 plus PDF 2.0 types).
Bcp47Validator::isWellFormed/isValid/validate/normalisemethodValidate RFC 5646 language tags with syntactic and registry-backed checks.
AccessibilityAutoFixerRegistryfinal classOpt-in PHP Standards Recommendation (PSR)-11-style registry for heuristic structure fixers.
<?php
declare(strict_types=1);
use NextPDF\Core\Document;
$doc = Document::createStandalone();
// The BCP 47 tag drives the catalog language entry and the
// structure-tree root language attribute.
$doc->enableTaggedPdf(lang: 'en');
$doc->setTitle('Tagged accessibility demo');
$doc->addPage();
// Semantic HTML maps to structure elements: h1 to /H1, p to /P,
// ul and li to /L plus /LI. Text runs are wrapped in
// marked-content operators with stable identifiers.
$doc->writeHtml('<h1>Document title</h1><p>Body paragraph.</p>');
$doc->save(__DIR__ . '/output/tagged.pdf');
<?php
declare(strict_types=1);
use NextPDF\Conformance\ConformancePolicy;
use NextPDF\Core\Document;
use NextPDF\Exception\InvalidConfigException;
use Psr\Log\LoggerInterface;
final class AccessibleReportWriter
{
public function __construct(private readonly LoggerInterface $logger)
{
}
public function render(string $html, string $bcp47Lang, string $outPath): void
{
$doc = Document::createStandalone();
try {
// strictUa2() rejects malformed BCP 47 tags at the API
// boundary (ISO 14289-2 §8.4.4) instead of dropping silently.
$doc->enableTaggedPdf($bcp47Lang, ConformancePolicy::strictUa2());
} catch (InvalidConfigException $e) {
$this->logger->error('Rejected language tag for tagged PDF', [
'lang' => $bcp47Lang,
'reason' => $e->getMessage(),
]);
throw $e;
}
$doc->setTitle('Quarterly accessibility report')
->setLanguage($bcp47Lang)
->addPage();
$doc->writeHtml($html);
// The engine emits a Degraded / ComplianceRisk advisory directing
// the caller to validate externally; surface it to operators
// rather than treating tagged output as certified.
foreach ($doc->getWarnings() as $warning) {
$this->logger->warning('Tagged-PDF advisory', [
'code' => $warning->code->value,
'message' => $warning->message,
]);
}
$doc->save($outPath);
}
}
  • Order of calls. Call enableTaggedPdf() before writeHtml(). The HTML pipeline checks the conformance mode when the parser is constructed and does not retroactively wire the emitter for content that has already rendered.
  • Empty structure tree. A document with enableTaggedPdf() but no attached structure descendants does not advertise PDF/UA-2 in its metadata. The publication gate is rootHasChildren(), not hasRoot(), because validators reject a file that claims PDF/UA-2 with an empty structure tree (ISO 14289-2 §5; verified by EmptyTaggedPdfDoesNotAdvertisePdfUa2Test).
  • Conformance-mode collapse. When you call enablePdfA() and enableTaggedPdf() on the same document, the single-valued conformance discriminator collapses to last-wins. Side effects (structure tree, mark-info) remain additive, and NextPDF emits a CONFORMANCE_MODE_CLOBBERED warning so the collapse is observable.
  • Auto-fixers are not automatic. Built-in fixers (EmptyTagStripper, LegacyLangNormaliser, RootLangFallback) ship under NextPDF\Accessibility\AutoFixer\* but are never auto-registered. You must register them explicitly on AccessibilityAutoFixerRegistry.

NextPDF emits structure consistent with the PDF/UA-2 structure-tree model, but it does not create semantics it cannot infer. You must supply markup or attributes for the following; NextPDF does not generate them for you:

  • alternative text for images and other non-text content;
  • table header scope and header-to-cell associations beyond what the HTML markup expresses;
  • link purpose text when the visible link text is not self-describing;
  • list semantics for content that is visually laid out as a list but lacks list markup;
  • corrected reading order when the source order differs from the intended reading order;
  • decorative-versus-meaningful classification for ambiguous content.

NextPDF performs no end-to-end PDF/UA-2 verification. At runtime, it emits a Degraded / ComplianceRisk advisory (PDFUA2_FOUNDATIONAL) that directs the caller to validate the output with an external checker before production sign-off. Validate with a PDF/UA checker (for example, veraPDF). NextPDF does not assert conformance on your behalf. Final-document conformance depends on authoring choices and a validator, not on calling the API.

Structure-tree construction is linear in the number of structure elements. Identifier allocation is amortized constant time per marked-content sequence. Serialization is a single linear pass over the element set. For HTML-driven tagging, the dominant cost is the HTML pipeline itself, not tag emission. The per-recipe cap declared in performance_budget (1500 ms wall time, 64 MB peak) applies to a typical multi-page semantic document. Large documents scale linearly with element count rather than page count.

Language tags and accessibility attributes flow into PDF name and string objects. NextPDF escapes them through PdfStringEscaper, so malformed or hostile language, alternative-text, replacement-text, and title values cannot break out of their PDF object context. Strict mode also rejects unregistered BCP 47 tags at the API boundary, narrowing the input surface before it reaches the writer. Accessibility attributes can carry author-supplied free text. Treat them as untrusted output and review them as you review other document content. See the Conformance module for profile-checker behavior.

This page maps library behavior to clause identifiers. It does not assert that your output conforms. The cited clauses are paraphrased, never quoted. See the PDF/UA-2 specification mapping for the provision-level table and explicit non-coverage. Citation chunk hashes are recorded in docs/public/modules/core/_normative-evidence-a11y.md.