Compliance: PDF/R-1 validator, Arlington grammar, lifecycle tools
At a glance
Section titled “At a glance”NextPDF\Compliance ships byte-stream validators and a grammar cross-check
that read a finished Portable Document Format (PDF) file and report where it
differs from a normative contract. When a validator returns zero findings, the
result is checked against the clauses it implements. It is not a blanket
certificate.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”The module has three parts.
PdfRValidator validates a candidate ISO 23504-1:2020 (PDF/R-1) byte
stream. It operates on raw bytes, not on the writer’s internal state. It catches
drift between what the writer intends to emit and what the spec requires, and it
is the final check. The implemented clause set is the v5.1.0 cluster: §5
version-identification comment, §6.2.2/§6.2.3 header allowlist, §6.2.4
generation-0 and object-stream prohibition, §6.5.7 content-stream operator
allowlist (q, Q, cm, Do only), §6.6.1 image XObject key allowlist,
§6.4.3 Info-dictionary key allowlist, and §6.3 Catalog key allowlist. §6.7
incremental updates and §6.8 encryption are explicitly out of scope for the
initial cluster and declared as such in claims.json. The validator does not
stop at the first finding. It collects every divergence in one pass so you can
see the full diff.
PdfRConformancePolicy is an immutable policy for the
recommended-but-informative bands around PDF/R-1. The normative §6 floor is
never configurable. The policy controls only §A.5 implementation-limit
recommendations, the §6.6.1 multi-strip discouragement, and the §A.6 XMP
Extensible Metadata Platform (XMP) requirement for downstream PDF/A
re-classification.
ArlingtonValidator runs the upstream PDF Association Arlington PDF model
in report-only mode. It is advisory throughout the current cycle:
validateReportOnly() never throws. It falls back through three modes. When
the reference checker binary is available, it parses structured findings. When
only the pinned grammar is available, it emits one info finding that proves
the grammar pin loaded. When the grammar is unavailable, it returns an empty
list. A WaiverRegistry lets the orchestrator suppress known-acceptable
disagreements while preserving the audit trail.
The honesty rule matches the Cascading Style Sheets (CSS) support matrix and
the conformance module. A clause is Verified only when a passing test exists
and the normative clause is cited. A clause that is implemented but does not
have a dedicated passing fixture is Claimed. Out-of-scope clauses are stated
explicitly. They are not left ambiguous. A zero-finding PdfRValidator result
asserts only the clauses it checks. It makes no claim about §6.7 or §6.8, which
it does not implement.
API surface
Section titled “API surface”| Type | Kind | Key members |
|---|---|---|
NextPDF\Compliance\Validator\PdfRValidator | final class | validate(string $pdfBytes): list<PdfRValidationFinding> |
NextPDF\Compliance\Validator\PdfRValidationFinding | final readonly class | string $clause, 'error'|'warning'|'info' $severity, string $message |
NextPDF\Compliance\Profile\PdfRConformancePolicy | final readonly class | __construct(bool $enforceA5ImplementationLimits = true, bool $rejectMultiStripPages = false, bool $requireXmpForA6Compatibility = false); lax(), strictArchival() |
NextPDF\Compliance\Validator\ArlingtonValidator | final class | validateReportOnly(string $pdfPath): list<ArlingtonFinding> |
NextPDF\Compliance\Validator\WaiverRegistry | final class | isWaived(string $validator, string $ruleId, string $scopeKey): bool |
Code sample — Quick start
Section titled “Code sample — Quick start”<?php
declare(strict_types=1);
use NextPDF\Compliance\Validator\PdfRValidator;
$validator = new PdfRValidator();$findings = $validator->validate(file_get_contents('candidate.pdf'));
if ($findings === []) { // Zero divergences from the §6 clauses PdfRValidator implements. // This is NOT a PDF/R-1 certificate — §6.7 and §6.8 are not checked. echo "No PDF/R-1 §6 divergences detected (implemented clause set).\n";} else { foreach ($findings as $f) { echo "[{$f->severity}] §{$f->clause}: {$f->message}\n"; }}Code sample — Production
Section titled “Code sample — Production”<?php
declare(strict_types=1);
use NextPDF\Compliance\Validator\ArlingtonValidator;use NextPDF\Compliance\Validator\ArlingtonGrammarLoader;use NextPDF\Compliance\Validator\WaiverRegistry;
$validator = new ArlingtonValidator( waivers: new WaiverRegistry(/* loaded waiver entries */), grammar: new ArlingtonGrammarLoader(/* pinned submodule path */), adapter: null, // grammar-only mode when the reference checker is absent);
// Advisory by contract — never throws on findings.$findings = $validator->validateReportOnly('artifact.pdf');
foreach ($findings as $finding) { // Each finding pins the Arlington grammar commit SHA for provenance. logger()->info('arlington', [ 'rule' => $finding->ruleId, 'severity' => $finding->severity, 'grammarSha' => $finding->grammarSha, ]);}Edge cases & gotchas
Section titled “Edge cases & gotchas”PdfRValidatoris regex-based, not a full parser. It targets the deterministic output ofNextPDF\Writer\PdfRWriter. Use it as a drift detector for that writer, not as a general PDF parser.- Zero findings ≠ full PDF/R-1 conformance. §6.7 (incremental updates) and
§6.8 (encryption) are not implemented in the v5.1.0 cluster and are declared
out of scope in
claims.json. Treat a clean result as “no divergence on the implemented clause set”, and nothing more. - Arlington is advisory. In the current cycle,
validateReportOnly()never fails the build. Continuous integration (CI) consumes the artifact but does not gate on it. - PDF/A International Color Consortium (ICC) validation is not here. The
ISO 19005-4:2020 §6.2.2 OutputIntent ICC validation lives in the Enterprise
PdfAManager(nextpdf/pro), not in Core’s Compliance module. Core’s PDF/A surface is theConformanceModediscriminator only. - Waivers preserve the audit trail. A waived rule is suppressed from the finding list, but the waiver entry remains the record of why.
Performance
Section titled “Performance”PdfRValidator::validate() is a single linear pass with bounded regex walks
over the byte stream. Cost scales with document size and stays well inside the
module budget. In grammar-only mode, ArlingtonValidator is
O(grammar-rule-count) for the load-proof finding. The reference-checker path
runs as a subprocess and is bounded by the upstream tool, not by NextPDF. It is
an out-of-band CI step.
Security notes
Section titled “Security notes”These validators read untrusted PDF bytes. PdfRValidator strips
parenthesized and hex literals before key extraction, so a crafted Creator
string cannot inject a false /Name key (ISO 32000-1:2008 §7.3.4.2 escape
handling). The Arlington adapter runs the upstream checker as a bounded
subprocess. It treats a timeout or execution error as “no findings” rather
than trusting partial output. See the project threat model for the
PDF-parsing attack surface.
Conformance
Section titled “Conformance”| Standard | Clause | What the Compliance module does | Status |
|---|---|---|---|
| ISO 23504-1:2020 (PDF/R-1) | §6.5.7 | PdfRValidator enforces the {q,Q,cm,Do} content-stream operator allowlist | Verified (unit + standards-profile + integration tests pass) |
| ISO 23504-1:2020 (PDF/R-1) | §6.4.3 | PdfRValidator enforces the Info-dictionary key allowlist | Verified (test-backed) |
| ISO 23504-1:2020 (PDF/R-1) | §6.7, §6.8 | Not implemented in the v5.1.0 cluster | Explicit non-coverage (declared in claims.json) |
| ISO 32000-2:2020 (PDF 2.0) | §7.5.2 | Catalog-key allowlist walk | Claimed (regex walk; structural) |
| ISO 19005-4:2020 (PDF/A-4) | §6.7.3 | Identification-schema awareness via the Conformance module | Cross-reference (see /specifications/pdfa4/) |
Support is not conformance. A PdfRValidator run that returns no findings
proves only that the input did not diverge from the §6 clauses that the validator
implements. It does not assert that the file is a conforming PDF/R-1 file:
§6.7 and §6.8 are not checked. The Arlington cross-check is advisory and never
asserts conformance. For PDF/A-4, veraPDF is the authoritative validator and
runs out of band. See the conformance module for
the veraPDF oracle and its opt-in gating.
Citations are paraphrased from the NextPDF compliance corpus. The full
64-character reference_id digests are recorded in the page front-matter and
in _normative-evidence-conf.md.
See also
Section titled “See also”- Conformance module —
ConformanceModerouting and the veraPDF oracle - Audit module — audit trail and attestation surface
- PDF/A-4 specification mapping — ISO 19005-4 coverage and non-coverage
- Security module — PDF-parsing threat model