Data handling, PII, and telemetry
At a glance
Section titled “At a glance”This page describes how the core engine handles data: what it reads, what it keeps in process memory, what it writes, the deterministic personally identifiable information (PII)-scrubbing transform it applies to audit bundles, and the opt-in telemetry path.
Boundary. This page describes library behavior. Deployment-level data residency covers which jurisdiction processes your documents, where temporary files are stored, how long output is retained, and which telemetry backend, if any, receives spans. Those choices are the integrator’s responsibility, not the library’s. The engine gives you fail-closed defaults and a scrubbing transform. It cannot make a data-residency or lawful-basis decision for you.
Install
Section titled “Install”composer require nextpdf/core:^3The PII scrubber and telemetry interceptor are part of the core package. The telemetry path stays inert unless an OpenTelemetry SDK is present and the caller wires the interceptor.
Conceptual overview
Section titled “Conceptual overview”The engine acts as a processor for the data you hand it, in the ISO/IEC
29100 sense (iso_iec_29100#3.x56): it works on document content under the
integrator’s instruction. It does not phone home, persist content beyond the
output you request, or transmit document content to any NextPDF-operated
endpoint.
Three data surfaces matter:
- Document input/output (I/O). The engine reads input from the path or stream you provide and writes output to the path or stream you provide. Intermediate buffers stay in process memory for the duration of the render and are released when it completes.
- Audit bundles. When auditing is enabled, the engine can emit a diagnostic bundle. Before serialization, that bundle passes through a deterministic PII scrubber.
- Telemetry. An optional OpenTelemetry interceptor can emit spans and metrics. It remains off unless the SDK is installed and the interceptor is constructed; span attributes pass through an attribute sanitizer.
The privacy posture follows the GDPR Art. 32 principle that pseudonymisation
and minimization are example safeguards. Applying those safeguards is the
controller’s responsibility (eu_gdpr#x50). The library supplies the
scrubbing mechanism. The controller decides the lawful basis, retention, and
residency.
API surface
Section titled “API surface”This page does not re-document the audit or telemetry APIs (see
/modules/core/audit/). The trust-relevant components are the default PII
sanitizer applied to audit bundles and the OpenTelemetry interceptor’s
attribute sanitizer. The sections below describe their effects, not their
signatures.
Code sample — Quick start
Section titled “Code sample — Quick start”By default, nothing leaves the process unless you ask it to. No code enables network egress; the absence of code is the default:
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
// Input read from disk, output written to disk. No telemetry SDK loaded,// so the telemetry path completes in sub-microsecond no-ops. No content// is transmitted anywhere.$doc = Document::open('input.pdf');$doc->save('output.pdf');Code sample — Production
Section titled “Code sample — Production”When an audit bundle is produced, the deterministic PII scrubber masks common categories before serialization. The transform is pure (no clocks, no randomness), so a bundle is byte-stable for a given input:
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Audit\DefaultPiiSanitiser;
$scrubber = new DefaultPiiSanitiser();// E-mail → [EMAIL], IPv4 → [IPV4], IPv6 → [IPV6], X.500 DN attributes// beyond CN → keyword preserved, value [REDACTED]. Deterministic.$safe = $scrubber->sanitise($rawAuditField);Edge cases & gotchas
Section titled “Edge cases & gotchas”- The scrubber is best-effort, not a guarantee.
DefaultPiiSanitisermasks the categories it knows: RFC 5321 e-mail, IPv4/IPv6, and several RFC 4514 distinguished name (DN) attributes. A free-text field that contains a name or an identifier outside those patterns is not masked. Treat the scrubber as a defense-in-depth layer, not a compliance control that removes the operator’s review duty. - Temporary files are the deployment’s concern. The engine uses
secure temporary-file handling. Where your
TMPDIRresides, and whetherTMPDIRis on encrypted storage in the right jurisdiction, are deployment decisions. The library cannot enforce data residency. - Telemetry is opt-in and sanitized, not absent-risk. When wired, the OpenTelemetry interceptor passes span attributes through an attribute sanitizer that enforces a zero-trust data policy. The backend you export to, and its retention and location, are entirely the integrator’s choice.
- Lawful basis is not a library decision. The controller determines
whether processing a given document is lawful, and on what basis, under GDPR
/ local law (
eu_gdpr#x50); the library has no visibility into it.
Performance
Section titled “Performance”The PII scrubber uses pure regex transforms with no I/O. The telemetry interceptor checks for SDK presence once at construction and caches the result. When no SDK is installed, every telemetry call completes in sub-microsecond time, so the privacy-preserving default (telemetry off) is also the zero-overhead default.
Security notes
Section titled “Security notes”For reviewers, the data-handling boundary rules are:
- No covert egress. The engine transmits no document content to any NextPDF-operated endpoint. Outbound network access happens only on explicitly enabled, scheme-restricted resource fetches and on configured time-stamp authority (TSA), Online Certificate Status Protocol (OCSP), and certificate revocation list (CRL) endpoints, each behind the server-side request forgery (SSRF) guard.
- Deterministic, bounded scrubbing. The audit-bundle PII transform is
deterministic and runs before serialization. It is a minimization aid in
the spirit of GDPR Art. 32 (
eu_gdpr#x50), not a certification of anonymization. - Residency is the integrator’s. Inventory and mapping of where data is
processed are organizational activities per the NIST Privacy Framework
(
nist_privacy_framework_1_1#x9.x1.p3); the library exposes the controls, and the integrator performs the mapping. - Roles are external. Whether the deployment is a controller or a
processor, and which obligations follow, is an ISO/IEC 29100 role
determination (
iso_iec_29100#3.x56) the library cannot make.
Conformance
Section titled “Conformance”This is not a conformance profile. This page references GDPR Art. 32, ISO/IEC 29100, and the NIST Privacy Framework to locate the boundary between library behavior and controller responsibility. It does not assert GDPR compliance, ISO/IEC 29100 conformance, or any privacy certification. The data controller makes those determinations at the deployment level, not the library.