Skip to content

Data handling, PII, and telemetry

This page describes how the core engine handles data: what it reads, what it keeps in process memory, what it writes, the deterministic personally identifiable information (PII)-scrubbing transform it applies to audit bundles, and the opt-in telemetry path.

Boundary. This page describes library behavior. Deployment-level data residency covers which jurisdiction processes your documents, where temporary files are stored, how long output is retained, and which telemetry backend, if any, receives spans. Those choices are the integrator’s responsibility, not the library’s. The engine gives you fail-closed defaults and a scrubbing transform. It cannot make a data-residency or lawful-basis decision for you.

Terminal window
composer require nextpdf/core:^3

The PII scrubber and telemetry interceptor are part of the core package. The telemetry path stays inert unless an OpenTelemetry SDK is present and the caller wires the interceptor.

The engine acts as a processor for the data you hand it, in the ISO/IEC 29100 sense (iso_iec_29100#3.x56): it works on document content under the integrator’s instruction. It does not phone home, persist content beyond the output you request, or transmit document content to any NextPDF-operated endpoint.

Three data surfaces matter:

  1. Document input/output (I/O). The engine reads input from the path or stream you provide and writes output to the path or stream you provide. Intermediate buffers stay in process memory for the duration of the render and are released when it completes.
  2. Audit bundles. When auditing is enabled, the engine can emit a diagnostic bundle. Before serialization, that bundle passes through a deterministic PII scrubber.
  3. Telemetry. An optional OpenTelemetry interceptor can emit spans and metrics. It remains off unless the SDK is installed and the interceptor is constructed; span attributes pass through an attribute sanitizer.

The privacy posture follows the GDPR Art. 32 principle that pseudonymisation and minimization are example safeguards. Applying those safeguards is the controller’s responsibility (eu_gdpr#x50). The library supplies the scrubbing mechanism. The controller decides the lawful basis, retention, and residency.

This page does not re-document the audit or telemetry APIs (see /modules/core/audit/). The trust-relevant components are the default PII sanitizer applied to audit bundles and the OpenTelemetry interceptor’s attribute sanitizer. The sections below describe their effects, not their signatures.

By default, nothing leaves the process unless you ask it to. No code enables network egress; the absence of code is the default:

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
// Input read from disk, output written to disk. No telemetry SDK loaded,
// so the telemetry path completes in sub-microsecond no-ops. No content
// is transmitted anywhere.
$doc = Document::open('input.pdf');
$doc->save('output.pdf');

When an audit bundle is produced, the deterministic PII scrubber masks common categories before serialization. The transform is pure (no clocks, no randomness), so a bundle is byte-stable for a given input:

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Audit\DefaultPiiSanitiser;
$scrubber = new DefaultPiiSanitiser();
// E-mail → [EMAIL], IPv4 → [IPV4], IPv6 → [IPV6], X.500 DN attributes
// beyond CN → keyword preserved, value [REDACTED]. Deterministic.
$safe = $scrubber->sanitise($rawAuditField);
  • The scrubber is best-effort, not a guarantee. DefaultPiiSanitiser masks the categories it knows: RFC 5321 e-mail, IPv4/IPv6, and several RFC 4514 distinguished name (DN) attributes. A free-text field that contains a name or an identifier outside those patterns is not masked. Treat the scrubber as a defense-in-depth layer, not a compliance control that removes the operator’s review duty.
  • Temporary files are the deployment’s concern. The engine uses secure temporary-file handling. Where your TMPDIR resides, and whether TMPDIR is on encrypted storage in the right jurisdiction, are deployment decisions. The library cannot enforce data residency.
  • Telemetry is opt-in and sanitized, not absent-risk. When wired, the OpenTelemetry interceptor passes span attributes through an attribute sanitizer that enforces a zero-trust data policy. The backend you export to, and its retention and location, are entirely the integrator’s choice.
  • Lawful basis is not a library decision. The controller determines whether processing a given document is lawful, and on what basis, under GDPR / local law (eu_gdpr#x50); the library has no visibility into it.

The PII scrubber uses pure regex transforms with no I/O. The telemetry interceptor checks for SDK presence once at construction and caches the result. When no SDK is installed, every telemetry call completes in sub-microsecond time, so the privacy-preserving default (telemetry off) is also the zero-overhead default.

For reviewers, the data-handling boundary rules are:

  1. No covert egress. The engine transmits no document content to any NextPDF-operated endpoint. Outbound network access happens only on explicitly enabled, scheme-restricted resource fetches and on configured time-stamp authority (TSA), Online Certificate Status Protocol (OCSP), and certificate revocation list (CRL) endpoints, each behind the server-side request forgery (SSRF) guard.
  2. Deterministic, bounded scrubbing. The audit-bundle PII transform is deterministic and runs before serialization. It is a minimization aid in the spirit of GDPR Art. 32 (eu_gdpr#x50), not a certification of anonymization.
  3. Residency is the integrator’s. Inventory and mapping of where data is processed are organizational activities per the NIST Privacy Framework (nist_privacy_framework_1_1#x9.x1.p3); the library exposes the controls, and the integrator performs the mapping.
  4. Roles are external. Whether the deployment is a controller or a processor, and which obligations follow, is an ISO/IEC 29100 role determination (iso_iec_29100#3.x56) the library cannot make.

This is not a conformance profile. This page references GDPR Art. 32, ISO/IEC 29100, and the NIST Privacy Framework to locate the boundary between library behavior and controller responsibility. It does not assert GDPR compliance, ISO/IEC 29100 conformance, or any privacy certification. The data controller makes those determinations at the deployment level, not the library.