Memory and streaming
Spec: ISO 32000-2, §7.5.4 ISO 32000-2 §7.5.4 Evidence: Mixed evidence
At a glance
Section titled “At a glance”A large PDF should not require a large heap. This page explains how NextPDF keeps process memory bounded as a document grows, where it streams to disk instead of accumulating in memory, and what a “performance budget” means here: a checked contract, not a headline figure.
Why this matters
Section titled “Why this matters”The PDF format does not force a generator to use a large heap. Its cross-reference table records a byte offset for every indirect object, so a reader needs random access into the file, not the whole file in memory. A generator can follow that model: emit objects as they are finished and remember only where they landed. If instead the whole document stays in the heap until the final write, page count drives memory linearly, and a report that is fine at a hundred pages fails the process at fifty thousand.
For batch and worker workloads this is the difference between a stable service and one that fails unpredictably under load. Bounded memory is a design property to be engineered, not a number to be hoped for.
The short version
Section titled “The short version”- The streaming writer is built so memory stays bounded per document. Each page is written to the output as soon as it is finalised. Then its buffer is released.
- The bookkeeping that would otherwise grow with object count — the
cross-reference offsets and the page-tree
Kidsreferences — is written to temporary streams opened withphp://temp/maxmemory:0, which spill to disk immediately rather than filling the PHP heap. - The design goal is O(1) heap per page: holding the document does not cost more as pages are added. That is the engineering target the writer is shaped around.
- A performance budget is a real, structured concept in the documentation system: a wall-clock cap and a peak-memory cap, expressed as a checked contract. It states an obligation. It is not a benchmark result.
- Concrete numbers are treated as a living signal, measured under a stated method, not frozen into prose where they could quietly go stale.
How NextPDF approaches it
Section titled “How NextPDF approaches it”The streaming writer follows one decision: never hold what you can emit.
- Start page A single active cursor; no document-wide page graph in memory.
- Finalise page Page content + page object written straight to the output stream.
- Release buffer The finalised page buffer is dropped; the heap returns to baseline.
- Record offset to disk Xref and Kids entries go to php://temp/maxmemory:0 — immediate disk spill.
- Close Pages-tree root, Catalog, and trailer written once at the end.
The disk-spill detail is the load-bearing one. PHP’s php://temp keeps a
small amount of data in memory and only spills when it exceeds a threshold. The
writer opens those temporary streams with the maxmemory:0 option, which
forces them to spill immediately — the in-memory threshold is zero. The
practical effect is that the per-object bookkeeping, which by definition
grows with the document, never accumulates in the heap. It accumulates on
disk, where size is not the constraint. Without that option the default
in-memory window would have to fill before spilling, which would defeat the
bounded-memory goal exactly when it matters most.
The performance budget is the other half of the story. It is a documentation-system contract rather than a marketing claim. The schema defines a budget as two bounded integers: a wall-clock cap in milliseconds and a peak resident-memory cap in mebibytes. A recipe that declares a budget declares an obligation that can be checked, the same way a typed signature declares an obligation a compiler can check. The value of a budget is that it is stated and enforced, not that it is small.
What the evidence says
Section titled “What the evidence says”This page is Evidence: Mixed evidence , and the mix is deliberate because the evidence is genuinely of three kinds.
- Code-backed mechanism. The streaming writer in
src/Writer/Streaming/StreamingPdfWriter.phpdocuments and implements the per-page emit-then-release cycle and opens its xref and Kids streams withphp://temp/maxmemory:0to force immediate disk spill so “PHP memory stays bounded regardless of object count.” The streaming, single-cursor, no-retained-tree design is also the architectural decision recorded in ADR-001 (the rendering pipeline holds at most O(depth) state, not O(n) nodes). - Design-principle budget. The
performance_budgetfield is a real, optional part of the documentation schema, defined as{ wall_ms, peak_mb }with explicit upper bounds. It is an enforceable contract by design. - Benchmark as a living signal. ADR-001 is explicit that the controlled large-document peak-memory and wall-clock figures are an empirical target to be collected and recorded under a stated method — not a number to be asserted in prose. This page therefore states the mechanism and the contract, and points concrete figures at the place that measures them.
The format makes the goal reasonable rather than aspirational. Because the cross-reference table is a per-object offset index per Spec: ISO 32000-2, §7.5.4 ISO 32000-2 §7.5.4 , a generator is able to write objects as it finishes them and keep only their offsets. Bounded memory is consistent with the file format, not a fight against it.
Practical example
Section titled “Practical example”Bounded memory is a property of how you generate, not a flag you set. A batch loop that finalises and releases each document keeps the heap flat across the run:
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\DocumentFactory;use NextPDF\Core\PdfFactory;use NextPDF\Graphics\ImageRegistry;use NextPDF\Typography\FontRegistry;
// Process-lifetime, shared once.$factory = PdfFactory::new() ->withCompress(true) ->withDocumentFactory(new DocumentFactory( new FontRegistry(), new ImageRegistry(maxCacheBytes: 50 * 1024 * 1024), ));
// Per-document, created and released each iteration.foreach ($invoiceBatch as $invoice) { $doc = $factory->create(); $doc->addPage(); $doc->writeHtml($invoice->toHtml()); $doc->save($invoice->outputPath()); unset($doc); // the document model is not carried into the next iteration}The registries are shared because parsing fonts and images once is the point of a worker. The document is not shared, and it is released each pass — which keeps batch memory bounded by one document, not by the batch.
Common misconception
Section titled “Common misconception”The most common misconception is treating “bounded memory” as a benchmark claim — expecting a megabyte number to quote. That inverts what is being said. The guarantee here is structural: the writer is built so that holding a document does not cost more as pages are added. A specific peak figure depends on page content, fonts, and images and is only meaningful with its measurement method attached, which is why it belongs to a benchmark, not to this sentence.
A second trap: assuming php://temp already protects you. It does — but
only after its default in-memory window fills. The maxmemory:0 option is
what makes the spill immediate. The detail is the mechanism. Without it the
property would not hold under exactly the large documents it exists for.
Limits and boundaries
Section titled “Limits and boundaries”This page explains the streaming mechanism and the meaning of a performance
budget. It does not state measured peak-memory or throughput figures.
Those are produced by the benchmarking discipline under a declared method,
and ADR-001 explicitly defers the empirical numbers to that measurement.
Bounded “per document” does not mean constant regardless of a single
document’s content: a page with many large embedded images still costs what
those images cost. What does not grow is the per-page bookkeeping and the
retained page graph. Not every generation path is the streaming writer.
Which paths stream and which buffer is determined by the code and the
pipeline shape, not by this overview. The mechanism described is accurate as
of this page’s review date. The authoritative sources are
src/Writer/Streaming/ and ADR-001 in the core repository.
The streaming and bounded-memory design is a Core property. Editions do not change it:
| Edition | Availability |
|---|---|
| Core | Core provides the streaming, disk-spilling writer design. |
| Pro | Pro inherits the same bounded-memory writer; it adds features, not a different memory model. |
| Enterprise | Enterprise inherits the same bounded-memory writer; it adds features, not a different memory model. |
Related docs
Section titled “Related docs”- The pipeline model — where the writer stage sits in the document flow.
- Honest benchmarking — how NextPDF reports the numbers this page deliberately does not assert.
- High-volume document generation — the batch scenario this mechanism is built for.
Glossary
Section titled “Glossary”- Bounded memory — a design property where holding the document does not consume more heap as pages are added (the O(1)-per-page goal).
- Streaming writer — the writer that emits each page to the output and releases its buffer instead of retaining the whole document.
php://temp/maxmemory:0— a PHP temporary stream forced to spill to disk immediately, used for the growing per-object bookkeeping.- Performance budget — a structured documentation contract: a wall-clock cap and a peak-memory cap, stated and checkable.
- Living signal — a measured value reported with its method under stated conditions, rather than a fixed number embedded in prose.