Streams and filters
ISO 32000-2 §7.4 Evidence: Standard-backed
At a glance
Section titled “At a glance”Most of a real PDF’s bytes are inside streams: page content, fonts, images, and the cross-reference stream itself. Almost none of those bytes are stored raw; they pass through one or more filters first. This page covers which filters you meet, what each is for, where they cause trouble, and why NextPDF pins its compression so the same input always produces the same bytes.
Why this matters
Section titled “Why this matters”A stream and its filter are a contract: “these bytes are deflate-compressed,
then base-85 encoded — decode in that order to get the real data.” If the
/Filter entry disagrees with what the bytes actually are, or the
/Length is wrong, or two filters are listed in the wrong order, the stream
is undecodable and the object it carried is lost. A reader does not
heuristically guess; it does what the dictionary tells it.
There is a second, quieter cost. If a library’s compressor is nondeterministic — different zlib build, different level, different internal block boundaries — then two runs that should produce an identical PDF produce two different files. That breaks byte-level reproducibility. Broken reproducibility then breaks golden-file tests, signed-build verification, and any pipeline that diffs output. Filters determine both whether the PDF is correct and whether the PDF is the same.
The short version
Section titled “The short version”- A stream object is a dictionary plus a block of bytes, wrapped in
stream…endstream, with a/Lengthand usually a/Filter. - The
/Filterentry names the decode filter — or an array of filters applied as a pipeline, in order. - The filters split into two families: compression (FlateDecode, LZWDecode, RunLengthDecode, DCTDecode, JPXDecode, JBIG2Decode) and ASCII transport (ASCIIHexDecode, ASCII85Decode), plus the special Crypt filter for encryption.
- The one you will see most is FlateDecode — zlib/deflate. It is the default for content, fonts, and the cross-reference stream.
- NextPDF pins its Flate output to a fixed level and format so the same input bytes always compress to the same output bytes.
How NextPDF approaches it
Section titled “How NextPDF approaches it”NextPDF emits stream objects through one buffer helper and compresses through one pinned compressor — on purpose.
BinaryBuffer::writeStream() (src/Support/BinaryBuffer.php) wraps stream
content in its dictionary, always writing a /Length equal to the actual
byte length and merging in any extra entries the caller supplies, such as
/Filter. There is no path where the declared length can disagree with the
bytes written, because the length is taken from the content string itself.
Compression goes through PinnedZlibCompressor
(src/Writer/PinnedZlibCompressor.php). This class exists for one reason.
gzcompress without an explicit level defers to the zlib runtime default,
which has historically varied across builds. The 2-byte zlib header even
encodes the level indirectly, so “the default” is not a stable output. The
compressor pins the level to the RFC 1951 maximum and always emits
zlib-wrapped deflate (RFC 1950 header + Adler-32 trailer), which is exactly
what /Filter /FlateDecode expects. A hard failure from zlib becomes a
typed exception rather than a silent fallback to uncompressed output — a
stream is never quietly emitted raw.
The cross-reference stream itself is a worked example of all of this:
CrossReferenceStream (src/Core/CrossReferenceStream.php) builds a binary
table, compresses it, and emits it as a stream object with
/Type /XRef, a /W field-width array, and /Filter /FlateDecode. The
index that lets a reader find every object is, itself, a filtered stream.
| Filter | Family | What it is for | Where it goes wrong |
|---|---|---|---|
| FlateDecode | Compression | zlib/deflate; the default for content, fonts, xref streams | A non-deterministic zlib build makes “identical” PDFs differ byte-for-byte |
| LZWDecode | Compression | Older Lempel–Ziv–Welch compression | Legacy; superseded by Flate, occasionally still seen in old files |
| DCTDecode | Compression | JPEG-encoded colour/grayscale images | Lossy — re-encoding an already-DCT image degrades it again |
| JPXDecode | Compression | JPEG 2000 wavelet image data | Not permitted by some archival profiles; wide support is uneven |
| JBIG2Decode | Compression | Bilevel (1-bit) image compression | Must not be used with inline images; lossy modes can alter scans |
| RunLengthDecode | Compression | Byte-oriented run-length | Only helps data with long single-byte runs; can grow other data |
| ASCIIHexDecode | Transport | Binary as hex digits | Doubles size; only for 7-bit-safe channels, never for size |
| ASCII85Decode | Transport | Binary as base-85 ASCII | ~25% overhead; a transport convenience, not compression |
| Crypt | Security | Applies the document’s security handler | A cross-reference stream must not use a Crypt filter |
The PDF standard filter set, by family, with the failure each one is associated with. NextPDF writes FlateDecode for content, fonts, and the cross-reference stream; the ASCII transport filters are for 7-bit channels, never for reducing size.
What the evidence says
Section titled “What the evidence says”The filter mechanism is defined by Spec: ISO 32000-2, §7.4 ISO 32000-2 §7.4 . A stream dictionary names its filters through /Filter.
When the entry lists more than one filter, those filters form a decode
pipeline and are applied in sequence. A writer encodes a stream to compress it
or to make it 7-bit-safe. A reader invokes the corresponding decode filters to
recover the original data. Evidence: Standard-backed
The standard’s filter table classifies each filter. FlateDecode decompresses zlib/deflate-encoded data, reproducing the original text or binary data. DCTDecode reproduces image samples that approximate the original via JPEG — the word “approximate” is the standard telling you it is lossy. LZWDecode, RunLengthDecode, JBIG2Decode, JPXDecode, and the Crypt filter are each defined there too, with JBIG2 explicitly barred from inline images.
The cross-reference stream applies the format’s own machinery to itself: it
is a stream object (/Type /XRef,
Spec: ISO 32000-2, §7.5.8 ISO 32000-2 §7.5.8 ) whose /W array
states the byte width of each entry field in the decoded stream. The
standard requires that it is not encrypted and does not use a Crypt filter.
NextPDF’s CrossReferenceStream follows this exactly — FlateDecode,
explicit /W, no encryption.
Practical example
Section titled “Practical example”A page content stream, compressed with Flate. This is the overwhelmingly
common shape: a dictionary with /Length and /Filter, then the
compressed bytes between stream and endstream.
<?php
declare(strict_types=1);
use NextPDF\Writer\PinnedZlibCompressor;
// The marking operators a page content stream carries, uncompressed.$content = "BT /F1 12 Tf 72 712 Td (Hello) Tj ET\n";
// NextPDF compresses through the pinned compressor: fixed level,// fixed zlib-wrapped format. The same $content always yields the// same $compressed bytes, on any supported PHP/zlib build.$compressed = PinnedZlibCompressor::compress($content);
// Emitted as a stream object. /Length is the real byte length of// $compressed; /Filter names the decode the reader must apply.// N 0 obj// << /Length <strlen($compressed)> /Filter /FlateDecode >>// stream// <$compressed bytes>// endstream// endobjA reader does the inverse: read /Length bytes, run them through
FlateDecode because /Filter says so, and get the original operators back.
Pin the compressor and that round trip is not only correct. It is
identical every time, which is what golden-file and signed-build checks
rely on.
Common misconception
Section titled “Common misconception”The trap is treating the ASCII filters as compression. ASCIIHexDecode and ASCII85Decode make a stream larger — roughly double, and roughly 25% respectively. They exist to move binary data through a channel that is only safe for 7-bit text, not to save space. Choosing ASCII85 to “shrink” a PDF does the opposite. The second half of the same misconception is believing FlateDecode is lossless for images “for free”. Flate is lossless, but if the image was already DCT (JPEG) encoded, wrapping it again or transcoding it through a lossy filter degrades it regardless of what Flate does around it. The filter pipeline preserves exactly what you feed it — including a re-compression artifact you fed it by accident.
Limits and boundaries
Section titled “Limits and boundaries”This page covers how filters are declared and applied, not the bit-level algorithm inside each one. The determinism guarantee is specifically about NextPDF’s Flate output for the streams it writes. It holds across PHP minor versions and standards-conforming zlib builds, but the standard explicitly permits a deflate encoder to choose different internal block boundaries, so byte-identical output across genuinely different zlib implementations (for example a stock zlib versus zlib-ng) is not promised. The build environment is pinned for that reason.
NextPDF chooses FlateDecode and the ASCII transport filters for the data it emits. It is not an image transcoder. It does not promise to re-pack an arbitrary inbound JPEG2000 or JBIG2 stream, and lossy image trade-offs are a property of the source data, not something a writer can undo.
Mini-FAQ
Section titled “Mini-FAQ”Why is FlateDecode everywhere? It is lossless, general-purpose, well-supported, and a good fit for the text-and-operators content of most PDFs. It is the safe default for content streams, embedded fonts, and the cross-reference stream.
Can I turn compression off?
You can omit /Filter and store raw bytes, and a reader will accept it. The
file gets larger and nothing else improves; there is rarely a reason outside
debugging.
Why pin the compression level at all? So the output is reproducible. An unpinned level (or zlib build) can change the compressed bytes without changing the decompressed content — correct, but not identical, which defeats byte-level verification.
Related docs
Section titled “Related docs”- What a PDF actually is — the object model the streams in this page live inside.
- Fonts: the hard part — embedded font programs are filtered streams, with their own failure modes.
- PDF 2.0: what changed — how the 2.0 baseline treats streams and the cross-reference stream NextPDF defaults to.
Glossary
Section titled “Glossary”- Stream object — a dictionary plus a block of bytes between
streamandendstream, carrying a/Lengthand usually a/Filter. - Filter — a named decoding transformation a reader applies to a
stream’s bytes (for example
FlateDecode). - Filter pipeline — an array of filters applied in sequence; the array order is the decode order.
- FlateDecode — the zlib/deflate filter; the default compression for content, fonts, and cross-reference streams.
- DCTDecode — the JPEG image filter; lossy, so re-encoding degrades the image again.
- ASCII transport filter — ASCIIHexDecode / ASCII85Decode; makes data 7-bit-safe at the cost of size — not compression.
- Deterministic compression — producing byte-identical compressed output for identical input, achieved by pinning the compressor’s level and format.