ContentStream: PDF content-stream emitter
At a glance
Section titled “At a glance”The ContentStream module emits Portable Document Format (PDF) marked-content operators. It opens and closes structure tags and artifacts, tracks nesting depth, and returns the operator buffer.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”ContentStreamBuilder is the module’s single class. It builds the
marked-content layer of a page content stream. A content stream encodes page
content as a sequence of operators — ISO 32000-2 §8.
The builder then emits marked-content operators around that content.
append() adds raw operator bytes verbatim. The builder does not escape this
input. You own its validity. Use this boundary when the HTML pipeline and the
Graphics module need to interleave their own operators.
beginTag() opens a structure-tagged sequence. It emits a BDC operator with
an MCID property list, per ISO 32000-2 §14.6.
endTag() emits the matching EMC
operator. The builder counts nesting depth. If you call endTag() with no open
sequence, it throws PageLayoutException instead of writing an unbalanced
EMC.
beginArtifact() opens an artifact sequence. Use artifacts for pagination
decoration — headers, footers, page numbers, and rules — that must stay out of
the structure tree, per ISO 32000-2 §14.8.2.2.
The subtype is one of four ISO values:
Pagination, Layout, Page, or Background. Prefer the typed
ArtifactSubtype enum. The string overload is validated against the enum, so a
non-standard value fails immediately.
relabelTag() rewrites a previously emitted tag in place. finish() returns
the full buffer and throws if marked content is unbalanced. drain() returns
the buffer so far without the balance check, for incremental streaming.
peek() returns the buffer without consuming it. reset() clears the state.
API surface
Section titled “API surface”| Method | Signature | Role |
|---|---|---|
append() | append(string $raw): void | Adds raw operator bytes verbatim (no escaping) |
beginTag() | beginTag(string $structType, int $mcid): void | Opens a BDC structure sequence |
endTag() | endTag(): void | Closes the innermost sequence with EMC |
beginArtifact() | beginArtifact(ArtifactSubtype|string $type): void | Opens an artifact sequence |
endArtifact() | endArtifact(): void | Closes the innermost artifact |
getMarkedContentDepth() | getMarkedContentDepth(): int | Returns current nesting depth |
relabelTag() | relabelTag(string $old, string $new, int $mcid): void | Rewrites an emitted tag in place |
finish() | finish(): string | Returns the full buffer; throws if unbalanced |
drain() | drain(): string | Returns the buffer without the balance check |
peek() | peek(): string | Returns the buffer without consuming it |
reset() | reset(): void | Clears all state |
Run composer docs:generate-api-php -- --module=ContentStream to generate the
full PHPDoc table.
Code sample — Quick start
Section titled “Code sample — Quick start”<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\ContentStream\ContentStreamBuilder;
$builder = new ContentStreamBuilder();
$builder->beginTag('P', mcid: 0);$builder->append("BT /F1 12 Tf 72 720 Td (Hello) Tj ET\n");$builder->endTag();
$pageContent = $builder->finish();Code sample — Production
Section titled “Code sample — Production”Use this pattern to wrap a paragraph in a structure tag and a footer in an
artifact. The pattern streams the buffer incrementally with drain().
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Accessibility\ArtifactSubtype;use NextPDF\ContentStream\ContentStreamBuilder;
$builder = new ContentStreamBuilder();
$builder->beginTag('H1', mcid: 0);$builder->append($titleOperators);$builder->endTag();
$builder->beginArtifact(ArtifactSubtype::Pagination);$builder->append($footerOperators);$builder->endArtifact();
if ($builder->getMarkedContentDepth() !== 0) { throw new RuntimeException('Unbalanced marked content before flush.');}
$chunk = $builder->drain();Edge cases & gotchas
Section titled “Edge cases & gotchas”append()does not escape input. Pass only valid operator bytes. The builder trusts the caller.endTag()andendArtifact()throw on underflow. Never close a sequence that is not open.finish()checks balance and throws when depth is not zero.drain()does not check. Usedrain()only for incremental streaming.- The depth counter does not distinguish tags from artifacts.
EMCcloses the innermost sequence of either kind. Nest sequences in strict order. - The string overload of
beginArtifact()is validated against the enum. A non-standard subtype fails at the call, not in the output. relabelTag()rewrites an emitted tag. Use the samemcidyou used to emit it.
Performance
Section titled “Performance”Each operation is an O(1) string append, except relabelTag(), which performs
an O(buffer) rewrite. The module holds one string buffer and one integer depth
counter. It performs no parsing and allocates only the buffer. The reference
workload budget is 1500 ms wall and 64 MB peak. This module remains far below
it.
Security notes
Section titled “Security notes”append() is the trust boundary. The builder writes bytes verbatim, so
upstream code must escape any string that reaches a literal-string operator. The
canonical escaper is PdfStringEscaper::escapeLiteral() (ADR-015). Never pass
unescaped user text through append(). The balance checks in endTag(),
endArtifact(), and finish() prevent a malformed marked-content tree from
reaching the Writer. See /modules/core/security/ for the document threat
model.
Conformance
Section titled “Conformance”The module emits marked-content operator structures consistent with ISO
32000-2: BDC/EMC pairs with an MCID property list per §14.6,
and artifact sequences per §14.8.2.2.
These are implementation facts. The evidence is src/ContentStream/ContentStreamBuilder.php,
the src/Accessibility/ArtifactSubtype.php enum, and
tests/Unit/ContentStream/ContentStreamBuilderMarkedContentBalanceCoverageTest
plus ContentStreamBuilderRelabelTagInvariantTest. They are not a claim of
end-to-end PDF/UA-2 or PDF 2.0 conformance. An external oracle validates the
tagged-PDF structure that these operators participate in:
tests/Integration/Accessibility/VeraPdfUa2GoldenTest checks a generated
fixture against veraPDF for the PDF/UA-2 profile. That oracle test skips when
the veraPDF binary is absent, so it is an opt-in gate. State that this module
“produces marked-content structures; PDF/UA-2 conformance is validated by
veraPDF” rather than asserting unqualified conformance.