Skip to content

Archive format boundary

Intent

This page defines the high-level on-disk structure of a crushr archive.

It describes the archive boundary in structural terms only. It does not define recovery policy, extraction workflow, or user-facing command semantics beyond what the format must support.

Guarantees

  • Verified data is never silently corrupted or misrepresented
  • Unverifiable data is never presented as valid
  • Degraded or partial results are explicitly labeled and structured
  • Archive processing fails closed when required truth cannot be established
  • Filesystem writes are constrained and cannot escape intended boundaries

Behavior

Layout

text [header] [extent blocks...] [dictionary blocks...] [tail frame]

Field Description
magic format identifier
version format version
flags global flags

Extent block

text [extent_identity] [compressed_payload]

Extent blocks bind payload data to deterministic extent identity. Their purpose is to preserve payload truth independently from higher-level metadata state.

Dictionary block

text [dict_id] [entries] [checksum]

Dictionary blocks preserve identity-bearing metadata such as naming and path relationships. Dictionary validity affects metadata completeness, not payload integrity.

Tail frame

Contains:

  • dictionary index
  • extent index
  • integrity markers

Structural model

No single structure is treated as the sole authority for all archive truth.

Payload integrity and metadata completeness are separable. Recovery behavior depends on what surviving archive evidence can still be validated or verified.

Boundaries / Non-goals

This page does not describe repair behavior, guessed reconstruction, or alternate trust vocabularies.

Non-goals:

  • No best-effort reconstruction
  • No hidden failure smoothing
  • No compression-first tradeoffs
  • No external decode dependencies

Constraints

  • Extents must be independently readable
  • No central manifest dependency may become a single point of total truth