Skip to content

Dictionary system

Intent

This page defines how crushr stores and validates identity-bearing metadata such as names and paths without making that metadata a single point of total archive truth.

The dictionary system exists to preserve metadata completeness while remaining independent from payload verification.

Guarantees

  • Verified data is never silently corrupted or misrepresented
  • Unverifiable data is never presented as valid
  • Degraded or partial results are explicitly labeled and structured
  • Archive processing fails closed when required truth cannot be established
  • Filesystem writes are constrained and cannot escape intended boundaries

Behavior

crushr uses mirrored dictionaries so that metadata survival does not depend on one central authoritative copy.

Structure

Field Description
dict_id unique dictionary identifier
entries mapping of extent to filename or path metadata
checksum BLAKE3 over dictionary content

Mirroring model

  • dictionaries are duplicated across archive segments
  • no single primary dictionary exists
  • any valid dictionary may contribute metadata completeness

Validation

text if blake3(dict_bytes) != checksum: reject dictionary

Dictionary validation affects whether metadata can be trusted. It does not redefine payload integrity.

Failure behavior

Condition Result
one valid dictionary metadata completeness may be preserved
multiple valid dictionaries consistency must be established before metadata is trusted
no valid dictionary recovery may degrade to named or anonymous output depending on surviving evidence

Architectural consequence

Payload recovery must not depend on dictionary survival. Dictionary loss may degrade identity, but it must not be treated as equivalent to payload corruption.

Boundaries / Non-goals

This page does not authorize guessed names, repair behavior, or heuristic metadata synthesis.

Non-goals:

  • No best-effort reconstruction
  • No hidden failure smoothing
  • No compression-first tradeoffs
  • No external decode dependencies

Constraints

  • Dictionaries must remain small relative to payload
  • No cross-dependency between mirrored dictionaries may create a hidden central authority