Format Evolution¶

crushr is moving quickly, so this page is intended as a living map rather than a frozen history.

The short version is:

crushr started as an integrity-first archive format and is evolving into a salvage-first archive design where the strongest truth lives with the payload rather than in a fragile metadata hub.

This page exists to help readers understand how the format changed, what the harness proved, and why the current direction looks the way it does.

The original assumption¶

The early design assumption was familiar to anyone who has worked with archives:

archive integrity depends on metadata integrity
metadata corruption causes recovery failure

That assumption is reasonable because it describes how many traditional archive containers behave in practice.

The original goal of crushr was therefore framed around perfect integrity in the conventional sense.

The turning point¶

Once the corruption harness was in place, the project stopped being driven only by theory.

Instead, crushr began evolving through repeated destructive tests:

build deterministic archive variants
corrupt them in controlled ways
measure what still survives
keep what helps
cut what does not

That process changed the architecture.

Evolution timeline¶

flowchart TD
    A["Initial integrity-first archive idea"] --> B["Corruption harness introduced"]
    B --> C["FORMAT-05 self-identifying payload blocks"]
    C --> D["FORMAT-06 file manifests"]
    D --> E["FORMAT-07 graph-aware salvage reasoning"]
    E --> F["FORMAT-08 metadata placement comparison"]
    F --> G["FORMAT-09 metadata survivability audit"]
    G --> H["FORMAT-10 pruning / simplification direction"]

Phase-by-phase evolution¶

Early phase — integrity-first container thinking¶

At the start, crushr looked more like a traditional archival design problem:

protect metadata
duplicate metadata
make the container robust
recover files from preserved structural information

This phase was important because it established the baseline assumptions that the harness later challenged.

FORMAT-05 — self-identifying payload blocks¶

This was the major architectural shift.

Payload blocks gained enough local truth to be independently identified and verified.

That meant recovery no longer had to start from a surviving index. Salvage could begin with the payload itself.

What changed conceptually

old model:
metadata -> file structure -> payload

new model:
payload truth -> reconstruction -> metadata as supporting evidence

What the harness showed

This phase produced the first major recovery improvement. It was the point where crushr stopped looking like “archive with better metadata” and started looking like “archive where the data can still explain itself after damage.”

FORMAT-06 — file manifests¶

This phase added file-level truth so the system could better reason about:

file completeness
expected extent count
stronger recovery confidence

What changed conceptually

FORMAT-06 did not replace FORMAT-05. It layered file-level structure on top of block-level truth.

What the harness showed

FORMAT-06 improved confidence and verification detail more than it improved top-line recovery counts. That was still useful: not every phase needs to create dramatic jumps if it sharpens what the system can prove.

FORMAT-07 — graph-aware salvage reasoning¶

This phase changed the reasoning model.

Instead of flat metadata checks, salvage began reasoning over surviving verified relationships:

block belongs to extent
extent belongs to file
file links to name/path if that truth survives

What changed conceptually

Recovery classes became more explicit and defensible. The system could explain not just what was recovered, but why that level of recovery was justified.

FORMAT-08 — metadata placement comparison¶

This phase tested three metadata placement strategies:

fixed_spread
hash_spread
golden_spread

The expectation was that better placement might improve metadata survivability.

What the harness showed

All three strategies produced effectively identical results.

That suggested placement was not the real issue.

FORMAT-09 — metadata survivability and necessity audit¶

This phase asked the blunt question:

Do the extra metadata layers actually survive enough to matter?

What the harness showed

The answer was stark:

manifest checkpoint survival stayed at or near zero
path checkpoint survival stayed at or near zero
verified metadata node count stayed at or near zero

This strongly suggested that the current metadata layers were not driving resilience.

What the project learned¶

The experiments so far point to a simple but important conclusion:

traditional metadata is not the main resilience mechanism in crushr
self-identifying payload truth is

That does not mean metadata is useless.

It means metadata appears to be:

optional support
helpful for confidence and naming when it survives
not the primary source of recoverability

Current direction¶

The format is now moving toward a more minimal, evidence-driven architecture.

The working model looks like this:

minimal container framing
+ self-identifying payload blocks
+ salvage reasoning
+ only the metadata that proves its worth

This is a major refinement of the original mission.

The original goal was perfect integrity.

The newer, more realistic formulation is:

Preserve perfect integrity for whatever survives, and recover only what can still be proven.

Why this matters¶

This direction is unusual for archive formats.

Traditional archive design usually assumes that metadata must remain authoritative.

crushr is increasingly exploring a different principle:

data should carry enough truth to survive structural failure.

That is the core reason the project is interesting.

What comes next¶

The next major question is no longer “how do we add more metadata?”

It is:

which metadata layers actually matter?
which ones can be pruned?
how small and elegant can the format become without losing recovery capability?

That is why the post-FORMAT-09 direction is likely to emphasize simplification rather than growth.

Reading this page later¶

This page will need regular updates. The format is evolving quickly, and some current conclusions may be refined or overturned by later data.

That is expected.

The important thing is that crushr is not being shaped by attachment to earlier assumptions. It is being shaped by what survives the harness.

Format Evolution¶

The original assumption¶

The turning point¶

Evolution timeline¶

Phase-by-phase evolution¶

Early phase — integrity-first container thinking¶

FORMAT-05 — self-identifying payload blocks¶

FORMAT-06 — file manifests¶

FORMAT-07 — graph-aware salvage reasoning¶

FORMAT-08 — metadata placement comparison¶

FORMAT-09 — metadata survivability and necessity audit¶

What the project learned¶

Current direction¶

Why this matters¶

What comes next¶

Reading this page later¶

Read next¶