Benchmark contract (v0.4.17)¶
Intent¶
This page defines the locked benchmark methodology for crushr.
It describes the measurement contract used to evaluate compression, packing, and extraction behavior. It does not redefine product guarantees, recovery vocabulary, or operator-facing trust classes.
Guarantees¶
- Benchmark runs are attributable to dataset, command, comparator, and environment
- Preservation profile is explicit for every crushr benchmark run
- Results are recorded exactly as observed
- Missing metrics are reported explicitly rather than silently omitted
- Benchmark methodology remains reproducible and reviewable
Behavior¶
Principles¶
- Benchmarks must be reproducible.
- Benchmark runs must be attributable to dataset, tool/profile, command, and environment.
- Preservation profile must be explicit for every crushr run.
- Results are recorded exactly as observed; no silent exclusions.
Dataset classes¶
Deterministic datasets are generated through the canonical harness entrypoint:
scripts/benchmark/harness.py datasets
Legacy direct script paths (generate_datasets.py, run_benchmarks.py) are compatibility shims; use harness.py for reproducible operations and documentation parity.
Generated under .bench/datasets/:
small_mixed_tree- hundreds to low thousands of files
- mixed file sizes
- empty directories
- symlinks
- xattrs when supported by host/filesystem
medium_realistic_tree- tens of thousands of files
- mixed text/binary content
- nested project-like trees
large_stress_tree- high file count plus repeated large binaries
- designed to surface scaling behavior and memory pressure
Determinism controls:
- fixed generation seed
- deterministic payload bytes from digest expansion
- fixed mtime for generated files/directories
- emitted
dataset_manifest.jsonwith counts and byte totals
Comparison set and commands¶
The benchmark harness executes a centralized comparator set from scripts/benchmark/contract.py.
Baseline (always):
- tar + zstd (zstd -3)
- tar + xz (xz -3)
- crushr pack --preservation full --level 3
- crushr pack --preservation basic --level 3
Optional experiment comparators (enabled via harness flags):
- tar + zstd with a deterministic trained dictionary (tar_zstd_dict)
- tar + zstd level variants (controlled by --zstd-levels)
- tar + zstd strategy variants (controlled by --zstd-strategies)
- deterministic file-ordering/locality variants for tar comparators (controlled by --ordering-strategies)
- deterministic lightweight content-class clustering for tar comparators (controlled by --content-class-strategy)
The comparator set, zstd level and strategy experiment model, dataset names, dictionary experiment model, and content-class experiment model are centralized and used by both run orchestration and benchmark assumptions fingerprinting.
Ordering experiments are centralized in the same model (lexical, size_ascending, size_descending, extension_grouped, kind_then_extension) and are applied only to tar comparators.
Content-class clustering is benchmark-only and applied only to tar comparators. lightweight_v1 uses file extension plus a small leading-byte sample to classify files into structured_text_like, text_like, binary_like, or unknown_mixed, then keeps deterministic ordering inside each class.
Canonical command forms used by the harness:
tar --sort=name --mtime=@0 --owner=0 --group=0 --numeric-owner --pax-option=delete=atime,delete=ctime --no-recursion --verbatim-files-from -T <ordered_inputs.txt> -I 'zstd -3' -cf <archive.tar.zst>tar --sort=name --mtime=@0 --owner=0 --group=0 --numeric-owner --pax-option=delete=atime,delete=ctime --no-recursion --verbatim-files-from -T <ordered_inputs.txt> -I 'xz -3' -cf <archive.tar.xz>crushr pack <dataset> -o <archive.crs> --level 3 --preservation <full|basic> --silent
Extraction command forms:
tar -xf <archive.tar.zst|archive.tar.xz> -C <out_dir>crushr extract <archive.crs> -o <out_dir> --all --overwrite --silent
Metrics¶
Required:
archive_size_bytespack_time_ms(wall clock)extract_time_ms(wall clock)- peak memory (
pack_peak_rss_kb,extract_peak_rss_kb)
Optional (captured when available):
- CPU timings (
*_user_time_ms,*_sys_time_ms)
Reproducibility steps¶
From repo root:
bash
cargo build --release -p crushr
python3 scripts/benchmark/harness.py full \
--clean \
--datasets .bench/datasets \
--crushr-bin target/release/crushr \
--content-class-strategy lightweight_v1 \
--output .bench/results/benchmark_results.json
Environment assumptions:
- Linux host
- GNU tar with
--sort/--pax-option zstd,xz, andtimeavailable inPATH- filesystem with symlink support
- xattrs are disabled by default (
--xattrs off) for host-independent dataset identity - optional xattr-inclusive runs must set
--xattrs on, which changes dataset identity and must not be mixed with default results
Pack phase attribution (v0.4.17+)¶
crushr pack supports explicit pack-phase timing output with --profile-pack.
This is attribution-only instrumentation for local investigation. It is not a benchmark-score mode and is never enabled by default.
Expected output shape¶
--profile-pack appends a deterministic phase table after normal pack completion:
text
Pack phases
discovery <ms>
metadata <ms>
hashing <ms>
compression <ms>
emission <ms>
finalization <ms>
Limitations¶
- Peak RSS and CPU fields depend on the
timeimplementation on the host - xattr coverage is best-effort and may be partial on supporting filesystems only
- Raw benchmark outputs are not product claims until reviewed comparatively
Boundaries / Non-goals¶
This page does not define recovery semantics, extraction trust classes, or product identity.
Non-goals:
- No best-effort reconstruction
- No hidden failure smoothing
- No compression-first tradeoffs
- No external decode dependencies