PRD: Deterministic Simulation, Single RNG, and Differential Replay Validation¶
Last updated: 2026-02-07
Context¶
We are porting the original game with high fidelity. The original uses a single RNG stream; the rewrite historically separated simulation/presentation RNG and is migrating to single-stream behavior. We need an architecture that supports:
- deterministic interactive play
- deterministic replay playback
- deterministic headless verification
- eventual differential testing against captures from the original game
- future multiplayer (LAN first, then networked lockstep improvements)
- highly studyable code that reads like an executable reference spec
Problem Statement¶
Current deterministic infrastructure is strong but still has split-brain risk:
- RNG stream split (sim vs presentation) can diverge from original behavior.
- Live, replay, and headless modes are closer now, but not yet fully unified around a single-stream model.
- Differential testing against original captures needs first-class tick contracts and stable diagnostics.
Product Goals¶
- Use one authoritative RNG stream per world/session, matching original behavior.
- Make headless simulation a first-class runtime mode that emits deterministic presentation commands.
- Make renderer/audio consumers of commands only (no RNG, no gameplay mutations).
- Verify deterministic parity via replay + sidecar checkpoints and command hashes.
- Enable differential testing against original memory-hook captures and first-divergence investigation.
- Keep architecture compatible with future multiplayer lockstep.
- Organize gameplay features into small intent-focused modules (for example
crimson/bonuses/fire_bullets.py) with explicit hook entrypoints.
Non-Goals (for this PRD scope)¶
- Implement full multiplayer networking stack.
- Fully redesign rendering assets or UI systems.
- Solve every remaining gameplay parity gap unrelated to deterministic pipeline.
Guiding Principles¶
- Simulation is authoritative.
- Input stream is the only external driver of simulation.
- Presentation commands are deterministic outputs of simulation tick processing.
- Rendering/audio are side-effect consumers, not simulation participants.
- Determinism diagnostics must identify first divergence quickly.
- Code should prioritize readability and traceability over clever abstraction.
Target Architecture¶
Core Tick Contract¶
Single entrypoint (conceptual):
tick(input_frame) -> TickResult(state_delta, presentation_commands, diagnostics)
Where:
input_frameis per-player input for one tick.presentation_commandsincludes render/sfx/music intent data only.diagnosticsincludes hashes and RNG markers.
Runtime Modes (all call same tick contract)¶
- Interactive mode:
- live inputs -> tick -> consume presentation commands.
- Replay playback mode:
- replay inputs -> tick -> consume presentation commands.
- Headless verification mode:
- replay inputs -> tick -> discard presentation commands, compare diagnostics/checkpoints.
RNG Model¶
- One RNG stream owned by gameplay state.
- Presentation planning consumes that same stream in deterministic order.
- RNG diagnostics are phase-labeled for drift localization.
Presentation Layer Boundaries¶
Simulation emits data commands (no texture handles, no raylib objects). Renderer/audio layer maps commands to concrete assets/effects.
Feature Module Topology and Hooks¶
Simulation behavior should be split into small feature modules with clear ownership.
Examples:
crimson/bonuses/fire_bullets.pycrimson/bonuses/freeze.pycrimson/perks/hot_tempered.pycrimson/weapons/rocket_launcher.py
Each feature module should contain:
- behavior: deterministic update/apply logic
- intent/spec notes: short docstring/comments describing original intent/parity caveats
- config/constants: local constants close to logic
- hook functions: explicit entrypoints used by the main tick pipeline
The tick pipeline should dispatch through registries or ordered hook lists rather than large mode-specific god functions.
Functional Requirements¶
- Single RNG stream is used for both gameplay and presentation command planning.
- No renderer/audio code is allowed to consume simulation RNG.
- Replay checkpoints include:
state_hashcommand_hashrng_state- selected
rng_marks - Replay verification compares in this order:
- command hash
- state hash
- detailed field diffs
- Differential runner can compare rewrite outputs against original-capture sidecars and report first mismatch tick.
- Input contract supports N players with deterministic ordering by player index.
- Feature hooks are deterministic and side-effect-bounded (no hidden renderer/audio coupling).
- Core tick orchestration module remains small and delegates feature logic to dedicated modules.
Data Contracts¶
Replay/Checkpoint Sidecar (Rewrite)¶
Per sampled tick:
- tick index
- state hash
- command hash
- rng state
- rng marks
- event summary
Original Capture Sidecar (Target)¶
Per sampled tick (minimum viable set):
- tick index
- input snapshot (optional in sidecar if already in replay)
- command/event summary compatible with rewrite command hashing
- rng state (if capturable)
- selected world fields for debugging
Rollout Plan and Progress¶
Phase 0: Foundation (Done)¶
- Shared deterministic tick pipeline introduced (
sim/step_pipeline.py). - Live runtime, replay runners, and replay playback wired to shared deterministic step path.
- Checkpoints now carry
command_hash. - Replay verify fast-fails on
command_hashmismatch. - Optional RNG trace mode added (
--trace-rng). - Live-vs-headless parity tests for Survival/Rush added.
Phase 1: Single RNG Stream Migration (In Progress)¶
- Remove separate presentation RNG usage from runtime/replay code paths.
- Feed presentation planning from the authoritative simulation RNG.
- Preserve deterministic command order and existing parity behavior.
- Update RNG tests to assert single-stream invariants.
- Document any intentional fidelity deviations discovered during migration (none introduced in this migration slice; parity guarded by deterministic command/state hash tests).
Phase 2: Headless-First Runtime API¶
- Define explicit headless session API for stepping ticks and collecting outputs.
- Refactor interactive/replay entrypoints to use headless session adapter directly.
- Ensure render/audio layers consume commands only.
- Add smoke tests for all three runtime modes calling the same tick API.
Current scope for the checked items is Survival/Rush deterministic loops (interactive + playback + replay verification).
Phase 2.5: Studyability-First Module Refactor¶
- Define module conventions for feature files (
behavior,intent/spec,config/constants,hooks). - Introduce hook registries per subsystem (perks world-step hooks, bonus pickup FX hooks, presentation projectile-decal hooks).
- Migrate high-churn features first (Fire Bullets impact decals, Freeze pickup/presentation helpers, Reflex Boosted and Final Revenge perk hooks) into dedicated modules.
- Add lightweight architecture tests/checks that prevent growth of monolithic tick functions.
Phase 3: Differential Testing with Original Captures¶
- Define original capture schema and conversion pipeline.
- Add comparator that reports first divergence tick with command/state/rng context.
- Add tooling command to run diff quickly on replay + sidecar pairs.
- Add at least one golden differential fixture from original capture.
Phase 4: Multiplayer Determinism Readiness¶
- Promote per-player input frame contract as first-class (deterministic ordering).
- Add lockstep-oriented tick validation tests for multi-player input streams.
- Ensure replay and checkpoint formats remain compatible with multiplayer expansion.
Acceptance Criteria¶
- Given same replay input stream, interactive, playback, and headless modes produce identical
command_hashandstate_hashat sampled ticks. - Only one RNG state exists for simulation tick processing and presentation planning.
- Diff tool reports first divergence tick and includes enough context to debug without manual binary tracing.
- Multiplayer input schema is deterministic and replay-compatible.
- Feature behavior is discoverable by file path and hook name (engineers can locate behavior quickly without tracing giant modules).
Risks and Mitigations¶
- Risk: single-stream migration changes existing behavior unexpectedly. Mitigation: migrate behind focused parity tests and checkpoint comparisons.
- Risk: presentation command schema grows too engine-specific. Mitigation: enforce data-only command contract and keep renderer mapping separate.
- Risk: original capture mapping lacks one-to-one fields early. Mitigation: compare command/event summaries first, then deepen field coverage iteratively.
Open Questions¶
- Which original-memory fields are cheapest/highest-value for early sidecar capture?
- Do we need tick-level floating-point normalization for capture comparisons?
- Should RNG marks be persisted in all checkpoints or only debug mode to keep files smaller?
References¶
docs/rewrite/deterministic-step-pipeline.mdsrc/crimson/sim/step_pipeline.pysrc/crimson/replay/checkpoints.pysrc/crimson/cli.py