feat(fuzz): add egfx_surface_state state-machine oracle#1337
Open
Greg Lamberson (glamberson) wants to merge 3 commits into
Open
feat(fuzz): add egfx_surface_state state-machine oracle#1337Greg Lamberson (glamberson) wants to merge 3 commits into
Greg Lamberson (glamberson) wants to merge 3 commits into
Conversation
…types Extends the Arbitrary cascade from PR Devolutions#1272 (ironrdp-pdu) and PR Devolutions#1284 (extension to pdu types in ironrdp-pdu's egfx-related modules) to the ironrdp-egfx crate's own public type surface. Adds the arbitrary feature to ironrdp-egfx and adds the standard #[cfg_attr(feature = "arbitrary", derive(arbitrary::Arbitrary))] gate to every public PDU type in crates/ironrdp-egfx/src/pdu/: - common.rs: Point, Color, PixelFormat - avc.rs: Avc420BitmapStream, Avc444BitmapStream, Avc420Region - cmd.rs: GfxPdu plus all 23 variant types, RawCapabilitySet, CapabilitySet, CapabilityVersion, Codec1Type, Codec2Type, QueueDepth, CacheEntryMetadata, and the six CapabilitiesV*Flags bitflags structs. Three types use a manual Arbitrary impl rather than the derive because their fields have implicit wire-bit-width constraints the derive cannot express. derive(Arbitrary) on these types generates values in the full underlying type range, which the encoder then panics on when packing into the narrower wire range. Manual impls mask each field to its spec-defined width: - Timestamp (cmd.rs): milliseconds and hours into 10 bits each, seconds and minutes into 6 bits each. The encoder packs all four into a single u32 via set_bits. - QuantQuality (avc.rs): quantization_parameter into 6 bits. Encoder packs into a u8 via set_bits(0..6). - Encoding bitflag (avc.rs): masked to 2 bits. The bitflag's `const _ = !0` clause otherwise accepts any u8 value, but the encoder for Avc444BitmapStream packs encoding.bits() into bits 30..32 of the stream-info field. The arbitrary feature cascades through ironrdp-pdu/arbitrary so types in egfx's variant payloads (ExclusiveRectangle, etc.) pick up their existing Arbitrary impls. bitflags/arbitrary is activated so the CapabilitiesV*Flags bitflag types get Arbitrary via the bitflags crate's own derive support. Adds the ironrdp-egfx/arbitrary case to the xtask feature matrix (xtask/src/features.rs), mirroring the ironrdp-pdu/arbitrary entry, so CI verifies the feature builds cleanly. Default build (no arbitrary feature) is unchanged. Existing tests pass. cargo xtask check features --case ironrdp-egfx/arbitrary clean. This is the prerequisite cascade for the egfx_multi_frame fuzz target (target 5 under Devolutions#1316), which will use Arbitrary-derived Vec<GfxPdu> to drive a single decoder + surface-cache pair across each fuzz iteration. The target itself follows in a separate PR. Per the Devolutions#1122 guidance ("either implement Arbitrary manually and reject invalid values, or generate a raw value and then sanitize it"), the three manual impls here follow the sanitize-via-mask shape so generated values always round-trip through Encode. Refs Devolutions#1316.
Implements target 5 of Devolutions#1316 (egfx fuzz-coverage umbrella): a multi-frame oracle that drives a sequence of Arbitrary-derived GfxPdus through a single GraphicsPipelineClient + surface-cache pair across each fuzz iteration, exposing cross-PDU state corruption that single-shot fuzzers cannot reach. Harness shape (per the design note posted as issuecomment-4559459227 on Devolutions#1316). The oracle: 1. Constructs one GraphicsPipelineClient (no-op handler, no H.264 decoder) at iteration start. The surface cache, state machine, and ZGFX decompressor history are owned by this single instance. 2. Initialises the channel by calling DvcProcessor::start so the client moves into its post-start state. 3. Uses arbitrary::Unstructured to derive Vec<GfxPdu> from the raw fuzz input. Each GfxPdu variant is Arbitrary via the cascade in PR Devolutions#1334. 4. For each PDU: encodes back to wire bytes via encode_vec, wraps in a single uncompressed ZGFX segment via wrap_uncompressed, and feeds the wire bytes to the client's public DvcProcessor::process entry point. Errors and panics propagate to libFuzzer naturally. The harness deliberately routes through the public process() entry rather than the private handle_pdu so the dispatch path exercises the same ZGFX-decompress + GfxPdu-decode + per-variant-dispatch + state machine + surface cache flow that production traffic takes. The uncompressed ZGFX wrapper bypasses the ZGFX decoder layer (already covered by egfx_zgfx_decompress in a separate PR) and concentrates fuzz pressure on the dispatch + state machine. What this catches: panics or sanitizer reports along the dispatch + state machine path under adversarially-ordered PDU sequences; inconsistent surface-cache state under attacker-controlled CreateSurface / DeleteSurface / Map* orderings; corrupted frame-id state from interleaved StartFrame / EndFrame / FrameAcknowledge sequences; ZGFX-wrapper integration bugs separate from the standalone ZGFX coverage. What this does NOT catch: cross-frame H.264 decoder state corruption. The client is constructed with h264_decoder: None, so H264-bearing PDUs (WireToSurface1 with AVC codecs) skip the H.264 decoder. The standalone egfx_avc420_decode and egfx_avc444_decode targets cover the H.264 wrapper. Wiring a real (or mock) H.264 decoder into this harness can be a follow-up if frame-to-frame H.264 state coverage surfaces as a gap. Bug discovery during development. The first smoke fuzz on the in-progress harness surfaced a bit_field::set_bits panic in Timestamp::encode where derive(Arbitrary) generated u8/u16 values exceeding the encoder's 6-bit / 10-bit pack widths. The fix landed in PR Devolutions#1334 (the cascade prerequisite) as manual Arbitrary impls for Timestamp, QuantQuality, and the Encoding bitflag, mapping each field to its wire-allowed range. This is the "sanitize via mask" shape from Devolutions#1122's body. Validation. 15-minute rigorous libFuzzer + ASan smoke fuzz on the final SHA: 3,659,157 iterations in 901 seconds (~4,061 exec/s sustained), peak RSS 96 MB, zero panics, zero sanitizer reports, zero OOMs. Corpus grew to ~4,558 entries via libFuzzer's coverage discovery. The regression-replay test check_egfx_multi_frame passes against the seed-empty corpus entry. Depends on PR Devolutions#1334 (Arbitrary cascade across ironrdp-egfx public PDU types). CI on this PR's branch reds until Devolutions#1334 merges or this PR's branch rebases onto master post-merge. That's the correct mechanical signal of the dependency per the no-exclude-then-follow-up rule. Refs Devolutions#1316.
Implements target 6 of Devolutions#1316 (egfx fuzz-coverage umbrella): the egfx_surface_state oracle. Sibling of egfx_multi_frame (target 5, PR Devolutions#1336) with two distinct properties: 1. The PDU stream is narrowed to a SurfaceLifecyclePdu enum that contains only the seven state-affecting variants: ResetGraphics, CreateSurface, DeleteSurface, MapSurfaceToOutput, StartFrame, EndFrame, FrameAcknowledge. Each iteration concentrates fuzz pressure on the surface and frame state machine rather than spreading across all 23 GfxPdu variants. 2. The oracle maintains a parallel ExpectedState model alongside the client. For each successfully-encoded PDU the model is updated to mirror the client's documented state transition (CreateSurface inserts a surface unless width or height is zero; DeleteSurface removes; MapSurfaceToOutput updates is_mapped and origin; ResetGraphics clears all surfaces; EndFrame increments the running frame counter). After dispatch the client's observable state via the public get_surface and total_frames_decoded getters is asserted against the model. This catches a different bug class from egfx_multi_frame's panic / sanitizer oracle: logic bugs in state transitions that produce wrong but non-crashing observable state. Examples covered: client retains a surface after DeleteSurface, client clobbers is_mapped on a surface that did not receive MapSurfaceToOutput, client fails to clear all surfaces on ResetGraphics, client increments total_frames_decoded on a PDU other than EndFrame. The model encodes the implementation's current documented behavior (e.g., zero-dimension CreateSurface is silently skipped per handle_create_surface), so the oracle catches drift from that behavior. Spec-compliance verification is a separate concern that would require a parallel spec-only model. What this does NOT catch: state transitions that exist only in private fields without a public getter (current_frame_id, frames_queued). Exposing observable access to those fields is a separate egfx-public-API change, out of scope here. Validation. 15-minute rigorous libFuzzer + ASan smoke fuzz on the final SHA: 2,770,751 iterations in 901 seconds (~3,075 exec/s sustained), peak RSS 102 MB, zero panics, zero sanitizer reports, zero state-model assertion failures. Corpus grew to ~1,840 entries via libFuzzer's coverage discovery. The check_egfx_surface_state regression-replay test passes against the seed-empty corpus entry. Stack dependency. This PR is stacked on PR Devolutions#1336 (egfx_multi_frame, target 5) which is stacked on PR Devolutions#1334 (Arbitrary cascade). CI on this branch will red until both ancestors merge or this branch rebases onto master post-merge. With this PR the Devolutions#1316 umbrella has its initial six target slots filled (1 merged, 2+2b+3+5+6 open). Refs Devolutions#1316. Depends on Devolutions#1334 and Devolutions#1336.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
egfx_surface_statestate-machine oracle. Sibling ofegfx_multi_frame(PR feat(fuzz): add egfx_multi_frame oracle and target #1336) with two distinct properties.SurfaceLifecyclePduenum that contains only the seven state-affecting variants (ResetGraphics,CreateSurface,DeleteSurface,MapSurfaceToOutput,StartFrame,EndFrame,FrameAcknowledge). Each iteration concentrates fuzz pressure on the surface and frame state machine rather than spreading across all 23GfxPduvariants.ExpectedStatestruct mirrors the client's documented per-surface fields (id,width,height,pixel_format,is_mapped,output_origin_x/y) plus thetotal_frames_decodedcounter. For each successfully-encoded PDU the model is updated to mirror the client'shandle_*transitions; after dispatch the client's observable state via the publicget_surfaceandtotal_frames_decodedgetters is asserted against the model.What this catches
A different bug class from
egfx_multi_frame's panic / sanitizer oracle: logic bugs in state transitions that produce wrong but non-crashing observable state. Examples:DeleteSurface(model removed it, client did not).is_mappedon a surface that did not receiveMapSurfaceToOutput.ResetGraphics.total_frames_decodedon a PDU other thanEndFrame.The model encodes the implementation's current documented behavior (e.g.,
CreateSurfacewithwidth == 0orheight == 0is silently skipped perhandle_create_surface,total_frames_decodedis incremented unconditionally byhandle_end_frameeven when no matchingStartFramepreceded it). The oracle catches drift from that behavior. Spec-compliance verification is a separate concern.What this does NOT catch
State transitions that exist only in private fields without a public getter (
current_frame_id,frames_queued). Wiring observable access to those fields is a separate egfx-public-API change, out of scope here.Dependencies
This PR is stacked on PR #1336 (
egfx_multi_frame, target 5) which is stacked on PR #1334 (Arbitrary cascade). CI on this branch reds until both ancestors merge or this branch rebases onto master post-merge. That is the correct mechanical signal of the dependency per the no-exclude-then-follow-up rule.Validation
cargo xtask ciclean locally on1.89.0: fmt, lints, tests, typos, locks all green.check_egfx_surface_stateregression-replay test passes against the seed-empty corpus entry.Notes
SurfaceLifecyclePduenum lives oracle-private inside the function (not exported fromironrdp-egfx), keeping the egfx public surface clean per the fallback shape from the design discussion on egfx: extend fuzz coverage to OpenH264-wrapper, ZGFX, multi-frame state, and encoder round-trip #1316.as _imports forArbitraryandDvcProcessorper clippy'sunused_trait_nameslint, matching theegfx_multi_framestyle.#[expect(clippy::panic, reason = ...)]and# Panicsdoc section match the precedent frombulk_round_tripfor assertion-based oracles.With this PR the #1316 umbrella has its initial six target slots filled (1 merged, 2+2b+3+5+6 open).
Refs #1316. Depends on #1334 and #1336.