🐛 Fix RL Training and Improve Structure#573
Conversation
|
It looks like #572 has created some conflicts here. Since the |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis pull request updates action validation in the reinforcement learning module by changing the Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as RL Agent
participant Env as PredictorEnv
participant Circuit as QuantumCircuit
participant Layout as Layout Object
participant CouplingMap as Device CouplingMap
Agent->>Env: Request valid actions for current state
Env->>Circuit: Check circuit properties
Env->>Layout: Retrieve layout information
Env->>Env: is_circuit_laid_out(circuit, layout)
rect rgba(100, 150, 200, 0.5)
Note over Env: Verify all logical qubits<br/>mapped to physical qubits
end
Env->>CouplingMap: Get device coupling map
Env->>Env: is_circuit_routed(circuit, coupling_map)
rect rgba(100, 150, 200, 0.5)
Note over Env: Verify directed 2-qubit gates<br/>respect device topology
end
Env->>Env: determine_valid_actions_for_state()
rect rgba(100, 150, 200, 0.5)
Note over Env: Synthesize valid actions based on<br/>synthesis, layout, routing, mapping states
end
Env->>Agent: Return list of valid action indices
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/mqt/predictor/rl/predictorenv.py`:
- Around line 435-447: The current is_circuit_laid_out only checks qubits that
appear in instructions and misses idle logical qubits; update
is_circuit_laid_out (and its TranspileLayout handling) to validate every logical
qubit from the circuit (e.g., iterate circuit.qubits or
range(circuit.num_qubits)) against layout.get_virtual_bits() instead of
iterating only instr.qubits, and return False if any circuit qubit is not
present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
- Around line 360-363: Replace the eager string concatenation in the VF2Layout
check with logger-style formatting: instead of adding
pm.property_set["VF2Layout_stop_reason"] to the message, call logger.warning
with a format string and pass pm.property_set["VF2Layout_stop_reason"] as a
separate argument (referencing VF2Layout_stop_reason, VF2LayoutStopReason,
pm.property_set and logger.warning in predictorenv.py) so the message is
constructed lazily and satisfies Ruff G003.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
src/mqt/predictor/rl/actions.pysrc/mqt/predictor/rl/predictor.pysrc/mqt/predictor/rl/predictorenv.pytests/compilation/test_integration_further_SDKs.py
| def is_circuit_laid_out(self, circuit: QuantumCircuit, layout: TranspileLayout | Layout) -> bool: | ||
| """True if every logical qubit in the circuit has a physical assignment.""" | ||
| if isinstance(layout, TranspileLayout): | ||
| # Use final_layout if available; otherwise fallback to initial_layout | ||
| layout = layout.final_layout or layout.initial_layout | ||
|
|
||
| if not only_nat_gates: | ||
| actions = self.actions_synthesis_indices + self.actions_opt_indices | ||
| if self.layout is not None: | ||
| actions += self.actions_routing_indices | ||
| return actions | ||
| v2p = layout.get_virtual_bits() | ||
| for instr in circuit.data: | ||
| for q in instr.qubits: | ||
| if q not in v2p: | ||
| # Logical qubit not assigned | ||
| return False | ||
| return True |
There was a problem hiding this comment.
Validate all logical qubits, not just those appearing in instructions.
The docstring says “every logical qubit”; idle qubits are currently skipped.
🔧 Suggested fix
- for instr in circuit.data:
- for q in instr.qubits:
- if q not in v2p:
- # Logical qubit not assigned
- return False
+ for q in circuit.qubits:
+ if q not in v2p:
+ # Logical qubit not assigned
+ return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mqt/predictor/rl/predictorenv.py` around lines 435 - 447, The current
is_circuit_laid_out only checks qubits that appear in instructions and misses
idle logical qubits; update is_circuit_laid_out (and its TranspileLayout
handling) to validate every logical qubit from the circuit (e.g., iterate
circuit.qubits or range(circuit.num_qubits)) against layout.get_virtual_bits()
instead of iterating only instr.qubits, and return False if any circuit qubit is
not present in v2p; keep the existing fallback to layout.final_layout or
layout.initial_layout and handle missing/None layout gracefully.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Patrick Hopf <81010725+flowerthrower@users.noreply.github.com>
ProblemAfter a Qiskit layout pass (e.g., DenseLayout), the MDP state became:
However, Consequence in MaskablePPO
Applying softmax to these equal values results in a uniform distribution, meaning every action — including During training with 1000 timesteps, the probability that This never came up in tests that only executed 100 steps. Crash MechanismWhen the agent sampled
|
🧪 CI InsightsHere's what we observed from your CI run for 07e7338. 🟢 All jobs passed!But CI Insights is watching 👀 |
|
This PR will be merged into the qce-experiments branch with failing windows tests. The rational here is that all other platforms are succeeding and we want to make fast progress on the experiments branch. Furthermore, we have a strong suspicion that the problem will be resolved by subsequent experiment PRs anyway. |
There was a problem hiding this comment.
Pull request overview
This PR updates the reinforcement-learning compilation pipeline to better track layout/routing state, improves Qiskit/TKET interop behavior, and refines training/reward calculation logic to support more robust RL-based compilation.
Changes:
- Extend
PredictorEnvwith anmdpmode and new synthesized/layouted/routed state checks to drive valid-action selection. - Update TKET→Qiskit conversions to explicitly replace implicit swaps (env + integration test).
- Adjust RL training defaults (esp. for tests) and simplify reward qubit indexing via
QuantumCircuit.find_bit().
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/mqt/predictor/rl/predictorenv.py |
Adds mdp mode + new action-validity logic; updates layout/routing handling and unitary decomposition behavior. |
src/mqt/predictor/rl/predictor.py |
Tweaks test training hyperparameters and makes seeding deterministic in non-test training. |
src/mqt/predictor/rl/actions.py |
Updates DeviceDependentAction.do_while typing to match Qiskit PropertySet. |
src/mqt/predictor/reward.py |
Uses find_bit() for qubit indices and adds a clearer error message for missing 2Q gate calibration entries. |
tests/compilation/test_predictor_rl.py |
Increases RL training timesteps used in the integration-style test that generates models. |
tests/compilation/test_integration_further_SDKs.py |
Updates TKET routing test conversion to replace implicit swaps. |
Comments suppressed due to low confidence (1)
src/mqt/predictor/rl/predictor.py:112
set_random_seed(0)is now called unconditionally in the non-test training path, which forces deterministic training for all normal usage. This is a behavioral change that may be surprising for users expecting stochastic training runs. Consider making the seed configurable (parameter/env var) or limiting the hard-coded seed totest=Trueonly.
else:
set_random_seed(0)
# default PPO values
n_steps = 2048
n_epochs = 10
batch_size = 64
progress_bar = True
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info("MDP: " + mdp) | ||
| self.mdp = mdp | ||
|
|
There was a problem hiding this comment.
mdp is accepted as an arbitrary string, but later logic assumes only "paper" or "flexible". If any other value is passed, determine_valid_actions_for_state() can return an empty list and step() will raise RuntimeError("No valid actions left."). Consider validating mdp in __init__ (e.g., Literal/Enum + ValueError on unknown values) to fail fast with a clear error.
| for instr in circuit.data: | ||
| for q in instr.qubits: | ||
| if q not in v2p: | ||
| # Logical qubit not assigned | ||
| return False |
There was a problem hiding this comment.
is_circuit_laid_out() claims to verify that every logical qubit has a physical assignment, but it only checks qubits that appear in instructions (for instr in circuit.data). This can return True even if some idle logical qubits in circuit.qubits are not present in the layout. If you really want “every logical qubit”, iterate over circuit.qubits (or otherwise ensure the layout covers all qubits), not just those used by operations.
| for instr in circuit.data: | |
| for q in instr.qubits: | |
| if q not in v2p: | |
| # Logical qubit not assigned | |
| return False | |
| for q in circuit.qubits: | |
| if q not in v2p: | |
| # Logical qubit not assigned | |
| return False |
| # Initial state | ||
| if not synthesized and not laid_out and not routed: | ||
| if self.mdp == "flexible": | ||
| actions.extend(self.actions_synthesis_indices) | ||
| actions.extend(self.actions_mapping_indices) | ||
| actions.extend(self.actions_layout_indices) | ||
| actions.extend(self.actions_opt_indices) | ||
| if self.mdp == "paper": |
There was a problem hiding this comment.
With the new mdp modes, the initial-state policy in determine_valid_actions_for_state() can include mapping/layout actions (for mdp == "flexible"), but reset() still initializes self.valid_actions to opt + synthesis only. This means the action mask at the first decision step ignores the mdp setting. Consider setting self.valid_actions = self.determine_valid_actions_for_state() during reset() so the first action is consistent with the selected MDP.
|
|
||
| predictor.train_model( | ||
| timesteps=100, | ||
| timesteps=1000, |
There was a problem hiding this comment.
This test now trains with timesteps=1000 (previously 100). Since train_model() runs PPO rollouts and is executed during the test suite, this can significantly increase CI runtime and make the suite flaky/time out. Consider keeping timesteps minimal for CI (and/or gating longer training behind a pytest.mark.slow or an env flag) while still asserting the intended behavior.
| timesteps=1000, | |
| timesteps=100, |
Description
This PR addresses critical bugs in the RL training process with the following key changes:
Structure Improvements:
Redesigned action validation logic (
predictorenv.py): Rewrotedetermine_valid_actions_for_state()with a more structured (but equivalent) state machine that explicitly tracks three circuit states (synthesized, laid_out, routed) and handles 6 different state combinations.- Added helper methods
is_circuit_laid_out()andis_circuit_routed()to replace the buggyCheckMappass with more reliable state checking. The new logic supports both the original restricted MDP and a flexible general MDP mode.Fixed type annotation (
actions.py): Correcteddo_whileparameter type fromdict[str, Circuit]toPropertySetand added missing import for Qiskit'sPropertySet.Added reproducibility (
predictor.py): Set random seed for non-test training runs to ensure reproducible results.Improved VF2Layout error handling (
predictorenv.py): Replaced assertion failures with warning logs when VF2Layout doesn't find a solution, preventing crashes during training.Test Updates: