feat: add registers extraction task for PAC generation#1
feat: add registers extraction task for PAC generation#1ozongzi wants to merge 1 commit intoakiselev:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new registers extraction task intended to produce a structured peripheral/register/field JSON output that can be used for Rust PAC generation workflows (e.g., svd2rust / chiptool).
Changes:
- Added a new
ExtractTask::RegistersCLI task wired to a new prompt spec. - Introduced a new registers extraction prompt (
prompts/extract-registers.md) with detailed anti-hallucination and output requirements. - Added a JSON schema for the registers output shape in
src/prompts.rs.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/prompts.rs | Adds PROMPT_REGISTERS and a new registers() prompt spec + JSON schema. |
| src/extract.rs | Adds the Registers task variant and routes it to prompts::registers(). |
| prompts/extract-registers.md | New LLM prompt describing how to extract register maps and the expected JSON output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | Field | Requirement | | ||
| |-------|-------------| | ||
| | `name` | EXACT register name as shown (e.g., `CR1`, `SR`, `DR`) | | ||
| | `description` | Brief description from datasheet | | ||
| | `offset` | Byte offset from peripheral base address (hex string, e.g., `"0x00"`) | | ||
| | `size` | Register width in bits (typically 32) | | ||
| | `reset_value` | Reset/default value as hex string (e.g., `"0x00000000"`), or null if not specified | | ||
| | `access` | `"read-write"`, `"read-only"`, `"write-only"`, or `"read-writeOnce"` | | ||
|
|
There was a problem hiding this comment.
The markdown tables in this prompt use || at the start of the header/separator rows (e.g. || Field | Requirement |), which renders as an extra empty column and can reduce clarity for the model. Update these to standard | Field | Requirement | / | --- | --- | formatting.
| | Field | Requirement | | ||
| |-------|-------------| | ||
| | `name` | EXACT field name (e.g., `SPE`, `RXNE`, `BR`) | | ||
| | `description` | EXACT description from datasheet | | ||
| | `bit_offset` | LSB position (0-indexed, e.g., `6` for bit 6) | | ||
| | `bit_width` | Number of bits (e.g., `1` for single bit, `3` for 3-bit field) | | ||
| | `access` | `"read-write"`, `"read-only"`, `"write-only"` — inherit from register if not specified | | ||
| | `enumerated_values` | Array of named values if datasheet defines them, else `[]` | | ||
|
|
||
| ### Step 4: Handle Special Cases | ||
|
|
||
| **Reserved bits:** | ||
| - DO NOT include reserved bits as fields | ||
| - They will be inferred from gaps in bit coverage | ||
|
|
||
| **Write-clear / Read-clear flags:** | ||
| - Set `access` to `"read-writeOnce"` for write-1-to-clear flags | ||
| - Set `access` to `"read-only"` for hardware-set status flags | ||
|
|
There was a problem hiding this comment.
In the field extraction section, the allowed access values list doesn't include read-writeOnce, but later the prompt instructs using read-writeOnce for write-1-to-clear flags. This is internally inconsistent; include read-writeOnce in the field access options (or clarify that it is only allowed at the register level).
| spec.schema = serde_json::from_str(r#"{ | ||
| "type": "object", | ||
| "required": ["part_details", "peripherals"], | ||
| "properties": { | ||
| "part_details": { |
There was a problem hiding this comment.
The prompt specifies an explicit error JSON shape when no register map is found, but the responseJsonSchema here only allows {part_details, peripherals}. With schema enforcement, the model can't return the documented error object and may be forced to fabricate data to satisfy the schema. Consider updating the schema to oneOf the success shape vs an {error, part_number, pages_searched} shape (or remove the error-response instruction from the prompt).
| "type": "object", | ||
| "required": ["part_number"], | ||
| "properties": { | ||
| "part_number": {"type": "string"}, | ||
| "datasheet_revision": {"type": ["string", "null"]}, | ||
| "description": {"type": "string"} | ||
| } | ||
| }, | ||
| "peripherals": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["name", "base_address", "registers"], | ||
| "properties": { | ||
| "name": {"type": "string"}, | ||
| "description": {"type": "string"}, | ||
| "base_address": {"type": "string"}, | ||
| "source_page": {"type": "integer"}, | ||
| "incomplete": {"type": "boolean"}, | ||
| "registers": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["name", "offset", "fields"], | ||
| "properties": { | ||
| "name": {"type": "string"}, | ||
| "description": {"type": "string"}, | ||
| "offset": {"type": "string"}, | ||
| "size": {"type": "integer"}, | ||
| "reset_value": {"type": ["string", "null"]}, | ||
| "access": {"type": "string", "enum": ["read-write", "read-only", "write-only", "read-writeOnce"]}, | ||
| "fields": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["name", "bit_offset", "bit_width"], | ||
| "properties": { | ||
| "name": {"type": "string"}, | ||
| "description": {"type": "string"}, | ||
| "bit_offset": {"type": "integer"}, | ||
| "bit_width": {"type": "integer"}, | ||
| "access": {"type": "string"}, | ||
| "enumerated_values": { | ||
| "type": "array", | ||
| "items": { | ||
| "type": "object", | ||
| "required": ["name", "value"], | ||
| "properties": { | ||
| "name": {"type": "string"}, | ||
| "value": {"type": "integer"}, | ||
| "description": {"type": "string"} | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The schema currently does not require several fields that the prompt marks as mandatory (e.g., part_details.datasheet_revision/description, register access/size/reset_value, field enumerated_values). If the JSON schema is meant to enforce output completeness, these should be added to the relevant required lists (and enumerated_values should be required with type: array, even if empty).
| } | ||
| } | ||
| } | ||
| }"#).expect("registers schema is valid JSON"); |
There was a problem hiding this comment.
Unlike the other prompt specs, this schema is parsed from a raw JSON string and uses expect(...), which would panic at runtime if edited incorrectly. For consistency and safer refactors, consider expressing this with json!({...}) (or at least .context(...) + ?) to avoid panics and keep the style consistent across prompt specs.
| }"#).expect("registers schema is valid JSON"); | |
| }"#).unwrap_or_else(|e| panic!("registers schema JSON is invalid: {e}")); |
|
Have you tested this with microcontroller reference manuals? Any idea how it performs? I'm hesitant to add this as an explicit extraction because LLMs have a hard time with exhaustiveness in these cases and this would be even less reliable than extractions can be. Let me test this out with a project im working on |
|
Can I modify the upstream crate to pdfium-render? Otherwise my Mac won't compile. I plan to integrate svd2rust / chiptool to implement a complete datasheet -> PAC toolchain, but I'm not sure if I should implement it in this PR (maybe this change is too big, I should create a new crate instead). |
Summary
Adds a new
registersextraction task that uses Gemini to extract a complete peripheral register map from a microcontroller datasheet, outputting structured JSON suitable for generating Rust PAC (Peripheral Access Crate) code viasvd2rustorchiptool.Output Format
The output JSON models peripherals, registers, fields, and enumerated values — matching the data model used by
svd2rustand embassy'schiptool:{ "part_details": { "part_number": "STM32F407VG", ... }, "peripherals": [ { "name": "SPI1", "base_address": "0x40013000", "registers": [ { "name": "CR1", "offset": "0x00", "access": "read-write", "fields": [ { "name": "SPE", "bit_offset": 6, "bit_width": 1, ... } ] } ] } ] }Motivation
Existing PAC generation relies on SVD files provided by chip vendors, which are often inaccurate, incomplete, or nonexistent (especially for domestic Chinese MCUs like GD32, WCH, etc.). This task enables generating PAC data directly from datasheets, with the output validated against existing
stm32-rs/stm32-metapacdata.Build Note
mupdf-syscurrently fails to compile on macOS 26.x (Xcode 17, clang 17) due to afdopenmacro conflict in bundled zlib. This is an existing issue unrelated to this PR. Considerpdfium-renderas a cross-platform alternative.