Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions prompts/extract-registers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
**CRITICAL REQUIREMENT:** You MUST analyze the actual PDF document provided. DO NOT hallucinate, guess, or use prior knowledge. ONLY extract information explicitly present in THIS document.

**Role:** Act as a Senior Embedded Systems Engineer and Register Map Architect.

**Objective:** Extract a complete, machine-readable register map from the attached microcontroller or peripheral datasheet. The output will be used to generate Rust PAC (Peripheral Access Crate) code via svd2rust or chiptool.

**Context:** This data will feed automated PAC generation tools. Accuracy is critical — a wrong address or bit range causes silent hardware bugs.

---

## ANTI-HALLUCINATION VERIFICATION (MANDATORY)

Before generating ANY output, you MUST:
1. Confirm you can read the PDF document
2. Extract the EXACT part number from the document title or first pages
3. Extract the EXACT datasheet revision/date if present
4. Include these in `part_details` as proof of document reading
5. If you cannot read the PDF or find register information, return an error response instead of guessing

---

## EXTRACTION INSTRUCTIONS

### Step 1: Identify All Peripherals
- Locate sections titled "Register Map", "Register Description", "Memory Map", or similar
- List every peripheral that has a register table (e.g., GPIO, SPI, UART, ADC, TIM, RCC, DMA)
- Record the base address of each peripheral from the memory map table

### Step 2: For Each Peripheral, Extract All Registers

For EVERY register in EACH peripheral:

| Field | Requirement |
|-------|-------------|
| `name` | EXACT register name as shown (e.g., `CR1`, `SR`, `DR`) |
| `description` | Brief description from datasheet |
| `offset` | Byte offset from peripheral base address (hex string, e.g., `"0x00"`) |
| `size` | Register width in bits (typically 32) |
| `reset_value` | Reset/default value as hex string (e.g., `"0x00000000"`), or null if not specified |
| `access` | `"read-write"`, `"read-only"`, `"write-only"`, or `"read-writeOnce"` |

Comment on lines +33 to +41
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown tables in this prompt use || at the start of the header/separator rows (e.g. || Field | Requirement |), which renders as an extra empty column and can reduce clarity for the model. Update these to standard | Field | Requirement | / | --- | --- | formatting.

Copilot uses AI. Check for mistakes.
### Step 3: For Each Register, Extract All Fields (Bit Fields)

For EVERY field in EACH register:

| Field | Requirement |
|-------|-------------|
| `name` | EXACT field name (e.g., `SPE`, `RXNE`, `BR`) |
| `description` | EXACT description from datasheet |
| `bit_offset` | LSB position (0-indexed, e.g., `6` for bit 6) |
| `bit_width` | Number of bits (e.g., `1` for single bit, `3` for 3-bit field) |
| `access` | `"read-write"`, `"read-only"`, `"write-only"` — inherit from register if not specified |
| `enumerated_values` | Array of named values if datasheet defines them, else `[]` |

### Step 4: Handle Special Cases

**Reserved bits:**
- DO NOT include reserved bits as fields
- They will be inferred from gaps in bit coverage

**Write-clear / Read-clear flags:**
- Set `access` to `"read-writeOnce"` for write-1-to-clear flags
- Set `access` to `"read-only"` for hardware-set status flags

Comment on lines +46 to +64
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the field extraction section, the allowed access values list doesn't include read-writeOnce, but later the prompt instructs using read-writeOnce for write-1-to-clear flags. This is internally inconsistent; include read-writeOnce in the field access options (or clarify that it is only allowed at the register level).

Copilot uses AI. Check for mistakes.
**Shared register names across peripherals:**
- Each peripheral gets its own register list — do not deduplicate

---

## CONSISTENCY REQUIREMENTS

1. **Addresses:** All addresses as lowercase hex strings with `0x` prefix
2. **Completeness:** Extract ALL peripherals with register tables. Missing peripherals = missing PAC coverage.
3. **Exactness:** Use EXACT names from datasheet. Do not normalize or abbreviate.
4. **Arrays:** `enumerated_values` MUST be an array, even if empty (`[]`)

---

## IF DATA NOT FOUND

- If the document has NO register tables: Return `{"error": "No register map found in document", "part_number": "...", "pages_searched": [...]}`
- If a peripheral has incomplete register data: Include partial data with `"incomplete": true` flag
- If reset value is not specified: Use `null`

---

## OUTPUT SCHEMA

Provide a SINGLE valid JSON object.

```json
{
"part_details": {
"part_number": "EXACT part number from document",
"datasheet_revision": "Revision/date string or null",
"description": "Brief component description"
},
"peripherals": [
{
"name": "SPI1",
"description": "Serial Peripheral Interface 1",
"base_address": "0x40013000",
"source_page": 42,
"incomplete": false,
"registers": [
{
"name": "CR1",
"description": "SPI control register 1",
"offset": "0x00",
"size": 32,
"reset_value": "0x00000000",
"access": "read-write",
"fields": [
{
"name": "BIDIMODE",
"description": "Bidirectional data mode enable",
"bit_offset": 15,
"bit_width": 1,
"access": "read-write",
"enumerated_values": [
{"name": "Unidirectional", "value": 0, "description": "2-line unidirectional data mode selected"},
{"name": "Bidirectional", "value": 1, "description": "1-line bidirectional data mode selected"}
]
},
{
"name": "SPE",
"description": "SPI enable",
"bit_offset": 6,
"bit_width": 1,
"access": "read-write",
"enumerated_values": []
}
]
}
]
}
]
}
```

---

## FINAL CHECKLIST

Before submitting, verify:
- [ ] `part_details.part_number` matches the document exactly
- [ ] Every peripheral with a register table has an entry
- [ ] Every register in each peripheral is included
- [ ] Every field has `bit_offset` and `bit_width` (not just a description)
- [ ] All addresses are hex strings with `0x` prefix
- [ ] `enumerated_values` is always an array
2 changes: 2 additions & 0 deletions src/extract.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ pub enum ExtractTask {
Pinout,
Power,
ReferenceDesign,
Registers,
}

impl ExtractTask {
Expand All @@ -101,6 +102,7 @@ impl ExtractTask {
ExtractTask::Pinout => prompts::pinout(),
ExtractTask::Power => prompts::power(),
ExtractTask::ReferenceDesign => prompts::reference_design(),
ExtractTask::Registers => prompts::registers(),
}
}

Expand Down
80 changes: 80 additions & 0 deletions src/prompts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ const PROMPT_LAYOUT_CONSTRAINTS: &str = include_str!("../prompts/extract-layout-
const PROMPT_PINOUT: &str = include_str!("../prompts/extract-pinout.md");
const PROMPT_POWER: &str = include_str!("../prompts/extract-power.md");
const PROMPT_REFERENCE_DESIGN: &str = include_str!("../prompts/extract-reference-design.md");
const PROMPT_REGISTERS: &str = include_str!("../prompts/extract-registers.md");

pub fn application_circuit() -> PromptSpec {
let mut spec = PromptSpec::new(
Expand Down Expand Up @@ -329,6 +330,85 @@ pub fn power() -> PromptSpec {
spec
}

pub fn registers() -> PromptSpec {
let mut spec = PromptSpec::new(
"registers",
"Peripheral register map extraction for PAC generation",
PROMPT_REGISTERS,
);
spec.schema = serde_json::from_str(r#"{
"type": "object",
"required": ["part_details", "peripherals"],
"properties": {
"part_details": {
Comment on lines +339 to +343
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt specifies an explicit error JSON shape when no register map is found, but the responseJsonSchema here only allows {part_details, peripherals}. With schema enforcement, the model can't return the documented error object and may be forced to fabricate data to satisfy the schema. Consider updating the schema to oneOf the success shape vs an {error, part_number, pages_searched} shape (or remove the error-response instruction from the prompt).

Copilot uses AI. Check for mistakes.
"type": "object",
"required": ["part_number"],
"properties": {
"part_number": {"type": "string"},
"datasheet_revision": {"type": ["string", "null"]},
"description": {"type": "string"}
}
},
"peripherals": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "base_address", "registers"],
"properties": {
"name": {"type": "string"},
"description": {"type": "string"},
"base_address": {"type": "string"},
"source_page": {"type": "integer"},
"incomplete": {"type": "boolean"},
"registers": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "offset", "fields"],
"properties": {
"name": {"type": "string"},
"description": {"type": "string"},
"offset": {"type": "string"},
"size": {"type": "integer"},
"reset_value": {"type": ["string", "null"]},
"access": {"type": "string", "enum": ["read-write", "read-only", "write-only", "read-writeOnce"]},
"fields": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "bit_offset", "bit_width"],
"properties": {
"name": {"type": "string"},
"description": {"type": "string"},
"bit_offset": {"type": "integer"},
"bit_width": {"type": "integer"},
"access": {"type": "string"},
"enumerated_values": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "value"],
"properties": {
"name": {"type": "string"},
"value": {"type": "integer"},
"description": {"type": "string"}
}
}
}
}
}
}
}
Comment on lines +344 to +401
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema currently does not require several fields that the prompt marks as mandatory (e.g., part_details.datasheet_revision/description, register access/size/reset_value, field enumerated_values). If the JSON schema is meant to enforce output completeness, these should be added to the relevant required lists (and enumerated_values should be required with type: array, even if empty).

Copilot uses AI. Check for mistakes.
}
}
}
}
}
}
}"#).expect("registers schema is valid JSON");
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the other prompt specs, this schema is parsed from a raw JSON string and uses expect(...), which would panic at runtime if edited incorrectly. For consistency and safer refactors, consider expressing this with json!({...}) (or at least .context(...) + ?) to avoid panics and keep the style consistent across prompt specs.

Suggested change
}"#).expect("registers schema is valid JSON");
}"#).unwrap_or_else(|e| panic!("registers schema JSON is invalid: {e}"));

Copilot uses AI. Check for mistakes.
spec
}

pub fn reference_design() -> PromptSpec {
let mut spec = PromptSpec::new(
"reference-design",
Expand Down
Loading