Skip to content

feat(signature): first US for ExpectationSignature (#206)#257

Open
Kakudou wants to merge 19 commits into
mainfrom
feat/us-int-1-signature-lifecycle
Open

feat(signature): first US for ExpectationSignature (#206)#257
Kakudou wants to merge 19 commits into
mainfrom
feat/us-int-1-signature-lifecycle

Conversation

@Kakudou
Copy link
Copy Markdown
Member

@Kakudou Kakudou commented May 27, 2026

That's the first US to introduce the SignatureExpectation for the injectors.

Proposed changes

SignatureManager

Unified signature lifecycle for OpenAEV injectors: compile pre-execution signatures, merge post-execution results, and ship structured output to the backend.

Architecture
flowchart LR
    subgraph pyoaev/signatures
        SM[SignatureManager]
        M[models.py]
        CFG["InjectorConfig\n(Network / Cloud / External)"]
    end
    subgraph pyoaev/apis
        API[SignatureApiManager]
    end
    subgraph Backend
        CB["/api/injects/{id}/callback"]
    end

    CFG -->|typed input| SM
    SM -->|compile_pre/post| M
    SM -->|send_signatures| API
    API -->|callback\nretry + chunk| CB
Loading

Injector configs (models.py) are the typed contract: one config = one signature row.
SignatureManager owns the domain logic (compile, merge, resolve IP).
SignatureApiManager owns the transport (validation, chunking, retry).

Quick Start
from pyoaev import OpenAEV
from pyoaev.signatures import (
    SignatureManager,
    NetworkInjectorConfig,
    build_network_configs,
)

client = OpenAEV(url="https://openaev.example.com", token="my-token")
sm = SignatureManager(client)

# 1. Build typed injector configs (one per distinct target asset)
configs = build_network_configs(["10.0.0.1", "2001:db8::1", "target.example.com"])
# or hand-build them: NetworkInjectorConfig(target_ipv4="10.0.0.1")
# or build a list: [NetworkInjectorConfig(target_ipv4="10.0.0.1"), NetworkInjectorConfig(target_ipv6="2001:db8::1"), NetworkInjectorConfig(target_domain="target.example.com")]
  
# 2. Compile pre-execution signatures (category is carried by the config type)
pre = sm.compile_pre_execution_signatures(config=configs)

# 3. Run your tool...

# 4. Compile post-execution signatures
post = sm.compile_post_execution_signatures(pre, tool_output)

# 5. Build the wire payload
payload = sm.build_payload(post, expectation_types=["DETECTION"])

# 6. Send to backend
sm.send_signatures(inject_id="abc-123", phase="execution_complete", signatures=payload)
Injector configs

The category is encoded in the config type. Pass a single config for a single-target inject,
or a homogeneous list for a multi-target inject. Mixing config types in a single call is rejected.

Config Required fields Optional fields Use case
NetworkInjectorConfig target_ipv4 / target_ipv6 / target_hostname Nuclei, Nmap, NetExec
CloudInjectorConfig cloud_provider, cloud_account_id, cloud_region target_service Prowler, Stratus
ExternalInjectorConfig query target_ipv4, target_hostname Shodan

InjectorConfig is the union type: NetworkInjectorConfig | CloudInjectorConfig | ExternalInjectorConfig.
SignatureManager adds start_time automatically (plus source_ipv4 / source_ipv6 for network).

Network
from pyoaev.signatures import NetworkInjectorConfig

# One distinct asset per config, never mix identities on the same target
cfg = NetworkInjectorConfig(target_ipv4="10.0.0.1")
cfg = NetworkInjectorConfig(target_ipv6="2001:db8::1")
cfg = NetworkInjectorConfig(target_hostname="api.example.com")

# Multi-target inject
configs = [
    NetworkInjectorConfig(target_ipv4="10.0.0.1"),
    NetworkInjectorConfig(target_hostname="api.example.com"),
]
pre = sm.compile_pre_execution_signatures(config=configs)
# -> list of dicts, one per target, all sharing the same source_ipv4 / start_time

####### Network builder

build_network_configs(targets) turns a heterogeneous list of strings, dicts, or already-typed
NetworkInjectorConfig into a clean list of typed configs. Strings are auto-classified into
IPv4 / IPv6 / hostname via the stdlib ipaddress module. Each input is treated as one distinct
asset — a target never mixes identities.

from pyoaev.signatures import build_network_configs

build_network_configs(["10.0.0.1", "2001:db8::1", "web.example.com"])
# -> [NetworkInjectorConfig(target_ipv4="10.0.0.1"),
#     NetworkInjectorConfig(target_ipv6="2001:db8::1"),
#     NetworkInjectorConfig(target_hostname="web.example.com")]

# dicts also work and are validated
build_network_configs([{"target_ipv4": "10.0.0.1"}])
Cloud
from pyoaev.signatures import CloudInjectorConfig

cfg = CloudInjectorConfig(
    cloud_provider="aws",
    cloud_account_id="123456789012",
    cloud_region="eu-west-1",
    target_service="ec2",  # optional
)

# Multi-region: one config per region
configs = [
    CloudInjectorConfig(cloud_provider="aws", cloud_account_id="123456789012", cloud_region=r)
    for r in ("us-east-1", "eu-west-1", "ap-southeast-1")
]
pre = sm.compile_pre_execution_signatures(config=configs)
External
from pyoaev.signatures import ExternalInjectorConfig

cfg = ExternalInjectorConfig(
    query="port:22 os:linux",
    target_ipv4="203.0.113.5",      # optional
    target_hostname="ssh.example.com",  # optional
)
pre = sm.compile_pre_execution_signatures(config=cfg)
Compiled output shapes

compile_pre_execution_signatures returns a single flat dict for one config, or a list of dicts
for a list of configs. None fields are stripped.

# Network single target
{
    "start_time": "2024-06-26T06:06:06Z",
    "source_ipv4": "172.17.0.2",
    "target_ipv4": "10.0.0.1",
}

# Cloud single region
{
    "start_time": "2024-06-26T06:06:06Z",
    "cloud_provider": "aws",
    "cloud_account_id": "123456789012",
    "cloud_region": "eu-west-1",
    "target_service": "ec2",
}

# External
{
    "start_time": "2024-06-26T06:06:06Z",
    "target_ipv4": "203.0.113.5",
    "query": "port:22 os:linux",
}

compile_post_execution_signatures(pre, tool_output) preserves the input shape (dict in, dict out;
list in, list out) and adds end_time, execution_status, and optional partial_results.

# tool_output examples → execution_status
{}                                                  # -> "success"
{"status": "partial"}                               # -> "partial"
{"error_info": {"exit_code": 1}}                    # -> "failed"
{"timeout_info": {"partial_results": ["host-a"]}}   # -> "timeout"

Anything in tool_output["extra_signatures"] is merged into the final dict verbatim, useful for
injector-specific fields like parent_process_name or custom signal types.

Failure modes
Trigger Result
Empty list passed to compile_pre_execution_signatures ValueError
List mixing config types (e.g. Network + Cloud) ValueError
NetworkInjectorConfig with zero or more than one identity field ValidationError
build_network_configs item that's neither str, dict, nor a NetworkInjectorConfig TypeError
Malformed tool_output in post-execution OpenAEVError
Lifecycle Flow
sequenceDiagram
    participant Injector
    participant SM as SignatureManager
    participant API as SignatureApiManager
    participant Backend

    Injector->>SM: compile_pre_execution_signatures(config)
    SM-->>Injector: pre_signatures dict/list

    Note over Injector: Tool executes...

    Injector->>SM: compile_post_execution_signatures(pre, tool_output)
    SM-->>Injector: merged signatures

    Injector->>SM: build_payload(post, target_meta, expectation_type)
    SM-->>Injector: nested wire payload

    Injector->>SM: send_signatures(inject_id, phase, signatures)
    SM->>API: send_signatures(inject_id, phase, signatures)
    API->>API: validate + normalize + chunk if needed
    API->>Backend: POST /api/injects/{id}/callback
    Backend-->>API: 200/202
Loading
Transport Behaviour
  • Auto-chunking: payloads exceeding max_payload_size (default 1 MiB) are split by target and sent sequentially with chunk_index / total_chunks metadata.
  • Retry: 5xx errors trigger up to 3 retries with exponential backoff (1s, 2s, 4s).
  • No retry on 4xx: client errors raise SignatureTransmissionError immediately.
Wire Format

Payloads follow the nested schema expected by the callback endpoint:

{
  "phase": "execution_complete",
  "expectation_signature": {
    "targets": [
      {
        "signature_values": [
          {
            "expectation_type": "DETECTION",
            "values": [
              { "signature_type": "source_ipv4", "signature_value": "172.17.0.2" },
              { "signature_type": "target_ipv4", "signature_value": "10.0.0.1" },
              { "signature_type": "start_time", "signature_value": "2024-06-26T06:06:06Z" },
              { "signature_type": "end_time", "signature_value": "2024-06-26T06:06:09Z" },
              { "signature_type": "execution_status", "signature_value": "success" }
            ]
          }
        ]
      }
    ]
  }
}

Known signature_type labels live in pyoaev.signatures.SignatureTypes
(source_ipv4_address, target_hostname_address, cloud_region, query, ...). The wire format
itself accepts any string, so injectors are free to add custom types via tool_output.extra_signatures.

Utility
ip = sm.resolve_container_ip()  # "172.17.0.2" or "unknown" with a warning

Resolution strategy: CONTAINER_IP env var > socket.gethostbyname > hostname -i > "unknown".
The result is cached for the lifetime of the manager and IPv6 is sniffed best-effort alongside.

Related issues

Checklist

  • I consider the submitted work as finished
  • I tested the code for its functionality
  • I wrote test cases for the relevant uses case
  • I added/update the relevant documentation (either on github or on notion)
  • Where necessary I refactored code to improve the overall quality
  • For bug fix -> I implemented a test that covers the bug

Further comments

image Screenshot from 2026-05-25 17-44-17

@github-actions github-actions Bot added the filigran team use to identify PR from the Filigran team label May 27, 2026
@Kakudou Kakudou linked an issue May 27, 2026 that may be closed by this pull request
@Kakudou Kakudou changed the title [ExpectationSignature] Add new ContractOutputType: ExpectationSignature #206 [ExpectationSignature] feat(ContractOutputType): ExpectationSignature( #206) May 27, 2026
@Kakudou Kakudou changed the title [ExpectationSignature] feat(ContractOutputType): ExpectationSignature( #206) [client-python] feat(ContractOutputType): ExpectationSignature( #206) May 27, 2026
@Kakudou Kakudou changed the title [client-python] feat(ContractOutputType): ExpectationSignature( #206) [client-python] feat(signature): ExpectationSignature( #206) May 27, 2026
@Kakudou Kakudou changed the title [client-python] feat(signature): ExpectationSignature( #206) [client-python] feat(signature): first US for ExpectationSignature( #206) May 27, 2026
@Kakudou Kakudou changed the title [client-python] feat(signature): first US for ExpectationSignature( #206) [client-python] feat(signature): first US for ExpectationSignature (#206) May 27, 2026
@Kakudou
Copy link
Copy Markdown
Member Author

Kakudou commented May 27, 2026

Initial message edited to reflect the new behavior based on review/usages.
The way we defined the inject_config and the category was tedious to use, so instead i've created 3new models:
NetworkInjectorConfig, CloudInjectorConfig and ExternalInjectorConfig the usage of one of them (can't be mixin) define the type of injector.

Also as for the NetworkInjectorConfig, we can use a builder to quickly create them from a list of targets (from Targets.extract_targets()>.targets per example) or from a dict:

build_network_configs(["10.0.0.1", "2001:db8::1", "web.example.com"])
build_network_configs([{"target_ipv4": "10.0.0.1"}])

@guzmud
Copy link
Copy Markdown
Member

guzmud commented May 28, 2026

Quick question @Kakudou about adc201b : wouldn't it make sense to update main with a different PR (CI-oriented to use the dev requirements as you suggested) and rebase this one? (now that we are in rolling release, we can use main like that)

Comment thread pyoaev/signatures/models.py Outdated


class NetworkInjectorConfig(BaseModel):
"""A single network target. Exactly one of ``target_ipv4``, ``target_ipv6``, or ``target_hostname``."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a pydantic model_validator (eventually a mode=before) to ensure there is at least one but only one of those three options?

Something like

@model_validator(mode='before')
def check_one(cls, data):
    assert sum(value != None for key, value in data.items() if key in ['target_ipv4', 'target_ipv6', 'target_hostname']) == 1
    return data

@guzmud

This comment was marked as outdated.

@guzmud
Copy link
Copy Markdown
Member

guzmud commented Jun 2, 2026

@Kakudou to keep you updated:

when you speak about send_signatures it seems to be limited to a single expectation type, cf. your example

{
  "phase": "execution_complete",
  "expectation_signature": {
    "targets": [
      {
        "signature_target": { "agent": "...", "asset": "...", "asset_group": "..." },
        "signature_values": [
          {
            "expectation_type": "DETECTION",
            "values": [
              { "signature_type": "source_ipv4", "signature_value": "172.17.0.2" },
              { "signature_type": "target_ipv4", "signature_value": "10.0.0.1" },
              { "signature_type": "start_time", "signature_value": "2024-06-26T06:06:06Z" },
              { "signature_type": "end_time", "signature_value": "2024-06-26T06:06:09Z" },
              { "signature_type": "execution_status", "signature_value": "success" }
            ]
          }
        ]
      }
    ]
  }
}

yet according to the engineering made beforehand (as in US-5), multiple expectation types could be found there

  "expectation_signatures": {
    "DETECTION": {
      "source_ipv4": "172.18.0.5",
      "target_ipv4": "192.168.1.10",
      "target_hostname": "webserver.corp.local",
      "start_time": "2026-04-20T10:00:00Z",
      "end_time": "2026-04-20T10:05:30Z"
    },
    "PREVENTION": {
      "source_ipv4": "172.18.0.5",
      "target_ipv4": "192.168.1.10",
      "start_time": "2026-04-20T10:00:00Z",
      "end_time": "2026-04-20T10:05:30Z"
    },
    "VULNERABILITY": {
      "cves_tested": ["CVE-2023-1234"],
      "cves_found_vulnerable": ["CVE-2023-1234"],
      "target_ipv4": "192.168.1.10"
    }
  },

you anticipated that properly in your models, SignaturePayload expects a list of TargetSignatures indeed but inside those the signature_values are a list of ExpectationSignatureGroup

class TargetSignatures(BaseModel):
  """A target plus everything observed about it, grouped by expectation."""

  model_config = ConfigDict(extra="allow")

  signature_target: SignatureTarget
  signature_values: list[ExpectationSignatureGroup]


class SignaturePayload(BaseModel):
  """Inner ``signatures`` body: a list of targets, nothing else."""

  model_config = ConfigDict(extra="allow")

  targets: list[TargetSignatures]

but this doesn't translate into your function build_payload where expectation_type is a single type

    def build_payload(
        self,
        post_signatures: dict[str, Any] | list[dict[str, Any]],
        targets_meta: dict[str, str] | list[dict[str, str]],
        expectation_type: str = "DETECTION",
    ) -> dict[str, Any]:

leading to a single ExpectationSignatureGroup in your signature target

            targets.append(
                TargetSignatures(
                    signature_target=SignatureTarget(**meta),
                    signature_values=[
                        ExpectationSignatureGroup(
                            expectation_type=expectation_type, values=values
                        )
                    ],
                )
            )

I've made a small commit 8b7848f in order to allow for multiple expectation types in the build_payload.

cc @Megafredo @mariot

@guzmud guzmud force-pushed the feat/us-int-1-signature-lifecycle branch from 8b7848f to 9af20a8 Compare June 2, 2026 15:03
Comment thread pyoaev/signatures/models.py Outdated
@guzmud guzmud force-pushed the feat/us-int-1-signature-lifecycle branch from d162a92 to 68a7c63 Compare June 4, 2026 08:09
@guzmud guzmud changed the title [client-python] feat(signature): first US for ExpectationSignature (#206) feat(signature): first US for ExpectationSignature (#206) Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 85.60000% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.04%. Comparing base (181cffd) to head (a76467d).

Files with missing lines Patch % Lines
pyoaev/signatures/signature_manager.py 76.98% 29 Missing ⚠️
pyoaev/apis/signature.py 88.46% 15 Missing ⚠️
pyoaev/signatures/models.py 90.09% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #257      +/-   ##
==========================================
+ Coverage   70.34%   73.04%   +2.69%     
==========================================
  Files          49       53       +4     
  Lines        1966     2341     +375     
==========================================
+ Hits         1383     1710     +327     
- Misses        583      631      +48     
Flag Coverage Δ
connectors 73.04% <85.60%> (+2.69%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread pyoaev/signatures/signature_manager.py Outdated
post_signatures = [post_signatures]

# Validate expectation types
expectation_types_valid = [
Copy link
Copy Markdown
Member

@guzmud guzmud Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point @Megafredo and it makes sense, if an expectation isn't valid this will raise an error (make sense, that's the objective 👍 ).

But when we iterate later on line 214 (for expectation_type in expectation_types_valid:), it's for two reasons:

  1. extra_signatures[expectation_type] (meaning the key of the dict is ExpectationType.DETECTION - usually we prefer str as keys afaik 🤔 )
  2. expectation_type.value (so the expectation_type value provided originally in the input of the function, if it goes through the check)

So, I've got two suggestions for you that seems to be a bit better if you're ok with it.

Option 1

  • Line 200, we just check but don't keep the value: [ExpectationType(expectation_type.upper()) for expectation_type in expectation_types]. This will raise an issue if it doesn't work, so we're good.
  • Line 229 it's just expectation_type without the .value
  • the input dict extra_signature would have strings rather than ExpectationType as key

Option 2

Basically, we do the heavy-lifting outside of the signature_manage.py

  • We add a new model in pyoaev.signatures.models, something like ExtraSignaturesData
  • We either hardcode parameters in the model such as prevention: dict[str, JsonValue], expectation: dict[str, JsonValue], vulnerabilities: dict[str, JsonValue]
  • or we use a trick I suggested on collector-side (see below) that check on-the-fly the key names in the pydantic model
  • in signature_manager.py we just type extra_signature as the new ExtraSignaturesData rather than a dict
  • you don't have to check if the keys match ExpectationType here (we do it in the model, one way or an other) + the developers have a model to check for, easier than to look inside this function

guzmud's opinion

On my side, I prefer option 2 or something like that because it splits the issue of formatting/serializing properly the input (and it's done models side, plus it's easier to document and so forth) and the issue of using the data (which is the point of the current function).

on-the-fly checking

The trick in question is too on-the-fly check if the parameters are named after signature type. In our current case, it would be ExpectationType, so we need to change a bit the definition of _allowed_values, but that's the rough idea:

class OAEVData(BaseModel, extra="allow"):
    """
    Source-side version of OAEV formatted data.
    Apart from context, the allowed fields are signature types (e.g. parent_process_name)
    """

    __pydantic_extra__: dict[str, str] = Field(default_factory=dict)
    _allowed_values: ClassVar[frozenset[str]] = frozenset(
        [sig.value for sig in SignatureTypes]
    )

    @model_validator(mode="before")
    @classmethod
    def check_field_names(cls, data: Any) -> Any:
        """Check whether the fields provided through extra are actually signature types"""
        if isinstance(data, dict):
            for key in data:
                if key not in cls._allowed_values:
                    raise ValueError("Only signature types are allowed")
        return data

Note that this trick may confuse IDE and could make auto-documentation a bit harder, so hardcoding the parameters (prevention, detection, etc.) could be just simpler and better.

Copy link
Copy Markdown
Member

@guzmud guzmud Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more straightforward,

in pyoaev.signatures.models,

from pydantic import BaseModel, JsonValue, Field
class ExtraSignatureData(BaseModel):
    detection: dict[str, JsonValue] | None = Field(default_factory=dict)
    prevention: dict[str, JsonValue] | None = Field(default_factory=dict)
    vulnerability: dict[str, JsonValue] | None = Field(default_factory=dict)

and in pyoaev.signatures.signature_manager,

def build_payload(
    post_signatures: dict[str, Any] | list[dict[str, Any]],
    expectation_types: list[str],
    extra_signature: ExtraSignatureData| None = None,  # we change the typing for extra_signature
) -> dict[str, Any]:`

The question is how to easily access the value once inside the function. We can't do signature_data.update(extra_signatures[expectation_type]) since extra_signatures is not a dict anymore.

We have multiple options, but after a bit of thoughts, this seems to be the easiest for the dev-experience (although it asks us to maintain a bit more hardcoded things).

from pydantic import BaseModel, JsonValue, Field
class ExtraSignatureData(BaseModel):
    detection: dict[str, JsonValue] | None = Field(default_factory=dict)
    prevention: dict[str, JsonValue] | None = Field(default_factory=dict)
    vulnerability: dict[str, JsonValue] | None = Field(default_factory=dict)

    def get_extra(self, expectation_type: str):
        if expectation_type == "detection":
            return self.detection
        if expectation_type == "prevention":
            return self.prevention
        if expectation_type == "vulnerability":
            return self.vulnerability
        raise ValueError(
            f"Expectation type should be one of the available parameters: {list(self.__fields__.keys())}"
        )

@guzmud guzmud force-pushed the feat/us-int-1-signature-lifecycle branch from f5d832c to fb74062 Compare June 5, 2026 15:08
@guzmud
Copy link
Copy Markdown
Member

guzmud commented Jun 5, 2026

Nota bene for @mariot and @Kakudou !

In order to have a better control regarding extra_signatures and being able to distribute it according to expectations, @Megafredo and me:

  • moved extra_signatures outside of OutputTool (focused now more on serializing metadata regarding a tool run)
  • changed the design of extra signatures to be attached to 1+ expectation type (since some extra only makes sense for some specific expectation types)
  • created a new ExtraSignatureData model to match this new design(with ExtraSignatureData.detection, ExtraSignatureData.prevention, ExtraSignatureData.vulnerabilities)
  • added this new model as an input for the build_payload function
  • and used pydantic.JsonValue rather than str in a couple of places to better match the specs

This gives us the ability to spread the extra signatures into the various expectation types while providing a stronger typing/definition for the dev-experience.

…tation type through the build_payload (#206)

rather than handling global extra signatures through tool output in
post_exec_compile
@guzmud guzmud force-pushed the feat/us-int-1-signature-lifecycle branch from fb74062 to a76467d Compare June 5, 2026 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

filigran team use to identify PR from the Filigran team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new ContractOutputType: ExpectationSignature

4 participants