Skip to content

feat: add language variant support#286

Open
userlaurin wants to merge 13 commits intospcl:masterfrom
userlaurin:feature/language-variants
Open

feat: add language variant support#286
userlaurin wants to merge 13 commits intospcl:masterfrom
userlaurin:feature/language-variants

Conversation

@userlaurin
Copy link
Contributor

@userlaurin userlaurin commented Feb 27, 2026

  • Add --language-variant CLI flag to local start and experiment invoke
  • Select variant-tagged Docker images (run.local.lang.variant.ver)
  • Apply per-variant patch files (benchmarks/bm/lang/variant/patch.diff) during code packaging
  • Extend cache keys to include variant; store variant in FunctionConfig
  • Add build_docker_images.py support for variant image tags
  • Added requirement for language variant patches "patch-ng" that can apply a patch for a variant to the default

Summary by CodeRabbit

  • New Features

    • Added language variants support (Bun, LLRT, PyPy) with --language-variant CLI option
    • Enhanced caching and container management for variant-aware deployments
  • Documentation

    • Added comprehensive Language Variants deployment and integration guide

- Add --language-variant CLI flag to local start and experiment invoke
- Select variant-tagged Docker images (run.local.<lang>.<variant>.<ver>)
- Apply per-variant patch files (benchmarks/<bm>/<lang>/<variant>/patch.diff)
  during code packaging
- Extend cache keys to include variant; store variant in FunctionConfig
- Add build_docker_images.py support for variant image tags
@coderabbitai
Copy link

coderabbitai bot commented Feb 27, 2026

Warning

Rate limit exceeded

@userlaurin has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 55 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 400b02e1-447d-4726-a5f3-9ae7a936c8e7

📥 Commits

Reviewing files that changed from the base of the PR and between e66a5c6 and 3d64355.

📒 Files selected for processing (5)
  • config/nodejs.json
  • config/storage.json
  • docs/build.md
  • sebs/cpp_dependencies.py
  • sebs/faas/function.py
📝 Walkthrough

Walkthrough

Introduces language variant support enabling multiple runtime implementations (Bun, LLRT, PyPy) per language. Adds variant configuration, deserialization, caching, and deployment logic across the benchmarking framework. Removes deprecated C++ dependencies module and storage configuration.

Changes

Cohort / File(s) Summary
Variant Core Infrastructure
sebs/benchmark.py, sebs/faas/function.py, sebs/config.py
New LanguageSpec class models languages with variants. BenchmarkConfig replaced languages with language_specs and added supported_variants() and supports() methods. Benchmark extended with language_variant and cache_language_key properties; hash_directory() now accepts variant parameter. New Variant and Architecture classes in function.py with per-language enum namespaces. Runtime dataclass updated to carry variant field with language-specific defaults.
Cache and Configuration Updates
sebs/cache.py, sebs/config.py
Cache layer updated to use cache_language_key instead of language_name in code package operations. SeBSConfig.supported_variants() method added to retrieve variant lists per deployment and language.
CLI and Dependencies
sebs.py, requirements.txt
Added --language-variant CLI parameter and propagated through config as experiments.runtime.variant. Added patch-ng dependency for variant patch application.
Provider Implementations
sebs/aws/aws.py, sebs/local/local.py
Minor formatting in AWS provider. Local provider updated to extend container names with variant suffix when non-default.
Documentation
docs/build.md
New Language Variants section covering patch/overlay variant strategies, registration in systems.json, Dockerfile placement, image tagging, cache isolation, and variant addition checklist with examples.
Removals
sebs/cpp_dependencies.py, config/storage.json
Deleted entire C++ dependencies module with CppDependencyConfig, CppDependencies enum, and Docker/CMake generation utilities. Removed storage configuration (minio and scylladb sections).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (sebs.py)
    participant BenchConfig as BenchmarkConfig
    participant Benchmark as Benchmark
    participant Runtime as Runtime
    participant FunctionConfig as FunctionConfig
    participant Variant as Variant Resolution

    CLI->>BenchConfig: parse language_variant param
    CLI->>BenchConfig: deserialize language_specs
    BenchConfig->>Variant: validate supports(language, variant)
    Variant-->>BenchConfig: variant supported
    BenchConfig->>Benchmark: initialize with config
    Benchmark->>Benchmark: set language_variant from config.runtime.variant
    Benchmark->>Benchmark: compute cache_language_key (language+variant)
    Benchmark->>Benchmark: hash_directory(dir, deploy, language, variant)
    Benchmark->>Benchmark: apply patch or overlay based on variant
    FunctionConfig->>Runtime: construct Runtime with variant
    Runtime->>Variant: for_language(language, variant_value)
    Variant-->>Runtime: assign language-specific enum
    Runtime-->>FunctionConfig: Runtime with variant field
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mcopik

Poem

🐰 Variants hop in, patch by patch,
Each language finds its perfect match!
PyPy, Bun, and LLRT flow,
Cache keys jingle, fast they go!
A rabbit's blessing on this quest—
Runtime choices, truly blessed! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add language variant support' accurately and concisely summarizes the main change—introducing language variant support across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

laurin and others added 3 commits February 27, 2026 15:26
- Replace flat Variant enum with nested namespace class (Variant.Python,
  Variant.NodeJS) keyed by Language in Variant._LANG_MAP; new languages
  add one nested Enum + one map entry
- Fix Runtime.__post_init__ to skip variant resolution when language is
  None (used as sentinel in ExperimentConfig)
- Fix FunctionConfig._from_benchmark to carry variant through to Runtime
- Remove FIXME comments and unnecessary forward-reference quotes now that
  benchmark.py has from __future__ import annotations
- Document patch-based vs overlay-based variant formats in benchmark.py
- Simplify build_variant_language to delegate to build() so SeBS version
  suffix and future push support are handled consistently
@userlaurin userlaurin marked this pull request as ready for review March 2, 2026 14:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tools/build_docker_images.py (1)

47-55: ⚠️ Potential issue | 🔴 Critical

Critical: Image tag mismatch with local.py will cause container startup failures.

The build script appends the SeBS version suffix to image tags (line 55), producing tags like:

run.local.nodejs.bun.18-1.2.0

However, sebs/local/local.py lines 149-160 construct container names without this suffix:

# For variant != "default"
container_name = "{}:run.local.{}.{}.{}".format(
    self._system_config.docker_repository(),
    code_package.language_name,
    variant,
    code_package.language_version,
)

This produces: run.local.nodejs.bun.18

Docker will fail to find the images when starting containers because the tags don't match. Either the build script must stop appending the SeBS version suffix to tags, or local.py must be updated to include the suffix when referencing built images.

🧹 Nitpick comments (2)
tools/build_docker_images.py (1)

102-115: Consider extracting shared config iteration logic.

The config iteration logic (lines 104-112) duplicates build_language (lines 84-92). While acceptable for a small script, extracting this could reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/build_docker_images.py` around lines 102 - 115, The loop that builds
the version/base_image list is duplicated between build_variant_language and
build_language; extract that logic into a helper (e.g., get_language_configs or
iterate_language_configs) that accepts language_config and args.language_version
and returns the list of (version, base_image) pairs (handling the "base_images"
-> "x64" branch and the fallback [None, None]); then replace the duplicated
loops in build_variant_language and build_language to call this helper and
iterate its result before calling build("run", ...) so behavior remains
identical.
docs/build.md (1)

117-119: Add language specifiers to fenced code blocks.

Static analysis flagged several code blocks showing directory structures without a language specifier. Consider using text or plaintext for consistency:

  • Lines 117-119
  • Lines 128-135
  • Lines 147-154
  • Lines 195-197
  • Lines 201-205
  • Lines 209-211
Example fix
-```
+```text
 benchmarks/<id>/<language>/<variant>/
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/build.md around lines 117 - 119, Several fenced code blocks (e.g., the
block containing the literal "benchmarks////") are
missing language specifiers; update each of the listed fenced code blocks to use
a language token such as text or plaintext (for example change totext)
so static analysis stops flagging them; locate the exact fence instances by
searching for the literal snippets shown in the diff (e.g.,
"benchmarks////") and apply the same fix to the other
blocks mentioned in the comment.


</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @docs/build.md:

  • Around line 170-178: sebs/config.py currently uses the wrong key when reading
    language variants: update the SeBSConfig.supported_variants() (or the method
    that returns supported variants) to read "variant_images" from the languages
    entry instead of "supported_variants" so it matches docs and
    tools/build_docker_images.py; change the get call that currently does
    languages.get(language_name, {}).get("supported_variants", ["default"]) to use
    "variant_images" with the same default (["default"]) so configured
    variant_images are respected throughout the codebase.

In @sebs/aws/aws.py:

  • Around line 135-136: The comparison and dict lookup are using language.value
    (a string) against Language enum members and CONFIG_FILES keys (which are enum
    members), causing mismatches and KeyError; change the condition to compare the
    enum directly (e.g., use language == Language.PYTHON or language in
    (Language.PYTHON, Language.NODEJS)) and use the enum language (not
    language.value) when indexing CONFIG_FILES to retrieve package_config;
    alternatively, if CONFIG_FILES must use strings, normalize keys consistently but
    prefer the simpler fix of comparing and indexing with the Language enum
    (symbols: language, Language, CONFIG_FILES, package_config).
  • Line 155: The comparison uses language.value (a string) against an enum member
    (Language.CPP), causing mismatches; update the check in the same block (the elif
    that currently reads language.value == Language.CPP) to compare like-for-like —
    either compare the enum instance (language == Language.CPP) or compare the
    string value to the enum's value (language.value == Language.CPP.value) so the
    condition correctly detects the C++ branch.
  • Around line 177-182: The comparisons are incorrectly using language.value (a
    string) against Language enum members; update the conditional checks to compare
    the enum itself (e.g., change checks to language == Language.NODEJS, language ==
    Language.CPP, and language in [Language.PYTHON]) and keep using language.value
    when building return strings (e.g., f"{language.value}{language_version}" or
    f"{language.value}{language_version}.x" as appropriate) so the logic in aws.py
    matches enum semantics.

In @sebs/benchmark.py:

  • Around line 264-266: The hash calculation is passing a string
    (self.language_name) into Benchmark.hash_directory which expects a Language enum
    to index FILES and WRAPPERS, causing a KeyError; change the call to pass the
    Language enum (e.g., self.language or convert the name to the enum via
    Language[self.language_name] or Language(self.language_name) depending on how
    Language is defined) so Benchmark.hash_directory receives a Language enum rather
    than a string; ensure any other callers also pass the enum and update the call
    site that currently uses self.language_name accordingly.
  • Around line 53-61: LanguageSpec.deserialize currently imports Language from
    sebs.faas.function (wrong enum), causing missing CPP support; remove the local
    import in deserialize and use the module-level Language (sebs.types.Language)
    already imported at module scope so LanguageSpec.deserialize(...) resolves CPP
    and other language variants correctly, and ensure both branches (str and dict)
    call Language.deserialize from the module-level enum.

In @sebs/config.py:

  • Around line 54-63: The supported_variants method in sebs/config.py is reading
    the wrong key from the systems config; update the lookup inside
    supported_variants (the languages variable usage) to read "variant_images"
    instead of "supported_variants" so it matches tools/build_docker_images.py and
    docs, while preserving the fallback of ["default"]; adjust the get call on
    languages.get(language_name).get(...) to use "variant_images" and leave behavior
    otherwise unchanged.

In @sebs/faas/function.py:

  • Around line 335-339: The error message in deserialize(val: str) for
    Architecture is misleading because it logs the loop variable member instead of
    the invalid input; update the exception to include the original val (e.g., raise
    Exception(f"Unknown architecture type {val}")) so the error reports the actual
    bad config value; locate the deserialize function and replace the exception
    message that currently references member with one referencing val.
  • Around line 271-324: Remove the locally defined Language enum so the module
    uses the imported sebs.types.Language (avoids enum identity mismatches), update
    Variant._LANG_MAP to include a mapping for Language.CPP -> Variant., and ensure Variant.for_language continues
    to use the mapped enum class for lookups; also fix Architecture.deserialize to
    reference the input parameter (val) in its error message instead of the
    undefined variable member.

In @sebs/local/local.py:

  • Around line 147-160: The container image tag is missing the SeBS version
    suffix; retrieve the SeBS version from system config (e.g., sebs_version =
    self._system_config.sebs_version()) and include it in the formatted
    container_name strings so the tag matches build_docker_images.py output. Update
    both branches that set container_name (the two format calls that use
    self._system_config.docker_repository(), code_package.language_name, variant,
    and code_package.language_version) to insert the sebs_version token (e.g., as
    .{sebs_version} before the language_version) so both default and variant paths
    produce identical SeBS-versioned tags.

Nitpick comments:
In @docs/build.md:

  • Around line 117-119: Several fenced code blocks (e.g., the block containing
    the literal "benchmarks////") are missing language
    specifiers; update each of the listed fenced code blocks to use a language token
    such as text or plaintext (for example change totext) so static analysis
    stops flagging them; locate the exact fence instances by searching for the
    literal snippets shown in the diff (e.g.,
    "benchmarks////") and apply the same fix to the other
    blocks mentioned in the comment.

In @tools/build_docker_images.py:

  • Around line 102-115: The loop that builds the version/base_image list is
    duplicated between build_variant_language and build_language; extract that logic
    into a helper (e.g., get_language_configs or iterate_language_configs) that
    accepts language_config and args.language_version and returns the list of
    (version, base_image) pairs (handling the "base_images" -> "x64" branch and the
    fallback [None, None]); then replace the duplicated loops in
    build_variant_language and build_language to call this helper and iterate its
    result before calling build("run", ...) so behavior remains identical.

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `89084e4e-de4a-4e55-a9fb-905420aca9aa`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between ce1dacbbbad71f3fce877f7adcd15f131b6928e9 and 2aba9eacf9d7af2ab1863e769a87ddb1bb9d3488.

</details>

<details>
<summary>📒 Files selected for processing (10)</summary>

* `docs/build.md`
* `requirements.txt`
* `sebs.py`
* `sebs/aws/aws.py`
* `sebs/benchmark.py`
* `sebs/cache.py`
* `sebs/config.py`
* `sebs/faas/function.py`
* `sebs/local/local.py`
* `tools/build_docker_images.py`

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

laurin added 2 commits March 4, 2026 14:54
Language was defined twice - once in sebs/types.py (str, Enum) and again
in sebs/faas/function.py (plain Enum, missing CPP). This caused identity
comparisons like `language == Language.NODEJS` to silently fail, requiring
the workaround of comparing `.value` instead which was removed again in this commit.

Remove the duplicate definition and update all import sites in benchmark.py
to use the canonical Language from sebs/types.py.
…) to get back to the current handling of the master repo
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
sebs/faas/function.py (2)

323-327: ⚠️ Potential issue | 🟡 Minor

Use val in the invalid-architecture error message.

Line 327 reports {member} instead of the bad input, so failures show the wrong value.

Proposed fix
-        raise Exception(f"Unknown architecture type {member}")
+        raise Exception(f"Unknown architecture type {val}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/faas/function.py` around lines 323 - 327, In deserialize(val: str) in
function.py the raised Exception reports the last loop variable member instead
of the invalid input; change the error to include the provided val (e.g., raise
Exception(f"Unknown architecture type {val}")) so the message shows the actual
bad input when Architecture parsing fails; keep function and Architecture
references unchanged.

309-312: ⚠️ Potential issue | 🔴 Critical

Add C++ default variant mapping to prevent runtime breakage.

Line 309-312 omits Language.CPP, so Variant.for_language(...) fails for C++ in both Runtime.deserialize (Line 355) and FunctionConfig._from_benchmark (Line 374).

Proposed fix
 class Variant:
@@
     class NodeJS(Enum):
         DEFAULT = "default"
         BUN = "bun"
         LLRT = "llrt"
+
+    class CPP(Enum):
+        DEFAULT = "default"
@@
 Variant._LANG_MAP: Dict[Language, Type[Enum]] = {
     Language.PYTHON: Variant.Python,
     Language.NODEJS: Variant.NodeJS,
+    Language.CPP: Variant.CPP,
 }

Also applies to: 351-356, 371-375

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/faas/function.py` around lines 309 - 312, The mapping Variant._LANG_MAP
is missing an entry for C++ which causes Variant.for_language(...) to fail in
Runtime.deserialize and FunctionConfig._from_benchmark; add Language.CPP:
Variant.Cpp (or the actual C++ Variant class name used) to Variant._LANG_MAP
alongside Language.PYTHON and Language.NODEJS so Variant.for_language can
resolve C++ variants correctly, and update any other similar maps around
Runtime.deserialize and FunctionConfig._from_benchmark if duplicated.
sebs/benchmark.py (1)

262-264: ⚠️ Potential issue | 🔴 Critical

Normalize language key type usage (Enum vs string) in hashing and overlay copy paths.

Line 263 passes self.language_name (string), while Line 355 and Line 376 index enum-keyed maps; Line 475 also switches to string-key indexing. This inconsistency can break variant hashing/copy behavior.

Proposed fix
-        self._hash_value = Benchmark.hash_directory(
-            path, self._deployment_name, self.language_name, self._language_variant
-        )
+        self._hash_value = Benchmark.hash_directory(
+            path, self._deployment_name, self.language, self._language_variant
+        )
@@
-    def hash_directory(directory: str, deployment: str, language: str, variant: str = "default"):
+    def hash_directory(
+        directory: str, deployment: str, language: Language, variant: str = "default"
+    ):
@@
             wrappers = project_absolute_path(
-                "benchmarks", "wrappers", deployment, language, wrapper
+                "benchmarks", "wrappers", deployment, language.value, wrapper
             )
@@
-                for file_type in FILES[self.language_name]:
+                for file_type in FILES[self.language]:

Also applies to: 341-355, 377-379, 475-476

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/benchmark.py` around lines 262 - 264, The code mixes enum keys and
string keys when building hashes and accessing language-indexed maps: normalize
to a single key type by converting language_name to the Enum (or vice versa)
before use; update Benchmark.hash_directory calls and any places that index
language-keyed dicts (e.g., places that use self.language_name, overlay copy
logic, and any map lookups referenced around hash computation and copy paths) to
consistently use the Enum member (or consistently use its .name/.value) across
Benchmark.hash_directory, the overlay copy routines, and all map accesses so the
same key type is used everywhere.
🧹 Nitpick comments (1)
sebs/benchmark.py (1)

50-51: Return a copy from LanguageSpec.variants to protect internal state.

Exposing the backing list allows callers to mutate benchmark config state implicitly.

Proposed refactor
     `@property`
     def variants(self) -> List[str]:
-        return self._variants
+        return list(self._variants)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/benchmark.py` around lines 50 - 51, LanguageSpec.variants currently
returns the internal list self._variants directly, exposing mutable internal
state; change the method to return a shallow copy (e.g., return
list(self._variants) or self._variants.copy()) so callers cannot mutate the
LanguageSpec._variants backing list, and ensure the return type remains
List[str] to preserve the signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sebs/faas/function.py`:
- Line 315: A new local enum class named Architecture (class Architecture(Enum))
conflicts with an imported Architecture symbol (imported at the top of the
file), causing an F811 redefinition; fix by either removing the imported
Architecture from the imports and updating any external references accordingly,
or rename the local enum (e.g., LocalArchitecture or BackendArchitecture) and
update all local references in this file (constructors, type hints, comparisons,
and any usages of Architecture in sebs/faas/function.py) to the new name so
there is no name collision.

---

Duplicate comments:
In `@sebs/benchmark.py`:
- Around line 262-264: The code mixes enum keys and string keys when building
hashes and accessing language-indexed maps: normalize to a single key type by
converting language_name to the Enum (or vice versa) before use; update
Benchmark.hash_directory calls and any places that index language-keyed dicts
(e.g., places that use self.language_name, overlay copy logic, and any map
lookups referenced around hash computation and copy paths) to consistently use
the Enum member (or consistently use its .name/.value) across
Benchmark.hash_directory, the overlay copy routines, and all map accesses so the
same key type is used everywhere.

In `@sebs/faas/function.py`:
- Around line 323-327: In deserialize(val: str) in function.py the raised
Exception reports the last loop variable member instead of the invalid input;
change the error to include the provided val (e.g., raise Exception(f"Unknown
architecture type {val}")) so the message shows the actual bad input when
Architecture parsing fails; keep function and Architecture references unchanged.
- Around line 309-312: The mapping Variant._LANG_MAP is missing an entry for C++
which causes Variant.for_language(...) to fail in Runtime.deserialize and
FunctionConfig._from_benchmark; add Language.CPP: Variant.Cpp (or the actual C++
Variant class name used) to Variant._LANG_MAP alongside Language.PYTHON and
Language.NODEJS so Variant.for_language can resolve C++ variants correctly, and
update any other similar maps around Runtime.deserialize and
FunctionConfig._from_benchmark if duplicated.

---

Nitpick comments:
In `@sebs/benchmark.py`:
- Around line 50-51: LanguageSpec.variants currently returns the internal list
self._variants directly, exposing mutable internal state; change the method to
return a shallow copy (e.g., return list(self._variants) or
self._variants.copy()) so callers cannot mutate the LanguageSpec._variants
backing list, and ensure the return type remains List[str] to preserve the
signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 399467d7-c174-4ceb-b76c-b7fdbd23409b

📥 Commits

Reviewing files that changed from the base of the PR and between 2aba9ea and 44b54b4.

📒 Files selected for processing (3)
  • sebs/aws/aws.py
  • sebs/benchmark.py
  • sebs/faas/function.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • sebs/aws/aws.py

@userlaurin userlaurin requested a review from mcopik March 4, 2026 20:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs/benchmark.py (1)

633-635: ⚠️ Potential issue | 🟠 Major

Variant cache hash misses versioned Node package files.

package.json.<version> is used in overlay variants (Line [789]-Line [793]), but the hash input patterns (Line [634], Line [659]) exclude it. Cache entries can stay “valid” after variant dependency changes.

Proposed fix
         FILES = {
             Language.PYTHON: ["*.py", "requirements.txt*"],
-            Language.NODEJS: ["*.js", "package.json"],
+            Language.NODEJS: ["*.js", "package.json*"],
             Language.CPP: ["*.cpp", "*.hpp", "dependencies.json"],
         }

Also applies to: 649-662, 789-793

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/benchmark.py` around lines 633 - 635, The NodeJS hash input patterns
omit versioned package files (e.g., package.json.<version>) so variant cache
hashes miss changes; update the Language.NODEJS pattern lists used for hashing
to include versioned package files by adding "package.json.*" (or
"package.json*") alongside "package.json" wherever the hash input patterns are
defined (references: Language.NODEJS and the lists around the existing ["*.js",
"package.json"] entries and the later overlay/variant handling that uses
package.json.<version>); apply the same change to each occurrence of the NodeJS
pattern lists so variant dependency changes invalidate the cache correctly.
♻️ Duplicate comments (3)
sebs/faas/function.py (3)

567-570: ⚠️ Potential issue | 🔴 Critical

C++ flows break because Variant has no Language.CPP mapping.

Line [607] and Line [678] both rely on Variant.for_language(...); with the current map, any C++ benchmark can fail with ValueError("No variants defined for language ...").

Proposed fix
 class Variant:
@@
     class NodeJS(Enum):
         DEFAULT = "default"
         BUN = "bun"
         LLRT = "llrt"
+
+    class Cpp(Enum):
+        DEFAULT = "default"
@@
 Variant._LANG_MAP: Dict[Language, Type[Enum]] = {
     Language.PYTHON: Variant.Python,
     Language.NODEJS: Variant.NodeJS,
+    Language.CPP: Variant.Cpp,
 }

Also applies to: 605-607, 675-679

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/faas/function.py` around lines 567 - 570, The _LANG_MAP in Variant is
missing an entry for Language.CPP causing Variant.for_language(...) to raise
ValueError for C++ benchmarks; add a mapping for Language.CPP to the appropriate
enum class (e.g., Variant.Cpp or the correct C++ variant Enum) inside
Variant._LANG_MAP so that Variant.for_language(Language.CPP) succeeds and C++
flows on lines referencing Variant.for_language no longer fail.

581-585: ⚠️ Potential issue | 🟡 Minor

Use the invalid input value in architecture deserialization errors.

Line [585] currently reports member instead of the actual invalid val, which makes debugging harder.

Proposed fix
-        raise Exception(f"Unknown architecture type {member}")
+        raise Exception(f"Unknown architecture type {val}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/faas/function.py` around lines 581 - 585, The deserialize function in
sebs/faas/function.py returns an error that references the loop variable member
instead of the invalid input; update the exception in deserialize(val: str) to
include the actual invalid input val (e.g., raise Exception(f"Unknown
architecture type {val}")) so errors correctly report the value that failed to
deserialize; ensure the change is made in the deserialize function and preserves
the existing loop over Architecture members.

26-26: ⚠️ Potential issue | 🟠 Major

Resolve Architecture redefinition (Ruff F811).

Line [26] imports Architecture, then Line [573] redefines it in the same module. This is an avoidable redefinition error and makes the active type ambiguous.

Proposed fix
-from sebs.types import Language, Architecture
+from sebs.types import Language

Also applies to: 573-573

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/faas/function.py` at line 26, The module currently imports Architecture
from sebs.types and also defines a local Architecture (redefinition causing Ruff
F811); pick one source and remove the other: if the file should use the shared
type, delete the local Architecture definition (the one around the later
redefinition) and update any local references to use the imported Architecture;
if the local definition is intentional and distinct, rename that local symbol
(e.g., LocalArchitecture or FunctionArchitecture) and update all references in
this file to that new name so there is no name collision with the imported
Architecture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/build.md`:
- Around line 118-120: Several fenced code blocks in docs/build.md are unlabeled
(MD040); update each triple-backtick block that currently contains examples like
"benchmarks/<id>/<language>/<variant>/", the longer benchmark trees (e.g.,
"benchmarks/100.webapps/110.dynamic-html/..."), the dockerfile paths
("dockerfiles/<system>/<language>/<variant>/Dockerfile.run",
"dockerfiles/local/nodejs/bun/Dockerfile.run", etc.), and the image tag example
("<docker_repository>:run.<system>.<language>.<variant>.<version>-<SeBS_version>")
to include an explicit language after the opening ``` (use text for plain
paths/trees, bash for shell examples, json where appropriate) so each fenced
block is labeled and satisfies markdownlint MD040; locate the blocks by their
content strings and add the appropriate language identifiers.

In `@sebs/benchmark.py`:
- Around line 785-787: FILES is keyed by the Language enum but the loop uses
self.language_name (a string), causing a KeyError for overlay variants; change
the lookup to use the enum value (e.g., self.language or convert the string to
Language) when iterating FILES in the block that copies files from variant_dir
to output_dir so the key types match (ensure you reference FILES,
self.language_name, and Language enum when making the fix).

---

Outside diff comments:
In `@sebs/benchmark.py`:
- Around line 633-635: The NodeJS hash input patterns omit versioned package
files (e.g., package.json.<version>) so variant cache hashes miss changes;
update the Language.NODEJS pattern lists used for hashing to include versioned
package files by adding "package.json.*" (or "package.json*") alongside
"package.json" wherever the hash input patterns are defined (references:
Language.NODEJS and the lists around the existing ["*.js", "package.json"]
entries and the later overlay/variant handling that uses
package.json.<version>); apply the same change to each occurrence of the NodeJS
pattern lists so variant dependency changes invalidate the cache correctly.

---

Duplicate comments:
In `@sebs/faas/function.py`:
- Around line 567-570: The _LANG_MAP in Variant is missing an entry for
Language.CPP causing Variant.for_language(...) to raise ValueError for C++
benchmarks; add a mapping for Language.CPP to the appropriate enum class (e.g.,
Variant.Cpp or the correct C++ variant Enum) inside Variant._LANG_MAP so that
Variant.for_language(Language.CPP) succeeds and C++ flows on lines referencing
Variant.for_language no longer fail.
- Around line 581-585: The deserialize function in sebs/faas/function.py returns
an error that references the loop variable member instead of the invalid input;
update the exception in deserialize(val: str) to include the actual invalid
input val (e.g., raise Exception(f"Unknown architecture type {val}")) so errors
correctly report the value that failed to deserialize; ensure the change is made
in the deserialize function and preserves the existing loop over Architecture
members.
- Line 26: The module currently imports Architecture from sebs.types and also
defines a local Architecture (redefinition causing Ruff F811); pick one source
and remove the other: if the file should use the shared type, delete the local
Architecture definition (the one around the later redefinition) and update any
local references to use the imported Architecture; if the local definition is
intentional and distinct, rename that local symbol (e.g., LocalArchitecture or
FunctionArchitecture) and update all references in this file to that new name so
there is no name collision with the imported Architecture.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e024faa-2760-43e6-8f3b-83acb18a2858

📥 Commits

Reviewing files that changed from the base of the PR and between 44b54b4 and 762ba1d.

📒 Files selected for processing (8)
  • docs/build.md
  • requirements.txt
  • sebs/aws/aws.py
  • sebs/benchmark.py
  • sebs/cache.py
  • sebs/config.py
  • sebs/faas/function.py
  • sebs/local/local.py
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt
🚧 Files skipped from review as they are similar to previous changes (3)
  • sebs/aws/aws.py
  • sebs/config.py
  • sebs/local/local.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
sebs/benchmark.py (1)

799-801: ⚠️ Potential issue | 🔴 Critical

Critical: FILES dict keyed by enum, but indexed with string.

FILES is keyed by Language enum members (Lines 745-750), but Line 799 uses self.language_name (a string like "python") to index it. This will raise KeyError when applying overlay variants.

🐛 Proposed fix
             else:
                 # Overlay-based variant...
-                for file_type in FILES[self.language_name]:
+                for file_type in FILES[self.language]:
                     for f in glob.glob(os.path.join(variant_dir, file_type)):
                         shutil.copy2(f, output_dir)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/benchmark.py` around lines 799 - 801, The loop indexes FILES with a
string (self.language_name) but FILES is keyed by Language enum members; change
the lookup to convert the string to the enum first (e.g. resolve lang_enum =
Language(self.language_name) or Language[self.language_name.upper()] depending
on how Language is defined) and then use FILES[lang_enum] when iterating files
in variant_dir and copying to output_dir so the dict is accessed by the correct
enum key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sebs/benchmark.py`:
- Around line 644-649: The code incorrectly constructs file paths by doing path
= os.path.join(directory, f) even though glob.glob(os.path.join(directory,
file_type)) returns full paths, causing FileNotFoundError; update the loop to
use the glob result directly (use f as the path when opening and hashing) —
i.e., replace the os.path.join(directory, f) usage so that selected_files
iteration uses the full path from glob (refer to selected_files, FILES,
NON_LANG_FILES, path, and hash_sum.update).

---

Duplicate comments:
In `@sebs/benchmark.py`:
- Around line 799-801: The loop indexes FILES with a string (self.language_name)
but FILES is keyed by Language enum members; change the lookup to convert the
string to the enum first (e.g. resolve lang_enum = Language(self.language_name)
or Language[self.language_name.upper()] depending on how Language is defined)
and then use FILES[lang_enum] when iterating files in variant_dir and copying to
output_dir so the dict is accessed by the correct enum key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fc68df97-13dd-4998-809e-abc067f76b2e

📥 Commits

Reviewing files that changed from the base of the PR and between 762ba1d and e66a5c6.

📒 Files selected for processing (9)
  • config/storage.json
  • requirements.txt
  • sebs.py
  • sebs/aws/aws.py
  • sebs/benchmark.py
  • sebs/cache.py
  • sebs/config.py
  • sebs/cpp_dependencies.py
  • sebs/local/local.py
💤 Files with no reviewable changes (2)
  • config/storage.json
  • sebs/cpp_dependencies.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • requirements.txt
  • sebs.py
  • sebs/local/local.py

Comment on lines 644 to +649
selected_files = FILES[language] + NON_LANG_FILES
for file_type in selected_files:
for f in glob.glob(os.path.join(directory, "**", file_type), recursive=True):
if os.path.isfile(f):
path = os.path.join(directory, f)
with open(path, "rb") as opened_file:
hash_sum.update(opened_file.read())
for f in glob.glob(os.path.join(directory, file_type)):
path = os.path.join(directory, f)
with open(path, "rb") as opened_file:
hash_sum.update(opened_file.read())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Incorrect path construction will cause FileNotFoundError.

The glob result f already contains the full path (e.g., "benchmarks/001/python/handler.py"), so joining it again with directory creates an invalid path like "benchmarks/001/python/benchmarks/001/python/handler.py".

🐛 Proposed fix
         for file_type in selected_files:
             for f in glob.glob(os.path.join(directory, file_type)):
-                path = os.path.join(directory, f)
-                with open(path, "rb") as opened_file:
+                with open(f, "rb") as opened_file:
                     hash_sum.update(opened_file.read())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sebs/benchmark.py` around lines 644 - 649, The code incorrectly constructs
file paths by doing path = os.path.join(directory, f) even though
glob.glob(os.path.join(directory, file_type)) returns full paths, causing
FileNotFoundError; update the loop to use the glob result directly (use f as the
path when opening and hashing) — i.e., replace the os.path.join(directory, f)
usage so that selected_files iteration uses the full path from glob (refer to
selected_files, FILES, NON_LANG_FILES, path, and hash_sum.update).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants