Skip to content

Add same-metrics dataset for time series UI test scenarios#7

Open
kpatticha wants to merge 8 commits intosimianhacker:mainfrom
kpatticha:add-same-metric-dataset
Open

Add same-metrics dataset for time series UI test scenarios#7
kpatticha wants to merge 8 commits intosimianhacker:mainfrom
kpatticha:add-same-metric-dataset

Conversation

@kpatticha
Copy link
Contributor

@kpatticha kpatticha commented Feb 26, 2026

Add same-metrics and edge-cases datasets for time series UI test scenarios

Summary

Introduces two new datasets for testing the Metrics experience in Discover:

  1. same-metrics -- 9 scenario pairs (15 data streams) where each pair shares the metric name request_duration but differs along one axis (dimensions, ES metric type, field type, histogram variant, or unit).
  2. edge-cases -- 6 data streams that each undergo a mid-backfill rollover to change their mapping, simulating real-world metric type, field type, or unit updates over time.

Build first

npm run build

Dataset: same-metrics

  • CLI: ./forge --dataset same-metrics [--interval 1m] [--backfill now-1h] [--purge]
  • --count is ignored; all 15 streams are always populated.
# Scenario Stream A Stream B What differs
1 Identical (baseline) same-metric-identical-a same-metric-identical-b Nothing (gauge, double, ms, host.name)
2 Different dimension keys same-metric-different-dims-host-region same-metric-different-dims-host-env dimension_keys (2 vs 3, sharing host.name + region)
3 Different ES metric type same-metric-different-estype-gauge same-metric-different-estype-counter metric type + field type
4 Different ES metric type (histogram) same-metric-different-estype-histogram same-metric-different-estype-gauge (shared) metric type
5 Different field type same-metric-different-fieldtype-long same-metric-different-fieldtype-double field type
6 Different histogram (tdigest vs exponential) same-metric-different-histogram-tdigest same-metric-different-histogram-exponential field type
7 Different histogram (tdigest vs legacy) same-metric-different-histogram-tdigest (shared) same-metric-different-histogram-legacy field type
8 Different unit (null vs ms) same-metric-different-unit-null same-metric-different-unit-ms unit
9 Different unit (ms vs s) same-metric-different-unit-ms (shared) same-metric-different-unit-seconds unit

Shared metric: metric.name = request_duration, field request_duration. Each document includes test.scenario and test.data_stream for filtering in Discover.

Dataset: edge-cases

  • CLI: ./forge --dataset edge-cases [--interval 1m] [--purge]
  • --count and --backfill are ignored; the dataset generates a fixed 10-minute window (5 min per phase).

Each stream starts with one mapping (phase 1), then the index template is updated, the data stream is rolled over, and data continues with the new mapping (phase 2). Each stream uses its stream name as both metric.name and the metric field name.

Stream Phase 1 Phase 2 (after rollover)
edge-case-gauge-to-counter gauge / long counter / double
edge-case-histogram-to-gauge histogram (legacy) gauge / double
edge-case-histogram-to-tdigest histogram (legacy) tdigest
edge-case-tdigest-to-exponential tdigest exponential_histogram
edge-case-unit-null-to-ms unit: null unit: ms
edge-case-unit-ms-to-s unit: ms unit: s

Other changes

  • Extracted shared histogram value builders (buildHistogramValue, buildTdigestValue, buildExponentialHistogramValue) into src/formatters/histogram-helpers.ts, used by both datasets.
  • Made runBackfillStream and runRealTimeStream in BaseSimulator protected with optional start/end params so subclasses can implement multi-phase backfill.

This PR was prepared with AI assistance.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added same-metrics and edge-cases as new --dataset CLI options for data generation
    • Introduced "Same Metrics" dataset with 15 scenario-pair streams for comparative testing
    • Introduced "Edge Cases" dataset with 6 streams featuring mid-backfill metric and field changes across two phases
  • Documentation

    • Updated README with comprehensive usage guides for both new datasets, including example commands and identification guidance

- Introduced a new dataset for generating time series test scenarios with the same metric in two streams, supporting different units and dimensions.
- Implemented the SameMetricsSimulator, SameMetricsMetricsGenerator, and SameMetricsFormatter to handle the new dataset.
- Added configuration and template generation for same metrics data streams.
- Updated README.md to include details about the new dataset and usage examples.
@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

Adds two new datasets ("same-metrics" and "edge-cases") with types, config/metrics generators, formatters, template builders, and simulators; integrates dataset support into the main CLI/initialization and updates README usage/docs.

Changes

Cohort / File(s) Summary
Documentation
README.md
Adds "same-metrics" and "edge-cases" docs, usage examples, CLI --dataset updates, formatting fixes, and example commands.
Main entry & CLI
src/index.ts
Registers new datasets, expands dataset validation, purging logic for multiple datastreams, and branches to instantiate the new simulators; minor CLI/formatting adjustments and tracing span handling.
Same-metrics types
src/types/same-metrics-types.ts
New exported types/interfaces: scenario union, config/metrics/document shapes for same-metrics.
Same-metrics simulators & generators
src/simulators/same-metrics-simulator.ts, src/simulators/same-metrics-config-generator.ts, src/simulators/same-metrics-metrics-generator.ts
New SameMetricsSimulator, config generator, and deterministic metrics generator; simulator sets up ES templates and builds create ops.
Same-metrics template builder
src/simulators/same-metrics-template-builder.ts
Exports STREAM_TEMPLATE_CONFIGS, DATA_STREAM_IDS and buildIndexTemplate(...) to construct time-series index templates and metric mappings.
Same-metrics formatter
src/formatters/same-metrics-formatter.ts
New SameMetricsFormatter converting metrics to documents, applying field typing and optional dimensions.
Edge-cases types
src/types/edge-cases-types.ts
New EdgeCasesPhase, config/metrics/document types for edge-cases dataset.
Edge-cases simulators & generators
src/simulators/edge-cases-simulator.ts, src/simulators/edge-cases-config-generator.ts, src/simulators/edge-cases-metrics-generator.ts
New EdgeCasesSimulator with two-phase setup/backfill/rollover flow, config generator, and midpoint-aware metrics generator.
Edge-cases template builder
src/simulators/edge-cases-template-builder.ts
Defines per-stream two-phase configs, EDGE_CASE_STREAM_IDS, getStreamConfig/getPhaseConfig, and buildIndexTemplate(...) including phase-specific mappings.
Edge-cases formatter
src/formatters/edge-cases-formatter.ts
New EdgeCasesFormatter mapping phase-configured field types to documents and merging dimensions.
Histogram/helpers
src/formatters/histogram-helpers.ts
Adds helpers for numeric histogram, tdigest, and exponential histogram value shapes and formatValueForFieldType(...) dispatcher.
Base simulator visibility
src/simulators/base-simulator.ts
Changed runBackfillStream from private→protected and added optional startTime/endTime params; runRealTimeStream made protected.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI/User
    participant Simulator as Simulator (SameMetrics / EdgeCases)
    participant ConfigGen as ConfigGenerator
    participant MetricsGen as MetricsGenerator
    participant Formatter as Formatter
    participant ES as Elasticsearch

    CLI->>Simulator: start(dataset, options)
    Simulator->>ConfigGen: generateConfig(entityId)
    ConfigGen-->>Simulator: Config
    loop per-timestamp
      Simulator->>MetricsGen: generateMetrics(config, timestamp)
      MetricsGen-->>Simulator: Metrics
      Simulator->>Formatter: formatMetrics(metrics)
      Formatter-->>Simulator: FormatterResult[docs]
      Simulator->>ES: setupElasticsearchTemplates() / putIndexTemplate
      ES-->>Simulator: ack
      Simulator->>ES: bulk index documents
      ES-->>Simulator: ack
    end
    Simulator-->>CLI: stop/complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through configs, templates, and streams,
I counted each minute and chased sleepy dreams,
I formatted values with histogram art,
Two datasets now dance — I stitched them with heart,
Hop, hop — same-metrics and edge-cases depart!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new 'same-metrics' dataset for time series UI testing, which is the primary focus of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/types/same-metrics-types.ts (1)

11-15: Consider making dimension fields optional for flexibility.

As noted in the config generator, SameMetricsDimensions is used with partial data (e.g., only host.name for some streams). Making these fields optional would align with actual usage:

 export interface SameMetricsDimensions {
-  'host.name': string;
-  region: string;
-  environment: string;
+  'host.name'?: string;
+  region?: string;
+  environment?: string;
 }

This mirrors the optional fields already defined in SameMetricsDocument.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/same-metrics-types.ts` around lines 11 - 15, Make the three
properties on the SameMetricsDimensions interface optional to match actual usage
and the existing SameMetricsDocument shape: update the SameMetricsDimensions
declaration so 'host.name', region, and environment are optional (e.g., mark
them as optional properties) to allow partial dimension objects when only some
fields (like host.name) are present.
src/formatters/same-metrics-formatter.ts (1)

18-22: Type-safe dimension assignment possible.

Since SameMetricsDimensions has fixed keys (host.name, region, environment) that match the optional fields in SameMetricsDocument, you could avoid the double cast by spreading or directly assigning:

-    if (metrics.dimensions) {
-      for (const [key, value] of Object.entries(metrics.dimensions)) {
-        (doc as unknown as Record<string, unknown>)[key] = value;
-      }
-    }
+    if (metrics.dimensions) {
+      Object.assign(doc, metrics.dimensions);
+    }

The current implementation works correctly, so this is purely a readability refinement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/formatters/same-metrics-formatter.ts` around lines 18 - 22, The loop that
assigns metrics.dimensions into the document uses a double-cast and can be
simplified for type-safety and readability: replace the manual for-loop that
iterates over metrics.dimensions (in same-metrics-formatter.ts) with a direct,
type-safe assignment or spread so SameMetricsDimensions (keys: "host.name",
"region", "environment") are merged into the SameMetricsDocument without
casting; specifically, merge metrics.dimensions into the existing doc object
(e.g., Object.assign(doc, metrics.dimensions) or doc = { ...doc,
...metrics.dimensions }) ensuring types align with
SameMetricsDocument/SameMetricsDimensions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/simulators/same-metrics-config-generator.ts`:
- Around line 25-31: The function buildDimensionsFromNames constructs a partial
dimensions object but unsafely casts it to SameMetricsDimensions; change the
types or the casting so callers can't assume all fields exist: either update
SameMetricsDimensions to make fields optional (host.name, region, environment)
and adjust SameMetricsConfig.dimensions accordingly, or keep
SameMetricsDimensions strict and change buildDimensionsFromNames to return
Partial<SameMetricsDimensions> (and update any consumer types/usages). Locate
buildDimensionsFromNames, DEFAULT_DIMENSION_VALUES, and
SameMetricsConfig/SameMetricsDimensions in the types/diff and ensure the
returned type and downstream type annotations are updated consistently to avoid
the unsafe cast.

In `@src/simulators/same-metrics-simulator.ts`:
- Around line 104-107: Replace the invalid "body" parameter passed to the
Elasticsearch client's indices.putIndexTemplate calls with the flattened
"template" parameter expected by `@elastic/elasticsearch` v9.0.2; locate the
putIndexTemplate invocations in same-metrics-simulator (the call using
dataStream and template), and make the argument { name: dataStream, template:
template } instead of { name: dataStream, body: template }, and apply the same
change to the identical putIndexTemplate calls in weather-simulator.ts and
histograms-simulator.ts so all simulators use the "template" parameter.

---

Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 18-22: The loop that assigns metrics.dimensions into the document
uses a double-cast and can be simplified for type-safety and readability:
replace the manual for-loop that iterates over metrics.dimensions (in
same-metrics-formatter.ts) with a direct, type-safe assignment or spread so
SameMetricsDimensions (keys: "host.name", "region", "environment") are merged
into the SameMetricsDocument without casting; specifically, merge
metrics.dimensions into the existing doc object (e.g., Object.assign(doc,
metrics.dimensions) or doc = { ...doc, ...metrics.dimensions }) ensuring types
align with SameMetricsDocument/SameMetricsDimensions.

In `@src/types/same-metrics-types.ts`:
- Around line 11-15: Make the three properties on the SameMetricsDimensions
interface optional to match actual usage and the existing SameMetricsDocument
shape: update the SameMetricsDimensions declaration so 'host.name', region, and
environment are optional (e.g., mark them as optional properties) to allow
partial dimension objects when only some fields (like host.name) are present.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bd1080 and 33df8e4.

📒 Files selected for processing (8)
  • README.md
  • src/formatters/same-metrics-formatter.ts
  • src/index.ts
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-metrics-generator.ts
  • src/simulators/same-metrics-simulator.ts
  • src/simulators/same-metrics-template-builder.ts
  • src/types/same-metrics-types.ts

Comment on lines +104 to +107
await this.elasticsearchClient.indices.putIndexTemplate({
name: dataStream,
body: template
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat package.json | jq '.dependencies["@elastic/elasticsearch"], .devDependencies["@elastic/elasticsearch"]'

Repository: simianhacker/simian-forge

Length of output: 82


🏁 Script executed:

rg -n "putIndexTemplate" --type ts -C 3

Repository: simianhacker/simian-forge

Length of output: 1642


🌐 Web query:

@elastic/elasticsearch 9.0.2 putIndexTemplate API documentation

💡 Result:

For @elastic/elasticsearch v9.0.2, the Node.js client method is:

  • client.indices.putIndexTemplate({ name, index_patterns, composed_of, template, data_stream, priority, version, _meta, ... }) (creates/updates a composable index template). [1]

Official docs:

  • JavaScript client API referenceclient.indices.putIndexTemplate (arguments + behavior). [1]
  • Elasticsearch v9 REST API → “Create or update an index template” (PUT /_index_template/{name}) with request/response examples. [2]

Example (Node.js):

await client.indices.putIndexTemplate({
  name: 'template_1',
  index_patterns: ['template*'],
  priority: 1,
  template: { settings: { number_of_shards: 2 } }
})

[1][2]


Use template parameter instead of body for v9.0.2 client compatibility.

The @elastic/elasticsearch v9.0.2 client uses flattened parameters. The body parameter is not valid; use template instead:

await this.elasticsearchClient.indices.putIndexTemplate({
  name: dataStream,
  template: template
});

This same issue appears in weather-simulator.ts (lines 82-85) and histograms-simulator.ts (lines 104-107) and should be fixed consistently across all simulators.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/same-metrics-simulator.ts` around lines 104 - 107, Replace the
invalid "body" parameter passed to the Elasticsearch client's
indices.putIndexTemplate calls with the flattened "template" parameter expected
by `@elastic/elasticsearch` v9.0.2; locate the putIndexTemplate invocations in
same-metrics-simulator (the call using dataStream and template), and make the
argument { name: dataStream, template: template } instead of { name: dataStream,
body: template }, and apply the same change to the identical putIndexTemplate
calls in weather-simulator.ts and histograms-simulator.ts so all simulators use
the "template" parameter.

@kpatticha kpatticha marked this pull request as draft March 10, 2026 15:26
@kpatticha
Copy link
Contributor Author

Moving it back to draft so I can refine the data and create a more realistic environment (ex now I just generate total_cost for all indices :D )

- Introduced a new dataset for generating time series data with edge cases, simulating real-world scenarios where metric types, field types, or units change mid-backfill.
- Implemented EdgeCasesSimulator, EdgeCasesMetricsGenerator, and EdgeCasesFormatter to manage the new dataset.
- Added configuration and template generation for edge case data streams.
- Updated README.md to include details about the new edge cases dataset and usage examples.
@kpatticha kpatticha marked this pull request as ready for review March 26, 2026 16:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/formatters/same-metrics-formatter.ts (1)

15-16: Inconsistent error handling: formatter silently defaults while template builder throws.

When metrics.dataStream is not found in STREAM_TEMPLATE_CONFIGS, this code silently falls back to "double". However, buildIndexTemplate in same-metrics-template-builder.ts (lines 145-147) throws an error for unknown data streams. This asymmetry could mask bugs if a new stream is added to the config but not properly exported.

Consider making the formatter fail explicitly for unknown streams:

♻️ Proposed fix for consistency
   formatMetrics(
     metrics: SameMetricsMetrics,
   ): FormatterResult<SameMetricsDocument>[] {
     const streamConfig = STREAM_TEMPLATE_CONFIGS[metrics.dataStream];
-    const fieldType = streamConfig?.metricFieldType ?? "double";
+    if (!streamConfig) {
+      throw new Error(`Unknown same-metrics data stream: ${metrics.dataStream}`);
+    }
+    const fieldType = streamConfig.metricFieldType;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/formatters/same-metrics-formatter.ts` around lines 15 - 16, The formatter
currently defaults to "double" when STREAM_TEMPLATE_CONFIGS[metrics.dataStream]
is missing, creating an asymmetry with buildIndexTemplate which throws for
unknown streams; update the formatter to check streamConfig (from
STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and throw a clear Error if
undefined (including the metrics.dataStream value) instead of silently using the
fallback, then derive fieldType from streamConfig.metricFieldType only after the
existence check; reference STREAM_TEMPLATE_CONFIGS, metrics.dataStream,
streamConfig, fieldType and the buildIndexTemplate behavior to keep the handling
consistent.
src/simulators/edge-cases-config-generator.ts (1)

5-15: Silent fallback to empty string may cause issues if dimensions expand.

Currently SHARED_DIMENSIONS only contains "host.name", which is covered by DEFAULT_DIMENSION_VALUES. However, if SHARED_DIMENSIONS is later extended with additional dimension names not present in DEFAULT_DIMENSION_VALUES, they will silently receive empty string values, which could produce invalid or confusing test data.

Consider logging a warning or throwing an error for unmapped dimensions:

♻️ Proposed defensive check
 function buildDimensionsFromNames(dimNames: string[]): EdgeCasesDimensions {
   const result: Record<string, string> = {};
   for (const name of dimNames) {
+    if (!(name in DEFAULT_DIMENSION_VALUES)) {
+      console.warn(`No default value for dimension "${name}", using empty string`);
+    }
     result[name] = DEFAULT_DIMENSION_VALUES[name] ?? "";
   }
   return result;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-config-generator.ts` around lines 5 - 15, The
buildDimensionsFromNames function silently uses "" for any dimNames not in
DEFAULT_DIMENSION_VALUES; update it to detect unmapped names (using
DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear warning so
missing defaults don't produce invalid test data; modify
buildDimensionsFromNames to iterate dimNames, and for each name if
DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw
(choose consistent behavior for the module) with a message referencing the
missing dimension name and SHARED_DIMENSIONS context, otherwise assign the
default value from DEFAULT_DIMENSION_VALUES.
src/simulators/edge-cases-template-builder.ts (1)

131-186: Consider whether test.scenario field is needed for parity with same-metrics.

The same-metrics formatter includes both test.scenario and test.data_stream fields (as documented in README Line 339), but this template only defines test.data_stream. While the README (Line 368) indicates metric.name and test.data_stream are sufficient for filtering edge cases, you may want to confirm this asymmetry is intentional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-template-builder.ts` around lines 131 - 186, The
index template built by buildIndexTemplate currently defines "test.data_stream"
in the mappings but omits "test.scenario", causing asymmetry with the
same-metrics formatter; add a "test.scenario" property to the mappings
(properties object) with the same mapping as "test.data_stream" (e.g., { type:
"keyword" }) so edge-case documents have parity with same-metrics, and ensure
any consumers filtering on test.scenario will work consistently with the
existing metric.name and test.data_stream fields.
src/simulators/edge-cases-simulator.ts (1)

32-37: Unsafe double cast through unknown bypasses type checking.

The cast this.metricsGenerator as unknown as EdgeCasesMetricsGenerator loses compile-time safety. If the base class or createMetricsGenerator() changes, this would silently break.

Consider storing the generator instance before calling super() or adding a type guard.

Proposed refactor
 export class EdgeCasesSimulator extends BaseSimulator<
   EdgeCasesConfig,
   EdgeCasesMetrics,
   EdgeCasesDocument
 > {
   private formatter: EdgeCasesFormatter;
-  private edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator;
+  private readonly edgeCasesMetricsGenerator: EdgeCasesMetricsGenerator;

   constructor(options: BaseSimulatorOptions) {
+    const metricsGen = new EdgeCasesMetricsGenerator();
     super(options);
     this.formatter = new EdgeCasesFormatter();
-    this.edgeCasesMetricsGenerator =
-      this.metricsGenerator as unknown as EdgeCasesMetricsGenerator;
+    this.edgeCasesMetricsGenerator = metricsGen;
   }

+  protected createMetricsGenerator(): MetricsGenerator<
+    EdgeCasesConfig,
+    EdgeCasesMetrics
+  > {
+    // Return the instance we already created, or create a new one if called before constructor completes
+    return this.edgeCasesMetricsGenerator ?? new EdgeCasesMetricsGenerator();
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-simulator.ts` around lines 32 - 37, The constructor
is using an unsafe double-cast this.metricsGenerator as unknown as
EdgeCasesMetricsGenerator which bypasses type checking; fix by ensuring a
properly typed metrics generator is created or validated before assignment:
either accept a typed generator in options or call/create the generator (via
createMetricsGenerator or similar) and store it in a local variable, perform a
runtime type guard/assertion that it implements EdgeCasesMetricsGenerator, then
assign that validated instance to this.edgeCasesMetricsGenerator; update the
constructor (and any factory that produces metricsGenerator) to avoid the
unknown cast and use a safe typed conversion or guard.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Line 195: Update the --dataset help text to include the missing "edge-cases"
option so the list matches the rest of the README and the code; specifically
edit the help string that currently ends with "same-metrics (default:
\"hosts\")" to read something like "same-metrics, edge-cases (default:
\"hosts\")" so the flag documentation includes the edge-cases dataset.

In `@src/formatters/histogram-helpers.ts`:
- Around line 27-46: The exponential histogram builder
buildExponentialHistogramValue computes base as Math.abs(value % 1000) which can
be 0 for value === 0 or multiples of 1000; change the base calculation to
include the same fallback used in buildHistogramValue/buildTdigestValue (e.g.,
use Math.abs(value % 1000) || 1) so sum, min, and max are never zero and the
histogram isn't degenerate.

In `@src/simulators/edge-cases-metrics-generator.ts`:
- Around line 9-13: counterValueForStream currently always reads
getStreamConfig(dataStream).phase1.baseValue so phase2 metrics will be wrong if
base values differ; change counterValueForStream to accept a phase parameter
(e.g., phase: 'phase1' | 'phase2') and obtain the base value via
getPhaseConfig(dataStream, phase).baseValue (or similar helper) instead of
hardcoding phase1, and then update the call site(s) in generateMetrics to pass
the current phase into counterValueForStream so the correct phase-specific
baseValue is used.

In `@src/types/same-metrics-types.ts`:
- Around line 34-41: SameMetricsDocument's request_duration is typed too
narrowly as number; update its type to allow the histogram/tdigest shapes
returned by formatValueForFieldType used by SameMetricsFormatter (e.g. change
request_duration to a union like number | Histogram | TDigest or a more
permissive unknown/object union) so callers of SameMetricsFormatter and any
consumers of SameMetricsDocument accept histogram objects returned by
formatValueForFieldType.

---

Nitpick comments:
In `@src/formatters/same-metrics-formatter.ts`:
- Around line 15-16: The formatter currently defaults to "double" when
STREAM_TEMPLATE_CONFIGS[metrics.dataStream] is missing, creating an asymmetry
with buildIndexTemplate which throws for unknown streams; update the formatter
to check streamConfig (from STREAM_TEMPLATE_CONFIGS and metrics.dataStream) and
throw a clear Error if undefined (including the metrics.dataStream value)
instead of silently using the fallback, then derive fieldType from
streamConfig.metricFieldType only after the existence check; reference
STREAM_TEMPLATE_CONFIGS, metrics.dataStream, streamConfig, fieldType and the
buildIndexTemplate behavior to keep the handling consistent.

In `@src/simulators/edge-cases-config-generator.ts`:
- Around line 5-15: The buildDimensionsFromNames function silently uses "" for
any dimNames not in DEFAULT_DIMENSION_VALUES; update it to detect unmapped names
(using DEFAULT_DIMENSION_VALUES) and either throw an Error or log a clear
warning so missing defaults don't produce invalid test data; modify
buildDimensionsFromNames to iterate dimNames, and for each name if
DEFAULT_DIMENSION_VALUES[name] is undefined then call console.warn or throw
(choose consistent behavior for the module) with a message referencing the
missing dimension name and SHARED_DIMENSIONS context, otherwise assign the
default value from DEFAULT_DIMENSION_VALUES.

In `@src/simulators/edge-cases-simulator.ts`:
- Around line 32-37: The constructor is using an unsafe double-cast
this.metricsGenerator as unknown as EdgeCasesMetricsGenerator which bypasses
type checking; fix by ensuring a properly typed metrics generator is created or
validated before assignment: either accept a typed generator in options or
call/create the generator (via createMetricsGenerator or similar) and store it
in a local variable, perform a runtime type guard/assertion that it implements
EdgeCasesMetricsGenerator, then assign that validated instance to
this.edgeCasesMetricsGenerator; update the constructor (and any factory that
produces metricsGenerator) to avoid the unknown cast and use a safe typed
conversion or guard.

In `@src/simulators/edge-cases-template-builder.ts`:
- Around line 131-186: The index template built by buildIndexTemplate currently
defines "test.data_stream" in the mappings but omits "test.scenario", causing
asymmetry with the same-metrics formatter; add a "test.scenario" property to the
mappings (properties object) with the same mapping as "test.data_stream" (e.g.,
{ type: "keyword" }) so edge-case documents have parity with same-metrics, and
ensure any consumers filtering on test.scenario will work consistently with the
existing metric.name and test.data_stream fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06cd9dd4-dbdf-45dc-8925-1e1dceff9c07

📥 Commits

Reviewing files that changed from the base of the PR and between 33df8e4 and cbfeedd.

📒 Files selected for processing (16)
  • README.md
  • src/formatters/edge-cases-formatter.ts
  • src/formatters/histogram-helpers.ts
  • src/formatters/same-metrics-formatter.ts
  • src/index.ts
  • src/simulators/base-simulator.ts
  • src/simulators/edge-cases-config-generator.ts
  • src/simulators/edge-cases-metrics-generator.ts
  • src/simulators/edge-cases-simulator.ts
  • src/simulators/edge-cases-template-builder.ts
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-metrics-generator.ts
  • src/simulators/same-metrics-simulator.ts
  • src/simulators/same-metrics-template-builder.ts
  • src/types/edge-cases-types.ts
  • src/types/same-metrics-types.ts
✅ Files skipped from review due to trivial changes (1)
  • src/types/edge-cases-types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/simulators/same-metrics-config-generator.ts
  • src/simulators/same-metrics-simulator.ts

--backfill <value> How far back to backfill data (e.g., now-1h) (default: "now-5m")
--count <number> Number of entities to generate (default: "10")
--dataset <name> Name of the dataset: hosts, weather, unique-metrics, histograms (default: "hosts")
--dataset <name> Name of the dataset: hosts, weather, unique-metrics, histograms, same-metrics (default: "hosts")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The --dataset help text is missing edge-cases.

Line 195 shows same-metrics but omits edge-cases from the dataset list, though it's documented elsewhere in the README and supported by the code.

📝 Proposed fix
-  --dataset <name>                Name of the dataset: hosts, weather, unique-metrics, histograms, same-metrics (default: "hosts")
+  --dataset <name>                Name of the dataset: hosts, weather, unique-metrics, histograms, same-metrics, edge-cases (default: "hosts")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--dataset <name> Name of the dataset: hosts, weather, unique-metrics, histograms, same-metrics (default: "hosts")
--dataset <name> Name of the dataset: hosts, weather, unique-metrics, histograms, same-metrics, edge-cases (default: "hosts")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 195, Update the --dataset help text to include the missing
"edge-cases" option so the list matches the rest of the README and the code;
specifically edit the help string that currently ends with "same-metrics
(default: \"hosts\")" to read something like "same-metrics, edge-cases (default:
\"hosts\")" so the flag documentation includes the edge-cases dataset.

Comment on lines +27 to +46
export function buildExponentialHistogramValue(
value: number,
): Record<string, unknown> {
const base = Math.abs(value % 1000);
return {
scale: 3,
sum: base * 10,
min: base * 0.5,
max: base * 1.5,
zero: { threshold: 0, count: 0 },
positive: {
indices: [0, 1, 2, 3, 4],
counts: [1, 2, 4, 2, 1],
},
negative: {
indices: [],
counts: [],
},
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing || 1 fallback causes degenerate histogram when value is 0 or multiple of 1000.

Line 30 uses Math.abs(value % 1000) without the || 1 fallback present in buildHistogramValue (line 9) and buildTdigestValue (line 20). When value is 0 or any multiple of 1000, base becomes 0, resulting in min: 0, max: 0, sum: 0.

Proposed fix
 export function buildExponentialHistogramValue(
   value: number,
 ): Record<string, unknown> {
-  const base = Math.abs(value % 1000);
+  const base = Math.abs(value % 1000) || 1;
   return {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function buildExponentialHistogramValue(
value: number,
): Record<string, unknown> {
const base = Math.abs(value % 1000);
return {
scale: 3,
sum: base * 10,
min: base * 0.5,
max: base * 1.5,
zero: { threshold: 0, count: 0 },
positive: {
indices: [0, 1, 2, 3, 4],
counts: [1, 2, 4, 2, 1],
},
negative: {
indices: [],
counts: [],
},
};
}
export function buildExponentialHistogramValue(
value: number,
): Record<string, unknown> {
const base = Math.abs(value % 1000) || 1;
return {
scale: 3,
sum: base * 10,
min: base * 0.5,
max: base * 1.5,
zero: { threshold: 0, count: 0 },
positive: {
indices: [0, 1, 2, 3, 4],
counts: [1, 2, 4, 2, 1],
},
negative: {
indices: [],
counts: [],
},
};
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/formatters/histogram-helpers.ts` around lines 27 - 46, The exponential
histogram builder buildExponentialHistogramValue computes base as Math.abs(value
% 1000) which can be 0 for value === 0 or multiples of 1000; change the base
calculation to include the same fallback used in
buildHistogramValue/buildTdigestValue (e.g., use Math.abs(value % 1000) || 1) so
sum, min, and max are never zero and the histogram isn't degenerate.

Comment on lines +9 to +13
function counterValueForStream(dataStream: string, timestamp: Date): number {
const base = getStreamConfig(dataStream).phase1.baseValue;
const minutes = Math.floor(timestamp.getTime() / (60 * 1000));
return base + minutes * 5;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

counterValueForStream always uses phase1.baseValue, ignoring the current phase.

The function reads phase1.baseValue unconditionally, but the configuration defines separate baseValue entries for both phases. While all current configurations have identical phase1/phase2 base values, this creates an inconsistency: if base values ever diverge, phase 2 metrics will incorrectly use phase 1's base value.

Consider accepting the phase as a parameter and using getPhaseConfig:

🐛 Proposed fix to use phase-specific baseValue
-function counterValueForStream(dataStream: string, timestamp: Date): number {
-  const base = getStreamConfig(dataStream).phase1.baseValue;
+import { getPhaseConfig } from "./edge-cases-template-builder";
+
+function counterValueForStream(dataStream: string, timestamp: Date, phase: EdgeCasesPhase): number {
+  const base = getPhaseConfig(dataStream, phase).baseValue;
   const minutes = Math.floor(timestamp.getTime() / (60 * 1000));
   return base + minutes * 5;
 }

Then update the call site in generateMetrics:

-      counterValue: counterValueForStream(config.dataStream, timestamp),
+      counterValue: counterValueForStream(config.dataStream, timestamp, phase),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/simulators/edge-cases-metrics-generator.ts` around lines 9 - 13,
counterValueForStream currently always reads
getStreamConfig(dataStream).phase1.baseValue so phase2 metrics will be wrong if
base values differ; change counterValueForStream to accept a phase parameter
(e.g., phase: 'phase1' | 'phase2') and obtain the base value via
getPhaseConfig(dataStream, phase).baseValue (or similar helper) instead of
hardcoding phase1, and then update the call site(s) in generateMetrics to pass
the current phase into counterValueForStream so the correct phase-specific
baseValue is used.

Comment on lines +34 to +41
export interface SameMetricsDocument {
"@timestamp": string;
"metric.name": string;
request_duration: number;
"test.scenario": string;
"test.data_stream": string;
[key: string]: unknown;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type mismatch: request_duration can be a histogram object, not just a number.

The SameMetricsFormatter uses formatValueForFieldType() which returns histogram/tdigest objects for non-numeric field types. The current type request_duration: number is too narrow.

Proposed fix
 export interface SameMetricsDocument {
   "@timestamp": string;
   "metric.name": string;
-  request_duration: number;
+  request_duration: unknown;
   "test.scenario": string;
   "test.data_stream": string;
   [key: string]: unknown;
 }

Alternatively, define a union type for the histogram shapes if stricter typing is desired.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export interface SameMetricsDocument {
"@timestamp": string;
"metric.name": string;
request_duration: number;
"test.scenario": string;
"test.data_stream": string;
[key: string]: unknown;
}
export interface SameMetricsDocument {
"@timestamp": string;
"metric.name": string;
request_duration: unknown;
"test.scenario": string;
"test.data_stream": string;
[key: string]: unknown;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types/same-metrics-types.ts` around lines 34 - 41, SameMetricsDocument's
request_duration is typed too narrowly as number; update its type to allow the
histogram/tdigest shapes returned by formatValueForFieldType used by
SameMetricsFormatter (e.g. change request_duration to a union like number |
Histogram | TDigest or a more permissive unknown/object union) so callers of
SameMetricsFormatter and any consumers of SameMetricsDocument accept histogram
objects returned by formatValueForFieldType.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant