Skip to content

[Story] Cost Explorer (CE) AWS-accuracy audit: bring services/ce to parity #1767

@agbishop

Description

@agbishop

Tracks gaps between services/ce (AWS Cost Explorer) and real AWS. Most query and recommendation operations are stubs that return empty payloads — making the service unable to power realistic UI/SDK exercises. This issue inventories what is missing.


1. GetCostAndUsage Returns Empty ResultsByTime

AWS: Bucketizes cost/usage records into TimePeriods at HOURLY/DAILY/MONTHLY granularity, applies Filter expression, optionally groups by 1-2 GroupBy keys; each ResultsByTime entry contains Total{metric:{Amount,Unit}} and per-group Groups[].Keys + Metrics.
Current: handler.go:870-878 returns {ResultsByTime:[], DimensionValueAttributes:[]} regardless of input.
Plan: Build a synthetic cost ledger (per service/region/usage-type/account/linked-account); aggregate over TimePeriod; honor Granularity, Metrics, GroupBy, Filter; emit ResultsByTime with proper Estimated flag for current period.

2. No Synthetic Cost Data Generator

AWS: Real CE has billions of CUR line items per account.
Current: Backend only stores costCategories, anomalyMonitors, anomalySubscriptions, anomalies (backend.go:120-129); no cost ledger at all.
Plan: Add CostLedger keyed by (date, service, region, usageType, account, tags); seed with realistic distributions (EC2 ~40%, S3 ~15%, RDS ~10%, etc.); expose admin API to inject custom line items; persist via Snapshot/Restore.

3. Filter Expression Not Implemented

AWS: Expression { And, Or, Not, Dimensions{Key, Values, MatchOptions}, Tags{Key, Values, MatchOptions}, CostCategories{Key, Values, MatchOptions} } — recursive, used in GetCostAndUsage, GetCostForecast, GetReservation*, anomaly monitor specs.
Current: Only getCostAndUsageWithResources has Filter any field; no evaluator anywhere.
Plan: Implement an Expression type + recursive evaluator; share between cost queries, forecasts, anomaly MonitorSpecification, cost category Rules, anomaly subscription ThresholdExpression.

4. GroupBy Ignored

AWS: Up to 2 GroupBy[{Type:DIMENSION|TAG|COST_CATEGORY, Key}] produce nested grouped rows.
Current: getCostAndUsageInput (handler.go:859-863) lacks GroupBy entirely.
Plan: Add GroupBy to input; emit nested Groups[] with Keys array; validate max 2 groupings.

5. Metrics Validation Missing

AWS: Valid metrics: BlendedCost, UnblendedCost, AmortizedCost, NetAmortizedCost, NetUnblendedCost, UsageQuantity, NormalizedUsageAmount. Different metrics yield different Units.
Current: Metrics []string accepted blindly; no per-metric Unit logic.
Plan: Validate metric names; produce per-metric Amount/Unit (USD, Hrs, GB-Mo); reject UsageQuantity with GroupBy:USAGE_TYPE mismatches as AWS does.

6. GetCostForecast Returns Hardcoded "0"

AWS: Runs ETS-style forecast on historical cost; returns MeanValue plus PredictionIntervalLowerBound/UpperBound keyed by the chosen PredictionIntervalLevel (51-99). Per-time results in ForecastResultsByTime.
Current: handler.go:1019-1027 returns MeanValue:"0", empty list.
Plan: Run rolling-window mean + std-dev over the synthetic ledger; emit per-bucket forecast with confidence bounds derived from PredictionIntervalLevel (default 80).

7. GetUsageForecast Stubbed Identically

Current: handler.go:1041-1049 mirrors GetCostForecast with hardcoded zeros.
Plan: Same algorithm against UsageQuantity / NormalizedUsageAmount metrics with valid Units (Hrs, GB-Mo).

8. PredictionIntervalLevel Not Honored

AWS: 51-99 inclusive controls forecast confidence band width.
Current: Field absent from getCostForecastInput (handler.go:1000-1004).
Plan: Accept and validate (51-99); narrow/widen bounds via inverse normal CDF.

9. GetDimensionValues Returns Empty

AWS: Dimensions: AZ, INSTANCE_TYPE, LINKED_ACCOUNT, OPERATION, PURCHASE_TYPE, REGION, SERVICE, USAGE_TYPE, USAGE_TYPE_GROUP, RECORD_TYPE, OPERATING_SYSTEM, TENANCY, SCOPE, PLATFORM, SUBSCRIPTION_ID, LEGAL_ENTITY_NAME, DEPLOYMENT_OPTION, DATABASE_ENGINE, CACHE_ENGINE, INSTANCE_TYPE_FAMILY, BILLING_ENTITY, RESERVATION_ID, RESOURCE_ID, RIGHTSIZING_TYPE, SAVINGS_PLANS_TYPE, SAVINGS_PLAN_ARN, PAYMENT_OPTION, AGREEMENT_END_DATE_TIME_AFTER, AGREEMENT_END_DATE_TIME_BEFORE, INVOICING_ENTITY, ANOMALY_TOTAL_IMPACT_ABSOLUTE, ANOMALY_TOTAL_IMPACT_PERCENTAGE.
Current: handler.go:893-902 always returns [].
Plan: Maintain a per-dimension value catalog populated from the synthetic ledger; honor SearchString, Context (COST_AND_USAGE | RESERVATIONS | SAVINGS_PLANS), pagination via NextPageToken.

10. GetTags Returns Empty

AWS: Returns distinct tag keys (when TagKey empty) or values for a key, scoped to TimePeriod. Honors SortBy, SearchString, Filter.
Current: handler.go:917-926 returns [].
Plan: Track tag keys/values from the cost ledger and from cost-allocation tags; implement search/sort/pagination.

11. CostCategory Rule Expression Missing

AWS: Rule { Value, Rule:Expression, InheritedValue, Type:REGULAR|INHERITED_VALUE }. The Expression carries Dimensions/Tags/CostCategories AND/OR/NOT semantics.
Current: costCategoryRule is {Value string} only (handler.go:379-381); no Rule expression, no Type, no InheritedValue.
Plan: Extend CostCategoryRule with Type, Rule Expression, InheritedValue{DimensionName, DimensionKey}; evaluate during cost queries to attribute spend to category values.

12. SplitChargeRules Parameters Missing

AWS: SplitChargeRule { Source, Targets, Method:FIXED|PROPORTIONAL|EVEN, Parameters:[{Type:ALLOCATION_PERCENTAGES, Values}] }.
Current: splitChargeRule lacks Parameters and method validation (handler.go:383-387).
Plan: Add Parameters; validate Method enum and Parameters requirement (FIXED requires ALLOCATION_PERCENTAGES totalling 100); apply during attribution.

13. RuleVersion Not Validated

AWS: Currently only CostCategoryExpression.v1; required field.
Current: Accepted blindly in CreateCostCategoryDefinition (backend.go:210-246).
Plan: Reject unknown values with InvalidParameterException.

14. EffectiveStart Format and Semantics Wrong

AWS: Truncated to first of month (YYYY-MM-01T00:00:00Z) but caller may pass past or current-month value; effective end on Delete is set to start of next month.
Current: effectiveStart() (backend.go:203-207) always uses today's month; ignores caller-provided EffectiveStart; Delete effectiveEnd uses same value (handler.go:445).
Plan: Honor caller's EffectiveStart (validate first-of-month); on Delete compute effective end as next-month-start.

15. CostCategory History (Effective Periods) Missing

AWS: Updates create new effective period; Describe with EffectiveOn returns the rule set valid at that date.
Current: Update overwrites in place; EffectiveOn field accepted but ignored (handler.go:449-452).
Plan: Store an effective-history list per category; DescribeCostCategoryDefinition resolves by EffectiveOn.

16. AnomalyMonitor MonitorSpecification Missing

AWS: CUSTOM monitors require MonitorSpecification (Expression). DIMENSIONAL requires MonitorDimension=SERVICE.
Current: AnomalyMonitor (backend.go:77-84) lacks MonitorSpecification; no enforcement of MonitorDimension=SERVICE for DIMENSIONAL or MonitorSpecification for CUSTOM.
Plan: Add MonitorSpecification; validate type-specific requirements; reject mismatched combinations.

17. AnomalySubscription ThresholdExpression Not Supported

AWS: Threshold is deprecated; modern subs use ThresholdExpression (Expression on ANOMALY_TOTAL_IMPACT_ABSOLUTE / ANOMALY_TOTAL_IMPACT_PERCENTAGE).
Current: Only flat Threshold float64 (backend.go:88-96); UI sends ThresholdExpression but backend ignores it.
Plan: Add ThresholdExpression Expression; require one of {Threshold, ThresholdExpression}; emit deprecation note when Threshold used.

18. Anomaly Detection Not Computed

AWS: ML model continuously evaluates services per monitor; emits anomalies with score 0-1 and RootCauses[]{Service, Region, LinkedAccount, UsageType}.
Current: Anomalies only inserted via test helper AddAnomaly (backend.go:716-730); no detector.
Plan: Background detector that scans synthetic ledger per active monitor (sliding window mean + std-dev); emit Anomaly with score + RootCauses; respect monitor MonitorSpecification.

19. Anomaly RootCauses Field Missing

AWS: RootCauses[]{Service, Region, LinkedAccount, LinkedAccountName, UsageType} returned from GetAnomalies.
Current: Anomaly struct (backend.go:99-110) has no RootCauses; anomalySummary (handler.go:1067-1077) omits them.
Plan: Add field; populate from detector; serialize in GetAnomalies.

20. AnomalyScore Should Be {MaxScore, CurrentScore}

AWS: AnomalyScore { MaxScore, CurrentScore } object, not scalar; UI uses a.AnomalyScore?.MaxScore (see +page.svelte:672) which currently breaks.
Current: AnomalyScore float64 (backend.go:99-110); serialized as scalar (handler.go:1075).
Plan: Change to nested struct; populate Max/Current.

21. Impact Should Be {MaxImpact, TotalImpact, TotalActualSpend, TotalExpectedSpend, TotalImpactPercentage}

AWS: Impact is an object; UI uses a.Impact?.TotalActualSpend (see +page.svelte:683).
Current: anomalySummary.Impact float64 (handler.go:1077).
Plan: Replace with object containing all five fields.

22. ProvideAnomalyFeedback Doesn't Persist

Current: handler.go:1579-1586 is a no-op echoing AnomalyId; doesn't update the stored Anomaly.
Plan: Look up anomaly by ID and set FeedbackType; validate enum (YES|NO|PLANNED_ACTIVITY); 404 on missing.

23. GetAnomalies Filters and Date Interval Ignored

AWS: Filters by DateInterval, MonitorArn, Feedback, TotalImpact { NumericOperator:GREATER_THAN_OR_EQUAL|..., StartValue, EndValue }; supports pagination.
Current: handler.go:1084-1106 honors only MonitorArn + Feedback; ignores DateInterval and TotalImpact entirely; no pagination.
Plan: Add date range + impact filtering; implement NextPageToken (offset/cursor).

24. GetReservationCoverage / Utilization Stubbed

AWS: Returns per-time coverage (% of usage covered by RIs) and utilization (% of RI capacity used) with grouping (AZ, INSTANCE_TYPE, LINKED_ACCOUNT, PLATFORM, REGION, TENANCY, SUBSCRIPTION_ID) and filter.
Current: handler.go:1266-1273, 1314-1321 return empty arrays.
Plan: Synthesize from RI inventory (cross-link ec2 reservations) and ledger; produce coverage/utilization buckets.

25. GetReservationPurchaseRecommendation Stubbed

AWS: Recommends RI purchases based on lookback window; returns RecommendationDetails[] with EstimatedMonthlySavings, RecommendedNumberOfInstancesToPurchase, etc.
Current: handler.go:1292-1299 returns empty list; UI table iterates empty results.
Plan: Generate recommendations from synthetic ledger top instance types; honor LookbackPeriodInDays, TermInYears, PaymentOption.

26. GetSavingsPlans* Family Stubbed

Current: GetSavingsPlansCoverage, GetSavingsPlansPurchaseRecommendation, GetSavingsPlansUtilization, GetSavingsPlansUtilizationDetails, GetSavingsPlanPurchaseRecommendationDetails (handler.go:1379-1455, 1358-1363) all return empty objects.
Plan: Synthesize SP coverage/utilization/recommendations; cross-link with savingsplans service if present; honor SavingsPlansType (COMPUTE_SP|EC2_INSTANCE_SP|SAGEMAKER_SP).

27. GetRightsizingRecommendation Stubbed

AWS: Returns Modify/Terminate recommendations per resource with current+target instance and projected savings; supports Configuration{RecommendationTarget:SAME_INSTANCE_FAMILY|CROSS_INSTANCE_FAMILY, BenefitsConsidered}.
Current: handler.go:1338-1345 returns empty list.
Plan: Generate from ledger top resources + utilization heuristics; populate Summary.

28. GetCostAndUsageWithResources Identical to Stub

AWS: Like GetCostAndUsage but at resource granularity (each row carries a resource ARN); requires Filter with at least one Service dimension; only available for past 14 days.
Current: handler.go:1191-1199 returns empty.
Plan: Constrain to ≤14-day window; require Service filter (else BillExpirationException / DataUnavailableException); produce per-resource ResultsByTime.

29. ListCostAllocationTags / UpdateCostAllocationTagsStatus Stubbed

AWS: Returns user-defined and AWS-generated tags with Status:Active|Inactive, Type:AWSGenerated|UserDefined, LastUpdatedDate, LastUsedDate.
Current: handler.go:1514-1521, 1651-1657 return empty / no-op; no validation of TagKeys, no enum enforcement.
Plan: Maintain a tag registry; honor Status, Type, TagKeys filters; persist Active/Inactive on update; reject unknown TagKeys with Errors[] populated.

30. ListCostCategoryResourceAssociations / Backfill / CommitmentPurchaseAnalysis Stubbed

Current: Listed in supported ops but all are no-ops returning empty payloads (handler.go:1469-1605).
Plan: At minimum implement realistic empty-state shapes that match AWS field nesting; track backfill jobs with state machine (CREATED|IN_PROGRESS|SUCCEEDED|FAILED) and commitment analyses with AnalysisStatus.

31. errInvalidRequest Maps to 400 But Type Is __type-less

AWS: All CE errors return {__type, message} with the AWS exception class name.
Current: handleError returns either service.JSONErrorResponse{Type, Message} for known sentinels or {message:...} for invalid request paths (handler.go:295-327); the latter omits __type.
Plan: Always return {__type, message}; add specific exception classes (DataUnavailableException, RequestChangedException, BillExpirationException, UnresolvableUsageUnitException, RefundExpiredException, GenerationExistsException).

32. UpdateAnomalySubscription Threshold>0 Guard Drops 0

Current: backend.go:678-680 only updates threshold when > 0; AWS allows clearing/setting to 0.
Plan: Use a pointer or sentinel; allow explicit 0; validate non-negative.

33. anomalyTTL Eviction Loses User Data on Restart Without Snapshot

Current: backend.go:163-175 evicts purely on CreationDate, even if user marked anomaly with feedback.
Plan: Keep anomalies with non-empty FeedbackType; configurable retention per feedback state.

34. Janitor Goroutine Lacks WaitGroup / Shutdown Sync

Current: StartJanitor (backend.go:147-161) returns immediately after launching goroutine; nothing observes the goroutine's exit. On rapid Reset/Shutdown a leaked goroutine may operate on a zeroed map.
Plan: Return a stopped channel or expose a Done() channel; document lifecycle.


Optimizations / leaks

  • Holding the global lockmetrics.RWMutex for the entire query path (once a ledger exists) will serialize all CE calls. Plan: shard by date or use a copy-on-write snapshot per query.
  • No query-result caching: GetCostAndUsage/GetCostForecast are pure functions of (TimePeriod, Filter, GroupBy); add an LRU keyed by canonical request hash.
  • GetAnomalies allocates a slice with len(anomalies) capacity then sorts; with TTL eviction the map can grow to thousands — switch to a sorted index per (MonitorARN, CreationDate).
  • evictExpiredAnomalies walks the entire map each tick — use a min-heap keyed on CreationDate.
  • Snapshot marshals the entire backend under read lock — for large ledgers, snapshot in chunks under copy-on-write.
  • Anomaly detector (when added) must use a bounded worker pool; otherwise per-monitor goroutines can leak under heavy churn.
  • mapToResourceTags allocates per call; consider a sync.Pool for tag slices in hot paths.

UI gaps (ui/src/routes/costexplorer/+page.svelte)

  • No GetCostAndUsage view at all — the headline CE feature ("cost by service, region, account") is absent.
  • No stacked bar / line chart of cost over time (needs charting lib + GetCostAndUsage with GroupBy:SERVICE).
  • Forecast tab shows only a single Total number; no line chart with prediction interval band, no per-bucket ForecastResultsByTime.
  • Cost Category form has no rule editor (no Expression builder, no Value+Rule pairs, no SplitChargeRules editor, no DefaultValue field).
  • Anomaly Subscriptions form has no Subscribers editor (Email/SNS targets) and no MonitorArnList selector — created subs are useless.
  • Anomaly Monitor form lacks MonitorSpecification editor for CUSTOM monitors and MonitorDimension dropdown.
  • Anomalies tab uses hardcoded DateInterval:{2020-01-01..2099-12-31}; no date pickers, no TotalImpact filter, no RootCauses display, no per-anomaly feedback action buttons (YES/NO/PLANNED_ACTIVITY).
  • No GetDimensionValues / GetTags explorer (would let users browse SERVICE/REGION/ACCOUNT and tag values).
  • No Reservation/Savings-Plans coverage or utilization charts; only purchase-recommendation table that lists empty rows.
  • No Rightsizing recommendation view.
  • No Cost Allocation Tag manager (Active/Inactive toggle, AWSGenerated vs UserDefined).
  • No CSV/JSON export of cost data, forecasts, or anomalies.
  • No "estimated month-to-date" widget (DAILY granularity sum for current month).
  • No comparison view (GetCostAndUsageComparisons / GetCostComparisonDrivers wired, but no UI).

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions