Skip to content

[WIP] Fix GetCostAndUsage to return valid results by time#2001

Closed
Claude wants to merge 1 commit into
mainfrom
claude/fix-cost-explorer-query-results
Closed

[WIP] Fix GetCostAndUsage to return valid results by time#2001
Claude wants to merge 1 commit into
mainfrom
claude/fix-cost-explorer-query-results

Conversation

@Claude

@Claude Claude AI commented May 26, 2026

Copy link
Copy Markdown
Contributor

Thanks for asking me to work on this. I will get started on it and keep this PR's description up to date as I form a plan and make progress.


This section details on the original issue you should resolve

<issue_title>[Story] Cost Explorer (CE) AWS-accuracy audit: bring services/ce to parity</issue_title>
<issue_description>Tracks gaps between services/ce (AWS Cost Explorer) and real AWS. Most query and recommendation operations are stubs that return empty payloads — making the service unable to power realistic UI/SDK exercises. This issue inventories what is missing.


1. GetCostAndUsage Returns Empty ResultsByTime

AWS: Bucketizes cost/usage records into TimePeriods at HOURLY/DAILY/MONTHLY granularity, applies Filter expression, optionally groups by 1-2 GroupBy keys; each ResultsByTime entry contains Total{metric:{Amount,Unit}} and per-group Groups[].Keys + Metrics.
Current: handler.go:870-878 returns {ResultsByTime:[], DimensionValueAttributes:[]} regardless of input.
Plan: Build a synthetic cost ledger (per service/region/usage-type/account/linked-account); aggregate over TimePeriod; honor Granularity, Metrics, GroupBy, Filter; emit ResultsByTime with proper Estimated flag for current period.

2. No Synthetic Cost Data Generator

AWS: Real CE has billions of CUR line items per account.
Current: Backend only stores costCategories, anomalyMonitors, anomalySubscriptions, anomalies (backend.go:120-129); no cost ledger at all.
Plan: Add CostLedger keyed by (date, service, region, usageType, account, tags); seed with realistic distributions (EC2 ~40%, S3 ~15%, RDS ~10%, etc.); expose admin API to inject custom line items; persist via Snapshot/Restore.

3. Filter Expression Not Implemented

AWS: Expression { And, Or, Not, Dimensions{Key, Values, MatchOptions}, Tags{Key, Values, MatchOptions}, CostCategories{Key, Values, MatchOptions} } — recursive, used in GetCostAndUsage, GetCostForecast, GetReservation*, anomaly monitor specs.
Current: Only getCostAndUsageWithResources has Filter any field; no evaluator anywhere.
Plan: Implement an Expression type + recursive evaluator; share between cost queries, forecasts, anomaly MonitorSpecification, cost category Rules, anomaly subscription ThresholdExpression.

4. GroupBy Ignored

AWS: Up to 2 GroupBy[{Type:DIMENSION|TAG|COST_CATEGORY, Key}] produce nested grouped rows.
Current: getCostAndUsageInput (handler.go:859-863) lacks GroupBy entirely.
Plan: Add GroupBy to input; emit nested Groups[] with Keys array; validate max 2 groupings.

5. Metrics Validation Missing

AWS: Valid metrics: BlendedCost, UnblendedCost, AmortizedCost, NetAmortizedCost, NetUnblendedCost, UsageQuantity, NormalizedUsageAmount. Different metrics yield different Units.
Current: Metrics []string accepted blindly; no per-metric Unit logic.
Plan: Validate metric names; produce per-metric Amount/Unit (USD, Hrs, GB-Mo); reject UsageQuantity with GroupBy:USAGE_TYPE mismatches as AWS does.

6. GetCostForecast Returns Hardcoded "0"

AWS: Runs ETS-style forecast on historical cost; returns MeanValue plus PredictionIntervalLowerBound/UpperBound keyed by the chosen PredictionIntervalLevel (51-99). Per-time results in ForecastResultsByTime.
Current: handler.go:1019-1027 returns MeanValue:"0", empty list.
Plan: Run rolling-window mean + std-dev over the synthetic ledger; emit per-bucket forecast with confidence bounds derived from PredictionIntervalLevel (default 80).

7. GetUsageForecast Stubbed Identically

Current: handler.go:1041-1049 mirrors GetCostForecast with hardcoded zeros.
Plan: Same algorithm against UsageQuantity / NormalizedUsageAmount metrics with valid Units (Hrs, GB-Mo).

8. PredictionIntervalLevel Not Honored

AWS: 51-99 inclusive controls forecast confidence band width.
Current: Field absent from getCostForecastInput (handler.go:1000-1004).
Plan: Accept and validate (51-99); narrow/widen bounds via inverse normal CDF.

9. GetDimensionValues Returns Empty

AWS: Dimensions: AZ, INSTANCE_TYPE, LINKED_ACCOUNT, OPERATION, PURCHASE_TYPE, REGION, SERVICE, USAGE_TYPE, USAGE_TYPE_GROUP, RECORD_TYPE, OPERATING_SYSTEM, TENANCY, SCOPE, PLATFORM, SUBSCRIPTION_ID, LEGAL_ENTITY_NAME, DEPLOYMENT_OPTION, DATABASE_ENGINE, CACHE_ENGINE, INSTANCE_TYPE_FAMILY, BILLING_ENTITY, RESERVATION_ID, RESOURCE_ID, RIGHTSIZING_TYPE, SAVINGS_PLANS_TYPE, SAVINGS_PLAN_ARN, PAYMENT_OPTION, AGREEMENT_END_DATE_TIME_AFTER, AGREEMENT_END_DATE_TIME_BEFORE, INVOICING_ENTITY, ANOMALY_TOTAL_IMPACT_ABSOLUTE, ANOMALY_TOTAL_IMPACT_PERCENTAGE.
Current: handler.go:893-902 always returns [].
Plan: Maintain a per-dimension value catalog populated from the synthetic ledger; honor SearchString, Context (COST_AND_USAGE | RESERVATIONS | SAVINGS_PLANS), pagination via NextPageToken.

10. GetTags Returns Empty

AWS: Returns distinct tag keys (when TagKey empty) or values for a key, scoped to TimePeriod. Honors SortBy, SearchString, Filter.
Current: handler.go:917-926 returns [].
Plan: Track tag keys/values from the cost ledger and from cost-allocation tags; implement search/sort/pagination.

11. CostCategory Rule Expression Missing

AWS: Rule { Value, Rule:Expression, InheritedValue, Type:REGULAR|INHERITED_VALUE }. The Expression carries Dimensions/Tags/CostCategories AND/OR/NOT semantics.
Current: costCategoryRule is {Value string} only (handler.go:379-381); no Rule expression, no Type, no InheritedValue.
Plan: Extend CostCategoryRule with Type, Rule Expression, InheritedValue{DimensionName, DimensionKey}; evaluate during cost queries to attribute spend to category values.

12. SplitChargeRules Parameters Missing

AWS: SplitChargeRule { Source, Targets, Method:FIXED|PROPORTIONAL|EVEN, Parameters:[{Type:ALLOCATION_PERCENTAGES, Values}] }.
Current: splitChargeRule lacks Parameters and method validation (handler.go:383-387).
Plan: Add Parameters; validate Method enum and Parameters requirement (FIXED requires ALLOCATION_PERCENTAGES totalling 100); apply during attribution.

13. RuleVersion Not Validated

AWS: Currently only CostCategoryExpression.v1; required field.
Current: Accepted blindly in CreateCostCategoryDefinition (backend.go:210-246).
Plan: Reject unknown values with InvalidParameterException.

14. EffectiveStart Format and Semantics Wrong

AWS: Truncated to first of month (YYYY-MM-01T00:00:00Z) but caller may pass past or current-month value; effective end on Delete is set to start of next month.
Current: effectiveStart() (backend.go:203-207) always uses today's month; ignores caller-provided EffectiveStart; Delete effectiveEnd uses same value (handler.go:445).
Plan: Honor caller's EffectiveStart (validate first-of-month); on Delete compute effective end as next-month-start.

15. CostCategory History (Effective Periods) Missing

AWS: Updates create new effective period; Describe with EffectiveOn returns the rule set valid at that date.
Current: Update overwrites in place; EffectiveOn field accepted but ignored (handler.go:449-452).
Plan: Store an effective-history list per category; DescribeCostCategoryDefinition resolves by EffectiveOn.

16. AnomalyMonitor MonitorSpecification Missing

AWS: CUSTOM monitors require MonitorSpecification (Expression). DIMENSIONAL requires MonitorDimension=SERVICE.
Current: AnomalyMonitor (backend.go:77-84) lacks MonitorSpecification; no enforcement of MonitorDimension=SERVICE for DIMENSIONAL or MonitorSpecification for CUSTOM.
Plan: Add MonitorSpecification; validate type-specific requirements; reject mismatched combinations.

17. AnomalySubscription ThresholdExpression Not Supported

AWS: Threshold is deprecated; modern subs use ThresholdExpression (Expression on ANOMALY_TOTAL_IMPACT_ABSOLUTE / ANOMALY_TOTAL_IMPACT_PERCENTAGE).
Current: Only flat Threshold float64 (backend.go:88-96); UI sends ThresholdExpression but backend ignores it.
Plan: Add ThresholdExpression Expression; require one of {Threshold, ThresholdExpression}; emit deprecation note when Threshold used.

18. Anomaly Detection Not Computed

AWS: ML model continuously evaluates services per monitor; emits anomalies with score 0-1 and RootCauses[]{Service, Region, LinkedAccount, UsageType}.
Current: Anomalies only inserted via test helper AddAnomaly (backend.go:716-730); no detector.
Plan: Background detector that scans synthetic ledger per active monitor (sliding window mean + std-dev); emit Anomaly with score + RootCauses; respect monitor MonitorSpecification.

19. Anomaly RootCauses Field Missing

AWS: RootCauses[]{Service, Region, LinkedAccount, LinkedAccountName, UsageType} returned from GetAnomalies.
Current: Anomaly struct (backend.go:99-110) has no RootCauses; anomalySummary (handler.go:1067-1077) omits them.
Plan: Add field; populate from detector; serialize in GetAnomalies.

20. AnomalyScore Should Be {MaxScore, CurrentScore}

AWS: AnomalyScore { MaxScore, CurrentScore } object, not scalar; UI uses a.AnomalyScore?.MaxScore (see +page.svelte:672) which currently breaks.
Current: AnomalyScore float64 (backend.go:99-110); serialized as scalar (handler.go:1075).
Plan: Change to nested struct; populate Max/Current.

21. Impact Should Be {MaxImpact, TotalImpact, TotalActualSpend, TotalExpectedSpend, TotalImpactPercentage}

AWS: Impact is an object; UI uses a.Impact?.TotalActualSpend (see +page.svelte:683).
Current: anomalySummary.Impact float64 (handler.go:1077).
Plan: Replace with object containing all five fields.

22. ProvideAnomalyFeedback Doesn't Persist

Current: handler.go:1579-1586 is a no-op echoing AnomalyId; doesn't update the stored Anomaly.
Plan: Look up anomaly by ID and set FeedbackType; validate enum (YES|NO|PLANNED_ACTIVITY); 404 on missing.

23. GetAnomalies Filters and Date Interval Ignored

AWS: Filters by DateInterval, MonitorArn, Feedback, TotalImpact { NumericOperator:GREATER_THAN_OR_EQUAL|..., StartValue, EndValue }; supports pagination.
Current: handler.go:1084-1106 honors only MonitorArn + Feedback; ignores DateInterval and TotalImpact entirely; no pagination.
Plan: Add date range + impact filtering; implement NextPageToken (offset/cursor).

24. GetReservationCoverage / Utilization Stubbed

AWS: Returns per-time coverage (% of usage covered by RIs) and utilization (% of RI capacity used) with grouping (AZ, INSTANCE_TYPE, LINKED_ACCOUNT, PLATFORM, REGION, TENANCY, SUBSCRIPTION_ID) and filter.
Current: handler.go:1266-1273, 1314-1321 return empty arrays.
Plan: Synthesize from RI inventory (cross-link ec2 reservations) and ledger; produce coverage/utilization buckets.

25. GetReservationPurchaseRecommendation Stubbed

AWS: Recommends RI purchases based on lookback window; returns RecommendationDetails[] with EstimatedMonthlySavings, RecommendedNumberOfInstancesToPurchase, etc.
Current: handler.go:1292-1299 returns empty list; UI table iterates empty results.
Plan: Generate recommendations from synthetic ledger top instance types; honor LookbackPeriodInDays, TermInYears, PaymentOption.

26. GetSavingsPlans* Family Stubbed

Current: GetSavingsPlansCoverage, GetSavingsPlansPurchaseRecommendation, GetSavingsPlansUtilization, GetSavingsPlansUtilizationDetails, GetSavingsPlanPurchaseRecommendationDetails (handler.go:1379-1455, 1358-1363) all return empty objects.
Plan: Synthesize SP coverage/utilization/recommendations; cross-link with savingsplans service if present; honor SavingsPlansType (COMPUTE_SP|EC2_INSTANCE_SP|SAGEMAKER_SP).

27. GetRightsizingRecommendation Stubbed

AWS: Returns Modify/Terminate recommendations per resource with current+target instance and projected savings; supports Configuration{RecommendationTarget:SAME_INSTANCE_FAMILY|CROSS_INSTANCE_FAMILY, BenefitsConsidered}.
Current: handler.go:1338-1345 returns empty list.
Plan: Generate from ledger top resources + utilization heuristics; populate Summary.

28. GetCostAndUsageWithResources Identical to Stub

AWS: Like GetCostAndUsage but at resource granularity (each row carries a resource ARN); requires Filter with at least one Service dimension; only available for past 14 days.
Current: handler.go:1191-1199 returns empty.
Plan: Constrain to ≤14-day window; require Service filter (else BillExpirationException / DataUnavailableException); produce per-resource ResultsByTime.

29. ListCostAllocationTags / UpdateCostAllocationTagsStatus Stubbed

AWS: Returns user-defined and AWS-generated tags with Status:Active|Inactive, Type:AWSGenerated|UserDefined, LastUpdatedDate, LastUsedDate.
Current: handler.go:1514-1521, 1651-1657 return empty / no-op; no validation of TagKeys, no enum enforcement.
Plan: Maintain a tag registry; honor Status, Type, TagKeys filters; persist Active/Inactive on update; reject unknown TagKeys with Errors[] populated.

30. ListCostCategoryResourceAssociations / Backfill / CommitmentPurchaseAnalysis Stubbed

Current: Listed in supported ops but all are no-ops returning empty payloads (handler.go:1469-1605).
Plan: At minimum implement realistic empty-state shapes that match AWS field nesting; track backfill jobs with state machine (CREATED|IN_PROGRESS|SUCCEEDED|FAILED) and commitment analyses with AnalysisStatus.

31. errInvalidRequest Maps to 400 But Type Is __type-less

AWS: All CE errors return {__type, message} with the AWS exception class name.
Current: handleError returns either service.JSONErrorResponse{Type, Message} for known sentinels or {message:...} for invalid request paths (handler.go:295-327); the latter omits __type.
Plan: Always return {__type, message}; add specific exception classes (DataUnavailableException, RequestChangedException, BillExpirationException, UnresolvableUsageUnitException, RefundExpiredException, GenerationExistsException).

32. UpdateAnomalySubscription Threshold>0 Guard Drops 0

Current: backend.go:678-680 only updates threshold when > 0; AWS allows clearing/setting to 0.
Plan: Use a pointer or sentinel; allow explicit 0; validate non-negative.

33. anomalyTTL Eviction Loses User Data on Restart Without Snapshot

Current: backend.go:163-175 evicts purely on CreationDate, even if user marked anomaly with feedback.
Plan: Keep anomalies with non-empty FeedbackType; configurable retention per feedback state.

34. Janitor Goroutine Lacks WaitGroup / Shutdown Sync

Current: StartJanitor (backend.go:147-161) returns immediately after launching goroutine; nothing observes the goroutine's exit. On rapid Reset/Shutdown a leaked goroutine may operate on a zeroed map.
Plan: Return a stopped channel or expose a Done() channel; document lifecycle.


Optimizations / leaks

  • Holding the global lockmetrics.RWMutex for the entire query path (once a ledger exists) will serialize all CE calls. Plan: shard by date or use a copy-on-write snapshot per query.
  • No query-result caching: GetCostAndUsage/GetCostForecast are pure functions of (TimePeriod, Filter, GroupBy); add an LRU keyed by canonical request hash.
  • GetAnomalies allocates a slice with len(anomalies) capacity then sorts; with TTL eviction the map can grow to thousands — switch to a sorted index per (MonitorARN, CreationDate).
  • evictExpiredAnomalies walks the entire map each tick — use a min-heap keyed on CreationDate.
  • Snapshot marshals the entire backend under read lock — for large ledgers, snapshot in chunks under copy-on-write.
  • Anomaly detector (when added) must use a bounded worker pool; otherwise per-monitor goroutines can leak under heavy churn.
  • mapToResourceTags allocates per call; consider a sync.Pool for tag slices in hot paths.

UI gaps (ui/src/routes/costexplorer/+page.svelte)

  • No GetCostAndUsage view at all — the headline CE feature ("cost by service, region, account") is absent.
  • No stacked bar / line chart of cost over time (needs charting lib + GetCostAndUsage with GroupBy:SERVICE).
  • Forecast tab shows only a single Total number; no line chart with prediction interval band, no per-bucket ForecastResultsByTime.
  • Cost Category form has no rule editor (no Expression builder, no Value+Rule pairs, no SplitChargeRules editor, no DefaultValue field).
  • Anomaly Subscriptions form has no Subscribers editor (Email/SNS targets) and no MonitorArnList selector — created subs are useless.
  • Anomaly Monitor form lacks MonitorSpecification editor for CUSTOM monitors and MonitorDimension dropdown.
  • Anomalies tab uses hardcoded DateInterval:{2020-01-01..2099-12-31}; no date pickers, no TotalImpact filter, no RootCauses display, no per-anomaly feedback action buttons (YES/NO/PLANNED_ACTIVITY).
  • No GetDimensionValues / GetTags explorer (would let users browse SERVICE/REGION/ACCOUNT and tag values).
  • No Reservation/Savings-Plans coverage or utilization charts; only purchase-recommendation table that lists empty rows.
  • No Rightsizing recommendation view.
  • No Cost Allocation Tag manager (Active/Inactive toggle, AWSGenerated vs UserDefined).
  • No CSV/JSON export of cost data, forecasts, or anomalies.
  • No "estimated month-to-date" widget (DAILY granularity sum for current month).
  • No comparison view (GetCostAndUsageComparisons / GetCostComparisonDrivers wired, but no UI).
    </issue_description>

Comments on the Issue (you are @claude[agent] in this section)

@Claude Claude AI linked an issue May 26, 2026 that may be closed by this pull request
Copilot stopped work on behalf of agbishop due to an error May 26, 2026 03:05
@Claude Claude AI requested a review from agbishop May 26, 2026 03:05
@agbishop agbishop closed this May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Story] Cost Explorer (CE) AWS-accuracy audit: bring services/ce to parity

2 participants