2026-02-19 Engineering Team

# Agenda

- Discuss [draft testing plan](https://github.com/ActivitySim/activitysim/issues/1038)
  - Is this a good plan?  What can make it better?  What am I missing?  
  - The main "thing" to do for to move this along is to write the testing helpers. Do you think that such helpers would make this process easier, or just different?
  - What should be the "correct" place to put a new test?

@ActivitySim/engineering 

# Notes

Jeff presented a GitHub issue outlining a proposed plan to improve ActivitySim's testing structure. The group reviewed it together and offered initial reactions, with the intent to continue refining the approach asynchronously.

## Current Problems Identified

**1. Over-reliance on integration tests.** Most existing tests run the full model end-to-end (load data → run 20–30 components → compare final trip tables). These are slow (can take an hour+), give poor diagnostic signal when something breaks, and produce cascading failures from trivial changes (e.g., a capitalization fix).

**2. Disorganized test structure.** Tests are scattered across the repository with no clear guidance on where a new test should live. Contributors adding features or fixing bugs have no obvious location or format to follow.

**3. High setup burden for component tests.** Writing a test for even a single component (e.g., trip destination choice) currently requires assembling a large set of boilerplate config files. In one recent example, writing the test took significantly more effort than the underlying bug fix itself.

### Proposed Approach

**Two-tier testing structure:**
- **Fast tier (unit/component tests):** Small, targeted tests covering individual features or bug fixes. Should run in 1–4 minutes on every commit. This is the primary feedback loop for developers.
- **Slow tier (integration tests):** Full model runs retained as a final safety net before pull requests are merged — not triggered on every commit.

**"Boy Scout" rule:** Every new feature, bug fix, or non-trivial code change should include an accompanying test. Exceptions (e.g., documentation typos) should be rare and deliberate.

**Testing submodule:** Jeff proposed creating a dedicated testing submodule with reusable helper functions and setup utilities, reducing the boilerplate burden when writing new component tests.

**Clear contributor expectations:** Document what a well-formed pull request should include with respect to testing.

### Key Discussion Points

**Self-contained vs. config-file-based tests.** David highlighted a tension between tests that embed all settings inline (transparent, self-contained, but not reusable) vs. tests that load shared config files (reusable, but harder to debug when something breaks). No consensus reached — considered an open design question.

**Chain-effect testing.** Sijia noted that some features (e.g., global household skipping on failure) require testing cross-component behavior, making pure unit tests insufficient. Some tests will inevitably need a broader model state.

**ActivitySim's structural challenge.** Because components are tightly coupled and depend on file-based configuration, breaking dependency chains for isolated testing is non-trivial. Jeff acknowledged there may not be a clean solution.

**Test maintenance burden.** David raised concern that a large, varied test suite can itself become expensive to maintain — especially during library upgrades (e.g., Pandas 3 is already causing cascading failures). Sijia echoed this from experience with Network Wrangler, noting tests are often the first thing to break during refactoring.

**Scope of coverage.** Whether the goal is full unit-test coverage of the existing codebase vs. incremental improvement going forward is a consortium-level policy question, not an engineering one. Jeff estimated achieving full coverage could require roughly a year of dedicated funding — valuable but difficult to justify to individual agencies in the near term.

**Performance testing.** David raised the lack of any systematic performance benchmarking. Jeff noted prior discussions about dedicated cloud-based reference machines for this purpose, which were never acted on. The group agreed this is worth revisiting, potentially as a third testing tier or as a regular reporting mechanism.

**AI-assisted test writing.** David and Jeff discussed using AI tools to accelerate test generation — but only after a consistent test format and a set of 6–12 canonical examples are established. Pointing AI at the current repository without that scaffolding would likely make things worse.

**External software developers.** David advised against bringing in outside developers unfamiliar with the domain, citing past experience where a 2-year ramp-up still yielded contributions that missed important context.

---

## Action Items

| Owner | Action |
|---|---|
| Jeff Newman | Convert the GitHub issue into a shared Google Doc for async commenting and collaboration. |
| Jeff Newman | Propose named patterns/standards for the two primary test approaches discussed. |
| Sijia Wang | Share Network Wrangler repository and test structure as a reference example. |
| David Hensle | Share the Pandas contributing guide as another reference for testing philosophy. |
| All | Review the draft testing strategy doc and add comments before the next meeting. |

---

**Next Meeting:** Continue testing strategy discussion. Jeff will circulate the Google Doc in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2026-02-19 Engineering Team #72

Agenda

Notes

Current Problems Identified

Proposed Approach

Key Discussion Points

Action Items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Owner	Action
Jeff Newman	Convert the GitHub issue into a shared Google Doc for async commenting and collaboration.
Jeff Newman	Propose named patterns/standards for the two primary test approaches discussed.
Sijia Wang	Share Network Wrangler repository and test structure as a reference example.
David Hensle	Share the Pandas contributing guide as another reference for testing philosophy.
All	Review the draft testing strategy doc and add comments before the next meeting.

2026-02-19 Engineering Team #72

Description

Agenda

Notes

Current Problems Identified

Proposed Approach

Key Discussion Points

Action Items

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions