[caching] save/load source hierarchy and extracted log statements #20

tstack · 2025-10-07T18:25:24Z

Scanning the source hierarchy and extracting log statements is pretty costly, so we need to cache what we found between runs. This change writes a separate file for every source root. The file contains a header that is a JSON-line with interesting metadata that a user might find helpful. The body of the file is a bincode serialization of the SourceTree object. The CLI will try to load from the cache on startup, check for any changes in the source tree, and then update the cache, if necessary.

I was originally looking at using SQLite for the cache, but it's hard to write blobs of unknown size to a table. This approach seems to work since the header can easily be read separately from the main content to determine if the file is valid or not. I also tried using the postcard
serializer, but had trouble getting it to work. The bincode one seems to just work.

Files:

Cargo.toml: Add some more dependencies:
- bincode for serialization
- directories for finding the user's platform-specific cache directory
- serde_regex for serializing/deserializing Regex
- sha2 for producing a hash used for the cache file names
- tempfile for creating the cache file
- walkdir for walking the cache dir in the tests
cache-header-v1.json: The schema for the cache entry header
lib.rs: Add cache stuff
main.rs: Add a footer to the help to mention where the cache is. Load the statements from the cache and save for future use.
source_hier.rs: Derive Deserialize on various structs.
common_settings/mod.rs: Move helper module to its own directory.
source_ref.rs: Serialize/deserialize the Regex directly so the deserialize can fail if the regex string is invalid. Add separate pattern_str to cache the string version of the pattern.

Scanning the source hierarchy and extracting log statements is pretty costly, so we need to cache what we found between runs. This change writes a separate file for every source root. The file contains a header that is a JSON-line with interesting metadata that a user might find helpful. The body of the file is a bincode serialization of the SourceTree object. The CLI will try to load from the cache on startup, check for any changes in the source tree, and then update the cache, if necessary. I was originally looking at using SQLite for the cache, but it's hard to write blobs of unknown size to a table. This approach seems to work since the header can easily be read separately from the main content to determine if the file is valid or not. I also tried using the postcard serializer, but had trouble getting it to work. The bincode one seems to just work. Files: * Cargo.toml: Add some more dependencies: - bincode for serialization - directories for finding the user's platform-specific cache directory - serde_regex for serializing/deserializing Regex - sha2 for producing a hash used for the cache file names - tempfile for creating the cache file - walkdir for walking the cache dir in the tests * cache-header-v1.json: The schema for the cache entry header * lib.rs: Add cache stuff * main.rs: Add a footer to the help to mention where the cache is. Load the statements from the cache and save for future use. * source_hier.rs: Derive `Deserialize` on various structs. * common_settings/mod.rs: Move helper module to its own directory. * source_ref.rs: Serialize/deserialize the Regex directly so the deserialize can fail if the regex string is invalid. Add separate `pattern_str` to cache the string version of the pattern.

If running with "-v", log that the cache will not be used

ttiimm · 2025-10-29T16:56:16Z

tests/common_settings/mod.rs

+        cmd.env("XDG_CONFIG_HOME", self.location.path());
+        cmd.env("USERPROFILE", self.location.path());
+        cmd.env("LOCALAPPDATA", self.location.path());
+        cmd.env("APPDATA", self.location.path());


Does ProjectDirs need TEMP or TMP set to work on windows? That was one recommendation by the coding assistant on how to fix this...

tstack added 7 commits October 7, 2025 11:22

[tests] fix expected out

a0e9eae

[tests] try tweak for win

2abe8d2

[tests] windows...

01bb01f

[tests] maybe the cache dir is not found on win

5174865

If running with "-v", log that the cache will not be used

[tests] exclude the corruption test from windows for now

c6079c2

[tasks] checkoff a couple more tasks

ea17eb4

ttiimm approved these changes Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[caching] save/load source hierarchy and extracted log statements #20

[caching] save/load source hierarchy and extracted log statements #20

Uh oh!

tstack commented Oct 7, 2025

Uh oh!

ttiimm Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[caching] save/load source hierarchy and extracted log statements #20

Are you sure you want to change the base?

[caching] save/load source hierarchy and extracted log statements #20

Uh oh!

Conversation

tstack commented Oct 7, 2025

Uh oh!

ttiimm Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants