Skip to content

Conversation

@tstack
Copy link
Contributor

@tstack tstack commented Sep 24, 2025

Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log.

Files:

  • Cargo.toml: Add python tree-sitter crate.
  • Tasks.md: Use task markers so we can check them off.
  • lib.rs: Refactor the placeholder regex and add python to SourceLanguage.
  • source_ref.rs: The escape_ignore_newlines() needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.

Tested with https://github.com/nicolargo/glances since
it is a relatively complex pyton program that generates
a log.

Files:
* Cargo.toml: Add python tree-sitter crate.
* Tasks.md: Use task markers so we can check them off.
* lib.rs: Refactor the placeholder regex and add
  python to SourceLanguage.
* source_ref.rs: The `escape_ignore_newlines()` needs
  to convert octal escapes to hex.  Also, handle
  raw strings since the backslashes need to be escaped.
Copy link
Owner

@ttiimm ttiimm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

Comment on lines +4 to +6
- TSS: Doesn't this work already? I echo
the body of the log message into log2src
and it can find the message.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I tried it recently, but there was a panic when something was using the log format. Could we remove the log formats from the test code if they're unneeded by the test case?

Comment on lines +155 to +203
fn escape_ignore_newlines(raw: bool, segment: &str) -> String {
const HEX_CHARS: [char; 16] = [
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
];

let mut result = String::with_capacity(segment.len() * 2);
for c in segment.chars() {
match c {
'\n' => result.push_str(r"\n"), // Use actual newline in regex
'\r' => result.push_str(r"\r"), // Handle carriage returns too
// Escape regex special chars
'.' | '+' | '*' | '?' | '^' | '$' | '(' | ')' | '[' | ']' | '{' | '}' | '|' => {
let mut last_end = 0;
let regex = if raw {
&RAW_ESCAPE_REGEX
} else {
&ESCAPE_REGEX
};
for cap in regex.captures_iter(segment) {
let overall_range = cap.get(0).unwrap().range();
result.push_str(segment[last_end..overall_range.start].as_ref());
last_end = overall_range.end;
if let Some(c) = cap.get(1) {
result.push('\\');
result.push_str(c.as_str());
} else if let Some(c) = cap.get(2) {
match c.as_str() {
"\n" => result.push_str("\\n"),
"\r" => result.push_str("\\r"),
"\t" => result.push_str("\\t"),
_ => unreachable!(),
}
} else if let Some(c) = cap.get(3) {
if raw {
result.push('\\');
result.push_str(c.as_str());
} else {
let c = c.as_str();
let c = &c[1..];
let c = u8::from_str_radix(c, 8).unwrap();
result.push('\\');
result.push(c);
result.push('x');
result.push(HEX_CHARS[(c >> 4) as usize]);
result.push(HEX_CHARS[(c & 0xf) as usize]);
}
_ => result.push(c),
} else if let Some(_c) = cap.get(4) {
// XXX This is the fancy Python "\N{...}" escape sequence. Ideally, we'd interpret the
// name of the escape, but that seems like a lot of work. So, we'll just match any
// character.
result.push_str("\\w");
} else {
unreachable!();
}
}
result.push_str(segment[last_end..].as_ref());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any additional test coverage for the raw strings. Would you add some?

@@ -0,0 +1,52 @@
---
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding test coverage here vs in the root test directory. I guess this tests the library pretty thoroughly end to end, so maybe those tests should focus more on the CLI features. Does that make sense to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add tests in lib.rs so that I can use the debugger. The stuff under tests spawns a separate process and I don't think I was able to attach a debugger.

Comment on lines +145 to +153
static ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\[0-7]{3}|\\0)|(\\N\{[^}]+})"#).unwrap()
});

// A regex for raw strings that doesn't try to interpret escape sequences.
static RAW_ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\)"#).unwrap()
});

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes, and the placeholder ones, are hurting my brain. I wish there was a more expressive way to write them. Any ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comments for now.

@tstack
Copy link
Contributor Author

tstack commented Sep 25, 2025

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

The test_basic_python() in lib.rs has a raw string. Is there a case in particular you're thinking of?

@ttiimm
Copy link
Owner

ttiimm commented Sep 25, 2025

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

The test_basic_python() in lib.rs has a raw string. Is there a case in particular you're thinking of?

I was looking for a case in the source_ref module and didn't see any.

@ttiimm ttiimm merged commit 414b7f1 into ttiimm:main Sep 25, 2025
7 checks passed
@ttiimm
Copy link
Owner

ttiimm commented Sep 25, 2025

Is there a case in particular you're thinking of?

Does this work with Rust raw strings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants