[lang] add support for python #15

tstack · 2025-09-24T19:45:48Z

Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log.

Files:

Cargo.toml: Add python tree-sitter crate.
Tasks.md: Use task markers so we can check them off.
lib.rs: Refactor the placeholder regex and add python to SourceLanguage.
source_ref.rs: The escape_ignore_newlines() needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.

Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log. Files: * Cargo.toml: Add python tree-sitter crate. * Tasks.md: Use task markers so we can check them off. * lib.rs: Refactor the placeholder regex and add python to SourceLanguage. * source_ref.rs: The `escape_ignore_newlines()` needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.

ttiimm

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

ttiimm · 2025-09-25T15:45:47Z

docs/Tasks.md

+  - TSS: Doesn't this work already?  I echo
+    the body of the log message into log2src
+    and it can find the message.


I thought I tried it recently, but there was a panic when something was using the log format. Could we remove the log formats from the test code if they're unneeded by the test case?

ttiimm · 2025-09-25T16:12:41Z

src/source_ref.rs

+fn escape_ignore_newlines(raw: bool, segment: &str) -> String {
+    const HEX_CHARS: [char; 16] = [
+        '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F',
+    ];
+
    let mut result = String::with_capacity(segment.len() * 2);
-    for c in segment.chars() {
-        match c {
-            '\n' => result.push_str(r"\n"), // Use actual newline in regex
-            '\r' => result.push_str(r"\r"), // Handle carriage returns too
-            // Escape regex special chars
-            '.' | '+' | '*' | '?' | '^' | '$' | '(' | ')' | '[' | ']' | '{' | '}' | '|' => {
+    let mut last_end = 0;
+    let regex = if raw {
+        &RAW_ESCAPE_REGEX
+    } else {
+        &ESCAPE_REGEX
+    };
+    for cap in regex.captures_iter(segment) {
+        let overall_range = cap.get(0).unwrap().range();
+        result.push_str(segment[last_end..overall_range.start].as_ref());
+        last_end = overall_range.end;
+        if let Some(c) = cap.get(1) {
+            result.push('\\');
+            result.push_str(c.as_str());
+        } else if let Some(c) = cap.get(2) {
+            match c.as_str() {
+                "\n" => result.push_str("\\n"),
+                "\r" => result.push_str("\\r"),
+                "\t" => result.push_str("\\t"),
+                _ => unreachable!(),
+            }
+        } else if let Some(c) = cap.get(3) {
+            if raw {
+                result.push('\\');
+                    result.push_str(c.as_str());
+            } else {
+                let c = c.as_str();
+                let c = &c[1..];
+                let c = u8::from_str_radix(c, 8).unwrap();
                result.push('\\');
-                result.push(c);
+                result.push('x');
+                result.push(HEX_CHARS[(c >> 4) as usize]);
+                result.push(HEX_CHARS[(c & 0xf) as usize]);
            }
-            _ => result.push(c),
+        } else if let Some(_c) = cap.get(4) {
+            // XXX This is the fancy Python "\N{...}" escape sequence.  Ideally, we'd interpret the
+            // name of the escape, but that seems like a lot of work.  So, we'll just match any
+            // character.
+            result.push_str("\\w");
+        } else {
+            unreachable!();
        }
    }
+    result.push_str(segment[last_end..].as_ref());


I don't see any additional test coverage for the raw strings. Would you add some?

ttiimm · 2025-09-25T16:15:50Z

src/snapshots/log2src__tests__basic_python.snap

@@ -0,0 +1,52 @@
+---


What do you think about adding test coverage here vs in the root test directory. I guess this tests the library pretty thoroughly end to end, so maybe those tests should focus more on the CLI features. Does that make sense to you?

I add tests in lib.rs so that I can use the debugger. The stuff under tests spawns a separate process and I don't think I was able to attach a debugger.

ttiimm · 2025-09-25T16:23:44Z

src/source_ref.rs

+static ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
+    Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\[0-7]{3}|\\0)|(\\N\{[^}]+})"#).unwrap()
+});
+
+// A regex for raw strings that doesn't try to interpret escape sequences.
+static RAW_ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| {
+    Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\)"#).unwrap()
+});
+


These regexes, and the placeholder ones, are hurting my brain. I wish there was a more expressive way to write them. Any ideas?

I'll add some comments for now.

tstack · 2025-09-25T17:13:15Z

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

The test_basic_python() in lib.rs has a raw string. Is there a case in particular you're thinking of?

ttiimm · 2025-09-25T17:52:47Z

Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.

The test_basic_python() in lib.rs has a raw string. Is there a case in particular you're thinking of?

I was looking for a case in the source_ref module and didn't see any.

ttiimm · 2025-09-25T17:56:18Z

Is there a case in particular you're thinking of?

Does this work with Rust raw strings?

tstack added 2 commits September 24, 2025 12:45

[build] missed snap file

d004ed6

ttiimm reviewed Sep 25, 2025

View reviewed changes

ttiimm merged commit 414b7f1 into ttiimm:main Sep 25, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lang] add support for python #15

[lang] add support for python #15

Uh oh!

tstack commented Sep 24, 2025

Uh oh!

ttiimm left a comment

Uh oh!

ttiimm Sep 25, 2025

Uh oh!

ttiimm Sep 25, 2025

Uh oh!

ttiimm Sep 25, 2025

Uh oh!

tstack Sep 25, 2025

Uh oh!

ttiimm Sep 25, 2025

Uh oh!

tstack Sep 25, 2025

Uh oh!

tstack commented Sep 25, 2025

Uh oh!

ttiimm commented Sep 25, 2025

Uh oh!

Uh oh!

ttiimm commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[lang] add support for python #15

[lang] add support for python #15

Uh oh!

Conversation

tstack commented Sep 24, 2025

Uh oh!

ttiimm left a comment

Choose a reason for hiding this comment

Uh oh!

ttiimm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ttiimm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ttiimm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

tstack Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

ttiimm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

tstack Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

tstack commented Sep 25, 2025

Uh oh!

ttiimm commented Sep 25, 2025

Uh oh!

Uh oh!

ttiimm commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants