-
Notifications
You must be signed in to change notification settings - Fork 2
[lang] add support for python #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log. Files: * Cargo.toml: Add python tree-sitter crate. * Tasks.md: Use task markers so we can check them off. * lib.rs: Refactor the placeholder regex and add python to SourceLanguage. * source_ref.rs: The `escape_ignore_newlines()` needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.
ttiimm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you have time to write the raw string test case coverage? Otherwise, the changes look fine.
| - TSS: Doesn't this work already? I echo | ||
| the body of the log message into log2src | ||
| and it can find the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I tried it recently, but there was a panic when something was using the log format. Could we remove the log formats from the test code if they're unneeded by the test case?
| fn escape_ignore_newlines(raw: bool, segment: &str) -> String { | ||
| const HEX_CHARS: [char; 16] = [ | ||
| '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', | ||
| ]; | ||
|
|
||
| let mut result = String::with_capacity(segment.len() * 2); | ||
| for c in segment.chars() { | ||
| match c { | ||
| '\n' => result.push_str(r"\n"), // Use actual newline in regex | ||
| '\r' => result.push_str(r"\r"), // Handle carriage returns too | ||
| // Escape regex special chars | ||
| '.' | '+' | '*' | '?' | '^' | '$' | '(' | ')' | '[' | ']' | '{' | '}' | '|' => { | ||
| let mut last_end = 0; | ||
| let regex = if raw { | ||
| &RAW_ESCAPE_REGEX | ||
| } else { | ||
| &ESCAPE_REGEX | ||
| }; | ||
| for cap in regex.captures_iter(segment) { | ||
| let overall_range = cap.get(0).unwrap().range(); | ||
| result.push_str(segment[last_end..overall_range.start].as_ref()); | ||
| last_end = overall_range.end; | ||
| if let Some(c) = cap.get(1) { | ||
| result.push('\\'); | ||
| result.push_str(c.as_str()); | ||
| } else if let Some(c) = cap.get(2) { | ||
| match c.as_str() { | ||
| "\n" => result.push_str("\\n"), | ||
| "\r" => result.push_str("\\r"), | ||
| "\t" => result.push_str("\\t"), | ||
| _ => unreachable!(), | ||
| } | ||
| } else if let Some(c) = cap.get(3) { | ||
| if raw { | ||
| result.push('\\'); | ||
| result.push_str(c.as_str()); | ||
| } else { | ||
| let c = c.as_str(); | ||
| let c = &c[1..]; | ||
| let c = u8::from_str_radix(c, 8).unwrap(); | ||
| result.push('\\'); | ||
| result.push(c); | ||
| result.push('x'); | ||
| result.push(HEX_CHARS[(c >> 4) as usize]); | ||
| result.push(HEX_CHARS[(c & 0xf) as usize]); | ||
| } | ||
| _ => result.push(c), | ||
| } else if let Some(_c) = cap.get(4) { | ||
| // XXX This is the fancy Python "\N{...}" escape sequence. Ideally, we'd interpret the | ||
| // name of the escape, but that seems like a lot of work. So, we'll just match any | ||
| // character. | ||
| result.push_str("\\w"); | ||
| } else { | ||
| unreachable!(); | ||
| } | ||
| } | ||
| result.push_str(segment[last_end..].as_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any additional test coverage for the raw strings. Would you add some?
| @@ -0,0 +1,52 @@ | |||
| --- | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding test coverage here vs in the root test directory. I guess this tests the library pretty thoroughly end to end, so maybe those tests should focus more on the CLI features. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add tests in lib.rs so that I can use the debugger. The stuff under tests spawns a separate process and I don't think I was able to attach a debugger.
| static ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\[0-7]{3}|\\0)|(\\N\{[^}]+})"#).unwrap() | ||
| }); | ||
|
|
||
| // A regex for raw strings that doesn't try to interpret escape sequences. | ||
| static RAW_ESCAPE_REGEX: LazyLock<Regex> = LazyLock::new(|| { | ||
| Regex::new(r#"([.*+?^${}()|\[\]])|([\n\r\t])|(\\)"#).unwrap() | ||
| }); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These regexes, and the placeholder ones, are hurting my brain. I wish there was a more expressive way to write them. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some comments for now.
The |
I was looking for a case in the source_ref module and didn't see any. |
Does this work with Rust raw strings? |
Tested with https://github.com/nicolargo/glances since it is a relatively complex pyton program that generates a log.
Files:
escape_ignore_newlines()needs to convert octal escapes to hex. Also, handle raw strings since the backslashes need to be escaped.