CHORE: Exclude diagnostic LOG statements from coverage metrics#556
CHORE: Exclude diagnostic LOG statements from coverage metrics#556gargsaumya wants to merge 1 commit into
Conversation
📊 Code Coverage Report
Diff CoverageDiff: main...HEAD, staged and unstaged changesNo lines with coverage information in this diff. 📋 Files Needing Attention📉 Files with overall lowest coverage (click to expand)mssql_python.pybind.build._deps.simdutf-src.src.haswell.implementation.cpp: 0.4%
mssql_python.pybind.build._deps.simdutf-src.src.implementation.cpp: 6.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.implementation.h: 10.4%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.scalar.utf16_to_utf8.utf16_to_utf8.h: 25.3%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.internal.isadetection.h: 65.3%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.__init__.py: 77.3%🔗 Quick Links
|
bewithgaurav
left a comment
There was a problem hiding this comment.
A revamped take on the LOG-coverage gap
I called this exact gap out in the logging framework PR a few months ago: #312 (comment). at the time I rejected the LCOV markers approach for the same reasons that show up here: 150+ manual changes, clutters the codebase, hides executable code paths. coming back to it now I think there's a third option we didn't consider then that gets the same metric lift without any of those costs.
the 600 LCOV_EXCL_LINE markers and the python filter in generate_codecov.sh can be replaced with one tiny pre-build step plus one extra lcov flag. same coverage number, cleaner source tree, no helper scripts.
ran it end-to-end on macos arm64. full pytest suite including the logging integration tests:
PR's machinery: 3916/5079 = 77.10%
proposed flow: 3912/5052 = 77.43%
1733/1733 tests pass. logging still works at runtime, LoggerBridge gets normal coverage, nothing about behavior changes.
the proposal
- add a small script (
scripts/join_logs_for_coverage.py, ~50 lines) that joins multi-line LOG calls onto single lines. only used in the codecov build, never committed source changes. - in
mssql_python/pybind/build.shcodecov mode, snapshot the source, run the join, build, restore on EXIT viatrap(~12 lines). - in
generate_codecov.sh, swap the embedded python filter for--omit-lines '\bLOG[A-Z_]*\('on the existinglcov -aline.
joining preserves runtime behavior because adjacent string literals concatenate at compile time. the .so built from joined source is bit-for-bit equivalent to one built from the original.
what gets deleted
- all 600
// LCOV_EXCL_LINEmarkers from the cpp files add_lcov_exclusions.pyandfix_multiline_log_exclusions.pyat repo root- the embedded ~90-line
PYTHON_FILTERblock ingenerate_codecov.sh
tests/test_021_coverage_edge_cases.py and .gitignore additions stay.
small bonus
the helper scripts use 'LOG(' in line which misses LOG_ERROR( and LOG_WARNING(. 11 sites currently unmarked (9 LOG_ERROR + 1 LOG_WARNING + their continuations). the proposed regex catches them. that's the +0.33pp.
Replace manual LCOV_EXCL_LINE markers with lcov's built-in --omit-lines flag.
This approach is cleaner, more maintainable, and catches all LOG variants.
Changes:
- Add eng/scripts/join_logs_for_coverage.py to join multi-line LOG calls during coverage builds
- Modify build.sh to temporarily join LOG statements in codecov mode
- Replace Python filter in generate_codecov.sh with --omit-lines '\bLOG[A-Z_]*\('
- Update .gitignore to exclude local development scripts
Benefits:
- No source code clutter (600+ markers removed)
- Catches LOG_ERROR, LOG_WARNING, and all LOG variants
- Cleaner, more maintainable approach
- Source files remain unchanged in repository
Addresses review feedback from @bewithgaurav on PR #556
b662826 to
d9f3988
Compare
There was a problem hiding this comment.
Pull request overview
The PR is described as adding // LCOV_EXCL_LINE markers to ~284 LOG() statements across the C++ sources, but what is actually committed is a different mechanism: during coverage builds, source files are temporarily rewritten to collapse multi-line LOG(...) calls onto a single line, then lcov --omit-lines is used with a \bLOG[A-Z_]*\( regex to drop those lines from the merged report. Local helper scripts (including add_lcov_exclusions.py) and coverage artifacts are also added to .gitignore.
Changes:
build.sh(coverage mode): backs up.cpp/.hppfiles, invokeseng/scripts/join_logs_for_coverage.pyto collapse multi-lineLOGcalls, and restores originals via anEXITtrap.generate_codecov.sh: passes--omit-lines '\bLOG[A-Z_]*\('tolcovduring the merge step and tweaks the surrounding comments.- New
eng/scripts/join_logs_for_coverage.pythat walks*.cpp/*.hppand joins multi-lineLOGmacro invocations using naive paren counting. .gitignoreupdated with coverage outputs (*.profraw,*.profdata,*.info,htmlcov/, etc.) and several local-only helper scripts.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| mssql_python/pybind/build.sh | Adds coverage-mode pre-step that backs up/rewrites C++ source and restores it on exit. |
| generate_codecov.sh | Adds --omit-lines regex to lcov merge and updates comments. |
| eng/scripts/join_logs_for_coverage.py | New helper that joins multi-line LOG() calls onto one line. |
| .gitignore | Ignores coverage artifacts and several local experimental scripts. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Backup all .cpp and .hpp files | ||
| find . -maxdepth 2 -type f \( -name "*.cpp" -o -name "*.hpp" \) -exec cp {} "$BACKUP_DIR/" \; | ||
|
|
||
| # Set trap to restore source files on exit (success or failure) | ||
| trap 'echo "[CLEANUP] Restoring original source files"; cp -f "$BACKUP_DIR"/* "$ORIGINAL_DIR/" 2>/dev/null || true; rm -rf "$BACKUP_DIR"' EXIT |
| # Set trap to restore source files on exit (success or failure) | ||
| trap 'echo "[CLEANUP] Restoring original source files"; cp -f "$BACKUP_DIR"/* "$ORIGINAL_DIR/" 2>/dev/null || true; rm -rf "$BACKUP_DIR"' EXIT | ||
|
|
||
| # Join LOG statements using the helper script | ||
| SCRIPT_PATH="${ORIGINAL_DIR}/../../eng/scripts/join_logs_for_coverage.py" | ||
| if [[ -f "$SCRIPT_PATH" ]]; then | ||
| python3 "$SCRIPT_PATH" "$ORIGINAL_DIR" | ||
| echo "[SUCCESS] LOG statements joined for coverage build" | ||
| else | ||
| echo "[WARNING] join_logs_for_coverage.py not found at $SCRIPT_PATH" | ||
| echo "[WARNING] Continuing with original source (LOG filtering may be incomplete)" | ||
| fi |
| def join_log_statements(content: str) -> str: | ||
| """Join multi-line LOG macro calls onto a single line.""" | ||
| lines = content.split('\n') | ||
| result = [] | ||
| i = 0 | ||
|
|
||
| while i < len(lines): | ||
| line = lines[i] | ||
|
|
||
| # Check if this line contains a LOG macro start | ||
| if re.search(r'\bLOG[A-Z_]*\s*\(', line): | ||
| # Start collecting the full statement | ||
| full_statement = line | ||
| paren_depth = line.count('(') - line.count(')') | ||
| i += 1 | ||
|
|
||
| # Continue collecting until we close all parentheses | ||
| while i < len(lines) and paren_depth > 0: | ||
| next_line = lines[i] | ||
| full_statement += ' ' + next_line.strip() | ||
| paren_depth += next_line.count('(') - next_line.count(')') | ||
| i += 1 | ||
|
|
||
| # Add the joined statement | ||
| result.append(full_statement) | ||
| else: | ||
| result.append(line) | ||
| i += 1 | ||
|
|
Replace manual LCOV_EXCL_LINE markers with cleaner built-in lcov filtering.
This approach uses lcov's native exclusion mechanism and is more maintainable.
Changes:
- Add eng/scripts/join_logs_for_coverage.py to join multi-line LOG calls during coverage builds
- Modify build.sh to temporarily join LOG statements in codecov mode with automatic restore
- Use lcov --rc lcov_excl_line='\bLOG[A-Z_]*\s*\(' to exclude LOG macros from coverage
- Add llvm-cov ignore pattern for build/_deps/ (vendored simdutf sources from PR #526)
- Add lcov --remove for build/_deps/ as defense-in-depth (from PR #579)
- Update .gitignore to exclude local development scripts
Benefits:
- No source code clutter (600+ markers not needed)
- Catches all LOG variants (LOG_ERROR, LOG_WARNING, etc.)
- Excludes vendored third-party dependencies from coverage metrics
- Cleaner, more maintainable approach using lcov native features
- Source files remain unchanged in repository
Addresses review feedback from @bewithgaurav on PR #556
Includes changes from PR #579 to fix simdutf coverage pollution
2b284f0 to
af93e0d
Compare
Work Item / Issue Reference
Summary
This pull request introduces a new workflow for improving code coverage reporting by automatically joining multi-line
LOG()macro calls into single lines before coverage analysis. This makes it easier to exclude logging statements from LCOV coverage results without modifying the original source code. The changes include a new helper script, updates to the build script to use this helper during coverage builds, and adjustments to the coverage merge process to excludeLOGstatements.Coverage build improvements:
eng/scripts/join_logs_for_coverage.pythat joins multi-lineLOG()macro calls into single lines in.cppand.hppfiles, simplifying LCOV filtering for logging statements. This script operates on temporary copies of the source code and does not modify the originals.mssql_python/pybind/build.shto invoke the new helper script during coverage builds. The script backs up original source files, processes them to joinLOGstatements, and restores the originals after the build, ensuring the source code remains unchanged.Coverage reporting adjustments:
generate_codecov.shto use LCOV's--omit-linesoption with a regex that matches allLOGmacro calls (e.g.,LOG,LOG_ERROR,LOG_WARNING) during the coverage merge step, ensuring these statements are excluded from the final coverage report.