Skip to content

CHORE: Exclude diagnostic LOG statements from coverage metrics#556

Open
gargsaumya wants to merge 1 commit into
mainfrom
saumya/code-coverage
Open

CHORE: Exclude diagnostic LOG statements from coverage metrics#556
gargsaumya wants to merge 1 commit into
mainfrom
saumya/code-coverage

Conversation

@gargsaumya
Copy link
Copy Markdown
Contributor

@gargsaumya gargsaumya commented May 5, 2026

Work Item / Issue Reference

AB#44903

GitHub Issue: #<ISSUE_NUMBER>


Summary

This pull request introduces a new workflow for improving code coverage reporting by automatically joining multi-line LOG() macro calls into single lines before coverage analysis. This makes it easier to exclude logging statements from LCOV coverage results without modifying the original source code. The changes include a new helper script, updates to the build script to use this helper during coverage builds, and adjustments to the coverage merge process to exclude LOG statements.

Coverage build improvements:

  • Added a new script eng/scripts/join_logs_for_coverage.py that joins multi-line LOG() macro calls into single lines in .cpp and .hpp files, simplifying LCOV filtering for logging statements. This script operates on temporary copies of the source code and does not modify the originals.
  • Updated mssql_python/pybind/build.sh to invoke the new helper script during coverage builds. The script backs up original source files, processes them to join LOG statements, and restores the originals after the build, ensuring the source code remains unchanged.

Coverage reporting adjustments:

  • Modified generate_codecov.sh to use LCOV's --omit-lines option with a regex that matches all LOG macro calls (e.g., LOG, LOG_ERROR, LOG_WARNING) during the coverage merge step, ensuring these statements are excluded from the final coverage report.

@github-actions github-actions Bot added the pr-size: large Substantial code update label May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

📊 Code Coverage Report

🔥 Diff Coverage

100%


🎯 Overall Coverage

25%


📈 Total Lines Covered: 6734 out of 26527
📁 Project: mssql-python


Diff Coverage

Diff: main...HEAD, staged and unstaged changes

No lines with coverage information in this diff.


📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)
mssql_python.pybind.build._deps.simdutf-src.src.haswell.implementation.cpp: 0.4%
mssql_python.pybind.build._deps.simdutf-src.src.implementation.cpp: 6.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.implementation.h: 10.4%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.scalar.utf16_to_utf8.utf16_to_utf8.h: 25.3%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.pybind.ddbc_bindings.h: 59.7%
mssql_python.pybind.build._deps.simdutf-src.include.simdutf.internal.isadetection.h: 65.3%
mssql_python.row.py: 70.5%
mssql_python.pybind.logger_bridge.hpp: 70.8%
mssql_python.__init__.py: 77.3%

🔗 Quick Links

⚙️ Build Summary 📋 Coverage Details

View Azure DevOps Build

Browse Full Coverage Report

Copy link
Copy Markdown
Collaborator

@bewithgaurav bewithgaurav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A revamped take on the LOG-coverage gap

I called this exact gap out in the logging framework PR a few months ago: #312 (comment). at the time I rejected the LCOV markers approach for the same reasons that show up here: 150+ manual changes, clutters the codebase, hides executable code paths. coming back to it now I think there's a third option we didn't consider then that gets the same metric lift without any of those costs.

the 600 LCOV_EXCL_LINE markers and the python filter in generate_codecov.sh can be replaced with one tiny pre-build step plus one extra lcov flag. same coverage number, cleaner source tree, no helper scripts.

ran it end-to-end on macos arm64. full pytest suite including the logging integration tests:

PR's machinery:    3916/5079 = 77.10%
proposed flow:     3912/5052 = 77.43%

1733/1733 tests pass. logging still works at runtime, LoggerBridge gets normal coverage, nothing about behavior changes.

the proposal

  1. add a small script (scripts/join_logs_for_coverage.py, ~50 lines) that joins multi-line LOG calls onto single lines. only used in the codecov build, never committed source changes.
  2. in mssql_python/pybind/build.sh codecov mode, snapshot the source, run the join, build, restore on EXIT via trap (~12 lines).
  3. in generate_codecov.sh, swap the embedded python filter for --omit-lines '\bLOG[A-Z_]*\(' on the existing lcov -a line.

joining preserves runtime behavior because adjacent string literals concatenate at compile time. the .so built from joined source is bit-for-bit equivalent to one built from the original.

what gets deleted

  • all 600 // LCOV_EXCL_LINE markers from the cpp files
  • add_lcov_exclusions.py and fix_multiline_log_exclusions.py at repo root
  • the embedded ~90-line PYTHON_FILTER block in generate_codecov.sh

tests/test_021_coverage_edge_cases.py and .gitignore additions stay.

small bonus

the helper scripts use 'LOG(' in line which misses LOG_ERROR( and LOG_WARNING(. 11 sites currently unmarked (9 LOG_ERROR + 1 LOG_WARNING + their continuations). the proposed regex catches them. that's the +0.33pp.

gargsaumya added a commit that referenced this pull request May 19, 2026
Replace manual LCOV_EXCL_LINE markers with lcov's built-in --omit-lines flag.
This approach is cleaner, more maintainable, and catches all LOG variants.

Changes:
- Add eng/scripts/join_logs_for_coverage.py to join multi-line LOG calls during coverage builds
- Modify build.sh to temporarily join LOG statements in codecov mode
- Replace Python filter in generate_codecov.sh with --omit-lines '\bLOG[A-Z_]*\('
- Update .gitignore to exclude local development scripts

Benefits:
- No source code clutter (600+ markers removed)
- Catches LOG_ERROR, LOG_WARNING, and all LOG variants
- Cleaner, more maintainable approach
- Source files remain unchanged in repository

Addresses review feedback from @bewithgaurav on PR #556
@gargsaumya gargsaumya force-pushed the saumya/code-coverage branch from b662826 to d9f3988 Compare May 19, 2026 03:41
@github-actions github-actions Bot added pr-size: medium Moderate update size and removed pr-size: large Substantial code update labels May 19, 2026
@gargsaumya gargsaumya marked this pull request as ready for review May 19, 2026 03:42
Copilot AI review requested due to automatic review settings May 19, 2026 03:42
@gargsaumya gargsaumya changed the title CHORE: Add LCOV_EXCL_LINE markers to LOG() statements for coverage analysis CHORE: Exclude diagnostic LOG statements from coverage metrics May 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

The PR is described as adding // LCOV_EXCL_LINE markers to ~284 LOG() statements across the C++ sources, but what is actually committed is a different mechanism: during coverage builds, source files are temporarily rewritten to collapse multi-line LOG(...) calls onto a single line, then lcov --omit-lines is used with a \bLOG[A-Z_]*\( regex to drop those lines from the merged report. Local helper scripts (including add_lcov_exclusions.py) and coverage artifacts are also added to .gitignore.

Changes:

  • build.sh (coverage mode): backs up .cpp/.hpp files, invokes eng/scripts/join_logs_for_coverage.py to collapse multi-line LOG calls, and restores originals via an EXIT trap.
  • generate_codecov.sh: passes --omit-lines '\bLOG[A-Z_]*\(' to lcov during the merge step and tweaks the surrounding comments.
  • New eng/scripts/join_logs_for_coverage.py that walks *.cpp/*.hpp and joins multi-line LOG macro invocations using naive paren counting.
  • .gitignore updated with coverage outputs (*.profraw, *.profdata, *.info, htmlcov/, etc.) and several local-only helper scripts.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
mssql_python/pybind/build.sh Adds coverage-mode pre-step that backs up/rewrites C++ source and restores it on exit.
generate_codecov.sh Adds --omit-lines regex to lcov merge and updates comments.
eng/scripts/join_logs_for_coverage.py New helper that joins multi-line LOG() calls onto one line.
.gitignore Ignores coverage artifacts and several local experimental scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +47 to +51
# Backup all .cpp and .hpp files
find . -maxdepth 2 -type f \( -name "*.cpp" -o -name "*.hpp" \) -exec cp {} "$BACKUP_DIR/" \;

# Set trap to restore source files on exit (success or failure)
trap 'echo "[CLEANUP] Restoring original source files"; cp -f "$BACKUP_DIR"/* "$ORIGINAL_DIR/" 2>/dev/null || true; rm -rf "$BACKUP_DIR"' EXIT
Comment on lines +50 to +61
# Set trap to restore source files on exit (success or failure)
trap 'echo "[CLEANUP] Restoring original source files"; cp -f "$BACKUP_DIR"/* "$ORIGINAL_DIR/" 2>/dev/null || true; rm -rf "$BACKUP_DIR"' EXIT

# Join LOG statements using the helper script
SCRIPT_PATH="${ORIGINAL_DIR}/../../eng/scripts/join_logs_for_coverage.py"
if [[ -f "$SCRIPT_PATH" ]]; then
python3 "$SCRIPT_PATH" "$ORIGINAL_DIR"
echo "[SUCCESS] LOG statements joined for coverage build"
else
echo "[WARNING] join_logs_for_coverage.py not found at $SCRIPT_PATH"
echo "[WARNING] Continuing with original source (LOG filtering may be incomplete)"
fi
Comment on lines +15 to +43
def join_log_statements(content: str) -> str:
"""Join multi-line LOG macro calls onto a single line."""
lines = content.split('\n')
result = []
i = 0

while i < len(lines):
line = lines[i]

# Check if this line contains a LOG macro start
if re.search(r'\bLOG[A-Z_]*\s*\(', line):
# Start collecting the full statement
full_statement = line
paren_depth = line.count('(') - line.count(')')
i += 1

# Continue collecting until we close all parentheses
while i < len(lines) and paren_depth > 0:
next_line = lines[i]
full_statement += ' ' + next_line.strip()
paren_depth += next_line.count('(') - next_line.count(')')
i += 1

# Add the joined statement
result.append(full_statement)
else:
result.append(line)
i += 1

Replace manual LCOV_EXCL_LINE markers with cleaner built-in lcov filtering.
This approach uses lcov's native exclusion mechanism and is more maintainable.

Changes:
- Add eng/scripts/join_logs_for_coverage.py to join multi-line LOG calls during coverage builds
- Modify build.sh to temporarily join LOG statements in codecov mode with automatic restore
- Use lcov --rc lcov_excl_line='\bLOG[A-Z_]*\s*\(' to exclude LOG macros from coverage
- Add llvm-cov ignore pattern for build/_deps/ (vendored simdutf sources from PR #526)
- Add lcov --remove for build/_deps/ as defense-in-depth (from PR #579)
- Update .gitignore to exclude local development scripts

Benefits:
- No source code clutter (600+ markers not needed)
- Catches all LOG variants (LOG_ERROR, LOG_WARNING, etc.)
- Excludes vendored third-party dependencies from coverage metrics
- Cleaner, more maintainable approach using lcov native features
- Source files remain unchanged in repository

Addresses review feedback from @bewithgaurav on PR #556
Includes changes from PR #579 to fix simdutf coverage pollution
@gargsaumya gargsaumya force-pushed the saumya/code-coverage branch from 2b284f0 to af93e0d Compare May 19, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-size: medium Moderate update size

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants