Skip to content

fix: tolerate non-standard Excel formats (issue #136)#137

Open
IDKHowToCodeFR wants to merge 1 commit into
harmonydata:mainfrom
IDKHowToCodeFR:fix/issue-136-tolerant-excel-parsing
Open

fix: tolerate non-standard Excel formats (issue #136)#137
IDKHowToCodeFR wants to merge 1 commit into
harmonydata:mainfrom
IDKHowToCodeFR:fix/issue-136-tolerant-excel-parsing

Conversation

@IDKHowToCodeFR

@IDKHowToCodeFR IDKHowToCodeFR commented Jun 11, 2026

Copy link
Copy Markdown
  • Drop fully blank rows before column assignment
  • Add semantic column detection across header row
  • Fall back to longest-string heuristic when no keyword match
  • Add regression test with attached wellbeing-scales-list.xlsx

Description

⚠️ Please check which files you are pushing! If there is any file where you have just changed whitespace, or changed " to ', etc, please delete it from your pull request. If you can limit the number of files that you modify in your PR to just what is strictly necessary makes it much simpler to track the edits to the project, and also makes things easier to merge your changes if two people work on the project simultaneously and their changes have to be combined.

Excel files with non-standard column headers or leading blank rows failed to parse, returning empty instruments or incorrect column mappings. The fix adds blank row stripping and keyword-based column detection before falling back to the existing positional logic. No new dependencies are introduced.

Files changed:

  • src/harmony/parsing/excel_parser.py — core fix
  • tests/test_excel_tolerant.py — regression test
  • tests/wellbeing-scales-list.xlsx — test fixture from the issue

Fixes #136

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Testing

Added regression test tests/test_excel_tolerant.py using the exact file attached in issue #136 (wellbeing-scales-list.xlsx).

To reproduce:

uv venv
uv pip install -e ".[dev]"
uv run pytest tests/test_excel_tolerant.py -v

Full suite run:

uv run pytest --ignore=tests/test_convert_pdf.py --ignore=tests/test_url_loader.py
Result: 117 passed, 1 skipped

test_convert_pdf.py and test_url_loader.py are pre-existing failures requiring Java/Tika which is not installed on this machine. Neither is related to this change.

Did not run harmonyapi tests locally as they require Java/Tika. The change is limited to column detection logic in excel_parser.py and does not alter any function signatures, return types, or the Instrument schema.

  • Test A: uv run pytest tests/test_excel_tolerant.py -v — PASSED

Test Configuration

  • Library version: 1.0.7
  • OS: Windows 11
  • Toolchain: Python 3.12.13, uv, pytest 9.0.3

Checklist

  • My PR is for one issue, rather than for multiple unrelated fixes.
  • My code follows the style guidelines of this project. I have applied a Linter (recommended: Pycharm's code formatter) to make my whitespace consistent with the rest of the project.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings
  • The Harmony API is not broken by my change to the Harmony Python library
  • I add third party dependencies only when necessary. If I changed the requirements, it changes in requirements.txt, pyproject.toml and also in the requirements.txt in the API repo
  • If I introduced a new feature, I documented it (e.g. making a script example in the script examples repository so that people will know how to use it.

discordapp.com/users/ifdkhowtochatfr

- Drop fully blank rows before column assignment
- Add semantic column detection across header row
- Fall back to longest-string heuristic when no keyword match
- Add regression test with attached wellbeing-scales-list.xlsx

Closes harmonydata#136
@IDKHowToCodeFR

Copy link
Copy Markdown
Author

Umm I made q mistake in my discord username it's
ifdkhowtochatfr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tolerate different formats of Excel

1 participant