Add parsers to extract only relevant data for complexes table in new PDBePISA#51
Conversation
Saving logic between XML->JSON and post-processing JSON->JSON parsers may as well use the same save login, including compression
|
@mihaitodor I still need to add some tests. But wanted to get your thoughts on the implementation first |
Joseph-Ellaway
left a comment
There was a problem hiding this comment.
Some comments on the justification for features
mihaitodor
left a comment
There was a problem hiding this comment.
LGTM, just a few small nits
There was a problem hiding this comment.
Pull request overview
This PR introduces a new post-processing layer that converts the existing “polished” assemblies JSON into a minimal JSON tailored for the new PDBePISA “Complexes” table, while also refactoring previously monolithic utilities into focused modules (CLI tools, file I/O helpers, and field handlers).
Changes:
- Added post-process models + a
PostProcessComplexTableparser to emit a minimalcomplex_table.jsonderived fromassemblies.json. - Refactored
pisa_utils/utils.pyintocli_tools.py,file_io.py, andfield_handlers.py, updating imports accordingly. - Added tests and expected-output fixtures for the new post-processing output.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_utils.py | Updates imports to match the new utility module split. |
| tests/parsers/test_post_process_parsers.py | Adds an integration-style test for the new complex-table post-processor. |
| tests/models/test_post_process_models.py | Adds model validation tests for the new post-process schema. |
| tests/data/expected_output/post_processed_jsons/3hax_assembly_multi_asmset_post_proc.json | Adds expected minimal JSON output fixture for complex-table extraction. |
| pisa_utils/utils.py | Removes the old utils module in favor of dedicated modules. |
| pisa_utils/run.py | Wires the new post-processing step into the service pipeline and updates CLI imports. |
| pisa_utils/run_pisa.py | Updates imports after moving file/config helpers to file_io.py. |
| pisa_utils/post_process_parsers.py | Adds the new post-processing parser implementation. |
| pisa_utils/parsers.py | Switches JSON saving/opening to shared file I/O helpers and moves field helpers to field_handlers.py. |
| pisa_utils/models/post_process_models.py | Adds Pydantic models for the minimal complex-table JSON. |
| pisa_utils/models/labels.py | Adds new label(s) and renames/adjusts interface-energy and interface-total label text. |
| pisa_utils/models/data_models.py | Reuses centralized Field helper definitions for complex-related fields. |
| pisa_utils/models/data_fields.py | Introduces centralized Pydantic Field helper functions for reuse across models. |
| pisa_utils/file_io.py | New shared helpers for gzip-aware open/save, XML parsing, config creation, etc. |
| pisa_utils/field_handlers.py | New module containing extracted field/identifier helpers and UniProt CIF lookup logic. |
| pisa_utils/dictionaries.py | Updates imports for read_uniprot_info after refactor. |
| pisa_utils/cli_tools.py | New module containing CLI arg parsing + validation formerly in utils.py. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Adds a new model and parser to convert the polished complex-data-containing JSON into a "minimal" JSON, with only data needed by the Complexes Tab on the new PDBePISA UI.
Save the FE from having to do lots of parsing each time and also allows us to add a "Save as CSV" or "Copy to clipboard" button anywhere we want.
ABC classes used as I'm planning to extend this into a new set of models/parsers so the API need only deliver the data the new PDBePISA website needs. The existing polished JSONs and XMLs will remain unchanged, and the endpoints that fetch them can still be supplied to the user.