-
Notifications
You must be signed in to change notification settings - Fork 179
feat: Add HTML representation #2236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I feel like lazy loading might become more common (especially as datasets and number of modalities grow larger). So, I decided to take a pragmatic approach for the HTML repr rather than showing no category information at all. The trade-off: For
Why not avoid all loading? Showing just "(50 categories)" is much less useful than seeing the actual category names with color swatches. The cost of reading a few category strings is small compared to the value of the preview. Implementation: We access Visual examples: See tests 8b (partial loading) and 8c (metadata-only mode) in the live demo. |
Rich HTML representation for AnnData
Summary
Implements rich HTML representation (
_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)
Screenshot
Features
Interactive Display
.rawsection showing unprocessed data (Reportn_varsof.rawin__repr__#349)Visual Indicators
unspalettes (e.g.,cell_type_colors)unsvaluesuns["README"])Serialization Warnings
Proactively warns about data that won't serialize:
/(deprecated)Compatibility
.anndata-reprprevents style conflictsread_lazy()(categories, colors)Extensibility
Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):
obst/vart,mod)See the Reviewer's Guide for examples and API documentation.
Testing
python tests/visual_inspect_repr_html.pyRelated
scipyinheritance #1927 (sparse scipy changes), feat: array-api compatibility #2063 (Array-API)Acknowledgments
Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.
Technical Notes and Edits
Lazy Loading
Constants are in
_repr_constants.py(outside_repr/) to prevent loading ~6K lines onimport anndata. The full module loads only when_repr_html_()is called.Config Changes
pyproject.toml: Addedvartto codespell ignore list (TreeData section name).Edit (Dec 27, 2024)
To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.
What changed:
.rawsection - Expandable row showing unprocessed data (Reportn_varsof.rawin__repr__#349)Edit (Jan 4, 2025)
Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.
Code refactoring:
html.pyinto focused modules for maintainabilitycomponents.py(badges, buttons, icons)sections.py(obs/var, mapping, uns, raw)core.py(avoids circular imports)utils.pyFormatterContextconsolidates all 6 rendering settings (read once at entry, propagated via context)html.pyreduced from ~2100 to ~740 lines, clean import hierarchyNew features:
read_lazy()AnnData objects (experimental) - indicates when obs/var are xarray-backed(lazy)indicator on columnsBug fixes:
adata-text-mutedclass for uniform appearanceRelated issue discovered:
read_lazy()returns index values as byte-representation strings (e.g.,"b'cell_0'"instead of"cell_0") - seeISSUE_READ_LAZY_INDEX.mdEdit (Jan 6, 2025)
Smart partial loading for
read_lazy()AnnData:Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting
repr_html_max_lazy_categories(default: 100, set to 0 for metadata-only mode).Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).
Edit (Jan 6, 2025 - continued)
FormattedOutput API and architecture:
Clean separation between formatters and renderers - formatters inspect data and produce complete
FormattedOutput, renderers only receiveFormattedOutput(never the original data).The
FormattedOutputdataclass fields were renamed to be self-documenting:meta_contentpreview(text) orpreview_html(HTML)html_content+is_expandable=Trueexpanded_htmlhtml_content+is_expandable=Falsepreview_htmlis_expandableexpanded_html is not Nonetype_htmltype_namevisually)Naming convention:
*_htmlsuffix indicates raw HTML (caller responsible for escaping), plain text fields are auto-escaped.UI/UX improvements:
▼/▲arrows instead of⋯/▲for consistencyEdit (Jan 7, 2025)
Test architecture overhaul:
Tests reorganized from a single file into 10 focused modules for maintainability and parallel execution:
test_repr_core.pytest_repr_sections.pytest_repr_formatters.pytest_repr_ui.pytest_repr_warnings.pytest_repr_registry.pytest_repr_lazy.pytest_html_validator.pyHTMLValidator class (
conftest.py) provides structured HTML assertions:Key features: regex-based (no dependencies), section-aware matching, exact attribute matching to avoid "obs" matching "obsm".
Optional strict validation when dependencies available:
validate_html5()- W3C HTML5 + ARIA (requiresvnu)validate_js()- JavaScript syntax (requiresesprima)Jupyter Notebook/Lab compatibility tests (13 new tests in
TestJupyterNotebookCompatibility):Validates CSS scoping, JavaScript isolation, unique IDs across multiple cells, and Jupyter dark mode support.
Bug fix:
readme-modal-titleID is now unique per container to prevent ID collisions when multiple AnnData objects are displayed in the same notebook.