Skip to content

Conversation

@katosh
Copy link

@katosh katosh commented Nov 29, 2025

Rich HTML representation for AnnData

Summary

Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.

Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)

Screenshot

screenshot2

Features

Interactive Display

  • Foldable sections with auto-collapse for large datasets
  • Search/filter with regex and case-sensitive toggles
  • Copy-to-clipboard for field names
  • Nested AnnData expansion with configurable depth
  • .raw section showing unprocessed data (Report n_vars of .raw in __repr__ #349)

Visual Indicators

  • Category colors from uns palettes (e.g., cell_type_colors)
  • Type badges for views, backed mode, sparse matrices, Dask arrays
  • Serialization warnings for data that won't write to H5AD/Zarr
  • Value previews for simple uns values
  • README support via modal (renders markdown from uns["README"])
  • Memory info in footer

Serialization Warnings

Proactively warns about data that won't serialize:

Level Issue Related
🔴 Error datetime64/timedelta64 in obs/var #455, #2238
🔴 Error Non-string keys #321
🔴 Error Object columns with dicts/lists/custom objects #1923, #567, #636
🔴 Error Non-serializable types in uns
🟡 Warning Keys with / (deprecated) #1447, #2099
🟡 Warning String→categorical auto-conversion #534, #926

Compatibility

  • Dark mode auto-detection (Jupyter Lab/VS Code)
  • No-JS fallback with graceful degradation
  • JupyterLab safe - CSS scoped to .anndata-repr prevents style conflicts
  • Lazy-loading safe - configurable partial loading for read_lazy() (categories, colors)
  • Zero dependencies added

Extensibility

Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):

  1. TypeFormatter - Custom visualization for value types
  2. SectionFormatter - Add new sections (e.g., obst/vart, mod)
  3. Building blocks - CSS/JS/helpers for packages needing full control

See the Reviewer's Guide for examples and API documentation.

Testing

  • 457 unit tests organized by responsibility (core, sections, formatters, UI, warnings, registry, lazy, Jupyter compatibility)
  • HTMLValidator for structured HTML assertions (section-aware, no external dependencies)
  • Visual test cases: python tests/visual_inspect_repr_html.py

Related

Acknowledgments

Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.


Technical Notes and Edits

Lazy Loading

Constants are in _repr_constants.py (outside _repr/) to prevent loading ~6K lines on import anndata. The full module loads only when _repr_html_() is called.

Config Changes

pyproject.toml: Added vart to codespell ignore list (TreeData section name).


Edit (Dec 27, 2024)

To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.

What changed:

  • Exported building blocks - CSS, JavaScript, and rendering helpers for external packages to build custom reprs while reusing anndata's styling
  • .raw section - Expandable row showing unprocessed data (Report n_vars of .raw in __repr__ #349)
  • Enhanced serialization warnings - Extended to cover datetime64, non-string keys, slashes in keys, and all sections
  • Regex search - Case-sensitive and regex toggles for filtering
  • Robust error handling - Failed sections show visible error indicators instead of being silently hidden

Edit (Jan 4, 2025)

Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.

Code refactoring:

  • Split html.py into focused modules for maintainability
  • UI components extracted to components.py (badges, buttons, icons)
  • Section renderers moved to sections.py (obs/var, mapping, uns, raw)
  • Shared rendering primitives extracted to core.py (avoids circular imports)
  • Preview utilities moved to utils.py
  • FormatterContext consolidates all 6 rendering settings (read once at entry, propagated via context)
  • Result: html.py reduced from ~2100 to ~740 lines, clean import hierarchy

New features:

  • "Lazy" badge for read_lazy() AnnData objects (experimental) - indicates when obs/var are xarray-backed
  • Visual test for lazy AnnData (9b) - demonstrates lazy loading with (lazy) indicator on columns

Bug fixes:

  • Consistent meta column styling - all meta column text now uses adata-text-muted class for uniform appearance
  • Bytes index decoding - properly decode bytes values in index previews

Related issue discovered:

  • read_lazy() returns index values as byte-representation strings (e.g., "b'cell_0'" instead of "cell_0") - see ISSUE_READ_LAZY_INDEX.md

Edit (Jan 6, 2025)

Smart partial loading for read_lazy() AnnData:

Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting repr_html_max_lazy_categories (default: 100, set to 0 for metadata-only mode).

Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).


Edit (Jan 6, 2025 - continued)

FormattedOutput API and architecture:

Clean separation between formatters and renderers - formatters inspect data and produce complete FormattedOutput, renderers only receive FormattedOutput (never the original data).

The FormattedOutput dataclass fields were renamed to be self-documenting:

Old Field New Field Purpose
meta_content preview (text) or preview_html (HTML) Preview column content
html_content + is_expandable=True expanded_html Collapsible content below row
html_content + is_expandable=False preview_html Inline preview in preview column
is_expandable Removed Use expanded_html is not None
(new) type_html Custom HTML for type column (replaces type_name visually)

Naming convention: *_html suffix indicates raw HTML (caller responsible for escaping), plain text fields are auto-escaped.

UI/UX improvements:

  • Zebra striping for section rows (alternating background colors)
  • Expand buttons now use / arrows instead of / for consistency
  • No borders between entries within sections (cleaner look)
  • Fixed button alignment - Expand and wrap buttons now align properly
  • Category list styling - explicit muted color ensures consistent appearance in nested contexts

Edit (Jan 7, 2025)

Test architecture overhaul:

Tests reorganized from a single file into 10 focused modules for maintainability and parallel execution:

File Focus
test_repr_core.py HTML validation, settings, badges
test_repr_sections.py Section rendering (obs, var, uns, etc.)
test_repr_formatters.py Type-specific formatters
test_repr_ui.py Folding, colors, search, clipboard
test_repr_warnings.py Serialization warnings
test_repr_registry.py Plugin registry
test_repr_lazy.py Lazy AnnData support
test_html_validator.py HTMLValidator tests + Jupyter compatibility

HTMLValidator class (conftest.py) provides structured HTML assertions:

v = validate_html(html)
v.assert_section_exists("obs")
v.assert_section_contains_entry("obs", "batch")
v.assert_section_initially_collapsed("obs")  # or _not_initially_collapsed

Key features: regex-based (no dependencies), section-aware matching, exact attribute matching to avoid "obs" matching "obsm".

Optional strict validation when dependencies available:

  • validate_html5() - W3C HTML5 + ARIA (requires vnu)
  • validate_js() - JavaScript syntax (requires esprima)

Jupyter Notebook/Lab compatibility tests (13 new tests in TestJupyterNotebookCompatibility):

Validates CSS scoping, JavaScript isolation, unique IDs across multiple cells, and Jupyter dark mode support.

Bug fix: readme-modal-title ID is now unique per container to prevent ID collisions when multiple AnnData objects are displayed in the same notebook.

@flying-sheep flying-sheep changed the title Add HTML representation feat: Add HTML representation Jan 5, 2026
@katosh
Copy link
Author

katosh commented Jan 6, 2026

I feel like lazy loading might become more common (especially as datasets and number of modalities grow larger). So, I decided to take a pragmatic approach for the HTML repr rather than showing no category information at all.

The trade-off: For read_lazy() AnnData, we now do minimal, configurable partial loading to get richer category previews:

  • Only the first N category labels are read from storage (not the full column data or codes)
  • Only the corresponding N colors from .uns are loaded
  • Controlled via ad.settings.repr_html_max_lazy_categories (default: 100, set to 0 for zero disk I/O)

Why not avoid all loading? Showing just "(50 categories)" is much less useful than seeing the actual category names with color swatches. The cost of reading a few category strings is small compared to the value of the preview.

Implementation: We access CategoricalArray._categories directly and use read_elem_partial() to read only what we need. This bypasses the @cached_property that would load all categories. See the design decision in the reviewer's guide for details.

Visual examples: See tests 8b (partial loading) and 8c (metadata-only mode) in the live demo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML Repr

3 participants