feat: Add HTML representation #2236

katosh · 2025-11-29T20:01:57Z

Rich HTML representation for AnnData

Closes HTML Repr #675
Tests added
Release note added

Summary

Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.

Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)

Screenshot

Features

Interactive Display

Foldable sections with auto-collapse for large datasets
Search/filter with regex and case-sensitive toggles
Copy-to-clipboard for field names
Nested AnnData expansion with configurable depth
.raw section showing unprocessed data (Report n_vars of .raw in __repr__ #349)

Visual Indicators

Category colors from uns palettes (e.g., cell_type_colors)
Type badges for views, backed mode, sparse matrices, Dask arrays
Serialization warnings for data that won't write to H5AD/Zarr
Value previews for simple uns values
README support via modal (renders markdown from uns["README"])
Memory info in footer

Serialization Warnings

Proactively warns about data that won't serialize:

Level	Issue	Related
🔴 Error	datetime64/timedelta64 in obs/var	#455, #2238
🔴 Error	Non-string keys	#321
🔴 Error	Object columns with dicts/lists/custom objects	#1923, #567, #636
🔴 Error	Non-serializable types in uns
🟡 Warning	Keys with `/` (deprecated)	#1447, #2099
🟡 Warning	String→categorical auto-conversion	#534, #926

Compatibility

Dark mode auto-detection (Jupyter Lab/VS Code)
No-JS fallback with graceful degradation
JupyterLab safe - CSS scoped to .anndata-repr prevents style conflicts
Lazy-loading safe - configurable partial loading for read_lazy() (categories, colors)
Zero dependencies added

Extensibility

Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):

TypeFormatter - Custom visualization for value types
SectionFormatter - Add new sections (e.g., obst/vart, mod)
Building blocks - CSS/JS/helpers for packages needing full control

See the Reviewer's Guide for examples and API documentation.

Testing

457 unit tests organized by responsibility (core, sections, formatters, UI, warnings, registry, lazy, Jupyter compatibility)
HTMLValidator for structured HTML assertions (section-aware, no external dependencies)
Visual test cases: python tests/visual_inspect_repr_html.py

Supersedes Draft for AnnData html repr #784, Initial draft of AnnData HTML repr #694, WIP: add _repr_html_() method to AnnData for nicer rendering in Jupyter #521, [draft] html repr #346 (previous drafts)
Compatible with feat: remove sparse data scipy inheritance #1927 (sparse scipy changes), feat: array-api compatibility #2063 (Array-API)
Fully backward compatible

Acknowledgments

Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.

Technical Notes and Edits

Lazy Loading

Constants are in _repr_constants.py (outside _repr/) to prevent loading ~6K lines on import anndata. The full module loads only when _repr_html_() is called.

Config Changes

pyproject.toml: Added vart to codespell ignore list (TreeData section name).

Edit (Dec 27, 2024)

To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.

What changed:

Exported building blocks - CSS, JavaScript, and rendering helpers for external packages to build custom reprs while reusing anndata's styling
.raw section - Expandable row showing unprocessed data (Report n_vars of .raw in __repr__ #349)
Enhanced serialization warnings - Extended to cover datetime64, non-string keys, slashes in keys, and all sections
Regex search - Case-sensitive and regex toggles for filtering
Robust error handling - Failed sections show visible error indicators instead of being silently hidden

Edit (Jan 4, 2025)

Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.

Code refactoring:

Split html.py into focused modules for maintainability
UI components extracted to components.py (badges, buttons, icons)
Section renderers moved to sections.py (obs/var, mapping, uns, raw)
Shared rendering primitives extracted to core.py (avoids circular imports)
Preview utilities moved to utils.py
FormatterContext consolidates all 6 rendering settings (read once at entry, propagated via context)
Result: html.py reduced from ~2100 to ~740 lines, clean import hierarchy

New features:

"Lazy" badge for read_lazy() AnnData objects (experimental) - indicates when obs/var are xarray-backed
Visual test for lazy AnnData (9b) - demonstrates lazy loading with (lazy) indicator on columns

Bug fixes:

Consistent meta column styling - all meta column text now uses adata-text-muted class for uniform appearance
Bytes index decoding - properly decode bytes values in index previews

Related issue discovered:

read_lazy() returns index values as byte-representation strings (e.g., "b'cell_0'" instead of "cell_0") - see ISSUE_READ_LAZY_INDEX.md

Edit (Jan 6, 2025)

Smart partial loading for read_lazy() AnnData:

Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting repr_html_max_lazy_categories (default: 100, set to 0 for metadata-only mode).

Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).

Edit (Jan 6, 2025 - continued)

FormattedOutput API and architecture:

Clean separation between formatters and renderers - formatters inspect data and produce complete FormattedOutput, renderers only receive FormattedOutput (never the original data).

The FormattedOutput dataclass fields were renamed to be self-documenting:

Old Field	New Field	Purpose
`meta_content`	`preview` (text) or `preview_html` (HTML)	Preview column content
`html_content` + `is_expandable=True`	`expanded_html`	Collapsible content below row
`html_content` + `is_expandable=False`	`preview_html`	Inline preview in preview column
`is_expandable`	Removed	Use `expanded_html is not None`
(new)	`type_html`	Custom HTML for type column (replaces `type_name` visually)

Naming convention: *_html suffix indicates raw HTML (caller responsible for escaping), plain text fields are auto-escaped.

UI/UX improvements:

Zebra striping for section rows (alternating background colors)
Expand buttons now use ▼/▲ arrows instead of ⋯/▲ for consistency
No borders between entries within sections (cleaner look)
Fixed button alignment - Expand and wrap buttons now align properly
Category list styling - explicit muted color ensures consistent appearance in nested contexts

Edit (Jan 7, 2025)

Test architecture overhaul:

Tests reorganized from a single file into 10 focused modules for maintainability and parallel execution:

File	Focus
`test_repr_core.py`	HTML validation, settings, badges
`test_repr_sections.py`	Section rendering (obs, var, uns, etc.)
`test_repr_formatters.py`	Type-specific formatters
`test_repr_ui.py`	Folding, colors, search, clipboard
`test_repr_warnings.py`	Serialization warnings
`test_repr_registry.py`	Plugin registry
`test_repr_lazy.py`	Lazy AnnData support
`test_html_validator.py`	HTMLValidator tests + Jupyter compatibility

HTMLValidator class (conftest.py) provides structured HTML assertions:

v = validate_html(html)
v.assert_section_exists("obs")
v.assert_section_contains_entry("obs", "batch")
v.assert_section_initially_collapsed("obs")  # or _not_initially_collapsed

Key features: regex-based (no dependencies), section-aware matching, exact attribute matching to avoid "obs" matching "obsm".

Optional strict validation when dependencies available:

validate_html5() - W3C HTML5 + ARIA (requires vnu)
validate_js() - JavaScript syntax (requires esprima)

Jupyter Notebook/Lab compatibility tests (13 new tests in TestJupyterNotebookCompatibility):

Validates CSS scoping, JavaScript isolation, unique IDs across multiple cells, and Jupyter dark mode support.

Bug fix: readme-modal-title ID is now unique per container to prevent ID collisions when multiple AnnData objects are displayed in the same notebook.

https://htmlpreview.github.io/?https://gist.githubusercontent.com/katosh/4a2399d1472c733b041ef8dfd5b489b9/raw/repr_html_visual_test.html

katosh · 2026-01-06T01:39:54Z

I feel like lazy loading might become more common (especially as datasets and number of modalities grow larger). So, I decided to take a pragmatic approach for the HTML repr rather than showing no category information at all.

The trade-off: For read_lazy() AnnData, we now do minimal, configurable partial loading to get richer category previews:

Only the first N category labels are read from storage (not the full column data or codes)
Only the corresponding N colors from .uns are loaded
Controlled via ad.settings.repr_html_max_lazy_categories (default: 100, set to 0 for zero disk I/O)

Why not avoid all loading? Showing just "(50 categories)" is much less useful than seeing the actual category names with color swatches. The cost of reading a few category strings is small compared to the value of the preview.

Implementation: We access CategoricalArray._categories directly and use read_elem_partial() to read only what we need. This bypasses the @cached_property that would load all categories. See the design decision in the reviewer's guide for details.

Visual examples: See tests 8b (partial loading) and 8c (metadata-only mode) in the live demo.

katosh and others added 30 commits November 28, 2025 11:55

implement html representation

30a1e71

vizual inspection testing

5ce0afb

fix dark mode and nesting of htlm rep

9da45fe

handle disabled script in htlm rep

28292b9

more compact html rep

774c942

show categories in html rep

42ec6e6

dark mode and stability

73f0c5d

make max_cats configurable in html rep

292b4fc

test many cat and no JS for html rep

181b4d4

cnter folding icon in html rep

1dd4f18

max rows for counting n-unique in html rep

3db23cd

header coloring in html rep

d5974f6

max 20 categories in html rep

5cd1dd5

udpate many cats viz test of html rep

ef178c5

robust html rep for ad blocker

11949af

more tetsing of html rep

139f94d

future proof html rep

8a14312

htlm rep documentation

a64de45

show backed path inline in html rep

e7461f8

add custom uns rendering for html rep

966bb54

customizable section html rep

27b83f6

fix som html rep previews

bfe8221

better multi line categories in html rep

065eb2c

increase html rep testing

9505b63

formatt and style of html rep

8983df3

reduce complexity of html rep

bfe31eb

add "vart" to codespell's ignore-words-list

4b37eca

failed formatter wrnings in html

07632cf

explicit cleanup in html rep test

4c7ab7c

html rep aesthetics and formatting

c65952c

katosh added 5 commits January 4, 2026 15:16

consolidate html rep sttings in FormatterContext

43a34e1

call nunique only once in html rep

aa4ff38

"is lazy" badge for html rep and example

2ed985e

see "9b. Lazy AnnData (Experimental)" in

73bb56c

https://htmlpreview.github.io/?https://gist.githubusercontent.com/katosh/4a2399d1472c733b041ef8dfd5b489b9/raw/repr_html_visual_test.html

toc in html rep vizual test

2ee5f12

flying-sheep changed the title ~~Add HTML representation~~ feat: Add HTML representation Jan 5, 2026

katosh added 2 commits January 5, 2026 17:46

issue wrong decoding in lazy example by using newer numpy

5465ee8

richer html rep for fully lazy data

a9164e2

katosh added 11 commits January 6, 2026 02:45

cleanup html rep for lazy

8fe997d

move last column definition to formatters

0ffdcf0

refator FormattedOutput for clearity

298727a

style and zebra stripe section

a141ac2

rep module consolidations

bf93caf

fix ruff issues

3b9eb40

more features in 1st html rep viz example

ee000f2

no data object for renderers

e081319

consolidate renderers

c951807

more typing friendliness

3d694da

ruff formatting

aef9083

katosh mentioned this pull request Jan 6, 2026

feat: Efficient category count and partial loading for lazy AnnData #2283

Open

katosh added 8 commits January 6, 2026 20:32

add more serialzaition warning examples to test 23

2f0f4e1

test lazy _rep module loading and serialization warnings

ed80c5e

formatting

0330df9

reintroduce the number of columns in obsm/varm preview

87a011e

remove reudnant test and re-trigger ci

70530cc

splot html rep test and validate html

55efd3f

fix: unique ID for readme modal

3217630

explicitly test JupyterLab compatibility

34856ab

katosh mentioned this pull request Jan 7, 2026

feat: add get_categories and n_categories to CategoricalArray #2285

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add HTML representation #2236

feat: Add HTML representation #2236

katosh commented Nov 29, 2025 •

edited

Loading

Uh oh!

katosh commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add HTML representation #2236

Are you sure you want to change the base?

feat: Add HTML representation #2236

Conversation

katosh commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rich HTML representation for AnnData

Summary

Screenshot

Features

Interactive Display

Visual Indicators

Serialization Warnings

Compatibility

Extensibility

Testing

Related

Acknowledgments

Lazy Loading

Config Changes

Edit (Dec 27, 2024)

Edit (Jan 4, 2025)

Edit (Jan 6, 2025)

Edit (Jan 6, 2025 - continued)

Edit (Jan 7, 2025)

Uh oh!

katosh commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

katosh commented Nov 29, 2025 •

edited

Loading