Make .data() binding work with iterable protocol by bsidhom · Pull Request #321 · d3/d3-selection

bsidhom · 2024-12-18T03:19:15Z

The current data binding implementation relies on random array-style access, but only ever accesses data sequentially. Some data structures (such as linked lists) cannot easily expose random access. Moreover, since this specifically uses bracket indexing, it only accepts Arrays or first-class array-like objects. Making a custom data structure indexable this way requires either an expensive Proxy (indexes have to be round-tripped from integers to strings on every access) or else requires the custom class to add explicit integer properties for each contained item; this adds memory bloat linear in the size of the data structure itself and also requires unnecessary bookkeeping.

If random access were required, it might be reasonable to require some cheaply-implemented but customizable accessor method (see, for example Array.at()). However, since this is not the case, it makes more sense to instead only require that input data types be iterable. This opens up efficient implementations for structures such as singly-linked lists and other purely applicative structures. Backward compatibility can be maintained by adding a cheap wrapper to implement iteration in terms of sequential indexing.

The current data binding implementation relies on random array-style access, but only ever accesses data sequentially. Some data structures (such as linked lists) cannot easily expose random access. Moreover, since this specifically uses bracket indexing, it only accepts `Array`s or first-class array-like objects. Making a custom data structure indexable this way requires either an expensive `Proxy` (indexes have to be round-tripped from integers to strings on every access) or else requires the custom class to add explicit integer properties for each contained item; this adds memory bloat linear in the size of the data structure itself and also requires unnecessary bookkeeping. If random access were required, it might be reasonable to require some cheaply-implemented but customizable accessor method (see, for example Array.at()). However, since this is not the case, it makes more sense to instead only require that input data types be iterable. This opens up efficient implementations for structures such as singly-linked lists and other purely applicative structures. Backward compatibility can be maintained by adding a cheap wrapper to implement iteration in terms of sequential indexing.

bsidhom · 2024-12-18T03:22:28Z

Note that this is just a sketch of the suggested functionality. I have not added the shim required to make this backward compatible with code that implements random bracketed access but is not iterable.

There's another question as to whether .data() should also require a .length method or whether it's sufficient to compute that dynamically during iteration. As far as I can tell, this is essentially used as a proxy to infer whether we're dealing with an array-like object. Ideally, this method would only check for the Symbol.iterator property and use that for iteration; sadly, that breaks compatibility.

curran · 2024-12-18T14:54:47Z

I'm curious, what limitation of Arrays are you facing?

Is there something wrong with converting the data to an Array first?

bsidhom · 2024-12-28T23:44:57Z

The main issue is just the expense of repeatedly converting to arrays at each render step. I'm working on some visualizations either using fully persistent data structures (which heavily use linking and structural sharing) or of those same data structures. It turns out that visualizing the internals of the data structures themselves is fine because any given element should only have $O(1)$ links. However, it becomes quite expensive to convert the entire data structure to a temporary array every render/update. That work is then thrown out because the temporary array isn't even retained, but discarded once the underlying enter/update/exit arrays, etc. are created.

On the other hand, I'm starting to realize that using the d3 join verbs is not necessarily the best way to spell this given that I want to use incremental updates with structural updates. It might make sense to use d3 for display only and roll my own diff/incremental update operation that aligns more closely with my data structures.

In any case, switching data binding to use iteration rather than random access in the interface more closely aligns with what d3 is already doing under the hood, requires a weaker contract from callers, and makes it easier to adapt for custom data structures, with minimal computational and storage overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make .data() binding work with iterable protocol#321

Make .data() binding work with iterable protocol#321
bsidhom wants to merge 1 commit intod3:mainfrom
bsidhom:iterable

bsidhom commented Dec 18, 2024

Uh oh!

bsidhom commented Dec 18, 2024

Uh oh!

curran commented Dec 18, 2024

Uh oh!

bsidhom commented Dec 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

bsidhom commented Dec 18, 2024

Uh oh!

bsidhom commented Dec 18, 2024

Uh oh!

curran commented Dec 18, 2024

Uh oh!

bsidhom commented Dec 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants