-
Notifications
You must be signed in to change notification settings - Fork 54
docs: add migration guides and tutorial #999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d91b322
a4e274b
5288076
6136b9b
28321a5
e58b689
feea3f9
e382094
05a3aa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,236 @@ | ||||||||||||||||
| (migration-guide)= | ||||||||||||||||
|
|
||||||||||||||||
| # Migration Guide | ||||||||||||||||
|
|
||||||||||||||||
| This page is meant to help migrate your codebase to an Array API compliant | ||||||||||||||||
| implementation. The guide is divided into two parts and, depending on your | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be changed since there are actually three parts, the first being |
||||||||||||||||
| exact use-case, you should look thoroughly into at least one of them. | ||||||||||||||||
|
Comment on lines
+6
to
+7
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit, seems unnecessary to me
Suggested change
|
||||||||||||||||
|
|
||||||||||||||||
| The first part is dedicated for {ref}`array-producers`. If your library | ||||||||||||||||
| mimics, for example, NumPy's or Dask's functionality, then you can find in | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. minor nit, Dask is probably a strange choice here since itself is quite firmly mimicking NumPy. Maybe PyTorch would be a better example of where the standard took influence to differ from (historical) NumPy |
||||||||||||||||
| the first part additional instructions and guidance on how to ensure | ||||||||||||||||
| downstream users can easily pick your solution as an array provider for | ||||||||||||||||
| their system/algorithm. | ||||||||||||||||
|
|
||||||||||||||||
| The second part delves into details for Array API compatibility for | ||||||||||||||||
| {ref}`array-consumers`. This pertains to any software that performs | ||||||||||||||||
| multidimensional array manipulation in Python, such as may be found in | ||||||||||||||||
| scikit-learn, SciPy, or statsmodels. If your software relies on a certain | ||||||||||||||||
| array producing library, such as NumPy or JAX, then you can use the second | ||||||||||||||||
| part to learn how to make it library agnostic and interchange array | ||||||||||||||||
| namespaces with significantly less friction. | ||||||||||||||||
|
Comment on lines
+20
to
+21
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is "interchange array namespaces" really what the second part is helping with? To me it seems focused on making functions agnostic |
||||||||||||||||
|
|
||||||||||||||||
| ## Ecosystem | ||||||||||||||||
|
|
||||||||||||||||
| Apart from the documented standard, the Array API ecosystem also provides | ||||||||||||||||
| a set of tools and packages to help you with the migration process: | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-api-compat)= | ||||||||||||||||
|
|
||||||||||||||||
| ### Array API Compat | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is the first I have seen where we refer to these names with capitalisation and spaces? https://data-apis.org/array-api-extra/ for example deliberately keeps the styling lower-case and with hyphens. Especially since these libraries are developer-facing, I think it makes sense to keep the style of names consistent with how they are distributed as packages. |
||||||||||||||||
|
|
||||||||||||||||
| GitHub: [array-api-compat](https://github.com/data-apis/array-api-compat) | ||||||||||||||||
|
|
||||||||||||||||
| User group: Array Consumers | ||||||||||||||||
|
|
||||||||||||||||
| Although NumPy, Dask, CuPy, and PyTorch support the Array API Standard, there | ||||||||||||||||
| are still some corner cases where their behavior diverges from the standard. | ||||||||||||||||
| `array-api-compat` provides a compatibility layer to cover these cases. | ||||||||||||||||
|
Comment on lines
+37
to
+39
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is false and conflicts with https://data-apis.org/array-api/draft/purpose_and_scope.html#conformance. Sure, it's unreasonable to require 100% conformance to say that a library 'supports the standard', but especially in the cases of Dask and PyTorch there are places where array-api-compat still has to do significant work to add missing support |
||||||||||||||||
| This is also accompanied by a few utility functions for easier introspection | ||||||||||||||||
| into array objects. As an array consumer, you can still rely on the original | ||||||||||||||||
| API while having access to the standard compatible one. | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-api-strict)= | ||||||||||||||||
|
|
||||||||||||||||
| ### Array API Strict | ||||||||||||||||
|
|
||||||||||||||||
| GitHub: [array-api-strict](https://github.com/data-apis/array-api-strict) | ||||||||||||||||
|
|
||||||||||||||||
| User group: Array Consumers, Array Producers (for testing) | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see other comment on the inclusion of array producers here |
||||||||||||||||
|
|
||||||||||||||||
| `array-api-strict` is a library that provides a strict and minimal | ||||||||||||||||
| implementation of the Array API Standard. For array producers, it is designed | ||||||||||||||||
| to be used as a reference implementation for testing and development purposes. | ||||||||||||||||
| You can compare your API calls with `array-api-strict` counterparts and | ||||||||||||||||
| ensure that your library is fully compliant with the standard and can | ||||||||||||||||
| serve as a reliable reference for other developers in the ecosystem. | ||||||||||||||||
|
Comment on lines
+54
to
+58
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would probably omit this part. The primary source of truth is the standard, with the second being array-api-tests. I don't think array-api-strict provides a particularly useful way to "ensure that your library is fully compliant", aside from comparing every permutation of argument combinations on all functions. Do any of the libraries adopting the standard actually have infrastructure that compares results against array-api-strict? I was under the assumption that they all just use array-api-tests. (The exception to that is libraries which are inherently both consumers and producers, like https://mdhaber.github.io/marray/intro.html, but I don't think such libraries are the target audience of this doc.) EDIT: ahh, I see now below the section on testing against array-api-strict for producers. Yeah, especially since that is marked as not the recommended way, I think it is best to omit it in this section here. Makes sense to point out to producers in their section that it is something that might be worth doing, but I don't think it should be included in the description of array-api-strict's purpose. |
||||||||||||||||
| For consumers, you can use `array-api-strict` during the development as an | ||||||||||||||||
| array provider to ensure your code uses APIs compliant with the standard. | ||||||||||||||||
|
Comment on lines
+59
to
+60
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would probably be explicit about parametrising tests with it as an array namespace, see https://lucascolley.github.io/talks/pydata-paris-25-array-api/#/5/10. 'using during development' is maybe a bit opaque for someone who is new to the standard. |
||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-api-tests)= | ||||||||||||||||
|
|
||||||||||||||||
| ### Array API Test | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
|
||||||||||||||||
| GitHub: [array-api-tests](https://github.com/data-apis/array-api-tests) | ||||||||||||||||
|
|
||||||||||||||||
| User group: Array Producers | ||||||||||||||||
|
|
||||||||||||||||
| `array-api-tests` is a collection of tests that can be used to verify the | ||||||||||||||||
| compliance of your library with the Array API Standard. It includes tests | ||||||||||||||||
| for array producers, covering a wide range of functionalities and use cases. | ||||||||||||||||
| By running these tests, you can ensure that your library adheres to the | ||||||||||||||||
| standard and can be used with compatible array consumer libraries. | ||||||||||||||||
|
Comment on lines
+74
to
+75
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah this is good! But it sounds strange that we're mentioning multiple different ways to ensure conformance, hence why I think we should remove the mention for array-api-strict. Ensuring once should be enough! (of course, you can never be 100% sure, but I assume we're writing assuming an ideal state of usage and maintenance of the core packages) |
||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-api-extra)= | ||||||||||||||||
|
|
||||||||||||||||
| ### Array API Extra | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does not mention tools for lazy backends (https://lucascolley.github.io/talks/pydata-paris-25-array-api/#/5/13) and |
||||||||||||||||
|
|
||||||||||||||||
| GitHub: [array-api-extra](https://github.com/data-apis/array-api-extra) | ||||||||||||||||
|
|
||||||||||||||||
| User group: Array Consumers | ||||||||||||||||
|
|
||||||||||||||||
| `array-api-extra` is a collection of additional utilities and tools that are | ||||||||||||||||
| missing from the Array API Standard but can be useful for compliant array | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 'missing' perhaps sounds a bit negative, I'm not sure what I would say instead but this seems to imply that, in an ideal state, everything in array-api-extra would be in the standard itself. I don't think that is true. |
||||||||||||||||
| consumers. It includes additional array manipulation and statistical functions. | ||||||||||||||||
| It is already used by SciPy and scikit-learn. | ||||||||||||||||
|
|
||||||||||||||||
| The sections below mention when and how to use them. | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would move this up above the individual library sections otherwise it looks like this is under the array-api-extra section |
||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-producers)= | ||||||||||||||||
|
|
||||||||||||||||
| ## Array Producers | ||||||||||||||||
|
|
||||||||||||||||
| For array producers, the central task during the development/migration process | ||||||||||||||||
| is ensuring that the user-facing API adheres to the Array API Standard. | ||||||||||||||||
|
|
||||||||||||||||
| The complete API of the standard is documented in the | ||||||||||||||||
| [API specification](https://data-apis.org/array-api/latest/API_specification/index.html). | ||||||||||||||||
|
|
||||||||||||||||
| There, each function, constant, and object is described with details | ||||||||||||||||
| on parameters, return values, and special cases. | ||||||||||||||||
|
|
||||||||||||||||
| ### Testing against Array API | ||||||||||||||||
|
|
||||||||||||||||
| There are two main ways to test your API for compliance: either using | ||||||||||||||||
| `array-api-tests` suite or testing your API manually against the | ||||||||||||||||
| `array-api-strict` reference implementation. | ||||||||||||||||
|
|
||||||||||||||||
| #### Array API Test suite (Recommended) | ||||||||||||||||
|
|
||||||||||||||||
| {ref}`array-api-tests` is a test suite which verifies that your API | ||||||||||||||||
| adheres to the standard. For each function or method, it confirms | ||||||||||||||||
| it's importable, verifies the signature, generates multiple test | ||||||||||||||||
| cases with the [hypothesis](https://hypothesis.readthedocs.io/en/latest/) | ||||||||||||||||
| package, and runs assertions on the outputs. | ||||||||||||||||
|
|
||||||||||||||||
| The setup details are enclosed in the GitHub repository, so here we | ||||||||||||||||
| cover only the minimal workflow: | ||||||||||||||||
|
|
||||||||||||||||
| 1. Install your package (e.g., in editable mode). | ||||||||||||||||
| 2. Clone `array-api-tests`, and set the `ARRAY_API_TESTS_MODULE` environment | ||||||||||||||||
| variable to your package import name. | ||||||||||||||||
| 3. Inside the `array-api-tests` directory run the command for running pytest: `pytest`. There are | ||||||||||||||||
| multiple useful options delivered by the test suite. A few worth mentioning: | ||||||||||||||||
| - `--max-examples=1000` - maximal number of test cases to generate when using | ||||||||||||||||
| hypothesis. This allows you to balance between execution time of the test | ||||||||||||||||
ev-br marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||
| suite and thoroughness of the testing. It's advised to use as many examples | ||||||||||||||||
| as the time buget can fit. Each test case is a random combination of | ||||||||||||||||
| possible inputs: the more cases, the higher chance of finding an | ||||||||||||||||
| unsupported edge case. | ||||||||||||||||
| - With the `--xfails-file` option, you can describe which tests are expected | ||||||||||||||||
| to fail. It's impossible to get the whole API perfectly implemented on a | ||||||||||||||||
| first try, so tracking what still fails gives you more control over the | ||||||||||||||||
| state of your API. | ||||||||||||||||
| - `-o xfail_strict=<bool>` is often used with the previous option. If a test | ||||||||||||||||
| expected to fail actually passes (`XPASS`), then you can decide whether | ||||||||||||||||
| to ignore that fact or raise it as an error. | ||||||||||||||||
| - `--skips-file` for skipping tests. At times, some failing tests might stall | ||||||||||||||||
| the execution time of the test suite. In that case, the most convenient | ||||||||||||||||
| option is to skip these for the time being. | ||||||||||||||||
|
|
||||||||||||||||
| We strongly advise you to embed this setup in your CI as well. This will allow | ||||||||||||||||
| you to continuously monitor Array API coverage, and make sure new changes don't break existing | ||||||||||||||||
| APIs. As a reference, see [NumPy's Array API Tests CI setup](https://github.com/numpy/numpy/blob/581d10f43b539a189a2d37856e5130464de9e5f6/.github/workflows/linux.yml#L296). | ||||||||||||||||
|
|
||||||||||||||||
|
Comment on lines
+146
to
+149
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I was a library developer reading this, I would be wishing there was also a link to how to set this up within a Pixi workspace, like https://github.com/mdhaber/mparray/blob/0ef47e008fef92c605f73907436d4c6617419161/pixi.toml#L119-L179 |
||||||||||||||||
|
|
||||||||||||||||
| #### Array API Strict | ||||||||||||||||
|
|
||||||||||||||||
| A simpler, and more manual, way of testing Array API coverage is to | ||||||||||||||||
| run your API calls along with the {ref}`array-api-strict` Python implementation. | ||||||||||||||||
|
|
||||||||||||||||
| This way, you can ensure that the outputs coming from your API match the minimal | ||||||||||||||||
| reference implementation. Bear in mind, however, that you need to write | ||||||||||||||||
| the tests cases yourself, so you need to also take into account any applicable edge | ||||||||||||||||
| cases. | ||||||||||||||||
|
|
||||||||||||||||
|
|
||||||||||||||||
| (array-consumers)= | ||||||||||||||||
|
|
||||||||||||||||
| ## Array Consumers | ||||||||||||||||
|
|
||||||||||||||||
| For array consumers, the main premise is to keep in mind that your **array | ||||||||||||||||
| manipulation operations should not lock in for a particular array producing | ||||||||||||||||
| library**. For instance, if you use NumPy for arrays, then your code could | ||||||||||||||||
| contain: | ||||||||||||||||
|
Comment on lines
+166
to
+169
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||
|
|
||||||||||||||||
| ```python | ||||||||||||||||
| import numpy as np | ||||||||||||||||
|
|
||||||||||||||||
| # ... | ||||||||||||||||
| b = np.full(shape, val, dtype=dtype) @ a | ||||||||||||||||
| c = np.mean(a, axis=0) | ||||||||||||||||
| return np.dot(c, b) | ||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit, optional: are you deliberately showing the transformation from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I added a note that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why tensordot and not vecdot?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general I wanted to show a case where a given function isn't present in the standard and we need to switch to something else. The code sample is small enough that you can't tell which one is the right choice: assuming that If we want to be strict in this code example I can make it unambiguous.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. May be worth more explicitly spelling what you describe above out.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure - I'll make an assumption about |
||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| The first step should be as simple as assigning the `np` namespace to a dedicated | ||||||||||||||||
| namespace variable. The convention used in the ecosystem is to name it `xp`. Then, | ||||||||||||||||
| it is vital to ensure that each method and function call is something that the Array API | ||||||||||||||||
| supports. For example, `dot` is present in the NumPy's API, but the standard | ||||||||||||||||
| doesn't support it. For the sake of simplicity, let's assume both `c` and `b` | ||||||||||||||||
| are `ndim=2`; therefore, we select `tensordot` instead, as both NumPy and the | ||||||||||||||||
| standard define it: | ||||||||||||||||
|
|
||||||||||||||||
| ```python | ||||||||||||||||
| import numpy as np | ||||||||||||||||
|
|
||||||||||||||||
| xp = np | ||||||||||||||||
|
|
||||||||||||||||
| # ... | ||||||||||||||||
| b = xp.full(shape, val, dtype=dtype) @ a | ||||||||||||||||
| c = xp.mean(a, axis=0) | ||||||||||||||||
| return xp.tensordot(c, b, axes=1) | ||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| At this point, replacing one backend with another one should only require providing a different | ||||||||||||||||
| namespace, such as `xp = torch` (e.g., via an environment variable). This can be useful | ||||||||||||||||
| if you're writing a script or in your custom software. The other alternatives are: | ||||||||||||||||
|
|
||||||||||||||||
| - If you are building a library where the backend is determined by input arrays, | ||||||||||||||||
| and your function accepts array arguments, then a recommended way is to ask | ||||||||||||||||
| your input arrays for a namespace to use: `xp = arr.__array_namespace__()`. | ||||||||||||||||
| If the given library doesn't have it, then [`array_api_compat.array_namespace()`](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace) | ||||||||||||||||
| should be used instead: | ||||||||||||||||
|
Comment on lines
+203
to
+207
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should skip the idealism and just recommend |
||||||||||||||||
| ```python | ||||||||||||||||
| def func(array1, scalar1, scalar2): | ||||||||||||||||
| xp = array1.__array_namespace__() # or array_namespace(array1) | ||||||||||||||||
| return xp.arange(scalar1, scalar2) @ array1 | ||||||||||||||||
| ``` | ||||||||||||||||
| - For a function that accepts scalars and returns arrays, use namespace `xp` as | ||||||||||||||||
| a parameter in the signature. Enforcing objects to have the same type as the | ||||||||||||||||
| provided backend can then be achieved with `arg1 = xp.asarray(arg1)` for each input: | ||||||||||||||||
| ```python | ||||||||||||||||
| def func(s1, s2, xp): | ||||||||||||||||
| return xp.arange(s1, s2) | ||||||||||||||||
|
Comment on lines
+217
to
+218
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this example isn't particularly compelling, since it is something that is already achievable more easily on the user-side. I think it is worth a sentence stating when this may be worth it (e.g. there may be significant computation that you want to happen native to the array library before returning, https://docs.scipy.org/doc/scipy/reference/generated/scipy.fft.fftfreq.html) versus when it probably isn't (e.g. just trivially wrapping a value with |
||||||||||||||||
| ``` | ||||||||||||||||
|
|
||||||||||||||||
| If you're relying on NumPy, CuPy, PyTorch, Dask, or JAX then | ||||||||||||||||
| {ref}`array-api-compat` can come in handy for the transition. The compat layer | ||||||||||||||||
| allows you to still rely on your preferred array producing library, while | ||||||||||||||||
| making sure you're already using standard compatible API. Additionally, it | ||||||||||||||||
| offers a set of useful utility functions, such as: | ||||||||||||||||
|
|
||||||||||||||||
| - [array_namespace()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.array_namespace) | ||||||||||||||||
| for fetching the namespace based on input arrays. | ||||||||||||||||
| - [is_array_api_obj()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.is_array_api_obj) | ||||||||||||||||
| for inspecting whether a given object is Array API compatible. | ||||||||||||||||
| - [device()](https://data-apis.org/array-api-compat/helper-functions.html#array_api_compat.device) | ||||||||||||||||
| for retrieving the device on which an array resides. | ||||||||||||||||
|
|
||||||||||||||||
| For now, the migration from a specific library (e.g., NumPy) to a standard | ||||||||||||||||
| compatible setup requires a manual intervention for each failing API call, | ||||||||||||||||
| but, in the future, we're hoping to provide tools for automating the migration process. | ||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here and throughout (maybe dropping 'standard' makes sense in some cases, but see #778 )