Skip to content

Conversation

@ghukill
Copy link
Contributor

@ghukill ghukill commented Nov 26, 2024

Purpose and background context

This PR scaffolds the project as an installable python library. Very little beyond that at this point.

The repository is called timdex-dataset-api (dashes), but the importable library that is installed is timdex_dataset_api to be fully python compliant.

This library will be used by Transmogrifier, pipeline lambdas, and TIM when interacting with the TIMDEX parquet dataset in S3. Other contexts for usage may emerge, but these are known at this time.

This library will support basic operations like:

  • loading a dataset
  • writing to a dataset
  • reading from a dataset

For more details on why a standalone python library, please see associated parent ticket: https://mitlibraries.atlassian.net/browse/TIMX-408.

How can a reviewer manually see the effects of these changes?

Local development of library

Clone repo, then normal things, make test and make lint.

Install as library for another project

1- create a new, temporary python project

cd /tmp
mkdir my-app
cd my-app
pipenv install --python 3.12

2- install library from github via this PR's commit

pipenv install git+https://github.com/MITLibraries/timdex-dataset-api.git@fd64746

Note the Pipfile will contain something like this:

[packages]
timdex-dataset-api = {ref = "fd64746", git = "git+https://github.com/MITLibraries/timdex-dataset-api.git"}

This demonstrates two things:

  1. the library is installable directly from Github
  2. we can install a particular version via a commit SHA, release tag, etc.

3- open Ipython shell and confirm importable

pipenv run ipython
from timdex_dataset_api import __version__

print(__version__)
# '0.1.0'

This demonstrates that the version number in the pyproject.toml --> project.version section is what sets the version when installed.

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:

This python library is designed to be installed as a dependency in other
applications.  This scaffolding supports installation directly from the
Github repository.

How this addresses that need:
* Creates project files that support local installation and development
* Creates project files that support installation from Github as a library
* Establishes the library name when imported and used as 'timdex_dataset_api'

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-414
Copy link

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I just have two minor change requests. 🤓

Why these changes are being introduced:

Formerly, the python library version number was set via the
pyproject.toml file.  This worked, as the actual package itself
could provide the version by reading the installed package
metadata.  However, some best practices discourage this, as it
relies on the packing and metadata reading.

We may choose to revisit this approach as we get farther along,
particularly if installing via a Github tag or release is
important.

How this addresses that need:
* Version is set in timdex_dataset_api.__init__.__version__,
allowing it to be read directly from the library
* Packaging via pyproject.toml uses this version number via a
"dynamic" metadata field

Side effects of this change:
* None at this time

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/TIMX-414
@ghukill ghukill force-pushed the TIMX-414-scaffold-library-project branch from 5139b78 to 3339f13 Compare December 2, 2024 18:52
@ghukill ghukill merged commit d5f3549 into main Dec 2, 2024
2 checks passed
@ghukill ghukill mentioned this pull request Jan 2, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants