-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Objective
Establish a standardized folder structure and file naming convention for new data ingest processes, ensuring compatibility with the latest release schema and efficient storage/validation practices.
Requirements
- Create a new ingest folder in the repository.
- Within the ingest folder, create a subfolder for each data provider.
- All ingests must support the latest release schema.
- Depending on total data size, files should be split to limit each to ~25 MB.
- Do not split records between files: each file must contain only complete records so that validation can be performed independently.
- All data files are to be formatted as JSON lists (enclosed in brackets). Consider https://jsonlines.org/ as an alternative approach if more appropriate for downstream usage.
- File naming convention:
<data provider>_<padded 5 number>.json(e.g.,emsl_00001.json). - Future - explore jsonlines formate
Acceptance Criteria
- New ingest folder structure is documented and implemented.
- Each data provider has its own subfolder.
- All files conform to the current release schema.
- No file exceeds ~25 MB; splitting strategy is documented.
- No records are split between files; all files independently valid.
- Naming convention is followed for all new files.
- JSON format (list or dict) is clearly specified and documented.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request