Systematic quality evaluation suite for AI/ML datasets. 103 ego datasets audited. ISO 5259-2 aligned.
-
Updated
Apr 21, 2026 - Python
Systematic quality evaluation suite for AI/ML datasets. 103 ego datasets audited. ISO 5259-2 aligned.
KALOS: Evaluate the quality of computer vision datasets
Official repository for paper "Enhancing 3D Point Cloud Classification with ModelNet-R and Point-SkipNet"
Evaluation QA harness for misinformation datasets: stress tests evidence quality, shortcuts, ambiguity, and ranking fragility.
(WIP): 'Aporia' in Greek means 'inconsistent'. A Python library that detects and fixes dataset issues using both rule-based methods and ML models. It evaluates dataset quality across multiple metrics, including missing values, duplicates, outliers, class imbalance, and label consistency. It also suggests fixes based on the metric scores.
面向研究、竞赛与论文场景的可追溯数据采集与交付工具
CV Dataset Quality Inspector — React-based tool for detecting quality issues in computer vision annotation datasets. Auto-detects bbox errors, visualizes class imbalance, and exports quality reports — built for AV/CV ML pipelines.
How much labeled data do you actually need to deploy a parking occupancy system at a never-before-seen lot? A supervision study spanning CLIP zero-shot → ResNet-18 few-shot → full supervision on 432k parking space crops, with dataset annotation error discovery. Trained on NVIDIA A100 via IU Big Red 200.
Agentic data intelligence tool using LangChain & Pandas for automated dataset cleaning, governance, and quality analysis.
Practical lessons on prompt engineering for code-generation datasets used to train LLMs. Patterns and failure modes from real task audits.
The Dataset Quality Scoring Engine (DQS) evaluates the quality of any dataset using automated, model-agnostic metrics. The system processes user-uploaded datasets, computes embeddings, analyzes statistical and semantic properties, and outputs a standardized quality score
Lightweight toolkit for multimodal data curation and quality triage
LLM Code Trainer & Dataset Quality Reviewer at Revelo. Prompt engineering, multi-language code review (Python, TS/JS, C, C++). Remote, EN/PT.
Add a description, image, and links to the dataset-quality topic page so that developers can more easily learn about it.
To associate your repository with the dataset-quality topic, visit your repo's landing page and select "manage topics."