This document describes the PyPI validation feature that helps detect hallucinated or invalid Python package imports generated by the software engineer agent.
When the software engineer agent generates Python code, it may sometimes include imports for packages that don't actually exist on PyPI (hallucinated packages) or use incorrect import names. This validation system:
- Validates imports against PyPI - Checks if each imported package actually exists on PyPI
- Resolves import names to distribution names - Converts import names (e.g.,
sklearn) to the correct PyPI distribution names (e.g.,scikit-learn) - Web search fallback - Uses Tavily web search to find correct package names when PyPI validation fails
- LLM-powered candidate extraction - Intelligently parses web search results to identify valid package candidates
- Provides feedback to the software engineer - Reports invalid imports back to the agent so it can fix them
flowchart TD
A[Import Name] --> B{Check Cache}
B -->|Cached| C[Return Cached Result]
B -->|Not Cached| D{Check Known Aliases}
D -->|Found| E[Validate Alias on PyPI]
D -->|Not Found| F[Generate Candidate Names]
F --> G[Try: original, hyphenated, no-separator]
G --> H{Query PyPI API}
H -->|Found| I[✓ Valid Package]
H -->|Not Found| J{Web Search Enabled?}
J -->|No| K[✗ Invalid Package]
J -->|Yes| L[Tavily Web Search]
L --> M[LLM Extracts Candidates]
M --> N[Parse pip install commands]
M --> O[Parse PyPI URLs]
M --> P[Analyze content]
N --> Q[Validate Each Candidate]
O --> Q
P --> Q
Q -->|Found Valid| R[✓ Valid Package via Web Search]
Q -->|None Valid| S[✗ Invalid with Search Context]
E --> I
I --> T[Cache Result]
K --> T
R --> T
S --> T
T --> U[Return Result]
classDef default fill:#E8E8E8,stroke:#333,stroke-width:2px,color:#000
classDef success fill:#90EE90,stroke:#2D5016,stroke-width:2px,color:#000
classDef failure fill:#FFB6C1,stroke:#8B0000,stroke-width:2px,color:#000
classDef search fill:#87CEEB,stroke:#104E8B,stroke-width:2px,color:#000
classDef llm fill:#DDA0DD,stroke:#8B008B,stroke-width:2px,color:#000
class I,R success
class K,S failure
class L search
class M llm
A standalone module that validates Python imports against PyPI using the PyPI JSON API and web search.
Key features:
- Checks if a distribution exists on PyPI
- Generates candidate distribution names from import names (handles underscores, hyphens, etc.)
- Resolves import names to correct distribution names
- Supports known aliases (e.g.,
sklearn→scikit-learn,tavily→tavily-python) - Web search integration - Uses Tavily to search for package information when PyPI validation fails
- LLM-powered extraction - Uses Claude to intelligently parse search results
- Provides detailed package information from PyPI
- Caches results for performance
Example usage:
from pypi_validator import PyPIValidator
# Basic validation
validator = PyPIValidator()
is_valid, dist_name, error_msg = validator.validate_import('sklearn', {'sklearn': 'scikit-learn'})
if is_valid:
print(f"Package found: {dist_name}")
else:
print(f"Invalid import: {error_msg}")
# With web search enabled
validator = PyPIValidator(
enable_web_search=True,
model="anthropic/claude-sonnet-4-5-20250929"
)
is_valid, dist_name, error_msg = validator.validate_import('tavily_websearch')
# Web search will find 'tavily-python' as the correct package!Handles web search integration for package discovery.
Key methods:
_extract_package_candidates_with_llm()- Uses LLM to parse search results and extract package names_extract_package_candidates_regex()- Fallback regex-based extractionsearch_and_validate_package()- Searches web and validates candidates against PyPI
How it works:
- Searches Tavily with query:
"Python package PyPI import {import_name} pip install" - Sends search results to LLM with structured prompt
- LLM returns JSON:
{"candidates": ["pkg1", "pkg2"], "reasoning": "explanation"} - Validates each candidate against PyPI
- Returns first valid package found
The PythonPackageAnalyzer class now:
- Uses
PyPIValidatorto validate all detected imports - Supports web search via
enable_web_searchparameter - Accepts
modelparameter for LLM-based search parsing - Tracks invalid imports in
self.invalid_imports - Only includes validated packages in
requirements.txt - Provides a
get_invalid_imports()method to retrieve validation errors
Initialization:
analyzer = PythonPackageAnalyzer(
src_dir="src",
enable_web_search=True, # Enable Tavily search
model="anthropic/claude-sonnet-4-5-20250929" # LLM for parsing
)New CLI flag:
--search true|false- Enable/disable Tavily web search (default:true)
Environment variables:
TAVILY_API_KEY- Required when--search true- Loaded from
.envfile automatically
Startup validation:
- Checks for
TAVILY_API_KEYwhen web search is enabled - Prompts user to continue without key or exit
The TestEngineerAgent now:
- Passes
enable_web_searchtoPythonPackageAnalyzer - Passes
modelparameter for LLM parsing - Retrieves invalid imports from the analyzer
- Includes invalid imports in the test results dictionary
- Reports invalid imports in the compressed feedback to the software engineer
- Provides actionable suggestions for fixing invalid imports
When validating an import like tavily_websearch:
- Check cache - See if we've already validated this import
- Check known aliases - Look up in the alias dictionary (e.g.,
tavily→tavily-python) - Generate candidates - Create variations:
tavily_websearch(original)tavily-websearch(underscores to hyphens)tavilywebsearch(no separators)
- Query PyPI - For each candidate, call
https://pypi.org/pypi/{candidate}/json - Web Search (if enabled and PyPI fails):
- Search Tavily for package information
- LLM analyzes results and extracts candidate package names
- Validate each candidate against PyPI
- Return first valid package found
- Return result:
- If found via PyPI or web search:
(True, distribution_name, None) - If not found:
(False, None, error_message_with_search_results)
- If found via PyPI or web search:
%%{init: {'theme':'base', 'themeVariables': { 'actorBkg':'#1657AD','actorBorder':'#aaa','actorTextColor':'#aaa','actorLineColor':'#aaa','signalColor':'#aaa','signalTextColor':'#aaa','labelBoxBkgColor':'#B0C4DE','labelBoxBorderColor':'#aaa','loopTextColor':'#aaa','activationBorderColor':'#aaa','activationBkgColor':'#B0C4DE','sequenceNumberColor':'#aaa','noteBorderColor':'#aaa'}}}%%
sequenceDiagram
participant SE as Software Engineer
participant TE as Test Engineer
participant PA as Package Analyzer
participant PV as PyPI Validator
participant TS as Tavily Search
participant LLM as LLM
participant PyPI as PyPI API
SE->>TE: Generated code with imports
TE->>PA: Analyze imports
PA->>PV: Validate 'tavily_websearch'
PV->>PyPI: Check candidates
PyPI-->>PV: Not found
PV->>TS: Search web for package info
TS-->>PV: Search results (URLs, content)
PV->>LLM: Parse results, extract candidates
LLM-->>PV: {"candidates": ["tavily-python"], "reasoning": "..."}
PV->>PyPI: Validate 'tavily-python'
PyPI-->>PV: Found!
PV-->>PA: Valid: 'tavily-python'
PA-->>TE: Valid packages + invalid imports
TE->>SE: Feedback with corrections
Software Engineer generates code
↓
TestEngineer extracts imports
↓
PythonPackageAnalyzer.analyze()
↓
PyPIValidator validates each import
↓
[PyPI Check]
↓
Found? → Valid imports → requirements.txt
↓
Not Found & Web Search Enabled?
↓
[Tavily Web Search]
↓
[LLM Candidate Extraction]
↓
[Validate Candidates on PyPI]
↓
Found? → Valid imports → requirements.txt
↓
Still Not Found → invalid_imports list
↓
Test results include invalid_imports
↓
Compressed feedback sent to Software Engineer
↓
Software Engineer fixes invalid imports
Run the test script to see the validator in action:
python test_pypi_validator.pyThis will demonstrate:
- Validating individual imports
- Resolving aliases (sklearn → scikit-learn, tavily → tavily-python)
- Detecting hallucinated packages
- Web search finding correct packages (tavily_websearch → tavily-python)
- LLM reasoning for candidate selection
- Batch validation
- Error reporting
Example output:
tavily_websearch (Possibly hallucinated package)
Searching web for information about 'tavily_websearch'...
LLM reasoning: Both packages are directly related to Tavily search functionality...
✓ Web search found valid package: tavily-python
✓ VALID: tavily-python
Summary: Python wrapper for the Tavily API
The system includes these built-in aliases:
| Import Name | Distribution Name |
|---|---|
sklearn |
scikit-learn |
dotenv |
python-dotenv |
tavily |
tavily-python |
bs4 |
beautifulsoup4 |
You can add more aliases by modifying the package_aliases dictionary in PythonPackageAnalyzer.__init__().
# Required for web search
export TAVILY_API_KEY="tvly-your-api-key-here"
# Or use .env file
echo "TAVILY_API_KEY=tvly-your-api-key-here" > .env# Enable web search (default)
python agents_from_scratch_docker.py --search true
# Disable web search
python agents_from_scratch_docker.py --search false
# Specify model for LLM parsing
python agents_from_scratch_docker.py --model anthropic/claude-sonnet-4-5-20250929# PyPI Validator
validator = PyPIValidator(
enable_web_search=True, # Enable Tavily search
model="anthropic/claude-sonnet-4-5-20250929" # LLM model
)
# Package Analyzer
analyzer = PythonPackageAnalyzer(
src_dir="src",
enable_web_search=True,
model="anthropic/claude-sonnet-4-5-20250929"
)Configurable parameters:
- Timeout: Change
self.timeoutinPyPIValidator.__init__()(default: 5 seconds) - Aliases: Add to
self.package_aliasesinPythonPackageAnalyzer.__init__() - Cache size: Modify
@lru_cache(maxsize=256)decorator - Candidate generation: Customize
generate_candidate_names()method - Web search model: Change
modelparameter (default: Claude Sonnet 4.5) - Search depth: Modify
search_depthinsearch_and_validate_package()(basic/advanced)
The validator gracefully handles:
- Network timeouts (5-second timeout per request)
- HTTP errors (treats as package not found)
- PyPI API unavailability (treats as unknown, allows to proceed)
- Rate limiting (uses caching to minimize requests)
- Missing TAVILY_API_KEY (prompts user or falls back to PyPI-only validation)
- Tavily API failures (falls back to regex-based extraction)
- LLM parsing errors (falls back to regex-based extraction)
- Invalid JSON from LLM (gracefully handles and retries with fallback)
- Caching: Results are cached using
@lru_cache(maxsize=256)and instance-level caching - Timeouts: 5-second timeout per PyPI request, 10-second timeout for web searches
- LLM calls: Only invoked when PyPI validation fails and web search is enabled
- Smart fallbacks: Regex extraction if LLM fails, PyPI-only if web search unavailable
- Candidate validation: Stops at first valid package found
Input: tavily_websearch
→ PyPI check fails
→ Web search finds: "pip install tavily-python"
→ LLM extracts: "tavily-python"
→ Validates on PyPI: ✓
Result: tavily-python
Input: this_is_definitely_fake_12345
→ PyPI check fails
→ Web search finds: Generic Python packaging info
→ LLM analysis: "No packages related to this import"
→ Result: Invalid with helpful context
Input: tavily-websear5ch (typo)
→ PyPI check fails
→ Web search finds: tavily-python documentation
→ LLM reasoning: "Appears to be a typo of 'tavily'"
→ Validates: tavily-python ✓
Result: tavily-python
Potential improvements:
- Parallel validation - Validate multiple packages concurrently
- Local database - Maintain a local cache of import → distribution mappings
- Fuzzy matching - Suggest similar package names for typos (beyond web search)
- Version checking - Validate specific version requirements
- Dependency analysis - Check if dependencies are also valid
- Custom PyPI mirrors - Support private PyPI repositories
- Search result caching - Cache web search results to reduce API calls
- Multi-LLM support - Allow different models for different use cases
Validator reports false negatives (valid packages marked invalid):
- Add the import → distribution mapping to
package_aliases - Enable web search with
--search true - Check if the package name uses non-standard separators
Web search not finding packages:
- Verify
TAVILY_API_KEYis set correctly - Check network connectivity to Tavily API
- Try running test script to verify search is working
LLM extraction failing:
- Check litellm configuration and API keys
- Verify model name is correct
- System will fall back to regex extraction automatically
Validator is too slow:
- Results are cached, so subsequent runs should be faster
- Disable web search with
--search falseif not needed - Consider increasing cache size or implementing persistent caching
PyPI requests timing out:
- Check network connectivity
- Increase timeout in
PyPIValidator.__init__() - The system will continue with unvalidated packages if PyPI is unavailable
TAVILY_API_KEY warnings:
- Set the key in
.envfile or environment - Or disable web search with
--search false - System will prompt before continuing without the key
Required:
requests- PyPI API callslitellm- LLM integration
Optional (for web search):
tavily-python- Tavily web search APIpython-dotenv- Environment variable management
Install all dependencies:
pip install -r requirements.txt