PyPI Import Validation

This document describes the PyPI validation feature that helps detect hallucinated or invalid Python package imports generated by the software engineer agent.

Overview

When the software engineer agent generates Python code, it may sometimes include imports for packages that don't actually exist on PyPI (hallucinated packages) or use incorrect import names. This validation system:

Validates imports against PyPI - Checks if each imported package actually exists on PyPI
Resolves import names to distribution names - Converts import names (e.g., sklearn) to the correct PyPI distribution names (e.g., scikit-learn)
Web search fallback - Uses Tavily web search to find correct package names when PyPI validation fails
LLM-powered candidate extraction - Intelligently parses web search results to identify valid package candidates
Provides feedback to the software engineer - Reports invalid imports back to the agent so it can fix them

Architecture Diagram

flowchart TD
    A[Import Name] --> B{Check Cache}
    B -->|Cached| C[Return Cached Result]
    B -->|Not Cached| D{Check Known Aliases}
    D -->|Found| E[Validate Alias on PyPI]
    D -->|Not Found| F[Generate Candidate Names]
    F --> G[Try: original, hyphenated, no-separator]
    G --> H{Query PyPI API}
    H -->|Found| I[✓ Valid Package]
    H -->|Not Found| J{Web Search Enabled?}
    J -->|No| K[✗ Invalid Package]
    J -->|Yes| L[Tavily Web Search]
    L --> M[LLM Extracts Candidates]
    M --> N[Parse pip install commands]
    M --> O[Parse PyPI URLs]
    M --> P[Analyze content]
    N --> Q[Validate Each Candidate]
    O --> Q
    P --> Q
    Q -->|Found Valid| R[✓ Valid Package via Web Search]
    Q -->|None Valid| S[✗ Invalid with Search Context]

    E --> I
    I --> T[Cache Result]
    K --> T
    R --> T
    S --> T
    T --> U[Return Result]

    classDef default fill:#E8E8E8,stroke:#333,stroke-width:2px,color:#000
    classDef success fill:#90EE90,stroke:#2D5016,stroke-width:2px,color:#000
    classDef failure fill:#FFB6C1,stroke:#8B0000,stroke-width:2px,color:#000
    classDef search fill:#87CEEB,stroke:#104E8B,stroke-width:2px,color:#000
    classDef llm fill:#DDA0DD,stroke:#8B008B,stroke-width:2px,color:#000

    class I,R success
    class K,S failure
    class L search
    class M llm

Components

1. `pypi_validator.py`

A standalone module that validates Python imports against PyPI using the PyPI JSON API and web search.

Key features:

Checks if a distribution exists on PyPI
Generates candidate distribution names from import names (handles underscores, hyphens, etc.)
Resolves import names to correct distribution names
Supports known aliases (e.g., sklearn → scikit-learn, tavily → tavily-python)
Web search integration - Uses Tavily to search for package information when PyPI validation fails
LLM-powered extraction - Uses Claude to intelligently parse search results
Provides detailed package information from PyPI
Caches results for performance

Example usage:

from pypi_validator import PyPIValidator

# Basic validation
validator = PyPIValidator()
is_valid, dist_name, error_msg = validator.validate_import('sklearn', {'sklearn': 'scikit-learn'})

if is_valid:
    print(f"Package found: {dist_name}")
else:
    print(f"Invalid import: {error_msg}")

# With web search enabled
validator = PyPIValidator(
    enable_web_search=True,
    model="anthropic/claude-sonnet-4-5-20250929"
)
is_valid, dist_name, error_msg = validator.validate_import('tavily_websearch')
# Web search will find 'tavily-python' as the correct package!

2. `TavilySearchHelper` Class

Handles web search integration for package discovery.

Key methods:

_extract_package_candidates_with_llm() - Uses LLM to parse search results and extract package names
_extract_package_candidates_regex() - Fallback regex-based extraction
search_and_validate_package() - Searches web and validates candidates against PyPI

How it works:

Searches Tavily with query: "Python package PyPI import {import_name} pip install"
Sends search results to LLM with structured prompt
LLM returns JSON: {"candidates": ["pkg1", "pkg2"], "reasoning": "explanation"}
Validates each candidate against PyPI
Returns first valid package found

3. Updates to `docker_test.py`

The PythonPackageAnalyzer class now:

Uses PyPIValidator to validate all detected imports
Supports web search via enable_web_search parameter
Accepts model parameter for LLM-based search parsing
Tracks invalid imports in self.invalid_imports
Only includes validated packages in requirements.txt
Provides a get_invalid_imports() method to retrieve validation errors

Initialization:

analyzer = PythonPackageAnalyzer(
    src_dir="src",
    enable_web_search=True,  # Enable Tavily search
    model="anthropic/claude-sonnet-4-5-20250929"  # LLM for parsing
)

4. Updates to `agents_from_scratch_docker.py`

New CLI flag:

--search true|false - Enable/disable Tavily web search (default: true)

Environment variables:

TAVILY_API_KEY - Required when --search true
Loaded from .env file automatically

Startup validation:

Checks for TAVILY_API_KEY when web search is enabled
Prompts user to continue without key or exit

The TestEngineerAgent now:

Passes enable_web_search to PythonPackageAnalyzer
Passes model parameter for LLM parsing
Retrieves invalid imports from the analyzer
Includes invalid imports in the test results dictionary
Reports invalid imports in the compressed feedback to the software engineer
Provides actionable suggestions for fixing invalid imports

How It Works

Validation Algorithm

When validating an import like tavily_websearch:

Check cache - See if we've already validated this import
Check known aliases - Look up in the alias dictionary (e.g., tavily → tavily-python)
Generate candidates - Create variations:
- tavily_websearch (original)
- tavily-websearch (underscores to hyphens)
- tavilywebsearch (no separators)
Query PyPI - For each candidate, call https://pypi.org/pypi/{candidate}/json
Web Search (if enabled and PyPI fails):
- Search Tavily for package information
- LLM analyzes results and extracts candidate package names
- Validate each candidate against PyPI
- Return first valid package found
Return result:
- If found via PyPI or web search: (True, distribution_name, None)
- If not found: (False, None, error_message_with_search_results)

Web Search Enhanced Flow

%%{init: {'theme':'base', 'themeVariables': { 'actorBkg':'#1657AD','actorBorder':'#aaa','actorTextColor':'#aaa','actorLineColor':'#aaa','signalColor':'#aaa','signalTextColor':'#aaa','labelBoxBkgColor':'#B0C4DE','labelBoxBorderColor':'#aaa','loopTextColor':'#aaa','activationBorderColor':'#aaa','activationBkgColor':'#B0C4DE','sequenceNumberColor':'#aaa','noteBorderColor':'#aaa'}}}%%
sequenceDiagram
    participant SE as Software Engineer
    participant TE as Test Engineer
    participant PA as Package Analyzer
    participant PV as PyPI Validator
    participant TS as Tavily Search
    participant LLM as LLM
    participant PyPI as PyPI API

    SE->>TE: Generated code with imports
    TE->>PA: Analyze imports
    PA->>PV: Validate 'tavily_websearch'
    PV->>PyPI: Check candidates
    PyPI-->>PV: Not found
    PV->>TS: Search web for package info
    TS-->>PV: Search results (URLs, content)
    PV->>LLM: Parse results, extract candidates
    LLM-->>PV: {"candidates": ["tavily-python"], "reasoning": "..."}
    PV->>PyPI: Validate 'tavily-python'
    PyPI-->>PV: Found!
    PV-->>PA: Valid: 'tavily-python'
    PA-->>TE: Valid packages + invalid imports
    TE->>SE: Feedback with corrections

Integration Flow

Software Engineer generates code
         ↓
TestEngineer extracts imports
         ↓
PythonPackageAnalyzer.analyze()
         ↓
PyPIValidator validates each import
         ↓
    [PyPI Check]
         ↓
    Found? → Valid imports → requirements.txt
         ↓
    Not Found & Web Search Enabled?
         ↓
    [Tavily Web Search]
         ↓
    [LLM Candidate Extraction]
         ↓
    [Validate Candidates on PyPI]
         ↓
    Found? → Valid imports → requirements.txt
         ↓
    Still Not Found → invalid_imports list
         ↓
Test results include invalid_imports
         ↓
Compressed feedback sent to Software Engineer
         ↓
Software Engineer fixes invalid imports

Testing

Run the test script to see the validator in action:

python test_pypi_validator.py

This will demonstrate:

Validating individual imports
Resolving aliases (sklearn → scikit-learn, tavily → tavily-python)
Detecting hallucinated packages
Web search finding correct packages (tavily_websearch → tavily-python)
LLM reasoning for candidate selection
Batch validation
Error reporting

Example output:

tavily_websearch               (Possibly hallucinated package)
Searching web for information about 'tavily_websearch'...
   LLM reasoning: Both packages are directly related to Tavily search functionality...
✓ Web search found valid package: tavily-python
  ✓ VALID: tavily-python
  Summary: Python wrapper for the Tavily API

Common Aliases

The system includes these built-in aliases:

Import Name	Distribution Name
`sklearn`	`scikit-learn`
`dotenv`	`python-dotenv`
`tavily`	`tavily-python`
`bs4`	`beautifulsoup4`

You can add more aliases by modifying the package_aliases dictionary in PythonPackageAnalyzer.__init__().

Configuration

Environment Variables

# Required for web search
export TAVILY_API_KEY="tvly-your-api-key-here"

# Or use .env file
echo "TAVILY_API_KEY=tvly-your-api-key-here" > .env

CLI Arguments

# Enable web search (default)
python agents_from_scratch_docker.py --search true

# Disable web search
python agents_from_scratch_docker.py --search false

# Specify model for LLM parsing
python agents_from_scratch_docker.py --model anthropic/claude-sonnet-4-5-20250929

Code Configuration

# PyPI Validator
validator = PyPIValidator(
    enable_web_search=True,  # Enable Tavily search
    model="anthropic/claude-sonnet-4-5-20250929"  # LLM model
)

# Package Analyzer
analyzer = PythonPackageAnalyzer(
    src_dir="src",
    enable_web_search=True,
    model="anthropic/claude-sonnet-4-5-20250929"
)

Configurable parameters:

Timeout: Change self.timeout in PyPIValidator.__init__() (default: 5 seconds)
Aliases: Add to self.package_aliases in PythonPackageAnalyzer.__init__()
Cache size: Modify @lru_cache(maxsize=256) decorator
Candidate generation: Customize generate_candidate_names() method
Web search model: Change model parameter (default: Claude Sonnet 4.5)
Search depth: Modify search_depth in search_and_validate_package() (basic/advanced)

Error Handling

The validator gracefully handles:

Network timeouts (5-second timeout per request)
HTTP errors (treats as package not found)
PyPI API unavailability (treats as unknown, allows to proceed)
Rate limiting (uses caching to minimize requests)
Missing TAVILY_API_KEY (prompts user or falls back to PyPI-only validation)
Tavily API failures (falls back to regex-based extraction)
LLM parsing errors (falls back to regex-based extraction)
Invalid JSON from LLM (gracefully handles and retries with fallback)

Performance

Caching: Results are cached using @lru_cache(maxsize=256) and instance-level caching
Timeouts: 5-second timeout per PyPI request, 10-second timeout for web searches
LLM calls: Only invoked when PyPI validation fails and web search is enabled
Smart fallbacks: Regex extraction if LLM fails, PyPI-only if web search unavailable
Candidate validation: Stops at first valid package found

Web Search Intelligence Examples

Example 1: Typo Detection

Input: tavily_websearch
→ PyPI check fails
→ Web search finds: "pip install tavily-python"
→ LLM extracts: "tavily-python"
→ Validates on PyPI: ✓
Result: tavily-python

Example 2: Hallucinated Package

Input: this_is_definitely_fake_12345
→ PyPI check fails
→ Web search finds: Generic Python packaging info
→ LLM analysis: "No packages related to this import"
→ Result: Invalid with helpful context

Example 3: Alternative Names

Input: tavily-websear5ch (typo)
→ PyPI check fails
→ Web search finds: tavily-python documentation
→ LLM reasoning: "Appears to be a typo of 'tavily'"
→ Validates: tavily-python ✓
Result: tavily-python

Future Enhancements

Potential improvements:

Parallel validation - Validate multiple packages concurrently
Local database - Maintain a local cache of import → distribution mappings
Fuzzy matching - Suggest similar package names for typos (beyond web search)
Version checking - Validate specific version requirements
Dependency analysis - Check if dependencies are also valid
Custom PyPI mirrors - Support private PyPI repositories
Search result caching - Cache web search results to reduce API calls
Multi-LLM support - Allow different models for different use cases

Troubleshooting

Validator reports false negatives (valid packages marked invalid):

Add the import → distribution mapping to package_aliases
Enable web search with --search true
Check if the package name uses non-standard separators

Web search not finding packages:

Verify TAVILY_API_KEY is set correctly
Check network connectivity to Tavily API
Try running test script to verify search is working

LLM extraction failing:

Check litellm configuration and API keys
Verify model name is correct
System will fall back to regex extraction automatically

Validator is too slow:

Results are cached, so subsequent runs should be faster
Disable web search with --search false if not needed
Consider increasing cache size or implementing persistent caching

PyPI requests timing out:

Check network connectivity
Increase timeout in PyPIValidator.__init__()
The system will continue with unvalidated packages if PyPI is unavailable

TAVILY_API_KEY warnings:

Set the key in .env file or environment
Or disable web search with --search false
System will prompt before continuing without the key

Dependencies

Required:

requests - PyPI API calls
litellm - LLM integration

Optional (for web search):

tavily-python - Tavily web search API
python-dotenv - Environment variable management

Install all dependencies:

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyPI Import Validation

Overview

Architecture Diagram

Components

1. `pypi_validator.py`

2. `TavilySearchHelper` Class

3. Updates to `docker_test.py`

4. Updates to `agents_from_scratch_docker.py`

How It Works

Validation Algorithm

Web Search Enhanced Flow

Integration Flow

Testing

Common Aliases

Configuration

Environment Variables

CLI Arguments

Code Configuration

Error Handling

Performance

Web Search Intelligence Examples

Example 1: Typo Detection

Example 2: Hallucinated Package

Example 3: Alternative Names

Future Enhancements

Troubleshooting

Dependencies

FilesExpand file tree

PYPI_VALIDATION.md

Latest commit

History

PYPI_VALIDATION.md

File metadata and controls

PyPI Import Validation

Overview

Architecture Diagram

Components

1. pypi_validator.py

2. TavilySearchHelper Class

3. Updates to docker_test.py

4. Updates to agents_from_scratch_docker.py

How It Works

Validation Algorithm

Web Search Enhanced Flow

Integration Flow

Testing

Common Aliases

Configuration

Environment Variables

CLI Arguments

Code Configuration

Error Handling

Performance

Web Search Intelligence Examples

Example 1: Typo Detection

Example 2: Hallucinated Package

Example 3: Alternative Names

Future Enhancements

Troubleshooting

Dependencies

1. `pypi_validator.py`

2. `TavilySearchHelper` Class

3. Updates to `docker_test.py`

4. Updates to `agents_from_scratch_docker.py`