The cli_models module defines the data models and configuration structures for the CodeWiki CLI. It serves as the bridge between user-facing CLI commands and the backend documentation generation system, providing:
- Persistent Configuration Management: User settings stored in
~/.codewiki/config.json - Job Lifecycle Tracking: Complete representation of documentation generation jobs
- Configuration Bridge: Conversion from CLI settings to backend configuration
- Type-Safe Data Handling: Dataclass-based models for type safety and serialization
This module is essential for maintaining separation of concerns between the CLI layer and the backend documentation generation system.
Custom instructions for the documentation agent that allow fine-grained control over documentation generation.
Key Responsibilities:
- Define file filtering patterns (include/exclude)
- Specify module focus areas
- Set documentation type preferences
- Provide custom LLM instructions
Key Methods:
to_dict()/from_dict(): Serialization for persistent storageis_empty(): Check if any instructions are definedget_prompt_addition(): Generate LLM prompt additions
Configuration Options:
- include_patterns: List[str] # e.g., ["*.cs", "*.py"]
- exclude_patterns: List[str] # e.g., ["*Tests*", "*Specs*"]
- focus_modules: List[str] # e.g., ["src/core", "src/api"]
- doc_type: str # api, architecture, user-guide, developer
- custom_instructions: str # Free-form instructionsPersistent CodeWiki CLI configuration stored in ~/.codewiki/config.json.
Key Responsibilities:
- Store LLM provider settings (base URL, models, API keys)
- Manage token limits and generation parameters
- Maintain default output directory
- Validate all configuration fields
- Convert to backend
Configfor documentation generation
Key Attributes:
- LLM Settings:
base_url,main_model,cluster_model,fallback_model,provider - Provider Specifics:
aws_region,api_version,azure_deployment - Token Limits:
max_tokens,max_token_per_module,max_token_per_leaf_module - Generation Parameters:
max_depth,default_output - Custom Instructions:
agent_instructions(AgentInstructions)
Key Methods:
validate(): Validates all fields based on provider typeis_complete(): Checks if required fields are setto_backend_config(): Converts CLI config to backend Config for a specific job
Provider Support:
- OpenAI-compatible (default)
- Azure OpenAI
- AWS Bedrock
- Anthropic (claude-code, codex - subscription mode)
Enumeration tracking the lifecycle of a documentation generation job.
States:
PENDING: Job created, awaiting executionRUNNING: Job actively generating documentationCOMPLETED: Job finished successfullyFAILED: Job encountered an error
Options controlling how documentation is generated for a specific job.
Options:
create_branch: Create a git branch for generated docsgithub_pages: Generate GitHub Pages compatible documentationno_cache: Skip caching, force regenerationcustom_output: Override default output directory
Metrics collected during documentation generation.
Metrics:
total_files_analyzed: Number of source files processedleaf_nodes: Number of leaf modules identifiedmax_depth: Maximum depth of module hierarchytotal_tokens_used: Total tokens consumed by LLM
Snapshot of LLM configuration used for a specific job.
Captured Settings:
main_model: Primary model usedcluster_model: Model used for module clusteringbase_url: LLM API endpoint
Complete representation of a documentation generation job from creation to completion.
Core Attributes:
- Identification:
job_id(UUID),repository_name,repository_path - Git Context:
commit_hash,branch_name - Output:
output_directory,files_generated - Execution:
status,timestamp_start,timestamp_end - Error Handling:
error_message - Configuration:
generation_options,llm_config - Results:
module_count,statistics
Key Methods:
start(): Mark job as running, record start timecomplete(): Mark job as completed, record end timefail(): Mark job as failed, record error messageto_dict()/from_dict(): Serialization for storage/retrievalto_json(): Generate JSON representation
Lifecycle:
Created (PENDING) → Start → Running → Complete/Fail → COMPLETED/FAILED
The module provides a critical bridge between CLI configuration and backend execution:
Persistent Configuration (~/.codewiki/config.json)
↓
Configuration
├── base_url, models, provider settings
├── token limits, generation parameters
└── agent_instructions (AgentInstructions)
↓
to_backend_config() [with runtime parameters]
↓
Backend Config (from shared_config_and_utils)
├── repo_path, output_dir
├── api_key, llm_base_url
├── model selections, token limits
└── merged agent_instructions
↓
Documentation Generation Backend
└── llm_backends, documentation_generation
Job Creation
↓ (status = PENDING)
Job Start
↓ (status = RUNNING, timestamp_start set)
Documentation Generation
├── LLM generates documentation
└── Metrics accumulated
↓
Success or Failure
├── complete() → COMPLETED
└── fail() → FAILED
Configuration Model
├── AgentInstructions
│ ├── include_patterns, exclude_patterns
│ ├── focus_modules, doc_type
│ └── custom_instructions
└── Conversion Method: to_backend_config()
└── Creates Backend Config with
├── Merged instructions
├── Repository context
└── LLM credentials
DocumentationJob
├── JobStatus (enum)
├── GenerationOptions
├── LLMConfig (snapshot)
├── JobStatistics
└── Serialization Methods
├── to_dict(), from_dict()
└── to_json()
Configuration
├── AgentInstructions (custom filters and instructions)
├── LLM Settings (base_url, models, provider)
├── Token Limits (max_tokens, max_token_per_module)
├── Generation Parameters (max_depth, default_output)
└── Serialization Methods
├── to_dict() → JSON for ~/.codewiki/config.json
├── from_dict() → Load from config file
└── to_backend_config() → Convert to Backend Config
PENDING (created)
↓ job.start()
RUNNING (executing)
├─ job.complete() → COMPLETED (success)
└─ job.fail(message) → FAILED (error)
DocumentationJob contains:
- JobStatus (state enum)
- LLMConfig (model snapshot)
- GenerationOptions (job settings)
- JobStatistics (execution metrics)
cli_models Module
├── Configuration ──→ cli_core/ConfigManager
│ (reads/writes ~/.codewiki/config.json)
│
├── DocumentationJob ──→ cli_core/CLIDocumentationGenerator
│ cli_utils/CLILogger, ProgressTracker
│
└── to_backend_config() ──→ Backend System
(documentation generation)
Input:
- Persistent Configuration (from ~/.codewiki/config.json)
- Runtime Parameters (repo path, output directory)
- API Key (from keyring/environment)
Processing:
1. Load Configuration
2. Validate fields (provider-specific)
3. Merge agent instructions (runtime overrides persistent)
4. Inject credentials
5. Create Backend Config
Output:
- Backend Config (ready for documentation generation)
- Preserves API key securely
- Maintains audit trail
- CLI
Configurationpersists user settings independently to_backend_config()creates a runtime configuration with:- Job-specific parameters (repo path, output directory)
- Merged agent instructions (runtime + persistent)
- Validated credentials and settings
- Complete state machine representation
- Immutable fields at creation time
- Mutable fields during execution
- Comprehensive serialization for audit trails
- Dataclass-based models for type hints
- Field validation in
Configuration.validate() - Provider-specific validation logic
- Safe enum-based status representation
- CLI models independent of backend implementation
- Job metadata separate from documentation content
- Configuration separate from execution context
- Statistics separate from core job tracking
- ConfigManager: Reads/writes
Configurationfrom/to~/.codewiki/config.json - CLIDocumentationGenerator: Uses
Configurationto createDocumentationJob - GitManager: Updates
DocumentationJobwith git context
- Config: Receives converted
Configurationviato_backend_config() - Backend System: Uses converted
Configfor actual generation
- LLMBackend: Receives
LLMConfigsnapshot for initialization - Provider Initialization: Uses provider-specific fields from
Configuration
- CLILogger: Logs job status transitions
- ProgressTracker: Tracks job execution progress
from codewiki.cli.config_manager import ConfigManager
from codewiki.cli.models.config import Configuration
# Load from ~/.codewiki/config.json
config_mgr = ConfigManager()
config: Configuration = config_mgr.load()
# Validate before use
config.validate()# Create job with runtime overrides
job = DocumentationJob(
repository_path="/path/to/repo",
repository_name="my-repo",
output_directory="docs"
)
# Convert CLI config to backend config
backend_config = config.to_backend_config(
repo_path=job.repository_path,
output_dir=job.output_directory,
api_key=api_key_from_keyring,
runtime_instructions=None # or custom AgentInstructions
)
# backend_config now ready for documentation generation# Create and start job
job = DocumentationJob(
repository_path="/path/to/repo",
repository_name="my-repo",
output_directory="docs"
)
job.start() # Sets status to RUNNING, timestamp_start
# During generation
job.files_generated.append("README.md")
job.statistics.total_files_analyzed = 42
# On completion
job.complete() # Sets status to COMPLETED, timestamp_end
# Or on failure
job.fail("Insufficient tokens for module clustering")
# Serialize for storage
job_json = job.to_json()from codewiki.cli.models.config import AgentInstructions
instructions = AgentInstructions(
doc_type="api",
focus_modules=["src/api", "src/models"],
exclude_patterns=["*_test.py", "*Tests*"],
custom_instructions="Emphasize error handling patterns"
)
# Generate prompt addition for LLM
prompt_addition = instructions.get_prompt_addition()
# → "Focus on API documentation: endpoints, parameters...\n
# Pay special attention to: src/api, src/models\n
# ..."{
"base_url": "https://api.example.com/v1",
"main_model": "gpt-4-turbo",
"cluster_model": "gpt-4",
"fallback_model": "glm-4p5",
"default_output": "docs",
"provider": "openai-compatible",
"aws_region": "us-east-1",
"api_version": "2024-12-01-preview",
"azure_deployment": "",
"max_tokens": 32768,
"max_token_per_module": 36369,
"max_token_per_leaf_module": 16000,
"max_depth": 2,
"agent_instructions": {
"include_patterns": ["*.py", "*.ts"],
"exclude_patterns": ["*test*", "*__pycache__*"],
"focus_modules": ["src/core"],
"doc_type": "architecture",
"custom_instructions": "Focus on system design"
}
}{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"repository_path": "/home/user/my-repo",
"repository_name": "my-repo",
"output_directory": "/home/user/my-repo/docs",
"commit_hash": "abc123def456",
"branch_name": "main",
"timestamp_start": "2024-05-23T10:30:00.000000",
"timestamp_end": "2024-05-23T11:45:30.000000",
"status": "completed",
"error_message": null,
"files_generated": ["README.md", "ARCHITECTURE.md", "API.md"],
"module_count": 8,
"generation_options": {
"create_branch": false,
"github_pages": true,
"no_cache": false,
"custom_output": null
},
"llm_config": {
"main_model": "gpt-4-turbo",
"cluster_model": "gpt-4",
"base_url": "https://api.example.com/v1"
},
"statistics": {
"total_files_analyzed": 156,
"leaf_nodes": 12,
"max_depth": 3,
"total_tokens_used": 125000
}
}Configuration(
base_url="https://api.openai.com/v1",
main_model="gpt-4-turbo",
cluster_model="gpt-4",
fallback_model="gpt-3.5-turbo",
provider="openai-compatible"
)Configuration(
base_url="https://{resource}.openai.azure.com/",
main_model="gpt-4-deployment",
cluster_model="gpt-4-deployment",
provider="azure-openai",
api_version="2024-12-01-preview",
azure_deployment="my-deployment"
)Configuration(
main_model="anthropic.claude-3-sonnet-20240229-v1:0",
cluster_model="anthropic.claude-3-haiku-20240307-v1:0",
provider="bedrock",
aws_region="us-east-1"
)Configuration(
main_model="claude-code", # or "codex"
provider="anthropic"
# No base_url, cluster_model, or fallback_model required
)The module handles several validation scenarios:
- Provider-specific field requirements
- Model name format validation
- URL format validation for API endpoints
- Required field presence checks
- State machine enforcement (PENDING → RUNNING → COMPLETED/FAILED)
- Error recording with descriptive messages
- Timestamp consistency validation
- Subscription-mode providers (claude-code, codex) require only
main_model - Traditional API providers require
base_url,cluster_model,fallback_model - Validation adapts based on
providerfield value
- Runtime instructions override persistent settings
- Merging only occurs when explicitly requested
- Efficient to_dict() conversion for serialization
- JSON format optimized for storage and transmission
- UUID generation is lazy (on-demand)
- Statistics accumulated incrementally during execution
- Dataclass instances are lightweight
- No persistent connections or resources held
- Serialization creates copies only when needed
-
API Key Management
- Configuration stores
base_urlbut NOT the API key - API key retrieved from secure keyring at runtime
- Passed to
to_backend_config()only when needed
- Configuration stores
-
Credential Handling
- Job serialization never includes credentials
- LLMConfig stores only configuration, not keys
- Backend
Confighandles credential injection
-
File Patterns
- Include/exclude patterns in AgentInstructions for filtering sensitive paths
- Patterns processed before repository analysis
@pytest.fixture
def mock_config():
return Configuration(
base_url="http://localhost:8000",
main_model="mock-model",
cluster_model="mock-cluster",
fallback_model="mock-fallback"
)def test_job_lifecycle():
job = DocumentationJob(
repository_path="/test/repo",
repository_name="test-repo"
)
assert job.status == JobStatus.PENDING
job.start()
assert job.status == JobStatus.RUNNING
job.complete()
assert job.status == JobStatus.COMPLETEDdef test_job_serialization():
job = DocumentationJob(...)
json_str = job.to_json()
restored = DocumentationJob.from_dict(json.loads(json_str))
assert job.job_id == restored.job_id- cli_core.md - CLI command execution and orchestration
- shared_config_and_utils.md - Backend configuration system
- llm_backends.md - LLM provider implementation
- documentation_generation.md - Core generation engine