Shared Config and Utils Module Documentation

Module Overview

The Shared Config and Utils module is the foundational layer of CodeWiki that provides centralized configuration management and cross-cutting file I/O utilities. It serves as a critical backbone for the entire system, enabling consistent configuration handling across CLI, backend services, and frontend web applications.

Purpose

This module addresses three core responsibilities:

Centralized Configuration Management: Unified configuration object used throughout the system
Context Detection: Distinguish between CLI and web application execution contexts
File I/O Operations: Standardized file handling for JSON and text persistence

Key Characteristics

Framework-Agnostic: Works seamlessly with both CLI and web applications
Provider-Flexible: Supports multiple LLM providers (OpenAI-compatible, Anthropic, AWS Bedrock, Azure OpenAI)
Extensible: Agent instructions allow runtime customization of behavior
Type-Safe: Uses Python dataclasses for robust configuration schema

Architecture Overview

graph TB
    Config["Config<br/>(Central Configuration)"]
    FM["FileManager<br/>(File I/O Utilities)"]
    
    CTX["Context Detection<br/>CLI vs Web App"]
    
    Config -->|Manages| AGI["Agent Instructions<br/>(Custom Behaviors)"]
    Config -->|Manages| LLM["LLM Configuration<br/>(Models & Providers)"]
    Config -->|Manages| PAT["Paths & Directories<br/>(Output Structure)"]
    Config -->|Manages| TOKEN["Token Settings<br/>(Rate Limiting)"]
    
    CTX -->|Controls| Config
    
    FM -->|Provides| JSON["JSON I/O"]
    FM -->|Provides| TEXT["Text I/O"]
    
    style Config fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style FM fill:#7B68EE,stroke:#5A4CB8,color:#fff
    style CTX fill:#F5A623,stroke:#C17E1F,color:#fff

Module Dependencies

graph LR
    ConfigUtils["shared_config_and_utils<br/>(This Module)"]
    
    CLICore["cli_core<br/>(ConfigManager, DocGenerator)"]
    CLIModels["cli_models<br/>(Configuration, JobStatus)"]
    LLMBackends["llm_backends<br/>(LLMBackend, CawBackend)"]
    Frontend["frontend_web_app<br/>(WebAppConfig, CacheManager)"]
    DependencyAnalyzer["dependency_analyzer<br/>(AnalysisService)"]
    DocumentationGen["documentation_generation<br/>(DocumentationGenerator)"]
    
    ConfigUtils -->|Used by| CLICore
    ConfigUtils -->|Used by| CLIModels
    ConfigUtils -->|Used by| LLMBackends
    ConfigUtils -->|Used by| Frontend
    ConfigUtils -->|Used by| DependencyAnalyzer
    ConfigUtils -->|Used by| DocumentationGen
    
    style ConfigUtils fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style CLICore fill:#4ECDC4,stroke:#2A9B8A,color:#fff
    style CLIModels fill:#4ECDC4,stroke:#2A9B8A,color:#fff
    style LLMBackends fill:#95E1D3,stroke:#6B9B8C,color:#fff
    style Frontend fill:#95E1D3,stroke:#6B9B8C,color:#fff
    style DependencyAnalyzer fill:#F8D47C,stroke:#C4A550,color:#000
    style DocumentationGen fill:#F8D47C,stroke:#C4A550,color:#000

Core Components

1. Config Class

The Config class serves as the central configuration object for the entire CodeWiki system. It encapsulates all runtime configuration parameters and provides type-safe access to settings.

Class Structure

@dataclass
class Config:
    """Configuration class for CodeWiki."""

Core Attributes

Paths & Directories

repo_path: str                  # Source repository path
output_dir: str                 # Base output directory
dependency_graph_dir: str       # Dependency graph storage
docs_dir: str                   # Documentation output directory

LLM Configuration

llm_base_url: str              # API endpoint URL
llm_api_key: str               # Authentication key
main_model: str                # Primary model (default: claude-sonnet-4)
cluster_model: str             # Model for clustering operations
fallback_model: str            # Backup model (default: glm-4p5)
provider: str                  # Provider type: openai-compatible, anthropic, bedrock, azure-openai

Provider-Specific Settings

aws_region: str                # AWS region for Bedrock (default: us-east-1)
api_version: str               # Azure OpenAI API version (default: 2024-12-01-preview)
azure_deployment: str          # Azure OpenAI deployment name

Token Management

max_tokens: int                # Max response tokens (default: 32,768)
max_token_per_module: int      # Per-module limit (default: 36,369)
max_token_per_leaf_module: int # Leaf module limit (default: 16,000)

Customization

max_depth: int                 # Hierarchical decomposition depth
agent_instructions: Optional[Dict[str, Any]]  # Custom behavior directives

Properties

The Config class provides convenient properties for accessing agent instructions:

@property
def include_patterns(self) -> Optional[List[str]]:
    """Get file include patterns from agent instructions."""
    
@property
def exclude_patterns(self) -> Optional[List[str]]:
    """Get file exclude patterns from agent instructions."""
    
@property
def focus_modules(self) -> Optional[List[str]]:
    """Get focus modules from agent instructions."""
    
@property
def doc_type(self) -> Optional[str]:
    """Get documentation type from agent instructions."""
    # Values: 'api', 'architecture', 'user-guide', 'developer'
    
@property
def custom_instructions(self) -> Optional[str]:
    """Get custom instructions from agent instructions."""

Factory Methods

1. from_args() - CLI Argument Parsing

@classmethod
def from_args(cls, args: argparse.Namespace) -> 'Config':
    """Create configuration from parsed CLI arguments."""

Creates a Config instance with default values derived from CLI argument parsing. Sanitizes repository name for use in directory structures.

2. from_cli() - Explicit Parameter Construction

@classmethod
def from_cli(
    cls,
    repo_path: str,
    output_dir: str,
    llm_base_url: str,
    llm_api_key: str,
    main_model: str,
    cluster_model: str,
    fallback_model: str = FALLBACK_MODEL_1,
    provider: str = "openai-compatible",
    aws_region: str = "us-east-1",
    api_version: str = "2024-12-01-preview",
    azure_deployment: str = "",
    max_tokens: int = DEFAULT_MAX_TOKENS,
    max_token_per_module: int = DEFAULT_MAX_TOKEN_PER_MODULE,
    max_token_per_leaf_module: int = DEFAULT_MAX_TOKEN_PER_LEAF_MODULE,
    max_depth: int = MAX_DEPTH,
    agent_instructions: Optional[Dict[str, Any]] = None
) -> 'Config':
    """Create configuration for CLI context with explicit parameters."""

Provides full control over all configuration parameters with sensible defaults. Used when configuration is loaded from configuration files.

Methods

get_prompt_addition()

def get_prompt_addition(self) -> str:
    """Generate prompt additions based on agent instructions."""

Builds dynamic prompt modifications based on configuration settings. Returns a formatted string with:

Documentation type guidance (API, architecture, user-guide, developer)
Focus module emphasis
Custom instructions

Usage Example:

config = Config.from_cli(...)
if config.agent_instructions:
    prompt += config.get_prompt_addition()

Context Detection System

Two global functions manage execution context:

def set_cli_context(enabled: bool = True):
    """Set whether we're running in CLI context (vs web app)."""
    global _CLI_CONTEXT
    _CLI_CONTEXT = enabled

def is_cli_context() -> bool:
    """Check if running in CLI context."""
    return _CLI_CONTEXT

Purpose: Allows different behaviors based on execution environment:

CLI Context: Load credentials from keyring, use local config files
Web Context: Use environment variables for sensitive data

Default Constants

OUTPUT_BASE_DIR = 'output'                           # Root output directory
DEPENDENCY_GRAPHS_DIR = 'dependency_graphs'          # Dependency graph storage
DOCS_DIR = 'docs'                                    # Documentation directory
MAX_DEPTH = 2                                        # Module hierarchy depth

# Token limits
DEFAULT_MAX_TOKENS = 32_768                          # Total response tokens
DEFAULT_MAX_TOKEN_PER_MODULE = 36_369                # Per-module budget
DEFAULT_MAX_TOKEN_PER_LEAF_MODULE = 16_000           # Leaf-module budget

# LLM Models (loaded from environment or defaults)
MAIN_MODEL = 'claude-sonnet-4'
FALLBACK_MODEL_1 = 'glm-4p5'
CLUSTER_MODEL = MAIN_MODEL

2. FileManager Class

The FileManager class provides a simple, standardized interface for file I/O operations. It abstracts low-level file handling and ensures consistent error handling across the system.

Class Structure

class FileManager:
    """Handles file I/O operations."""

All methods are static, making FileManager a utility class that doesn't require instantiation.

Methods

Directory Operations

@staticmethod
def ensure_directory(path: str) -> None:
    """Create directory if it doesn't exist."""
    os.makedirs(path, exist_ok=True)

Creates nested directory structures with proper error handling. Safe to call multiple times on the same path.

JSON Operations

@staticmethod
def save_json(data: Any, filepath: str) -> None:
    """Save data as JSON to file with 4-space indentation."""
    
@staticmethod
def load_json(filepath: str) -> Optional[Dict[str, Any]]:
    """Load JSON from file, return None if file doesn't exist."""

Handles JSON serialization/deserialization with graceful fallback for missing files. Data is saved with 4-space indentation for readability.

Text Operations

@staticmethod
def save_text(content: str, filepath: str) -> None:
    """Save text content to file."""
    
@staticmethod
def load_text(filepath: str) -> str:
    """Load text content from file."""

Simple UTF-8 text file operations. load_text() raises an exception if the file doesn't exist (unlike load_json()).

Singleton Instance

file_manager = FileManager()

A module-level instance is provided for convenient access across the codebase:

from codewiki.src.utils import file_manager

file_manager.ensure_directory('./output')
file_manager.save_json(data, './output/data.json')

Data Flow

Configuration Initialization Flow

sequenceDiagram
    participant CLI as CLI Entry Point
    participant CM as ConfigManager
    participant Config as Config Class
    participant Env as Environment/Keyring
    
    CLI->>CM: parse_arguments()
    CM->>Env: load config from ~/.codewiki/config.json
    Env-->>CM: configuration dict
    CM->>Config: Config.from_cli(...)
    Config->>Config: Setup paths & directories
    Config->>Config: Load LLM provider settings
    Config-->>CM: Config instance
    CM-->>CLI: Ready for processing

File I/O Usage Pattern

graph TB
    Component["System Component<br/>(DocumentationGenerator, etc.)"]
    FileManager["FileManager<br/>(Utility Class)"]
    Filesystem["File System"]
    
    Component -->|ensure_directory| FileManager
    Component -->|save_json| FileManager
    Component -->|load_json| FileManager
    Component -->|save_text| FileManager
    Component -->|load_text| FileManager
    
    FileManager -->|Create/Read/Write| Filesystem
    
    style Component fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style FileManager fill:#7B68EE,stroke:#5A4CB8,color:#fff
    style Filesystem fill:#95E1D3,stroke:#6B9B8C,color:#fff

Integration Points

1. CLI Core (ConfigManager)

The ConfigManager reads configuration from:

Command-line arguments
~/.codewiki/config.json file
System environment variables
Keyring for credentials

It then creates a Config instance for use throughout the CLI application.

Reference: See cli_core.md for detailed integration.

2. LLM Backends

All LLM backend implementations (llm_backends.md) use Config settings:

llm_base_url and llm_api_key for API connections
provider type for endpoint selection
main_model, fallback_model for model selection
Token limits for response generation

# Example from backend implementation
def __init__(self, config: Config):
    self.base_url = config.llm_base_url
    self.api_key = config.llm_api_key
    self.provider = config.provider
    self.main_model = config.main_model

3. Documentation Generation

The DocumentationGenerator (documentation_generation.md) uses:

max_tokens for LLM prompts
max_token_per_module for hierarchical budgeting
agent_instructions for custom behavior
output_dir and docs_dir for file output
FileManager for persistent storage

4. Frontend Web App

The frontend (frontend_web_app.md) uses:

WebAppConfig derived from environment variables
FileManager for caching and data persistence
Different context (web vs CLI) for credential handling

Usage Examples

Example 1: Creating Configuration in CLI

from codewiki.src.config import Config, set_cli_context

# Mark that we're in CLI context
set_cli_context(True)

# Create config with explicit parameters
config = Config.from_cli(
    repo_path="/path/to/repo",
    output_dir="/path/to/output",
    llm_base_url="https://api.anthropic.com/v1",
    llm_api_key="sk-xxxx",
    main_model="claude-opus",
    cluster_model="claude-opus",
    provider="anthropic",
    agent_instructions={
        'doc_type': 'architecture',
        'focus_modules': ['auth', 'database'],
        'custom_instructions': 'Emphasize security aspects'
    }
)

# Access configuration properties
print(config.main_model)           # claude-opus
print(config.doc_type)              # architecture
print(config.get_prompt_addition()) # Dynamic prompt additions

Example 2: File Operations

from codewiki.src.utils import file_manager

# Create output directory structure
file_manager.ensure_directory('./output/docs')
file_manager.ensure_directory('./output/dependency_graphs')

# Save analysis results
analysis_results = {
    'modules': ['auth', 'database', 'api'],
    'total_dependencies': 45,
    'complexity_score': 8.2
}
file_manager.save_json(analysis_results, './output/analysis.json')

# Save documentation
doc_content = "# Architecture Documentation\n..."
file_manager.save_text(doc_content, './output/docs/architecture.md')

# Load data back
loaded_results = file_manager.load_json('./output/analysis.json')
doc = file_manager.load_text('./output/docs/architecture.md')

Example 3: Using Agent Instructions

config = Config.from_cli(
    repo_path="/repo",
    output_dir="/output",
    llm_base_url="...",
    llm_api_key="...",
    main_model="claude-opus",
    cluster_model="claude-opus",
    agent_instructions={
        'include_patterns': ['src/**/*.py', 'lib/**/*.ts'],
        'exclude_patterns': ['**/test/**', '**/node_modules/**'],
        'focus_modules': ['auth', 'payment'],
        'doc_type': 'api',
        'custom_instructions': 'Include code examples for all endpoints'
    }
)

# Check if specific filters are configured
if config.include_patterns:
    # Apply file inclusion filters
    files = apply_patterns(repo, config.include_patterns, config.exclude_patterns)

# Build documentation with custom guidance
prompt = f"""
Generate comprehensive documentation for the following modules:

{', '.join(config.focus_modules)}

{config.get_prompt_addition()}
"""

Example 4: Configuration for Different LLM Providers

# OpenAI Compatible API
config_openai = Config.from_cli(
    repo_path="/repo",
    output_dir="/output",
    llm_base_url="http://localhost:8000/v1",
    llm_api_key="sk-xxxx",
    main_model="gpt-4",
    cluster_model="gpt-3.5-turbo",
    provider="openai-compatible"
)

# AWS Bedrock
config_bedrock = Config.from_cli(
    repo_path="/repo",
    output_dir="/output",
    llm_base_url="https://bedrock-runtime.us-west-2.amazonaws.com",
    llm_api_key="aws_credentials",
    main_model="anthropic.claude-3-sonnet-20240229-v1:0",
    cluster_model="anthropic.claude-3-haiku-20240307-v1:0",
    provider="bedrock",
    aws_region="us-west-2"
)

# Azure OpenAI
config_azure = Config.from_cli(
    repo_path="/repo",
    output_dir="/output",
    llm_base_url="https://myresource.openai.azure.com",
    llm_api_key="azure_api_key",
    main_model="gpt-4-turbo",
    cluster_model="gpt-35-turbo",
    provider="azure-openai",
    api_version="2024-12-01-preview",
    azure_deployment="my-deployment"
)

Token Management Strategy

The module implements a hierarchical token budgeting system to control LLM costs and prevent runaway token consumption:

graph TD
    Total["Total Response Budget<br/>32,768 tokens"]
    
    Total -->|Split across| PerModule["Per-Module Budget<br/>36,369 tokens"]
    PerModule -->|Split across| PerLeaf["Per-Leaf Module<br/>16,000 tokens"]
    
    PerLeaf -->|Used for| GenDoc["Documentation<br/>Generation"]
    PerLeaf -->|Used for| Analysis["Dependency<br/>Analysis"]
    
    style Total fill:#FF6B6B,stroke:#C92A2A,color:#fff
    style PerModule fill:#FFA94D,stroke:#CC5500,color:#fff
    style PerLeaf fill:#4ECDC4,stroke:#2A9B8A,color:#fff

Hierarchy Levels:

Max Tokens: Overall response limit per LLM call
Per-Module Budget: Tokens allocated to documenting a single module
Per-Leaf Budget: Tokens allocated to leaf modules (smaller allocations)

This ensures:

No single operation exceeds reasonable token limits
Larger modules get more budget for comprehensive documentation
Smaller modules are efficiently documented without excess
System remains cost-effective

Context Detection System

The module provides context awareness for different execution environments:

graph TB
    Start["Application Start"]
    
    Start -->|CLI Mode| SetCLI["set_cli_context True"]
    Start -->|Web Mode| SetWeb["set_cli_context False"]
    
    SetCLI -->|ConfigManager| LoadFile["Load ~/.codewiki/config.json"]
    SetCLI -->|ConfigManager| LoadKeyring["Load credentials from keyring"]
    
    SetWeb -->|WebAppConfig| LoadEnv["Load from environment variables"]
    SetWeb -->|WebAppConfig| LoadEnvKeys["Use env API keys"]
    
    style Start fill:#4A90E2,stroke:#2E5C8A,color:#fff
    style SetCLI fill:#7B68EE,stroke:#5A4CB8,color:#fff
    style SetWeb fill:#F5A623,stroke:#C17E1F,color:#fff

Benefits:

CLI: Secure credential storage in system keyring
Web: Simple environment variable configuration
Flexibility: Same code works in both contexts with different configuration sources

Environment Variables

The Config class loads default values from environment variables:

Variable	Default	Purpose
`MAIN_MODEL`	`claude-sonnet-4`	Primary LLM model
`FALLBACK_MODEL_1`	`glm-4p5`	Fallback LLM model
`CLUSTER_MODEL`	`MAIN_MODEL`	Model for clustering operations
`LLM_BASE_URL`	`http://0.0.0.0:4000/`	LLM API endpoint
`LLM_API_KEY`	`sk-1234`	API authentication key

Note: These defaults are suitable for development. Production deployments must override via environment variables or configuration files.

Error Handling & Edge Cases

FileManager

Missing Files

# load_json() returns None for missing files (graceful)
data = file_manager.load_json('./nonexistent.json')
assert data is None

# load_text() raises FileNotFoundError
content = file_manager.load_text('./nonexistent.txt')  # Raises FileNotFoundError

Directory Creation

# Safe to call multiple times
file_manager.ensure_directory('./output')
file_manager.ensure_directory('./output')  # No error

Config

Missing Agent Instructions

config = Config.from_cli(..., agent_instructions=None)
print(config.doc_type)  # Returns None
print(config.get_prompt_addition())  # Returns empty string

Empty Agent Instructions

config = Config.from_cli(..., agent_instructions={})
print(config.focus_modules)  # Returns None

Related Modules

cli_core.md: ConfigManager integration
cli_models.md: Configuration model definitions
llm_backends.md: LLM provider integration
documentation_generation.md: Uses Config for generation settings
frontend_web_app.md: Web context configuration

Summary

The Shared Config and Utils module provides:

✅ Centralized Configuration: Single source of truth for all system settings ✅ Multiple LLM Providers: Support for OpenAI, Anthropic, AWS Bedrock, Azure OpenAI ✅ Flexible Customization: Agent instructions enable runtime behavior customization ✅ Token Management: Hierarchical budgeting prevents runaway costs ✅ Context Awareness: Seamlessly works in CLI and web app contexts ✅ File I/O Utilities: Standardized file operations across the system ✅ Type Safety: Dataclass-based configuration prevents errors

This foundational module enables the rest of CodeWiki to operate with consistent, flexible, and manageable configuration.

FilesExpand file tree

shared_config_and_utils.md

Latest commit

History

shared_config_and_utils.md

File metadata and controls

Shared Config and Utils Module Documentation

Module Overview

Purpose

Key Characteristics

Architecture Overview

Module Dependencies

Core Components

1. Config Class

Class Structure

Core Attributes

Properties

Factory Methods

Methods

Context Detection System

Default Constants

2. FileManager Class

Class Structure

Methods

Singleton Instance

Data Flow

Configuration Initialization Flow

File I/O Usage Pattern

Integration Points

1. CLI Core (ConfigManager)

2. LLM Backends

3. Documentation Generation

4. Frontend Web App

Usage Examples

Example 1: Creating Configuration in CLI

Example 2: File Operations

Example 3: Using Agent Instructions

Example 4: Configuration for Different LLM Providers

Token Management Strategy

Context Detection System

Environment Variables

Error Handling & Edge Cases

FileManager

Config

Related Modules

Summary