Summary
When built with OpenAI embedding support and run with CODEGRAPH_EMBEDDING_PROVIDER=openai, indexing prints a critical fallback to random/hash embeddings, but the final indexing summary still reports successful OpenAI embeddings.
This makes a degraded index look valid.
Related
Clean Reproduction
Fresh clone:
git clone --depth 1 https://github.com/Jakedismo/codegraph-rust codegraph-rust-clean-openai
cd codegraph-rust-clean-openai
git rev-parse HEAD
# ce5bf27 Merge pull request #59 from Jakedismo/wip-indexing-analyzers-doc
Build:
cargo build --release -p codegraph-mcp-server --bin codegraph \
--no-default-features \
--features ai-enhanced,embeddings-openai,codegraph-ai/openai-llm
Index command:
CODEGRAPH_SURREALDB_URL=ws://127.0.0.1:51068 \
CODEGRAPH_SURREALDB_NAMESPACE=codegraph \
CODEGRAPH_SURREALDB_DATABASE=main \
CODEGRAPH_SURREALDB_USERNAME=root \
CODEGRAPH_SURREALDB_PASSWORD=root \
CODEGRAPH_EMBEDDING_PROVIDER=openai \
CODEGRAPH_EMBEDDING_MODEL=text-embedding-3-small \
CODEGRAPH_EMBEDDING_DIMENSION=1536 \
CODEGRAPH_LLM_PROVIDER=openai \
CODEGRAPH_MODEL=gpt-5.4-mini \
CODEGRAPH_CONTEXT_WINDOW=200000 \
RUST_LOG=info \
target/release/codegraph --config /path/to/codegraph-rust.toml \
index src --recursive --index-tier fast --force --batch-size 64 --max-concurrent 2
Observed during indexing:
CRITICAL: Falling back to random hash-based embeddings
Observed final summary:
Embeddings: chunks 2157 | dim 1536 | provider openai
Database state:
chunks: 2157
embedding_1536 populated: 2157
Expected
One of:
- Real OpenAI embeddings are generated.
- Indexing fails hard with an actionable OpenAI provider error.
- The final summary clearly reports fallback/random embeddings and marks the index unsuitable for semantic retrieval.
Actual
The index completes and reports OpenAI embeddings even though the log says random hash-based embeddings were used.
Impact
Downstream semantic search depends on embedding quality. A silent random fallback can make retrieval quality invalid while the CLI summary says the OpenAI embedding path succeeded.
Summary
When built with OpenAI embedding support and run with
CODEGRAPH_EMBEDDING_PROVIDER=openai, indexing prints a critical fallback to random/hash embeddings, but the final indexing summary still reports successful OpenAI embeddings.This makes a degraded index look valid.
Related
Clean Reproduction
Fresh clone:
Build:
Index command:
Observed during indexing:
Observed final summary:
Database state:
Expected
One of:
Actual
The index completes and reports OpenAI embeddings even though the log says random hash-based embeddings were used.
Impact
Downstream semantic search depends on embedding quality. A silent random fallback can make retrieval quality invalid while the CLI summary says the OpenAI embedding path succeeded.