Code hallucination#38
Open
adaamko wants to merge 20 commits into
Open
Conversation
Remove GLiNER2 schema-head detail from docs/taxonomy.md (too implementation-specific for now); keep the modality-unification and typed-output rationale. Ruff formatting on sample_assembler.
Introduce lettucedetect/generation/ with composable, source-agnostic primitives for building hallucination-detection datasets: - injection.py: universal, taxonomy-driven injector (sync + async) that corrupts a correct answer into a hallucinated one with exact character spans, modality-aware (code/tool_output/markdown/prose) across all 13 subtypes. - answers.py: grounded correct-answer generation (sync + async). - runner.py: batched (asyncio.gather), resumable, failure-logging orchestration reused by every source adapter. The code-hallucination Phase 6 injector now delegates edit application, span location, and validation to this engine, keeping its own code prompt and native labels. Verified byte-identical against the released dataset (2000/2000 entries reproduced through the shared path). Add the squeez tool-output adapter (scripts/generate_squeez_hallucinations.py) and taxonomy category/subtype definitions. Document the taxonomy and generation pipeline in the site nav.
The batched runner no longer writes a synthetic key field into output records — resumability keys are derived from each record via a record_key callable, so samples are written verbatim in the final schema. Add lettucedetect/generation/assembly.py with balance_hallucination_ratio, a reusable primitive that trims clean samples to a target hallucinated ratio at assembly/upload time (no per-source munging scripts).
Add lettucedetect/generation/questions.py — generates typed, self-contained questions answerable from a document, driven by an 18-type taxonomy (adapted from the acl-verbatim QA generator). Multi-part types are flagged as omission candidates. Needed by the doc-only markdown sources (wiki, READMEs). Extract the chat-completion-with-retries loop into generation/_completion.py (sync + async, with a transform callback) and route answers, injection, and questions through it, removing the duplicated retry boilerplate. Injection output is unchanged (verified byte-identical against the released dataset).
Add the ACL adapter (scripts/generate_acl_hallucinations.py): group acl-verbatim-spans by question, take the top-5 retrieved/gold chunks as markdown context, generate a grounded answer, and inject a paper-specific hallucination detectable against the excerpts. Shared additions, reused by future markdown sources: - per-edit hallucination types in apply_changes_to_answer (each edit labelled with its own type; falls back to the passed type, so existing sources are byte-identical) - inject_menu / inject_menu_async: menu-mode injection where the model picks the fitting types, mapped to the taxonomy per source - PAPER_MAP in taxonomy.py (NUMERICAL/ENTITY/RELATIONAL/METHODOLOGICAL/ CITATIONAL -> unified categories)
Reusable tool that searches popular repos across languages via the GitHub REST API, fetches and filters their READMEs (substantial, structured), and writes a resumable JSONL corpus for the README markdown source. Needs GITHUB_TOKEN for a usable rate limit; skips repos already collected.
Add generation/doc_source.py: the shared document-based flow (chunk by heading -> typed question -> grounded answer -> menu injection -> assemble), batched and resumable, with a generic factual markdown injection prompt as the default. READMEs and (upcoming) Wikipedia are thin configs over it. Add the README adapter (reads the collected README corpus, repo-level train/dev/test split, developer-style question subset). The generic factual injection suits heterogeneous README content far better than a dev-doc schema. Repurpose MARKDOWN_MAP to the generic factual types.
Stream the English open-wikipedia-markdown parquet shards (the dataset script is broken, so load parquet directly), sample substantial articles, and run the shared doc-source pipeline with factual question types and the generic factual injection. Add a shared hash_split helper for document-level train/dev/test splitting (README now uses it too).
The source acl-verbatim test config has only ~17 answerable questions, so the ACL test split was tiny (3 hallucinated). Pool all source questions and assign train/dev/test by hashing the paper id (paper-separated, no leakage), giving a real test set (~440 questions / ~117 hallucinated).
Update the generation pipeline doc to list the five sources now built (code, tool-output, ACL, README, Wikipedia) with their modality, question source, and injection mode.
Add classify.py: an LLM that types an already-annotated span into the unified taxonomy, for sources that ship untyped spans (inverse of the injector). Use it to fold PsiloQA (natural, multilingual hallucinations) into the taxonomy via classify_psiloqa_spans.py; RAGTruth maps mechanically. Add build_hf_dataset.py, a reusable assembler that merges data/v2 sources into a DatasetDict and pushes it (dev->validation, metadata dict serialized to a JSON string).
The baked prompt put the user request last ("...User request: {q}"), where
truncation=only_first clips it on long inputs, and never exposed context or
question separately. Add context+question fields to HallucinationSample, a shared
format_prompt() that builds the prompt question-first (truncation-safe), and wire
every adapter to emit context/question. Add canonicalize_prompts.py to backfill
existing data/v2 in place (idempotent, no LLM). Pack Wikipedia heading sections
into larger chunks so contexts are no longer too short.
a265c45 to
1080d52
Compare
These were early monolithic prototypes (Groq/Kimi) and one-off helpers superseded by the modular scripts/code_hallucination/ pipeline and lettucedetect/generation/. None were referenced by code, docs, or CI.
- validator.py now imports _extract_code_regions / _span_is_in_code / _max_allowed_coverage from the canonical injector instead of keeping drifted copies (behavior preserved: long-answer cap stays 0.30) - config.set_output_dir was missing 'global INJECTION_FAILURES_PATH', so redirecting the output dir silently left that one path at the default - drop a dead local variable in the injector's sequential path
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hallucination detection for agentic coding workflows
Builds a span-level hallucination-detection benchmark for agentic coding workflows: given the grounded artifacts an assistant sees at inference (repository source, tool output, retrieved docs), localize the unsupported spans in its answer. Spans across every source map into one unified taxonomy, and two public prose datasets are folded in as a complementary collection.
What's here
Unified taxonomy (
lettucedetect/datasets/taxonomy.py)contradiction,unsupported_addition,fabricated_reference) +supported+ document-levelomission; 13 subtypes. Every source maps into one label space.Shared generation pipeline (
lettucedetect/generation/)classify.py: an LLM that types an already-annotated span into the taxonomy — for sources that ship untyped spans (the inverse of the injector).Datasets (
KRLabsOrg/lettucedetect-code-hallucination, 79,591 samples)KRLabsOrg/lettucedetect-prose-hallucination(87,834): PsiloQA (natural, 14 languages) + RAGTruth, classified into the same taxonomy.Prompt format
contextandquestionseparately, and the prompt places the request first (User request: {question}\n\n{context}) so it is never lost when a long context is truncated. Backfilled existing data in place and updated all adapters.Tooling
scripts/build_hf_dataset.py: reusable assembler that mergesdata/v2sources into aDatasetDictand pushes (metadata serialized to JSON string).Repo hygiene
validator.pynow imports the canonical span/coverage helpers from the injector instead of keeping drifted copies.config.set_output_dir(missingglobal INJECTION_FAILURES_PATH).Not in scope yet