feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline#31
Draft
feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline#31
Conversation
Copilot
AI
changed the title
[WIP] Add support for processing files and images from Telegram
feat: Add Telegram file/image support with vision, OCR, and document-to-context pipeline
Mar 25, 2026
Owner
|
@copilot use ocrs https://github.com/robertknight/ocrs as ocr image instead of terract |
…rs OCR Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add MessageContent enum supporting Text(String) and Parts(Vec<ContentPart>) to replace Option<String> in ChatMessage.content. Backwards-compatible: Text variant serializes as plain JSON string, Parts as JSON array. - Add ContentPart, ImageUrlContent, MessageContent types to llm.rs - Update ChatMessage.content: Option<String> → Option<MessageContent> - Fix all usage sites in agent.rs, memory/conversations.rs, memory/query_rewriter.rs, memory/summarizer.rs, memory/rag.rs - DB storage continues to use extracted plain text via as_text() - All 113 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add `supports_vision` bool field to `OpenRouterConfig` (default false) - Add `OcrConfig` struct with `model_dir` (default $HOME/.cache/ocrs) - Add `ocr` field to `Config` with serde default - Add 3 tests: supports_vision defaults false, parses true, ocr default dir - Update config.example.toml with comments for new fields Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add src/file_processor/mod.rs with process_attachments() entry point - Image handling: base64 vision parts (supports_vision=true) or OCR via ocrs - OCR: auto-downloads text-detection/recognition .rten models on first use - PDF: text extraction via pdf_extract::extract_text_from_mem - DOCX: text extraction by traversing docx_rs document children - Long context (>6000 chars): chunk+store in MemoryStore, RAG-retrieve - Unit tests for chunk_text (4 cases) - Register module via mod file_processor; in src/main.rs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…comment on pdf unwrap_or_default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace early-return for non-text messages with caption fallback - Add download_telegram_file helper (creates temp dir, fetches via Telegram API, detects MIME with infer crate) - Add classify_attachment_kind and mime_from_extension helpers - Collect photo and document attachments into Vec<Attachment> - Pass attachments to IncomingMessage instead of empty vec - Clean up temp dir after agent processing completes - Add 5 unit tests for classify_attachment_kind Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…match Co-authored-by: chinkan <16433287+chinkan@users.noreply.github.com> Agent-Logs-Url: https://github.com/chinkan/RustFox/sessions/9c502a20-dd1e-47bb-8b4d-34cf2f26c7f1
Contributor
Author
Switched OCR from tesseract to Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Telegram only supported text messages. This adds handling for photos and file attachments (PDF, DOCX, images), routing them through a vision/OCR/document extraction pipeline before injecting context into the LLM.
Platform Layer
IncomingMessagegainsattachments: Vec<Attachment>(kind: Image|Pdf|Docx|Other,path,mime_type)telegram.rsnow handlesmsg.photo()(highest-res variant) andmsg.document(), downloading to per-request temp dirs cleaned up after processingLLM Message Model (
llm.rs)ChatMessage.contentpromoted fromOption<String>toMessageContent— either a plain string orVec<ContentPart>(text/image_url)File Processor (
src/file_processor/)openrouter.supports_vision = true→ base64-encode into multi-modal content parts; otherwise → OCR viaocrs(pure Rust, neural-network-based OCR engine — no system library dependencies) → inject as textpdf-extractdocx-rsEmbeddingEngine+sqlite-vecknowledge store, and top-K chunks are RAG-retrieved per user query — no new DB requiredOCR
Uses
ocrs— a pure Rust neural-network OCR engine backed by therteninference runtime. Models (text detection + recognition) are downloaded automatically from S3 on first use and cached in~/.cache/ocrs/. No native system library (e.g. Tesseract) required.Config
New Dependencies
ocrsrtenocrsmodel filesimagepdf-extractdocx-rsinferbase64Original prompt
💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.