|
| 1 | +# Media Recovery System — Agent Reference Guide |
| 2 | + |
| 3 | +> **Branch:** `fix/media-key-jsonb-updateMediaMessage` |
| 4 | +> **4 commits** on top of `release/2.3.7` — all production-tested. |
| 5 | +
|
| 6 | +## Overview |
| 7 | + |
| 8 | +This patch adds 3 new endpoints + 2 bug fixes to Evolution API, making it **self-sufficient** for WhatsApp media recovery without external orchestrators. |
| 9 | + |
| 10 | +### Problem Solved |
| 11 | + |
| 12 | +WhatsApp CDN URLs expire after ~7 days. Once expired, media files (SOR documents, images, etc.) become permanently inaccessible unless you: |
| 13 | +1. Have the original `mediaKey` + `directPath` (stored in EA's Message table) |
| 14 | +2. Can trigger WhatsApp's `updateMediaMessage` protocol (asks sender to re-upload) |
| 15 | +3. Can request historical messages via `fetchMessageHistory` (on-demand history sync) |
| 16 | + |
| 17 | +Previously, only OwnPilot had these capabilities. Now EA has them natively. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Endpoints |
| 22 | + |
| 23 | +### 1. `POST /chat/retryMediaFromMetadata/{instance}` |
| 24 | + |
| 25 | +**Purpose:** Download media using caller-supplied metadata — does NOT require message to exist in EA's DB. |
| 26 | + |
| 27 | +**When to use:** |
| 28 | +- You have `mediaKey` + `directPath` from an external source (e.g., OwnPilot DB) |
| 29 | +- The message exists in EA but `getBase64FromMediaMessage` fails (DB lookup issue) |
| 30 | + |
| 31 | +**Request:** |
| 32 | +```json |
| 33 | +{ |
| 34 | + "messageId": "3EB0D228037ED522E72774", |
| 35 | + "remoteJid": "120363423491841999@g.us", |
| 36 | + "participant": "119365882089638@lid", |
| 37 | + "fromMe": false, |
| 38 | + "mediaKey": "base64-encoded-key", |
| 39 | + "directPath": "/v/t62.7119-24/...", |
| 40 | + "url": "https://mmg.whatsapp.net/...", |
| 41 | + "mimeType": "application/octet-stream", |
| 42 | + "filename": "2314CP_82_V1.SOR", |
| 43 | + "fileLength": 20973, |
| 44 | + "convertToMp4": false |
| 45 | +} |
| 46 | +``` |
| 47 | + |
| 48 | +**Response:** |
| 49 | +```json |
| 50 | +{ |
| 51 | + "base64": "TWFwAMgAfA...", |
| 52 | + "mimetype": "application/octet-stream", |
| 53 | + "filename": "2314CP_82_V1.SOR" |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +**Algorithm:** |
| 58 | +1. Reconstruct minimal WAMessage proto from provided metadata |
| 59 | +2. Try direct `downloadMediaMessage` (fast-path — CDN still valid) |
| 60 | +3. On failure → explicit `updateMediaMessage` with 30s timeout (Baileys RC9 workaround) |
| 61 | +4. Retry download with refreshed URL |
| 62 | + |
| 63 | +**Edge Cases:** |
| 64 | +- `mediaKey` from PostgreSQL JSONB may be stored as `{0: 123, 1: 45, ...}` object instead of Uint8Array — the code handles both formats via lexicographic sort fix (commit `262c9300`) |
| 65 | +- `updateMediaMessage` times out after 30s if sender is permanently offline — throws `BadRequestException` |
| 66 | +- Audio files with `convertToMp4: true` are processed via `processAudioMp4` |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +### 2. `POST /chat/fetchGroupHistory/{instance}` |
| 71 | + |
| 72 | +**Purpose:** Trigger WhatsApp on-demand history sync for a group. WhatsApp responds with old message protos containing **fresh mediaKey + directPath**. |
| 73 | + |
| 74 | +**When to use:** |
| 75 | +- Messages are missing from EA's DB (were sent before EA was connected) |
| 76 | +- You need fresh mediaKeys for messages whose CDN URLs expired |
| 77 | + |
| 78 | +**Request:** |
| 79 | +```json |
| 80 | +{ |
| 81 | + "groupJid": "120363423491841999@g.us", |
| 82 | + "count": 50, |
| 83 | + "anchorMessageId": "3EB0DCCA32F22B9AA2A3B4", |
| 84 | + "anchorTimestamp": 1765216930, |
| 85 | + "anchorFromMe": false, |
| 86 | + "anchorParticipant": "90383560261829@lid" |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +**Response (immediate — 202-style):** |
| 91 | +```json |
| 92 | +{ |
| 93 | + "sessionId": "3EB006B411C1B0933F9410", |
| 94 | + "groupJid": "120363423491841999@g.us", |
| 95 | + "count": 50, |
| 96 | + "message": "History sync requested. WhatsApp will deliver messages via messaging-history.set event (async)." |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +**Algorithm:** |
| 101 | +1. Validate groupJid ends with `@g.us` |
| 102 | +2. Rate-limit check (1 call per 30 seconds) |
| 103 | +3. Call `sock.fetchMessageHistory(count, anchorKey, anchorTimestamp)` |
| 104 | +4. WhatsApp delivers messages asynchronously via `messaging-history.set` event |
| 105 | +5. Messages are stored in DB if `DATABASE_SAVE_DATA_HISTORIC=true` |
| 106 | + |
| 107 | +**CRITICAL Prerequisites:** |
| 108 | +- `DATABASE_SAVE_DATA_HISTORIC=true` must be set in env — otherwise messages arrive but are NOT saved to DB |
| 109 | +- `daysLimitImportMessages` in Chatwoot config should be high (e.g., 1000) — otherwise old messages are filtered out |
| 110 | +- EA must be the **sole linked device** on the WhatsApp number — if another client (e.g., OwnPilot) is connected, WhatsApp may route the response to that client instead |
| 111 | + |
| 112 | +**Edge Cases:** |
| 113 | +- Rate limited: 1 call per 30 seconds. Calling faster throws `BadRequestException` with wait time |
| 114 | +- Empty anchor (`anchorMessageId: ""`) — WhatsApp may not respond at all |
| 115 | +- WhatsApp returns messages OLDER than the anchor (backward direction only) |
| 116 | +- Duplicate messages are handled by `messagesRepository.has(m.key.id)` check — no duplicates in DB |
| 117 | +- Max 50 messages per call (WhatsApp protocol limit) |
| 118 | +- Response is async — poll DB count or check logs to verify delivery |
| 119 | + |
| 120 | +**Iterative Fetching Pattern:** |
| 121 | +``` |
| 122 | +1. Find oldest message in DB → use as anchor |
| 123 | +2. Call fetchGroupHistory |
| 124 | +3. Wait 35s (30s rate-limit + 5s buffer) |
| 125 | +4. Check if DB count increased |
| 126 | +5. If increased → repeat from step 1 (new oldest message = new anchor) |
| 127 | +6. If no increase → reached beginning of history |
| 128 | +``` |
| 129 | + |
| 130 | +--- |
| 131 | + |
| 132 | +### 3. `POST /chat/batchRecoverMedia/{instance}` |
| 133 | + |
| 134 | +**Purpose:** End-to-end batch recovery pipeline. For each message: DB lookup → download → MinIO upload → media record → mediaUrl update. |
| 135 | + |
| 136 | +**When to use:** |
| 137 | +- You have message IDs in EA's DB with expired CDN URLs |
| 138 | +- You want to permanently store media in MinIO (S3) and update DB references |
| 139 | + |
| 140 | +**Request:** |
| 141 | +```json |
| 142 | +{ |
| 143 | + "messageIds": ["3EB0D228037ED522E72774", "3EB0DCCA32F22B9AA2A3B4"], |
| 144 | + "continueOnError": true, |
| 145 | + "storeToMinIO": true |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +**Response:** |
| 150 | +```json |
| 151 | +{ |
| 152 | + "total": 2, |
| 153 | + "ok": 1, |
| 154 | + "skip": 1, |
| 155 | + "error": 0, |
| 156 | + "results": [ |
| 157 | + { |
| 158 | + "messageId": "3EB0D228037ED522E72774", |
| 159 | + "status": "ok", |
| 160 | + "mediaUrl": "http://minio:9000/evolution-media/..." |
| 161 | + }, |
| 162 | + { |
| 163 | + "messageId": "3EB0DCCA32F22B9AA2A3B4", |
| 164 | + "status": "skip", |
| 165 | + "error": "Already stored in MinIO" |
| 166 | + } |
| 167 | + ] |
| 168 | +} |
| 169 | +``` |
| 170 | + |
| 171 | +**Algorithm per message:** |
| 172 | +1. Fetch message from DB by `key.id` + `instanceId` |
| 173 | +2. Extract media metadata from `documentMessage | imageMessage | videoMessage | audioMessage | stickerMessage` |
| 174 | +3. Skip if no `mediaKey`/`directPath` |
| 175 | +4. Skip if `mediaUrl` already points to non-WhatsApp URL (already in MinIO) |
| 176 | +5. Handle JSONB mediaKey format: `Object.keys().sort((a,b)=>parseInt(a)-parseInt(b))` for numeric key ordering |
| 177 | +6. Call `retryMediaFromMetadata` with `getBuffer=true` |
| 178 | +7. Upload buffer to MinIO via `s3Service.uploadFile` |
| 179 | +8. Upsert `Media` record in DB |
| 180 | +9. Update `message.mediaUrl` in the document message content |
| 181 | + |
| 182 | +**Edge Cases:** |
| 183 | +- JSONB mediaKey sort: PostgreSQL stores `{0:x, 1:y, 10:z, 2:w}` — lexicographic sort gives wrong byte order. Numeric sort fix applied. |
| 184 | +- `continueOnError: false` — stops at first failure, returns partial results |
| 185 | +- `storeToMinIO: false` — downloads but doesn't upload (useful for testing) |
| 186 | +- S3 not enabled — downloads and reports size but doesn't upload |
| 187 | +- Message not found in DB → `status: "skip"` |
| 188 | +- Empty buffer after download → `status: "error"` |
| 189 | +- Presigned URLs in `mediaUrl` expire after 7 days — but the object persists in MinIO. Generate new presigned URL via `s3Service.getObjectUrl()` |
| 190 | + |
| 191 | +**Batch Processing Pattern:** |
| 192 | +```python |
| 193 | +# Recommended: 10 per batch, 1-2s delay between batches |
| 194 | +for batch in chunks(message_ids, 10): |
| 195 | + response = POST /chat/batchRecoverMedia/{instance} { messageIds: batch } |
| 196 | + # Each batch takes ~10-30s depending on CDN/updateMediaMessage |
| 197 | +``` |
| 198 | + |
| 199 | +--- |
| 200 | + |
| 201 | +## Bug Fixes (included in this patch) |
| 202 | + |
| 203 | +### Fix 1: JSONB mediaKey Sort (commit `262c9300`) |
| 204 | + |
| 205 | +**Problem:** PostgreSQL stores Uint8Array as JSONB object `{0: 182, 1: 45, 10: 67, 2: 99, ...}`. JavaScript `Object.keys()` returns lexicographic order: `["0", "1", "10", "2", ...]` — wrong byte sequence → HKDF decryption fails. |
| 206 | + |
| 207 | +**Fix:** `Object.keys(mediaKey).sort((a, b) => parseInt(a) - parseInt(b)).map(k => mediaKey[k])` |
| 208 | + |
| 209 | +**Affected:** `getBase64FromMediaMessage` + `batchRecoverMedia` |
| 210 | + |
| 211 | +### Fix 2: Baileys RC9 `reuploadRequest` Dead Code (commit `f268571b`) |
| 212 | + |
| 213 | +**Problem:** Baileys 7.0.0-rc.9 wires `reuploadRequest` callback in download options, but the catch block checks `error.status` while the actual error has `output.statusCode` — callback never triggers on 410/404. |
| 214 | + |
| 215 | +**Fix:** Explicit `updateMediaMessage()` call in the catch block with 30s timeout, bypassing Baileys' broken internal retry. |
| 216 | + |
| 217 | +**Affected:** `getBase64FromMediaMessage` + `retryMediaFromMetadata` |
| 218 | + |
| 219 | +--- |
| 220 | + |
| 221 | +## Environment Configuration |
| 222 | + |
| 223 | +**Required for history sync to work:** |
| 224 | +```env |
| 225 | +DATABASE_SAVE_DATA_HISTORIC=true # MUST be set — otherwise messaging-history.set messages are dropped |
| 226 | +``` |
| 227 | + |
| 228 | +**Required for old message import:** |
| 229 | +```sql |
| 230 | +-- In Chatwoot table, increase daysLimitImportMessages (default: 3 days) |
| 231 | +UPDATE "Chatwoot" SET "daysLimitImportMessages" = 1000 |
| 232 | +WHERE "instanceId" = '<your-instance-id>'; |
| 233 | +``` |
| 234 | + |
| 235 | +**Required for MinIO storage:** |
| 236 | +```env |
| 237 | +S3_ENABLED=true |
| 238 | +S3_BUCKET=evolution-media |
| 239 | +S3_PORT=9000 |
| 240 | +S3_ENDPOINT=minio |
| 241 | +S3_ACCESS_KEY=<key> |
| 242 | +S3_SECRET_KEY=<secret> |
| 243 | +``` |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Production Results |
| 248 | + |
| 249 | +Tested on GoConnectIT WhatsApp instance (Euronet SOR documents): |
| 250 | + |
| 251 | +| Metric | Before | After | |
| 252 | +|--------|--------|-------| |
| 253 | +| Total messages | 1646 | 1870 (+224) | |
| 254 | +| Oldest message | Dec 8, 2025 | Nov 10, 2025 | |
| 255 | +| SOR files in MinIO | 0 | 1132/1137 (99.6%) | |
| 256 | +| Irrecoverable | — | 5 (sender permanently offline) | |
| 257 | + |
| 258 | +--- |
| 259 | + |
| 260 | +## File Changes |
| 261 | + |
| 262 | +| File | Changes | |
| 263 | +|------|---------| |
| 264 | +| `src/api/dto/chat.dto.ts` | +3 DTOs: `RetryMediaFromMetadataDto`, `FetchGroupHistoryDto`, `BatchRecoverMediaDto` | |
| 265 | +| `src/api/controllers/chat.controller.ts` | +3 controller methods | |
| 266 | +| `src/api/routes/chat.router.ts` | +3 route registrations | |
| 267 | +| `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts` | +3 service methods, 2 bug fixes | |
0 commit comments