Skip to content

Commit f6348f6

Browse files
CyPackclaude
authored andcommitted
docs: add comprehensive Media Recovery System guide
Agent-friendly reference for the 3 new endpoints + 2 bug fixes: - retryMediaFromMetadata: metadata-based download without DB lookup - fetchGroupHistory: WhatsApp on-demand history sync trigger - batchRecoverMedia: end-to-end batch recovery pipeline (download → MinIO → DB) Includes: algorithm details, edge cases, JSONB mediaKey sort fix, Baileys RC9 reuploadRequest workaround, env prerequisites, iterative fetching pattern, and production results (1132/1137 SOR recovered). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 28fc185 commit f6348f6

File tree

1 file changed

+267
-0
lines changed

1 file changed

+267
-0
lines changed

docs/MEDIA_RECOVERY_GUIDE.md

Lines changed: 267 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,267 @@
1+
# Media Recovery System — Agent Reference Guide
2+
3+
> **Branch:** `fix/media-key-jsonb-updateMediaMessage`
4+
> **4 commits** on top of `release/2.3.7` — all production-tested.
5+
6+
## Overview
7+
8+
This patch adds 3 new endpoints + 2 bug fixes to Evolution API, making it **self-sufficient** for WhatsApp media recovery without external orchestrators.
9+
10+
### Problem Solved
11+
12+
WhatsApp CDN URLs expire after ~7 days. Once expired, media files (SOR documents, images, etc.) become permanently inaccessible unless you:
13+
1. Have the original `mediaKey` + `directPath` (stored in EA's Message table)
14+
2. Can trigger WhatsApp's `updateMediaMessage` protocol (asks sender to re-upload)
15+
3. Can request historical messages via `fetchMessageHistory` (on-demand history sync)
16+
17+
Previously, only OwnPilot had these capabilities. Now EA has them natively.
18+
19+
---
20+
21+
## Endpoints
22+
23+
### 1. `POST /chat/retryMediaFromMetadata/{instance}`
24+
25+
**Purpose:** Download media using caller-supplied metadata — does NOT require message to exist in EA's DB.
26+
27+
**When to use:**
28+
- You have `mediaKey` + `directPath` from an external source (e.g., OwnPilot DB)
29+
- The message exists in EA but `getBase64FromMediaMessage` fails (DB lookup issue)
30+
31+
**Request:**
32+
```json
33+
{
34+
"messageId": "3EB0D228037ED522E72774",
35+
"remoteJid": "120363423491841999@g.us",
36+
"participant": "119365882089638@lid",
37+
"fromMe": false,
38+
"mediaKey": "base64-encoded-key",
39+
"directPath": "/v/t62.7119-24/...",
40+
"url": "https://mmg.whatsapp.net/...",
41+
"mimeType": "application/octet-stream",
42+
"filename": "2314CP_82_V1.SOR",
43+
"fileLength": 20973,
44+
"convertToMp4": false
45+
}
46+
```
47+
48+
**Response:**
49+
```json
50+
{
51+
"base64": "TWFwAMgAfA...",
52+
"mimetype": "application/octet-stream",
53+
"filename": "2314CP_82_V1.SOR"
54+
}
55+
```
56+
57+
**Algorithm:**
58+
1. Reconstruct minimal WAMessage proto from provided metadata
59+
2. Try direct `downloadMediaMessage` (fast-path — CDN still valid)
60+
3. On failure → explicit `updateMediaMessage` with 30s timeout (Baileys RC9 workaround)
61+
4. Retry download with refreshed URL
62+
63+
**Edge Cases:**
64+
- `mediaKey` from PostgreSQL JSONB may be stored as `{0: 123, 1: 45, ...}` object instead of Uint8Array — the code handles both formats via lexicographic sort fix (commit `262c9300`)
65+
- `updateMediaMessage` times out after 30s if sender is permanently offline — throws `BadRequestException`
66+
- Audio files with `convertToMp4: true` are processed via `processAudioMp4`
67+
68+
---
69+
70+
### 2. `POST /chat/fetchGroupHistory/{instance}`
71+
72+
**Purpose:** Trigger WhatsApp on-demand history sync for a group. WhatsApp responds with old message protos containing **fresh mediaKey + directPath**.
73+
74+
**When to use:**
75+
- Messages are missing from EA's DB (were sent before EA was connected)
76+
- You need fresh mediaKeys for messages whose CDN URLs expired
77+
78+
**Request:**
79+
```json
80+
{
81+
"groupJid": "120363423491841999@g.us",
82+
"count": 50,
83+
"anchorMessageId": "3EB0DCCA32F22B9AA2A3B4",
84+
"anchorTimestamp": 1765216930,
85+
"anchorFromMe": false,
86+
"anchorParticipant": "90383560261829@lid"
87+
}
88+
```
89+
90+
**Response (immediate — 202-style):**
91+
```json
92+
{
93+
"sessionId": "3EB006B411C1B0933F9410",
94+
"groupJid": "120363423491841999@g.us",
95+
"count": 50,
96+
"message": "History sync requested. WhatsApp will deliver messages via messaging-history.set event (async)."
97+
}
98+
```
99+
100+
**Algorithm:**
101+
1. Validate groupJid ends with `@g.us`
102+
2. Rate-limit check (1 call per 30 seconds)
103+
3. Call `sock.fetchMessageHistory(count, anchorKey, anchorTimestamp)`
104+
4. WhatsApp delivers messages asynchronously via `messaging-history.set` event
105+
5. Messages are stored in DB if `DATABASE_SAVE_DATA_HISTORIC=true`
106+
107+
**CRITICAL Prerequisites:**
108+
- `DATABASE_SAVE_DATA_HISTORIC=true` must be set in env — otherwise messages arrive but are NOT saved to DB
109+
- `daysLimitImportMessages` in Chatwoot config should be high (e.g., 1000) — otherwise old messages are filtered out
110+
- EA must be the **sole linked device** on the WhatsApp number — if another client (e.g., OwnPilot) is connected, WhatsApp may route the response to that client instead
111+
112+
**Edge Cases:**
113+
- Rate limited: 1 call per 30 seconds. Calling faster throws `BadRequestException` with wait time
114+
- Empty anchor (`anchorMessageId: ""`) — WhatsApp may not respond at all
115+
- WhatsApp returns messages OLDER than the anchor (backward direction only)
116+
- Duplicate messages are handled by `messagesRepository.has(m.key.id)` check — no duplicates in DB
117+
- Max 50 messages per call (WhatsApp protocol limit)
118+
- Response is async — poll DB count or check logs to verify delivery
119+
120+
**Iterative Fetching Pattern:**
121+
```
122+
1. Find oldest message in DB → use as anchor
123+
2. Call fetchGroupHistory
124+
3. Wait 35s (30s rate-limit + 5s buffer)
125+
4. Check if DB count increased
126+
5. If increased → repeat from step 1 (new oldest message = new anchor)
127+
6. If no increase → reached beginning of history
128+
```
129+
130+
---
131+
132+
### 3. `POST /chat/batchRecoverMedia/{instance}`
133+
134+
**Purpose:** End-to-end batch recovery pipeline. For each message: DB lookup → download → MinIO upload → media record → mediaUrl update.
135+
136+
**When to use:**
137+
- You have message IDs in EA's DB with expired CDN URLs
138+
- You want to permanently store media in MinIO (S3) and update DB references
139+
140+
**Request:**
141+
```json
142+
{
143+
"messageIds": ["3EB0D228037ED522E72774", "3EB0DCCA32F22B9AA2A3B4"],
144+
"continueOnError": true,
145+
"storeToMinIO": true
146+
}
147+
```
148+
149+
**Response:**
150+
```json
151+
{
152+
"total": 2,
153+
"ok": 1,
154+
"skip": 1,
155+
"error": 0,
156+
"results": [
157+
{
158+
"messageId": "3EB0D228037ED522E72774",
159+
"status": "ok",
160+
"mediaUrl": "http://minio:9000/evolution-media/..."
161+
},
162+
{
163+
"messageId": "3EB0DCCA32F22B9AA2A3B4",
164+
"status": "skip",
165+
"error": "Already stored in MinIO"
166+
}
167+
]
168+
}
169+
```
170+
171+
**Algorithm per message:**
172+
1. Fetch message from DB by `key.id` + `instanceId`
173+
2. Extract media metadata from `documentMessage | imageMessage | videoMessage | audioMessage | stickerMessage`
174+
3. Skip if no `mediaKey`/`directPath`
175+
4. Skip if `mediaUrl` already points to non-WhatsApp URL (already in MinIO)
176+
5. Handle JSONB mediaKey format: `Object.keys().sort((a,b)=>parseInt(a)-parseInt(b))` for numeric key ordering
177+
6. Call `retryMediaFromMetadata` with `getBuffer=true`
178+
7. Upload buffer to MinIO via `s3Service.uploadFile`
179+
8. Upsert `Media` record in DB
180+
9. Update `message.mediaUrl` in the document message content
181+
182+
**Edge Cases:**
183+
- JSONB mediaKey sort: PostgreSQL stores `{0:x, 1:y, 10:z, 2:w}` — lexicographic sort gives wrong byte order. Numeric sort fix applied.
184+
- `continueOnError: false` — stops at first failure, returns partial results
185+
- `storeToMinIO: false` — downloads but doesn't upload (useful for testing)
186+
- S3 not enabled — downloads and reports size but doesn't upload
187+
- Message not found in DB → `status: "skip"`
188+
- Empty buffer after download → `status: "error"`
189+
- Presigned URLs in `mediaUrl` expire after 7 days — but the object persists in MinIO. Generate new presigned URL via `s3Service.getObjectUrl()`
190+
191+
**Batch Processing Pattern:**
192+
```python
193+
# Recommended: 10 per batch, 1-2s delay between batches
194+
for batch in chunks(message_ids, 10):
195+
response = POST /chat/batchRecoverMedia/{instance} { messageIds: batch }
196+
# Each batch takes ~10-30s depending on CDN/updateMediaMessage
197+
```
198+
199+
---
200+
201+
## Bug Fixes (included in this patch)
202+
203+
### Fix 1: JSONB mediaKey Sort (commit `262c9300`)
204+
205+
**Problem:** PostgreSQL stores Uint8Array as JSONB object `{0: 182, 1: 45, 10: 67, 2: 99, ...}`. JavaScript `Object.keys()` returns lexicographic order: `["0", "1", "10", "2", ...]` — wrong byte sequence → HKDF decryption fails.
206+
207+
**Fix:** `Object.keys(mediaKey).sort((a, b) => parseInt(a) - parseInt(b)).map(k => mediaKey[k])`
208+
209+
**Affected:** `getBase64FromMediaMessage` + `batchRecoverMedia`
210+
211+
### Fix 2: Baileys RC9 `reuploadRequest` Dead Code (commit `f268571b`)
212+
213+
**Problem:** Baileys 7.0.0-rc.9 wires `reuploadRequest` callback in download options, but the catch block checks `error.status` while the actual error has `output.statusCode` — callback never triggers on 410/404.
214+
215+
**Fix:** Explicit `updateMediaMessage()` call in the catch block with 30s timeout, bypassing Baileys' broken internal retry.
216+
217+
**Affected:** `getBase64FromMediaMessage` + `retryMediaFromMetadata`
218+
219+
---
220+
221+
## Environment Configuration
222+
223+
**Required for history sync to work:**
224+
```env
225+
DATABASE_SAVE_DATA_HISTORIC=true # MUST be set — otherwise messaging-history.set messages are dropped
226+
```
227+
228+
**Required for old message import:**
229+
```sql
230+
-- In Chatwoot table, increase daysLimitImportMessages (default: 3 days)
231+
UPDATE "Chatwoot" SET "daysLimitImportMessages" = 1000
232+
WHERE "instanceId" = '<your-instance-id>';
233+
```
234+
235+
**Required for MinIO storage:**
236+
```env
237+
S3_ENABLED=true
238+
S3_BUCKET=evolution-media
239+
S3_PORT=9000
240+
S3_ENDPOINT=minio
241+
S3_ACCESS_KEY=<key>
242+
S3_SECRET_KEY=<secret>
243+
```
244+
245+
---
246+
247+
## Production Results
248+
249+
Tested on GoConnectIT WhatsApp instance (Euronet SOR documents):
250+
251+
| Metric | Before | After |
252+
|--------|--------|-------|
253+
| Total messages | 1646 | 1870 (+224) |
254+
| Oldest message | Dec 8, 2025 | Nov 10, 2025 |
255+
| SOR files in MinIO | 0 | 1132/1137 (99.6%) |
256+
| Irrecoverable || 5 (sender permanently offline) |
257+
258+
---
259+
260+
## File Changes
261+
262+
| File | Changes |
263+
|------|---------|
264+
| `src/api/dto/chat.dto.ts` | +3 DTOs: `RetryMediaFromMetadataDto`, `FetchGroupHistoryDto`, `BatchRecoverMediaDto` |
265+
| `src/api/controllers/chat.controller.ts` | +3 controller methods |
266+
| `src/api/routes/chat.router.ts` | +3 route registrations |
267+
| `src/api/integrations/channel/whatsapp/whatsapp.baileys.service.ts` | +3 service methods, 2 bug fixes |

0 commit comments

Comments
 (0)