Auto-translated map_item updates from 4CAT (bootstrap, 13 datasources) by 4cat-to-zeeschuimer-automation-pr[bot] · Pull Request #90 · digitalmethodsinitiative/zeeschuimer

4cat-to-zeeschuimer-automation-pr · 2026-06-10T13:50:59Z

🤖 This PR was auto-generated by the 4CAT map_item sync workflow. The JavaScript was produced by an LLM and requires human review before merging — including manual fixes for any lint warnings flagged below.

Generation parameters

Model: gpt-oss-120b (provider: litellm)
Total LLM time: 381.15s
Trigger: manual workflow_dispatch with bootstrap=true (initial sync of all Zeeschuimer datasources).

Summary

✅ 13 translated
⚠️ 6 translated with lint warnings (require manual fix)
❌ 1 failed
❔ 0 skipped

Datasource	Module	Time	Warnings
`datasources/douyin/search_douyin.py`	`modules/douyin.js`	42.77s	⚠️ 1
`datasources/gab/search_gab.py`	`modules/gab.js`	27.53s	—
`datasources/imgur/search_imgur.py`	`modules/imgur.js`	12.93s	—
`datasources/instagram/search_instagram.py`	`modules/instagram.js`	52.96s	⚠️ 1
`datasources/linkedin/search_linkedin.py`	`modules/linkedin.js`	52.03s	⚠️ 1
`datasources/ninegag/search_9gag.py`	`modules/9gag.js`	14.75s	—
`datasources/pinterest/search_pinterest.py`	`modules/pinterest.js`	25.6s	—
`datasources/threads/search_threads.py`	`modules/threads.js`	25.11s	⚠️ 2
`datasources/tiktok/search_tiktok.py`	`modules/tiktok.js`	18.64s	⚠️ 2
`datasources/tiktok_comments/search_tiktok_comments.py`	`modules/tiktok-comments.js`	13.58s	—
`datasources/truth/search_truth.py`	`modules/truth.js`	19.59s	—
`datasources/xiaohongshu/search_rednote.py`	`modules/rednote.js`	24.17s	⚠️ 4
`datasources/xiaohongshu_comments/search_rednote_comments.py`	`modules/rednote-comments.js`	13.12s	—

⚠️ Lint warnings — fix before merging

The following datasources translated successfully but the static lint flagged issues that need human fixes. The auto-generated code was spliced into the JS module as-is; please patch the file directly in this PR.

datasources/douyin/search_douyin.py -> modules/douyin.js

[helpers_to_add[0]] Regex detected. The current LLM translates regex unreliably (escapes, character classes, flags) — please verify the regex behavior against the Python original by hand.

datasources/instagram/search_instagram.py -> modules/instagram.js

[helpers_to_add[0]] Literal newline inside a string literal — JS strings can't span lines without escape ("\n") or template literals (`\n`).

datasources/linkedin/search_linkedin.py -> modules/linkedin.js

[helpers_to_add[1]] Regex detected. The current LLM translates regex unreliably (escapes, character classes, flags) — please verify the regex behavior against the Python original by hand.

datasources/threads/search_threads.py -> modules/threads.js

[map_item_function] Literal newline inside a string literal — JS strings can't span lines without escape ("\n") or template literals (`\n`).
[map_item_function] Regex detected. The current LLM translates regex unreliably (escapes, character classes, flags) — please verify the regex behavior against the Python original by hand.

datasources/tiktok/search_tiktok.py -> modules/tiktok.js

[map_item_function] Even without an f prefix, "text {var}" / 'text {var}' are literal text in JavaScript — no interpolation happens. Whenever the original Python used an f-string, the JS must use a template literal (backticks).
[map_item_function] .get( call found. Python dict.get(k[, default]) does not exist in JavaScript — use [k] / [k] ?? default. NOTE: this check is a plain substring match, so it also flags legitimate JS .get() on Map, URLSearchParams, Headers, etc. — ignore the warning if the receiver is one of those.

datasources/xiaohongshu/search_rednote.py -> modules/rednote.js

[helpers_to_add[0]] Literal newline inside a string literal — JS strings can't span lines without escape ("\n") or template literals (`\n`).
[helpers_to_add[0]] Regex detected. The current LLM translates regex unreliably (escapes, character classes, flags) — please verify the regex behavior against the Python original by hand.
[helpers_to_add[1]] Literal newline inside a string literal — JS strings can't span lines without escape ("\n") or template literals (`\n`).
[helpers_to_add[1]] Regex detected. The current LLM translates regex unreliably (escapes, character classes, flags) — please verify the regex behavior against the Python original by hand.

`datasources/douyin/search_douyin.py` -> `modules/douyin.js` (42.77s) ⚠️

LLM commentary:

The JS version mirrors the Python logic closely. Helper getChineseNumber implements the same conversion of Chinese number strings (e.g. "1.2万" → 12000). All .get calls are replaced with optional chaining or explicit fallback (??). MissingMappedField objects are instantiated where the Python code used a default. The timestamp is formatted with the global formatUtcTimestamp helper, which expects a Unix‑seconds value. Keys that differ between embedded and non‑embedded payloads are set in the appropriate branch. The function assumes the global helpers (MappedItem, MissingMappedField, normalize_url_encoding, formatUtcTimestamp) are already loaded by Zeeschuimer. No additional imports are required.

`datasources/gab/search_gab.py` -> `modules/gab.js` (27.53s)

LLM commentary:

The JS version mirrors the Python map_item logic. All dict.get calls are replaced with optional chaining and nullish coalescing. Timestamp handling uses the global formatUtcTimestamp helper (expects seconds). Media handling reproduces the unknown-data collection and filters out missing URLs. Group, author, and card objects may be null; fields fall back to null when the container is absent, matching Python's conditional expressions. MissingMappedField is instantiated when the import metadata lacks a timestamp. No additional imports or helper functions were needed beyond the globals provided by Zeeschuimer.

`datasources/imgur/search_imgur.py` -> `modules/imgur.js` (12.93s)

LLM commentary:

The JS version mirrors the Python map_item exactly. Timestamp parsing uses Date.parse on the ISO string and then formatUtcTimestamp (global helper) to get the same "YYYY-MM-DD HH:MM:SS" format. Optional chaining (?.) guards against missing nested objects (e.g., cover). Boolean fields are converted to "yes"/"no" strings. All field names match the Python output. No extra imports or helper functions are required because Zeeschuimer provides MappedItem, normalize_url_encoding, formatUtcTimestamp, and MapItemException globally.

`datasources/instagram/search_instagram.py` -> `modules/instagram.js` (52.96s) ⚠️

LLM commentary:

Implemented map_item as a thin dispatcher that mirrors the Python logic. Added three parser helpers (Polaris, Graph, Itemlist) plus a small extractHashtags helper and the required constants. All Python constructs have been converted to JavaScript equivalents:

Used optional chaining and nullish coalescing instead of dict.get.

Replaced Python regex findall with matchAll to capture hashtag groups.

Used the global formatUtcTimestamp helper for timestamp formatting.

Wrapped all MissingMappedField, MappedItem and MapItemException calls with new.

Preserved field names exactly as in the Python version.

Throwing MapItemException for ads and for unexpected format changes matches the original behaviour.
No external imports are needed because Zeeschuimer provides the required globals. The code assumes those globals exist at runtime.

`datasources/linkedin/search_linkedin.py` -> `modules/linkedin.js` (52.03s) ⚠️

LLM commentary:

Implemented the Python map_item logic in JavaScript, preserving field names exactly. Added two helper functions (getAuthor and parseTimeAgo) because they are used internally. Used global helpers MappedItem, normalize_url_encoding, and formatUtcTimestamp. Replaced Python's datetime formatting with formatUtcTimestamp, and used regex test for is_promoted logic. All optional property accesses use optional chaining and nullish coalescing to avoid TypeErrors. Image and video thumbnail extraction follows the same priority rules as the Python version. No external imports are required; helpers are added to the helpers_to_add array.

`datasources/ninegag/search_9gag.py` -> `modules/9gag.js` (14.75s)

LLM commentary:

The JS version mirrors the Python map_item logic. Timestamps are handled with the global formatUtcTimestamp helper, which expects a Unix‑seconds value (post.creationTs). Image and video selection replicate the Python sorting by pixel count, and video URL priority follows the same order (av1, h265, vp9, vp8). Anonymous posts are given the placeholder creator object. All optional fields use optional chaining and nullish coalescing to avoid undefined values. No extra imports or helper functions are required because Zeeschuimer already provides MappedItem, normalize_url_encoding, and formatUtcTimestamp as globals.

`datasources/pinterest/search_pinterest.py` -> `modules/pinterest.js` (25.6s)

LLM commentary:

Implemented map_item with two internal helpers mirroring the Python logic. Timestamp parsing uses Date and the global formatUtcTimestamp helper; on failure MissingMappedField objects are returned to match Python's MissingMappedField(''). All field names are preserved exactly. Optional chaining and nullish coalescing replace Python dict.get calls. Image URL extraction follows the same precedence as the Python version. No external imports are required because Zeeschuimer provides MappedItem, MissingMappedField, normalize_url_encoding, and formatUtcTimestamp globally.

`datasources/threads/search_threads.py` -> `modules/threads.js` (25.11s) ⚠️

LLM commentary:

Implemented the Python map_item logic in JavaScript. Used formatUtcTimestamp (global) for the timestamp string, URL and URLSearchParams to extract the optional 'u' query parameter, and matchAll with a captured‑group regex to build the hashtags list. Optional chaining and nullish coalescing guard against missing fields. Empty containers are joined with commas to match the Python ','.join behaviour. All fields from the Python MappedItem are reproduced with identical names and comparable values. No extra imports are required because all helpers are globals.

`datasources/tiktok/search_tiktok.py` -> `modules/tiktok.js` (18.64s) ⚠️

LLM commentary:

Implemented the Python map_item logic in JavaScript, preserving field names exactly. Used global helpers MappedItem, normalize_url_encoding, and formatUtcTimestamp. Handled thumbnail selection by extracting the last shareCover entry (if an array) and filtering by the x-expires query parameter. All optional fields default to empty strings or appropriate fallbacks to match Python's .get(..., "") behavior. No additional imports or helper functions were needed.

`datasources/tiktok_comments/search_tiktok_comments.py` -> `modules/tiktok-comments.js` (13.58s)

LLM commentary:

The JS version mirrors the Python map_item logic. Timestamp formatting uses the global formatUtcTimestamp helper to match Python's '%Y-%m-%d %H:%M:%S'. Nested fields are accessed safely with optional chaining and nullish coalescing, providing null defaults where Python would have used .get(..., default). Boolean checks are converted to !!value and ternary expressions to produce "yes"/"no" strings. No additional imports or helper functions are required because all needed utilities (MappedItem, normalize_url_encoding, formatUtcTimestamp) are globals provided by Zeeschuimer.

`datasources/truth/search_truth.py` -> `modules/truth.js` (19.59s)

LLM commentary:

The JS version mirrors the Python logic exactly: media handling, group fallback, thread ID resolution (including nested replies), and collection of mentions/hashtags. Dates are parsed with the native Date constructor and formatted manually to match Python's "%Y-%m-%d %H:%M:%S" output. All optional fields use null or default values so the resulting JSON retains keys (undefined would be dropped). Errors are accumulated and passed as the second argument to MappedItem, matching the Python MappedItem(message=...). No external imports are required because Zeeschuimer provides MappedItem and normalize_url_encoding globally.

`datasources/xiaohongshu/search_rednote.py` -> `modules/rednote.js` (24.17s) ⚠️

LLM commentary:

Implemented the three mapping branches (API explore, embedded JSON, HTML) as separate helper functions. All Python constructs have been converted to JavaScript equivalents:

f-strings become template literals.

dict.get() replaced with optional chaining and nullish coalescing.

Regex findall with capture groups uses matchAll and extracts group 1.

MissingMappedField instances are created with new MissingMappedField("") to mirror the Python placeholder.

Timestamps (ms) are converted to seconds before calling the global formatUtcTimestamp helper.

Image handling mirrors the original logic, returning a MissingMappedField when no images are found.

The tile‑stub rejection mirrors the Python condition and throws a MapItemException.
No external imports are required; all helpers (MappedItem, MissingMappedField, normalize_url_encoding, formatUtcTimestamp, MapItemException) are globals provided by Zeeschuimer. The helper functions are supplied in helpers_to_add so they are available when map_item is executed.

`datasources/xiaohongshu_comments/search_rednote_comments.py` -> `modules/rednote-comments.js` (13.12s)

LLM commentary:

The JS version mirrors the Python map_item exactly. Timestamp conversion uses the global formatUtcTimestamp helper, which expects a Unix timestamp in seconds, matching the Python datetime.fromtimestamp(ms/1000). The collected_from_url field uses optional chaining and nullish coalescing to safely access the nested __import_meta object. ip_location follows the Python logic: if the value is truthy it is kept, otherwise a new MissingMappedField('') is inserted. All other fields are direct property accesses, with safe fallbacks for optional fields (content, user_info). No additional imports or helper functions are required because all needed utilities are provided globally by Zeeschuimer.

Failures

datasources/twitter-import/search_twitter.py (after 38.29s): could not parse a JSON object from the model reply

…`id`

…efc1468568c73d8d

…sync-bootstrap

dale-wahl · 2026-06-11T13:38:06Z

Worked through my tests. And we're clean for this:

Test Suites: 1 passed, 1 total
Tests:       1439 passed, 1439 total
Snapshots:   0 total
Time:        79.915 s
Ran all test suites.

=== map_item compare summary ===
  ✓ PASS  pinterest        5daeba72a2dfbb5ed8c855f824a61570  — 110/110 items match
  ✓ PASS  instagram        945bc3cd29726d676419339e3c8feeb9  — 124/124 items match
  ✓ PASS  9gag             92cb4f4865cd259ab868846b6e007000  — 48/48 items match
  ✓ PASS  douyin           9e355eb06a266576aebcd6ef3ab8f1c3  — 70/70 items match
  ✓ PASS  gab              05f95dfd6f827a9025a3d1f0828feb09  — 43/43 items match
  ✓ PASS  imgur            64d532cdae21835ee2313bc2bb9a060f  — 238/238 items match
  ✓ PASS  threads          474112f148078b2a5d350ab3d80e93ce  — 54/54 items match
  ✓ PASS  rednote          8e5f759a145555b8134f70afaea57109  — 171/171 items match
  ✓ PASS  tiktok-comments  f4644cb7ad521b0fed483a3172460a92  — 141/141 items match
  ✓ PASS  tiktok           c4ce8cafddc41555e9aed2901e183e53  — 120/120 items match
  ✓ PASS  truth            474f560c67d914cfc316a948a6994cb9  — 137/137 items match
  ✓ PASS  rednote          a74cc6406430422e866bce7443224a19  — 171/171 items match
12 datasource(s): 12 passed, 0 failed, 0 skipped

Looks like I am missing some datasources (and randomly have two rednotes).
I cannot get rednote-comments and linkedin is dead (right?). And Twitter/X failed to translate (so will try that again in a separate PR.

dale-wahl · 2026-06-11T13:43:15Z

@stijn-uva these datasets were all pretty recent ones I collected to have something to test against and not necessarily comprehensive. Also, I was not super sure how to test all of this together so I ended up merging the map_item_testing_actual_tests branch into this one (and then made updates to the tests themselves here). Thus it is messier than I would have liked. But you should be able to test is out and download mapped CSVs from Zeeschuimer!

Perhaps we should delete the linkedin and rednote-comment mappers until we can test them properly.

dale-wahl added 30 commits May 5, 2026 17:38

minimal changes for direct from 4CAT mapping

491f51b

give me some standard helper functions

b06805f

fix csv export

f9a2405

another to CSV fix

2f084b9

revert tiktok (mistaken test result commited)

d787042

clean up UI (make download menu button)

a9fba9a

testing is hard in JS

0980a56

add fixtures folder and README.md to explain what I did

46b96c7

add MapItemException

487b5b6

make a warning pop up

b6f487d

add MapItemException

f28e310

Merge branch 'master' into map_item_testing_actual_tests

7dedad7

Merge branch 'map_item_testing' into map_item_testing_actual_tests

f8c47d7

add env variables for tests (to connect to 4CAT)

5baff31

mirror 4CAT API missing value

6a8ce38

test the 4cat API endpoint

0c31403

update docs and packages

be2f308

some mapping for odd datasource names

caf1c7f

update existing map_item tests and add helper

f10fc49

comparison testing for datasources

3633cde

list common translation errors

7d97a0f

package.json fix

6ad4c13

rm other test doc

11ffffb

map_item.test.js verify modules import and map_item exists only

6cc6100

remove old fixtures and 4cat probe

a090675

update lib.js note on new endpoint

c62a7e7

update tests/.env.example (comments and dataset keys)

234f1ce

note on _loader.js for wrap_for_map_item

e0d0fb8

fix my test environment; scripts vs libraries

f2341d6

update map_item_compare.test.js for new 4CAT endpoints

e39ad42

dale-wahl and others added 14 commits June 3, 2026 15:48

fast_fail OR --all for tests

d7fcb4c

use headers for datasource

4f9e69c

add the --all instead of just fail_fail

8b918d4

map_item_compare.test.js: compare based on mapped id field not raw …

00f0369

…`id`

map_item_compare.test.js: still show errors on failed id matches

c7bb9ac

chore: sync map_item for bootstrap from 4CAT 888f0a126ea70404034f265f…

53c2b6f

…efc1468568c73d8d

Merge branch 'map_item_testing_actual_tests' into auto/4cat-map-item-…

dd62a4e

…sync-bootstrap

instagram.js: fix {} is truthy, location_city null vs ""

ce9ba39

douyin: "" vs null and Missing vs null

a5d981c

gab: key lost if undefined in JS

25ab435

map_item_test: fix order issue on id comparison

a83ebe8

map_item_compare.test: summarize datasources that pass/fail

3b5d157

threads.js fix some regex

8118d81

map_item_compare.test: loosely test URLs (not byte for byte)

743af85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-translated map_item updates from 4CAT (bootstrap, 13 datasources)#90

Auto-translated map_item updates from 4CAT (bootstrap, 13 datasources)#90
4cat-to-zeeschuimer-automation-pr[bot] wants to merge 44 commits into
masterfrom
auto/4cat-map-item-sync-bootstrap

4cat-to-zeeschuimer-automation-pr Bot commented Jun 10, 2026

Uh oh!

dale-wahl commented Jun 11, 2026

Uh oh!

dale-wahl commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

4cat-to-zeeschuimer-automation-pr Bot commented Jun 10, 2026

Generation parameters

Summary

⚠️ Lint warnings — fix before merging

datasources/douyin/search_douyin.py -> modules/douyin.js (42.77s) ⚠️

datasources/gab/search_gab.py -> modules/gab.js (27.53s)

datasources/imgur/search_imgur.py -> modules/imgur.js (12.93s)

datasources/instagram/search_instagram.py -> modules/instagram.js (52.96s) ⚠️

datasources/linkedin/search_linkedin.py -> modules/linkedin.js (52.03s) ⚠️

datasources/ninegag/search_9gag.py -> modules/9gag.js (14.75s)

datasources/pinterest/search_pinterest.py -> modules/pinterest.js (25.6s)

datasources/threads/search_threads.py -> modules/threads.js (25.11s) ⚠️

datasources/tiktok/search_tiktok.py -> modules/tiktok.js (18.64s) ⚠️

datasources/tiktok_comments/search_tiktok_comments.py -> modules/tiktok-comments.js (13.58s)

datasources/truth/search_truth.py -> modules/truth.js (19.59s)

datasources/xiaohongshu/search_rednote.py -> modules/rednote.js (24.17s) ⚠️

datasources/xiaohongshu_comments/search_rednote_comments.py -> modules/rednote-comments.js (13.12s)

Failures

Uh oh!

dale-wahl commented Jun 11, 2026

Uh oh!

dale-wahl commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`datasources/douyin/search_douyin.py` -> `modules/douyin.js` (42.77s) ⚠️

`datasources/gab/search_gab.py` -> `modules/gab.js` (27.53s)

`datasources/imgur/search_imgur.py` -> `modules/imgur.js` (12.93s)

`datasources/instagram/search_instagram.py` -> `modules/instagram.js` (52.96s) ⚠️

`datasources/linkedin/search_linkedin.py` -> `modules/linkedin.js` (52.03s) ⚠️

`datasources/ninegag/search_9gag.py` -> `modules/9gag.js` (14.75s)

`datasources/pinterest/search_pinterest.py` -> `modules/pinterest.js` (25.6s)

`datasources/threads/search_threads.py` -> `modules/threads.js` (25.11s) ⚠️

`datasources/tiktok/search_tiktok.py` -> `modules/tiktok.js` (18.64s) ⚠️

`datasources/tiktok_comments/search_tiktok_comments.py` -> `modules/tiktok-comments.js` (13.58s)

`datasources/truth/search_truth.py` -> `modules/truth.js` (19.59s)

`datasources/xiaohongshu/search_rednote.py` -> `modules/rednote.js` (24.17s) ⚠️

`datasources/xiaohongshu_comments/search_rednote_comments.py` -> `modules/rednote-comments.js` (13.12s)