Skip to content

feat: create post www dedup#3946

Open
capJavert wants to merge 2 commits into
mainfrom
create-post-www-dedup
Open

feat: create post www dedup#3946
capJavert wants to merge 2 commits into
mainfrom
create-post-www-dedup

Conversation

@capJavert

Copy link
Copy Markdown
Contributor

No description provided.

@capJavert capJavert requested a review from idoshamun June 17, 2026 16:51
@capJavert capJavert self-assigned this Jun 17, 2026
@pulumi

pulumi Bot commented Jun 17, 2026

Copy link
Copy Markdown

🍹 The Update (preview) for dailydotdev/api/prod (at 04ea9c6) was successful.

✨ Neo Code Review

Routine deployment of the www/non-www URL deduplication feature with accompanying DB and ClickHouse migrations; no stateful resources are being replaced and no data loss risk exists. ✅ Low Risk

This is a standard application deployment rolling out commit 04ea9c6 (previously bdc7956). The code change adds www/non-www URL deduplication logic for posts — when a post is submitted with http://example.com/path, the system now checks if http://www.example.com/path already exists (and vice versa) before creating a duplicate. This affects post creation via the RPC endpoint, the getExistingPost and findPostByUrl helpers, and the CDC worker that processes squad post moderation approvals.

Both DB and ClickHouse migration jobs are being cycled (old commit-tagged jobs deleted, new ones created), which is the normal per-deploy migration pattern. The migration jobs will run before or alongside the rolling deployment of all seven deployments and their associated cron jobs.

🔵 Info — The dedup logic (getWwwVariantPost) performs an extra DB query on every post creation request via the RPC path. This is a new synchronous lookup against ArticlePost by URL that did not exist before; at high ingest rates this adds marginal DB load, but it's a simple indexed findOneBy so impact should be negligible.

Resource Changes

    Name                                                       Type                           Operation
~   vpc-native-update-highlighted-views-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-squad-posts-analytics-refresh-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-rotate-daily-quests-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-private-deployment                              kubernetes:apps/v1:Deployment  update
~   vpc-native-rotate-weekly-quests-cron                       kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-analytics-history-clickhouse-cron  kubernetes:batch/v1:CronJob    update
~   vpc-native-worker-job-deployment                           kubernetes:apps/v1:Deployment  update
-   vpc-native-api-clickhouse-migration-b02d2849               kubernetes:batch/v1:Job        delete
~   vpc-native-user-profile-analytics-clickhouse-cron          kubernetes:batch/v1:CronJob    update
~   vpc-native-update-achievement-rarity-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-generate-search-invites-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-user-posts-analytics-refresh-cron               kubernetes:batch/v1:CronJob    update
~   vpc-native-ws-deployment                                   kubernetes:apps/v1:Deployment  update
~   vpc-native-clean-stale-user-transactions-cron              kubernetes:batch/v1:CronJob    update
+   vpc-native-api-clickhouse-migration-0eec750d               kubernetes:batch/v1:Job        create
~   vpc-native-clean-expired-better-auth-sessions-cron         kubernetes:batch/v1:CronJob    update
~   vpc-native-post-analytics-achievements-cron                kubernetes:batch/v1:CronJob    update
~   vpc-native-sync-subscription-with-cio-cron                 kubernetes:batch/v1:CronJob    update
~   vpc-native-user-profile-updated-sync-cron                  kubernetes:batch/v1:CronJob    update
~   vpc-native-calculate-top-readers-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-users-cron                         kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tag-materialized-views-cron              kubernetes:batch/v1:CronJob    update
~   vpc-native-materialize-yearly-best-post-archives-cron      kubernetes:batch/v1:CronJob    update
-   vpc-native-api-db-migration-b02d2849                       kubernetes:batch/v1:Job        delete
~   vpc-native-update-trending-cron                            kubernetes:batch/v1:CronJob    update
+   vpc-native-api-db-migration-0eec750d                       kubernetes:batch/v1:Job        create
~   vpc-native-personalized-digest-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-current-streak-cron                      kubernetes:batch/v1:CronJob    update
~   vpc-native-update-source-public-threshold-cron             kubernetes:batch/v1:CronJob    update
~   vpc-native-expire-super-agent-trial-cron                   kubernetes:batch/v1:CronJob    update
~   vpc-native-deployment                                      kubernetes:apps/v1:Deployment  update
~   vpc-native-personalized-digest-deployment                  kubernetes:apps/v1:Deployment  update
~   vpc-native-clean-old-notifications-cron                    kubernetes:batch/v1:CronJob    update
~   vpc-native-temporal-deployment                             kubernetes:apps/v1:Deployment  update
~   vpc-native-clean-gifted-plus-cron                          kubernetes:batch/v1:CronJob    update
~   vpc-native-hourly-notification-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-update-tags-str-cron                            kubernetes:batch/v1:CronJob    update
~   vpc-native-check-analytics-report-cron                     kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-user-companies-cron                kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-zombie-images-cron                        kubernetes:batch/v1:CronJob    update
~   vpc-native-clean-channel-highlights-cron                   kubernetes:batch/v1:CronJob    update
... and 14 other changes

@capJavert

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown

Claude finished @capJavert's task in 59s —— View job


Code Review: www / non-www dedup

I reviewed the diff (src/routes/private/rpc.ts, __tests__/routes/private/rpc.ts). The approach is sound and the tests cover the three key cases (www→non-www, non-www→www, unrelated subdomain). A few notes:

🟡 Race condition (TOCTOU)getWwwVariantPost is a read-only check with no DB-level guard. The exact-URL path is protected by the unique index (insert fails → getDuplicatePost), but the variant has no unique index covering it (per the comment). So two concurrent create calls — one www, one non-www for the same article — can both pass this check and both insert, producing the duplicate this is meant to prevent. Given ingestion can fan in from multiple sources, this is worth being aware of. It's best-effort dedup, which may be acceptable, but it won't be airtight under concurrency.

🟡 Two lookups where one would do — the create path now does the variant findOneBy, then relies on the insert failing for the exact match. You could collapse both into a single pre-check and avoid the insert-then-catch on exact dupes:

const existing = await con.getRepository(ArticlePost).findOneBy({
  url: In([req.url, variantUrl]),
});

Not required, but it makes the dedup intent explicit and saves a round-trip. Minor.

🟢 Regex correctness/^(https?:\/\/)(www\.)?(.*)$/i handles the standardized URLs correctly (scheme preserved, www. toggled, path/query in rest). Because standardizeURL already ran, the rest segments line up for exact string equality. Edge hosts like www.www.example.com would map to www.example.com (different host), but that's contrived and low-risk.

🟢 Ordering — running this after standardizeURL + validateCreatePostRequest is correct; the variant is computed from the normalized URL.

🔵 Nit — JSDoc noise — the auto-generated block comment (lines 50–63) largely restates the TS signature (@param {{ req: CreatePostRequest; con: DataSource }}, @return {*}). CLAUDE.md asks to avoid comments that restate the code. The one-line "Unique index only covers exact match" is the valuable part; the rest could be trimmed.

Overall: clean, well-tested change. The main thing to weigh is whether the lack of a DB-level guarantee for the variant matters for your ingestion concurrency, or if best-effort dedup is sufficient here.
create-post-www-dedup

@capJavert

Copy link
Copy Markdown
Contributor Author

@claude both are fine we want to keep minimal changes, i evaluated index based check but it would require cleanup of existing www duplicates which is more work and out of scope for now

@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown

Claude finished @capJavert's task in 18s —— View job


Re: www / non-www dedup

Makes sense — best-effort dedup at the application layer is a reasonable scope boundary here. Both points I raised were "consider," not "must fix":

  • TOCTOU / no DB guarantee — agreed this is the trade-off. A unique index covering both variants would be the airtight fix, but as you note it requires backfilling/cleanup of existing www duplicates, which is genuinely more work and out of scope for this PR. Best-effort is fine for now; worth a follow-up ticket if duplicate volume ever becomes a problem.
  • Single combined lookup — purely cosmetic; the current two-step approach works correctly and keeps the diff minimal, which is your stated goal.

No blockers from my side. The change is clean and well-tested. 👍
create-post-www-dedup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants