[WIP] Attempt Wikidata POIs#580
Draft
migurski wants to merge 11 commits intomigurski/continue-overturefrom
Draft
[WIP] Attempt Wikidata POIs#580migurski wants to merge 11 commits intomigurski/continue-overturefrom
migurski wants to merge 11 commits intomigurski/continue-overturefrom
Conversation
…verture data Summarize your findings up to this point in WIKIDATA.md and commit it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndings Summarize this exploration into WIKIDATA.md and commit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…istic wins Compare and contrast your two proposed disambiguation approaches; Try that combined approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…untime lookup chain Update WIKIDATA.md with these findings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Propose a script that will generate a fresh copy of wikidata-website-qid.csv.gz when it is run on a schedule; yes, and if it does update WIKIDATA.md and commit both Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hold on, let's have scripts live here in tiles/ and resulting data live under data/sources/ with others Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add WebsiteQidDb: domain→QID lookup parsed from a gzipped CSV
(wikidata-website-qid-2026-03.csv.gz). Overture places features have no
native wikidata field, but often carry websites URLs. This enables a
two-hop lookup: websites[0] → domain → Q-ID → QRank score → min_zoom.
- WebsiteQidDb.java: HashMap<String,Long> backed, fromCsv uses
lastIndexOf(',') to handle domain values containing commas; getQid()
strips protocol/www/path before lookup
- Basemap.java: download + load websiteQidDb after qrankDb; pass to Pois
- Pois.java: add websiteQidDb field; fallback website→QID lookup in
processOverture when wikidata tag is absent; add zoo/college/museum
qrankGrading entries; recalibrate aerodrome/university thresholds so
Oakland Airport→zoom 11, Oakland Zoo→zoom 12, UCB→zoom 13, OMCA→zoom 14
- Tests: WebsiteQidDbTest (9 tests), 4 new PoisOvertureTest cases with
real Overture UUIDs (f66024a2 airport, a74a40ae zoo, 67e4f788 UCB,
474b271e OMCA), LayerTest fixture expanded with all four Q-IDs
Prompt: "Implement the following plan: WebsiteQidDb + QRank-based
Overture POI Zoom [...] when you add unit tests concerning Overture
features, always include their full UUID so we can trace them back to
the original dataset [...] just use CLI duckdb, we already have it"
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spotless reformatted the markdown table during make lint; committed separately since it was missed from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ence Two guards prevent brand websites from inflating POI zoom levels: 1. Category allowlist: only apply website→QID when basic_category is an institution-level feature (airport, zoo, museum, college_university, etc.). Excludes air_transport_facility_service, travel_service, transportation_location, etc. where the website resolves to a brand entity (e.g. jetblue.com → Q161086 JetBlue Airways) rather than the specific place. 2. Confidence threshold (0.9): low-confidence features are often brand counters or services miscategorised as the institution. Real airports, zoos, etc. cluster at 0.90+; junk like JetBlue-as-airport appears at 0.32. Tests: websiteQid_ineligibleCategory_noEarlyZoom (category guard) and websiteQid_lowConfidence_noEarlyZoom (confidence guard), both using real Overture UUID e67dea74 / 8b6a937e for JetBlue features at OAK. Prompt: "Do option B [...] Comment about why they are eligible in the code [...] and test [...] I still see JetBlue appearing at z12 or even z11, why? [...] good yes and test" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop features below confidence 0.65 (junk tier: ~127k features dominated by real estate listings, beauty salons, ATMs from uncertain sources). Within the remaining features, use confidence to break sort key ties so higher-confidence POIs win label collision resolution at the same zoom. Sort key: minZoom * 1000 - (int)(confidence * 100), so confidence=0.99 scores 99 points lower (higher priority) than confidence=0.65. Tests updated: websiteQid_ineligibleCategory_dropped and websiteQid_lowConfidence_dropped now correctly expect zero features. kind_nationalPark_fromBasicCategory switched to Pinnacles National Park (4d619bc0, confidence=0.917) since the previous Alcatraz fixture (814b8a78, confidence=0.639) falls below the new cutoff. Prompt: "Let's bring more Overture confidence into POI rendering: make higher-confidence POIs higher rendering priority, and simply omit ones below 0.65 (junk tier)" Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… conflicts Kept HEAD (full WebsiteQidDb machinery) in Pois.java; the only conflict was a trivial comment difference on the QRank block. In PoisTest.java, kept HEAD's full test suite (both JetBlue drop tests + all four website→QID tests) over the cherry-pick's slimmed-down version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.