Skip to content

fix: translate Swedish content before removing data-translate markers#850

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/eliminate-data-translate-markers
Draft

fix: translate Swedish content before removing data-translate markers#850
Copilot wants to merge 5 commits intomainfrom
copilot/eliminate-data-translate-markers

Conversation

Copy link
Contributor

Copilot AI commented Mar 5, 2026

  • Change translatePhrase() return type to string | null — return null when no dictionary match (instead of returning the original Swedish text)
  • Update translateSwedishContent() — only strip the data-translate marker when a translation was found (non-null); leave the original span intact when no match
  • Update tests — flip "no match" translatePhrase test to expect null; flip translateSwedishContent "no match" test to expect marker kept
Original prompt

This section details on the original issue you should resolve

<issue_title>🔧 Eliminate remaining data-translate markers across all non-Swedish news articles</issue_title>
<issue_description>## 📋 Issue Type
Bug Fix - Translation Completeness

🎯 Objective

Remove all remaining data-translate="true" markers from non-Swedish news articles by translating the enclosed Swedish content into the target language.

📊 Current State

Approximately 121 articles still contain data-translate="true" markers indicating untranslated Swedish content:

Language Articles with markers
da (Danish) 10
no (Norwegian) 10
fi (Finnish) 10
de (German) 10
fr (French) 10
es (Spanish) 11
nl (Dutch) 10
ar (Arabic) 10
he (Hebrew) 10
ja (Japanese) 10
ko (Korean) 10
zh (Chinese) 10
Total ~121

What data-translate="true" Means

These markers are added by the article generation system when Swedish parliamentary terms (committee names, document types, legislative vocabulary) could not be translated by the scripts/translation-dictionary.ts at generation time. The markers wrap Swedish text that should have been translated:

<span data-translate="true">arbetsmarknadsutskottet</span>

Should be:

Committee on Labour Market Affairs  (for English)
Arbejdsmarkedsudvalget  (for Danish)

🚀 Desired State

  • Zero data-translate="true" markers in any non-Swedish article
  • All Swedish parliamentary terms translated using the translation-dictionary.ts
  • Missing translations added to the dictionary for future articles
  • Validation passes: npx tsx scripts/validate-news-translations.ts

🔧 Implementation Approach

Step 1: Identify Missing Dictionary Entries

# Extract all unique Swedish terms wrapped in data-translate
grep -ohP 'data-translate="true"[^>]*>([^<]+)' news/*-{da,no,fi,de,fr,es,nl,ar,he,ja,ko,zh}.html | sort -u

Step 2: Expand Translation Dictionary

Update scripts/translation-dictionary.ts with translations for any terms not currently covered. The dictionary already has comprehensive coverage of:

  • Committee names (16 committees)
  • Document type prefixes
  • Parliamentary institution names
  • Common legislative vocabulary

Add any missing terms identified in Step 1.

Step 3: Batch Process Articles

Use the existing translateSwedishContent() function from scripts/translation-dictionary.ts:

import { translateSwedishContent } from './scripts/translation-dictionary.js';
// For each article file:
const translated = translateSwedishContent(html, lang);
// Write back to file

Or create a batch script similar to scripts/fix-old-articles-branding.ts that:

  1. Reads all non-SV articles with data-translate markers
  2. Applies translateSwedishContent()
  3. Writes updated files
  4. Reports statistics

Step 4: Validate

npx tsx scripts/validate-news-translations.ts

🤖 Recommended Agent

agent:news-journalist — Expert in the translation dictionary system and translateSwedishContent() pipeline. Can expand the dictionary and batch-process all affected articles.

✅ Acceptance Criteria

  • All data-translate="true" markers removed from non-SV articles
  • Translation dictionary expanded with any missing terms
  • npx tsx scripts/validate-news-translations.ts passes with zero failures
  • HTML structure preserved (no broken tags)
  • RTL text direction maintained for ar/he articles
  • CJK characters display correctly in ja/ko/zh articles

📚 References

🏷️ Labels

type:bug, component:i18n, component:news, translation, priority-high, component:content</issue_description>

<agent_instructions>Eliminate all data-translate="true" markers from ~121 non-Swedish news articles. First extract all unique untranslated Swedish terms, then expand translation-dictionary.ts with any missing entries. Use translateSwedishContent() to batch-process all affected files. Create a script similar to fix-old-articles-branding.ts for the batch operation. Run validate-news-translations.ts to verify zero markers remain.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits March 5, 2026 23:51
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…hrases

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix remaining data-translate markers in non-Swedish articles fix: eliminate all data-translate="true" markers from non-Swedish news articles Mar 5, 2026
@pethers pethers requested a review from Copilot March 6, 2026 00:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate remaining data-translate="true" markers in non-Swedish news/*.html articles by batch-applying the existing translateSwedishContent() post-processor, and committing the resulting HTML updates so translation validation passes.

Changes:

  • Added a batch fixer script to rewrite news/*.html files by removing data-translate="true" markers via translateSwedishContent().
  • Updated many non-Swedish news HTML files to remove data-translate="true" markers on Swedish spans.
  • Committed regenerated article content across multiple languages (month-ahead / committee-reports / government-propositions / week-ahead).

Reviewed changes

Copilot reviewed 50 out of 106 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/fix-data-translate-markers.ts New batch script to remove data-translate="true" markers from non-Swedish news articles.
news/2026-02-27-week-ahead-en.html Removes translation markers in EN article content.
news/2026-02-22-month-ahead-zh.html Removes translation markers in ZH article content.
news/2026-02-22-month-ahead-no.html Removes translation markers in NO article content.
news/2026-02-22-month-ahead-nl.html Removes translation markers in NL article content.
news/2026-02-22-month-ahead-ko.html Removes translation markers in KO article content.
news/2026-02-22-month-ahead-he.html Removes translation markers in HE article content.
news/2026-02-22-month-ahead-fr.html Removes translation markers in FR article content.
news/2026-02-22-month-ahead-es.html Removes translation markers in ES article content.
news/2026-02-22-month-ahead-en.html Removes translation markers in EN article content.
news/2026-02-22-month-ahead-de.html Removes translation markers in DE article content.
news/2026-02-22-month-ahead-ar.html Removes translation markers in AR article content.
news/2026-02-22-government-propositions-zh.html Removes translation markers in ZH propositions article content.
news/2026-02-22-government-propositions-no.html Removes translation markers in NO propositions article content.
news/2026-02-22-government-propositions-nl.html Removes translation markers in NL propositions article content.
news/2026-02-22-government-propositions-ko.html Removes translation markers in KO propositions article content.
news/2026-02-22-government-propositions-ja.html Removes translation markers in JA propositions article content.
news/2026-02-22-government-propositions-he.html Removes translation markers in HE propositions article content.
news/2026-02-22-government-propositions-fr.html Removes translation markers in FR propositions article content.
news/2026-02-22-government-propositions-fi.html Removes translation markers in FI propositions article content.
news/2026-02-22-government-propositions-es.html Removes translation markers in ES propositions article content.
news/2026-02-22-government-propositions-en.html Removes translation markers in EN propositions article content.
news/2026-02-22-government-propositions-de.html Removes translation markers in DE propositions article content.
news/2026-02-22-government-propositions-da.html Removes translation markers in DA propositions article content.
news/2026-02-22-government-propositions-ar.html Removes translation markers in AR propositions article content.
news/2026-02-22-committee-reports-zh.html Removes translation markers in ZH committee-reports article content.
news/2026-02-22-committee-reports-no.html Removes translation markers in NO committee-reports article content.
news/2026-02-22-committee-reports-nl.html Removes translation markers in NL committee-reports article content.
news/2026-02-22-committee-reports-ko.html Removes translation markers in KO committee-reports article content.
news/2026-02-22-committee-reports-ja.html Removes translation markers in JA committee-reports article content.
news/2026-02-22-committee-reports-he.html Removes translation markers in HE committee-reports article content.
news/2026-02-22-committee-reports-fr.html Removes translation markers in FR committee-reports article content.
news/2026-02-22-committee-reports-fi.html Removes translation markers in FI committee-reports article content.
news/2026-02-22-committee-reports-es.html Removes translation markers in ES committee-reports article content.
news/2026-02-22-committee-reports-en.html Removes translation markers in EN committee-reports article content.
news/2026-02-22-committee-reports-de.html Removes translation markers in DE committee-reports article content.
news/2026-02-22-committee-reports-da.html Removes translation markers in DA committee-reports article content.
news/2026-02-22-committee-reports-ar.html Removes translation markers in AR committee-reports article content.

@@ -343,53 +343,53 @@ <h4><a href="https://www.riksdagen.se/en/calendar/?from=2026-03-06&amp;to=2026-0

<h2>Parliamentary Questions to Ministers</h2>
<div class="document-entry">
<h4><span data-translate="true" lang="sv">Åtgärder för att nå transportmålet</span></h4>
<h4><span lang="sv">measureer för att nå transportmålet</span></h4>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <span lang="sv">…</span> now contains mixed/incorrect-language text ("measureer för att nå transportmålet"). This is both a spelling issue ("measureer") and an accessibility issue because lang="sv" indicates Swedish while the content is not Swedish. Either keep the original Swedish text if it’s intentionally untranslated, or fully translate to English and remove/set the lang attribute accordingly.

Copilot uses AI. Check for mistakes.

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDC120260324ip/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Interpellationssvar</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDC120260324ip/" target="_blank" rel="noopener noreferrer"><span lang="sv">质询ssvar</span></a></h4>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entry still contains a Swedish suffix due to partial prefix-translation (e.g. "质询ssvar"). That indicates Interpellationssvar wasn’t translated as a whole term, and the result is mixed-language text wrapped in lang="sv". Please ensure the dictionary has an exact-match entry for the full Swedish term (and/or adjust the prefix-match algorithm) and regenerate/re-run the fixer so the output is a fully Chinese phrase without leftover Swedish fragments.

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +260
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3F%C3%B6U202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">国防委员会s sammanträde</span></a></h4>
<p class="policy-significance">Touches on defence and security policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Utbildningsutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">教育委员会s sammanträde</span></a></h4>
<p class="policy-significance">Touches on education policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Civilutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">民事委员会s sammanträde</span></a></h4>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several committee meeting entries are only partially translated (e.g. "国防委员会s sammanträde"), leaving the Swedish genitive suffix and the Swedish word "sammanträde" in a non-Swedish article. This contradicts the PR goal of translating Swedish parliamentary terms. Add exact dictionary entries for the full phrase (e.g. "försvarsutskottets sammanträde") and regenerate/re-run the fixer so the rendered text is fully in the target language (and doesn’t keep lang="sv" when translated).

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +260
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3F%C3%B6U202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Committee on Defences sammanträde</span></a></h4>
<p class="policy-significance">Touches on defence and security policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Utbildningsutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Committee on Educations sammanträde</span></a></h4>
<p class="policy-significance">Touches on education policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Civilutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Committee on Civil Affairss sammanträde</span></a></h4>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These committee meeting labels show partial translation artifacts from Swedish genitive handling (e.g. "Committee on Defences sammanträde", "Civil Affairss"), which reads incorrectly in English and leaves Swedish text in the UI. This suggests the dictionary/prefix-match logic is still splitting at the committee name and appending the remainder. Prefer exact-match dictionary entries for the full genitive phrase ("…utskottets sammanträde") and for standalone "sammanträde", then re-run generation so the output is fully translated (and avoid keeping lang="sv" on translated text).

Copilot uses AI. Check for mistakes.
Comment on lines +250 to +260
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3F%C3%B6U202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Comité de la défenses sammanträde</span></a></h4>
<p class="policy-significance">Touches on defence and security policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Utbildningsutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3UbU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Comité de l'éducations sammanträde</span></a></h4>
<p class="policy-significance">Touches on education policy. Parliamentary committee review and chamber debate will shape the final outcome.</p>
</div>

<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Civilutskottets sammanträde</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDA3CU202603241100/" target="_blank" rel="noopener noreferrer"><span lang="sv">Comité des affaires civiless sammanträde</span></a></h4>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translated committee meeting strings still include Swedish remnants (e.g. "Comité de la défenses sammanträde") and are wrapped in lang="sv", which misrepresents the language for screen readers. After fixing the dictionary/prefix-match for full phrases, ensure the span lang is removed or set to the article language when a translation is applied.

Copilot uses AI. Check for mistakes.
Comment on lines +244 to 245
<h3><span lang="sv">瑞典国家审计局报告 om miljöräddning vid stora olyckor till sjöss</span></h3>
<p>该提案涉及 This Defence Ministry proposal concerns national security or defence posture.<br><strong>提交至:</strong> Försvarsdepartementet</p>
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This heading is now mixed Chinese + Swedish ("Swedish National Audit Office report om …") but still marked lang="sv". That’s both a translation-quality issue and an accessibility issue (incorrect language metadata). After updating the dictionary/prefix matching, regenerate so the entire phrase is consistently in Chinese, or keep it fully Swedish if intentionally untranslated (and then don’t prepend Chinese).

Suggested change
<h3><span lang="sv">瑞典国家审计局报告 om miljöräddning vid stora olyckor till sjöss</span></h3>
<p>该提案涉及 This Defence Ministry proposal concerns national security or defence posture.<br><strong>提交至:</strong> Försvarsdepartementet</p>
<h3>瑞典国家审计署关于重大海上事故环境救援的报告</h3>
<p>该提案涉及一项国防部的提案,内容关系到国家安全和防务态势。<br><strong>提交至:</strong> Försvarsdepartementet</p>

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +15
* What it does:
* - For each non-SV article containing `data-translate="true"` spans:
* 1. Attempts dictionary translation of the enclosed Swedish text
* 2. Removes the `data-translate="true"` attribute (and the span wrapper
* when no other attributes remain)
* 3. Keeps the `lang="sv"` attribute for screen-reader accessibility
* when the phrase could not be translated
*
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment claims the script removes the span wrapper when no other attributes remain and that it keeps lang="sv" only when translation fails. However translateSwedishContent() (translation-dictionary.ts) always preserves lang="sv" on the span (it just strips data-translate), so the wrapper will never be removed and translated content will still be tagged as Swedish. Consider updating translateSwedishContent() to drop/adjust lang when translation occurs (and only keep lang="sv" when the inner text is unchanged), then re-run this script.

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +53
const NON_SWEDISH_LANGS: readonly string[] = [
'en', 'da', 'no', 'fi', 'de', 'fr', 'es', 'nl', 'ar', 'he', 'ja', 'ko', 'zh',
];

function getLanguageFromFilename(filename: string): string | null {
const match = filename.match(/-([a-z]{2})\.html$/);
return match ? (match[1] ?? null) : null;
}

function processFile(filepath: string): { modified: boolean; markersBefore: number; markersAfter: number } {
const filename = path.basename(filepath);
const lang = getLanguageFromFilename(filename);

if (!lang || !NON_SWEDISH_LANGS.includes(lang)) {
return { modified: false, markersBefore: 0, markersAfter: 0 };
}

const original = fs.readFileSync(filepath, 'utf-8');

const markersBefore = (original.match(/data-translate="true"/g) ?? []).length;
if (markersBefore === 0) {
return { modified: false, markersBefore: 0, markersAfter: 0 };
}

const modified = translateSwedishContent(original, lang as Language);

Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lang is derived as string and then cast to Language when calling translateSwedishContent(). This can hide unexpected values at compile time and may cause runtime issues if new languages are added or filenames don’t match the expected pattern. Prefer typing NON_SWEDISH_LANGS as readonly Language[] and narrowing lang to Language before calling the translation function (removing the cast).

Copilot uses AI. Check for mistakes.
Copilot AI and others added 2 commits March 6, 2026 00:48
…ed before marker removal

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title fix: eliminate all data-translate="true" markers from non-Swedish news articles fix: translate Swedish content before removing data-translate markers Mar 6, 2026
@github-actions github-actions bot added html-css HTML/CSS changes translation Translation updates rtl RTL language support (Arabic, Hebrew) testing Test coverage refactor Code refactoring news News articles and content generation size-xl Extra large change (> 1000 lines) labels Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

html-css HTML/CSS changes news News articles and content generation refactor Code refactoring rtl RTL language support (Arabic, Hebrew) size-xl Extra large change (> 1000 lines) testing Test coverage translation Translation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🔧 Eliminate remaining data-translate markers across all non-Swedish news articles

3 participants