Skip to content

fix: support youtu.be short URLs in YouTubeConverter#1705

Open
octo-patch wants to merge 3 commits intomicrosoft:mainfrom
octo-patch:fix/issue-1704-youtube-short-url-support
Open

fix: support youtu.be short URLs in YouTubeConverter#1705
octo-patch wants to merge 3 commits intomicrosoft:mainfrom
octo-patch:fix/issue-1704-youtube-short-url-support

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #1704

Problem

YouTubeConverter.accepts() only matched URLs starting with https://www.youtube.com/watch?, silently rejecting the youtu.be/<id> short URL format that YouTube's share button produces by default.

Additionally, convert() extracted video_id only from the v query parameter, which is absent in youtu.be/<id> URLs where the video ID is in the URL path.

Solution

  • Updated accepts() to use urlparse for host-based matching, accepting both www.youtube.com/watch and youtu.be hostnames (also adds youtube.com without www. for robustness)
  • Updated convert() to extract video_id from the URL path when hostname == "youtu.be", falling back to the v query parameter for standard watch URLs
  • Added a unit test for YouTubeConverter.accepts() covering both URL formats and verifying non-YouTube URLs are rejected

Testing

  • New unit test test_youtube_converter_accepts added to test_module_misc.py
  • Test passes locally without network access (pure URL parsing logic)
packages/markitdown/tests/test_module_misc.py::test_youtube_converter_accepts PASSED

octo-patch added 3 commits April 8, 2026 13:35
…icrosoft#1660)

Previously ExifTool was only auto-discovered if found directly in a trusted
directory (e.g. C:\Program Files). Common Windows installs place the binary
one level deeper (C:\Program Files\ExifTool\exiftool.exe), so dirname
returned a path that failed the exact-equality check and ExifTool was
silently not used.

Switch to startswith(d + os.sep) to allow subdirectory matches while keeping
the exact-match check, preserving the security boundary. Also add C:\Windows
as a trusted root since ExifTool is sometimes installed there directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

YouTube converter: short URLs (youtu.be/…) silently skipped, and get_transcript() removed in youtube-transcript-api 1.x

1 participant