Create a streamdown plugin ReadDocxFile that extracts text from .docx files. This plugin is executed via the ReadFile meta plugin when FileFormat=Docx.
Inputs
| Name |
Type |
Mandatory |
Description |
Path |
String |
Expression |
❌ |
Url |
String |
Expression |
❌ |
Base64 |
String |
Expression |
❌ |
Outputs (for meta normalization)
| Output Key |
Description |
Text |
Extracted text |
ParagraphsCount |
Optional count of parsed paragraphs |
Implementation Notes
-
Resolve bytes from source:
Base64 → bytes
Path → File.ReadAllBytes
Url → HTTP GET bytes
-
Parse using OpenXML SDK (DocumentFormat.OpenXml)
-
Preserve order; join paragraphs with double newlines for readability.
Create a streamdown plugin
ReadDocxFilethat extracts text from.docxfiles. This plugin is executed via theReadFilemeta plugin whenFileFormat=Docx.Inputs
PathUrlBase64Outputs (for meta normalization)
TextParagraphsCountImplementation Notes
Resolve bytes from source:
Base64→ bytesPath→ File.ReadAllBytesUrl→ HTTP GET bytesParse using OpenXML SDK (
DocumentFormat.OpenXml)Preserve order; join paragraphs with double newlines for readability.