txt2phrases — Feature Enhancement Proposal
Enhance txt2phrases to support more flexible input handling and compatibility with research workflows such as pygetpapers.
This update will make the library capable of automatically processing research papers in varied directory structures, converting PDFs to text, and allowing both single-file and batch-folder input.
Proposed Enhancements
1. pygetpapers Output Compatibility
- Goal: Enable
txt2phrases to automatically detect and process the directory structure generated by pygetpapers.
- Why: The current structure of
pygetpapers outputs differs from standard input formats expected by txt2phrases.
- Expected Behavior:
txt2phrases should intelligently navigate nested folders to find and process .pdf or .txt files.
2. PDF → TXT Conversion Method
- Goal: Add a built-in method to convert
.pdf files into .txt for downstream keyword extraction.
- Why: Users should be able to directly process PDF research papers without manual text extraction.
3. File and Folder Input Support
txt2phrases — Feature Enhancement Proposal
Enhance
txt2phrasesto support more flexible input handling and compatibility with research workflows such as pygetpapers.This update will make the library capable of automatically processing research papers in varied directory structures, converting PDFs to text, and allowing both single-file and batch-folder input.
Proposed Enhancements
1. pygetpapers Output Compatibility
txt2phrasesto automatically detect and process the directory structure generated bypygetpapers.pygetpapersoutputs differs from standard input formats expected bytxt2phrases.txt2phrasesshould intelligently navigate nested folders to find and process.pdfor.txtfiles.2. PDF → TXT Conversion Method
.pdffiles into.txtfor downstream keyword extraction.3. File and Folder Input Support
Goal: Allow
txt2phrasesto work seamlessly with both single files and entire directories.Why: This provides flexibility for users who want to analyze one document or batch-process an entire dataset.