Calculate readability scores using Flesch Reading Ease and Osman methods.
This feature requires the [readability] extra:
pip install "dalla-data-processing[readability]"
# or install all features: pip install "dalla-data-processing[all]"Command: dalla-dp readability [OPTIONS]
Arguments:
--add-ranks/--no-ranks- Add ranking and level columns (default: True)
Examples:
dalla-dp -i ./data/raw -o ./data/scored readability
dalla-dp -i ./data/raw -o ./data/scored readability --no-ranks
dalla-dp -i ./data/raw -o ./data/scored -c content readabilityfrom datasets import load_from_disk
from dalla_data_processing.readability import score_readability
# Load dataset
dataset = load_from_disk("./data/raw")
scored = score_readability(dataset, column="text", add_ranks=True)
# Save result
scored.save_to_disk("./data/scored")0: Very Easy1: Easy2: Medium3: Difficult4: Very Difficult