Skip to content

Add FILTER_VALIDATE_STRLEN for UTF-8 string length validation#21429

Open
masakielastic wants to merge 7 commits intophp:masterfrom
masakielastic:feature/filter-validate-str
Open

Add FILTER_VALIDATE_STRLEN for UTF-8 string length validation#21429
masakielastic wants to merge 7 commits intophp:masterfrom
masakielastic:feature/filter-validate-str

Conversation

@masakielastic
Copy link
Contributor

This PR introduces a new filter validator FILTER_VALIDATE_STRLEN.

The validator checks the length of a UTF-8 string in Unicode code points and allows applications to enforce minimum and/or maximum string length.

API

New filter constant:

FILTER_VALIDATE_STRLEN

Supported options:

min_len   int   minimum accepted length (in Unicode code points)
max_len   int   maximum accepted length (in Unicode code points)

Example:

$options = [
    'options' => [
        'min_len' => 2,
        'max_len' => 2,
    ],
];

var_dump(
    filter_var('ab', FILTER_VALIDATE_STRLEN, $options),
    filter_var('🐘🐘', FILTER_VALIDATE_STRLEN, $options),
    filter_var('🐘', FILTER_VALIDATE_STRLEN, $options),
    filter_var('🐘🐘🐘', FILTER_VALIDATE_STRLEN, $options)
);

Implementation notes
The implementation:

  • counts UTF-8 code points
  • reuses UTF-8 handling logic already present in core (similar to the implementation used by htmlspecialchars)

Motivation

Validating the length of UTF-8 strings is a common requirement in applications (e.g. form input validation).

Currently this is typically implemented in userland using:

  • mb_strlen() from the mbstring extension
  • framework or CMS polyfills
  • custom helpers

Providing a dedicated validator in the filter extension allows this validation to be performed in a lightweight and consistent way without introducing a dependency on mbstring.

Related discussion

See issue #21428

Implements min/max string length validation according to Unicode Standard 3.9.6.
Adds Unicode and invalid UTF-8 test cases.
Upstream changed filter handler return type from void to zend_result.
Update php_filter_validate_str() and related declarations accordingly.
The validator checks the length of a UTF-8 string in Unicode code points.
Renaming improves clarity and aligns the filter name with the
min_len / max_len options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant