Skip to content

Commit

Permalink
Refactor CombinedRegexp and MemoizedCombinedRegexp classes, add tests…
Browse files Browse the repository at this point in the history
… and docs
  • Loading branch information
xepozz committed Aug 4, 2023
1 parent e2d1199 commit 11ce621
Show file tree
Hide file tree
Showing 9 changed files with 473 additions and 296 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## 2.2.0 under development

- New #103: Add `MemoizedCombinedRegexp` class (@xepozz)
- New #102, #106: Add `CombinedRegexp` class (@xepozz, @vjik)
- Enh #106: Using fully-qualified function calls to improve performance (@vjik)

Expand Down
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ The package provides:
- `NumericHelper` that has static methods to work with numeric strings;
- `Inflector` provides methods such as `toPlural()` or `toSlug()` that derive a new string based on the string given;
- `WildcardPattern` is a shell wildcard pattern to match strings against.
- `CombinedRegexp` is a wrapper that optimizes multiple regular expressions matching.
- `CombinedRegexp` is a wrapper that optimizes multiple regular expressions matching
- `MemoizedCombinedRegexp` is a decorator that caches results of `CombinedRegexp` to speed up matching.

## Requirements

Expand Down Expand Up @@ -202,6 +203,37 @@ $regexp->getMatchingPatternPosition('a5'); // 2 – the index of the pattern in
$regexp->getCompiledPattern(); // '~(?|first|second()|^a\d$()())~'
```

## MemoizedCombinedRegexp usage

`MemoizedCombinedRegexp` caches results of `CombinedRegexp` in memory.
It is useful when the same incoming string are matching multiple times or different methods of `CombinedRegexp` are called.

```php
use \Yiisoft\Strings\CombinedRegexp;
use \Yiisoft\Strings\MemoizedCombinedRegexp;

$patterns = [
'first',
'second',
'^a\d$',
];
$regexp = new MemoizedCombinedRegexp(new CombinedRegexp($patterns, 'i'));
$regexp->matches('a5'); // Fires `preg_match` inside the `CombinedRegexp`.
$regexp->matches('first'); // Fires `preg_match` inside the `CombinedRegexp`.
$regexp->matches('a5'); // Does not fire `preg_match` inside the `CombinedRegexp` because the result is cached.
$regexp->getMatchingPattern('a5'); // The result is cached so no `preg_match` is fired.
$regexp->getMatchingPatternPosition('a5'); // The result is cached so no `preg_match` is fired.

// The following code fires only once matching mechanism.
if ($regexp->matches('second')) {
echo sprintf(
'Matched the pattern "%s" which is on the position "%s" in the expressions list.',
$regexp->getMatchingPattern('second'),
$regexp->getMatchingPatternPosition('second'),
);
}
```

## Testing

### Unit testing
Expand Down
65 changes: 65 additions & 0 deletions src/AbstractCombinedRegexp.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<?php

declare(strict_types=1);

namespace Yiisoft\Strings;

use Exception;

/**
* `CombinedRegexp` optimizes matching of multiple regular expressions.
* Read more about the concept in
* {@see https://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html}.
*/
abstract class AbstractCombinedRegexp
{
public const REGEXP_DELIMITER = '/';
public const QUOTE_REPLACER = '\\/';

/**
* @return string[] Regular expressions to combine.
*/
abstract public function getPatterns(): array;

/**
* @return string Flags to apply to all regular expressions.
*/
abstract public function getFlags(): string;

/**
* @return string The compiled pattern.
*/
abstract public function getCompiledPattern(): string;

/**
* Returns `true` whether the given string matches any of the patterns, `false` - otherwise.
*/
abstract public function matches(string $string): bool;

/**
* Returns pattern that matches the given string.
* @throws Exception if the string does not match any of the patterns.
*/
abstract public function getMatchingPattern(string $string): string;

/**
* Returns position of the pattern that matches the given string.
* @throws Exception if the string does not match any of the patterns.
*/
abstract public function getMatchingPatternPosition(string $string): int;

/**
* @throws Exception
* @return never-return
*/
protected function throwFailedMatchException(string $string): void
{
throw new Exception(
sprintf(
'Failed to match pattern "%s" with string "%s".',
$this->getCompiledPattern(),
$string,
)
);
}
}
28 changes: 14 additions & 14 deletions src/CombinedRegexp.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
namespace Yiisoft\Strings;

use Exception;

use InvalidArgumentException;

use function count;
Expand All @@ -15,11 +14,8 @@
* Read more about the concept in
* {@see https://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html}.
*/
final class CombinedRegexp
final class CombinedRegexp extends AbstractCombinedRegexp
{
private const REGEXP_DELIMITER = '/';
private const QUOTE_REPLACER = '\\/';

/**
* @var string[]
*/
Expand All @@ -32,13 +28,13 @@ final class CombinedRegexp
*/
public function __construct(
array $patterns,
string $flags = ''
private string $flags = ''
) {
if (empty($patterns)) {
throw new InvalidArgumentException('At least one pattern should be specified.');
}
$this->patterns = array_values($patterns);
$this->compiledPattern = $this->compilePatterns($this->patterns) . $flags;
$this->compiledPattern = $this->compilePatterns($this->patterns) . $this->flags;
}

/**
Expand Down Expand Up @@ -74,13 +70,7 @@ public function getMatchingPatternPosition(string $string): int
{
$match = preg_match($this->compiledPattern, $string, $matches);

Check failure on line 71 in src/CombinedRegexp.php

View workflow job for this annotation

GitHub Actions / psalm / PHP 8.1-ubuntu-latest

ArgumentTypeCoercion

src/CombinedRegexp.php:71:29: ArgumentTypeCoercion: Argument 1 of preg_match expects non-empty-string, but parent type string provided (see https://psalm.dev/193)

Check failure on line 71 in src/CombinedRegexp.php

View workflow job for this annotation

GitHub Actions / psalm / PHP 8.2-ubuntu-latest

ArgumentTypeCoercion

src/CombinedRegexp.php:71:29: ArgumentTypeCoercion: Argument 1 of preg_match expects non-empty-string, but parent type string provided (see https://psalm.dev/193)

Check failure on line 71 in src/CombinedRegexp.php

View workflow job for this annotation

GitHub Actions / psalm / PHP 8.2-ubuntu-latest

ArgumentTypeCoercion

src/CombinedRegexp.php:71:29: ArgumentTypeCoercion: Argument 1 of preg_match expects non-empty-string, but parent type string provided (see https://psalm.dev/193)

Check failure on line 71 in src/CombinedRegexp.php

View workflow job for this annotation

GitHub Actions / psalm / PHP 8.1-ubuntu-latest

ArgumentTypeCoercion

src/CombinedRegexp.php:71:29: ArgumentTypeCoercion: Argument 1 of preg_match expects non-empty-string, but parent type string provided (see https://psalm.dev/193)
if ($match !== 1) {
throw new Exception(
sprintf(
'Failed to match pattern "%s" with string "%s".',
$this->getCompiledPattern(),
$string,
)
);
$this->throwFailedMatchException($string);
}

return count($matches) - 1;
Expand Down Expand Up @@ -110,4 +100,14 @@ private function compilePatterns(array $patterns): string

return self::REGEXP_DELIMITER . $combinedRegexps . self::REGEXP_DELIMITER;
}

public function getPatterns(): array
{
return $this->patterns;
}

public function getFlags(): string

Check warning on line 109 in src/CombinedRegexp.php

View check run for this annotation

Codecov / codecov/patch

src/CombinedRegexp.php#L109

Added line #L109 was not covered by tests
{
return $this->flags;

Check warning on line 111 in src/CombinedRegexp.php

View check run for this annotation

Codecov / codecov/patch

src/CombinedRegexp.php#L111

Added line #L111 was not covered by tests
}
}
62 changes: 24 additions & 38 deletions src/MemoizedCombinedRegexp.php
Original file line number Diff line number Diff line change
Expand Up @@ -5,85 +5,71 @@
namespace Yiisoft\Strings;

/**
* `CombinedRegexp` optimizes matching of multiple regular expressions.
* Read more about the concept in
* {@see https://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html}.
* MemoizedCombinedRegexp is a decorator for {@see AbstractCombinedRegexp} that caches results of
* - {@see AbstractCombinedRegexp::matches()}
* - {@see AbstractCombinedRegexp::getMatchingPattern()}
* - {@see AbstractCombinedRegexp::getMatchingPatternPosition()}.
*/
final class MemoizedCombinedRegexp
final class MemoizedCombinedRegexp extends AbstractCombinedRegexp
{
private CombinedRegexp $combinedRegexp;

private array $results = [];

/**
* @param string[] $patterns Regular expressions to combine.
* @param string $flags Flags to apply to all regular expressions.
* @var array<string, array{matches:bool, position?:int}>
*/
private array $results = [];

public function __construct(
private array $patterns,
string $flags = ''
private AbstractCombinedRegexp $decorated,
) {
$this->combinedRegexp = new CombinedRegexp($patterns, $flags);
}

/**
* @return string The compiled pattern.
*/
public function getCompiledPattern(): string
{
return $this->combinedRegexp->getCompiledPattern();
return $this->decorated->getCompiledPattern();
}

/**
* Returns `true` whether the given string matches any of the patterns, `false` - otherwise.
*/
public function matches(string $string): bool
{
$this->evaluate($string);

return $this->results[$string]['matches'];
}

/**
* Returns pattern that matches the given string.
* @throws \Exception if the string does not match any of the patterns.
*/
public function getMatchingPattern(string $string): string
{
$this->evaluate($string);

Check warning on line 39 in src/MemoizedCombinedRegexp.php

View workflow job for this annotation

GitHub Actions / mutation / PHP 8.1-ubuntu-latest

Escaped Mutant for Mutator "MethodCallRemoval": --- Original +++ New @@ @@ } public function getMatchingPattern(string $string) : string { - $this->evaluate($string); + return $this->getPatterns()[$this->getMatchingPatternPosition($string)]; } public function getMatchingPatternPosition(string $string) : int

Check warning on line 39 in src/MemoizedCombinedRegexp.php

View workflow job for this annotation

GitHub Actions / mutation / PHP 8.1-ubuntu-latest

Escaped Mutant for Mutator "MethodCallRemoval": --- Original +++ New @@ @@ } public function getMatchingPattern(string $string) : string { - $this->evaluate($string); + return $this->getPatterns()[$this->getMatchingPatternPosition($string)]; } public function getMatchingPatternPosition(string $string) : int

return $this->patterns[$this->getMatchingPatternPosition($string)];
return $this->getPatterns()[$this->getMatchingPatternPosition($string)];
}

/**
* Returns position of the pattern that matches the given string.
* @throws \Exception if the string does not match any of the patterns.
*/
public function getMatchingPatternPosition(string $string): int
{
$this->evaluate($string);

return $this->results[$string]['position'] ?? throw new \Exception(
sprintf(
'Failed to match pattern "%s" with string "%s".',
$this->getCompiledPattern(),
$string,
)
);
return $this->results[$string]['position'] ?? $this->throwFailedMatchException($string);
}

protected function evaluate(string $string): void
private function evaluate(string $string): void
{
if (isset($this->results[$string])) {
return;
}
try {
$position = $this->combinedRegexp->getMatchingPatternPosition($string);
$position = $this->decorated->getMatchingPatternPosition($string);

$this->results[$string]['matches'] = true;
$this->results[$string]['position'] = $position;
} catch (\Exception) {
$this->results[$string]['matches'] = false;
}
}

public function getPatterns(): array
{
return $this->decorated->getPatterns();
}

public function getFlags(): string

Check warning on line 71 in src/MemoizedCombinedRegexp.php

View check run for this annotation

Codecov / codecov/patch

src/MemoizedCombinedRegexp.php#L71

Added line #L71 was not covered by tests
{
return $this->decorated->getFlags();

Check warning on line 73 in src/MemoizedCombinedRegexp.php

View check run for this annotation

Codecov / codecov/patch

src/MemoizedCombinedRegexp.php#L73

Added line #L73 was not covered by tests
}
}
Loading

0 comments on commit 11ce621

Please sign in to comment.