Generic/ByteOrderMark: small performance improvement #360

rodrigoprimo · 2024-02-23T14:51:35Z

Description

The BOM character must be the first character of the file. This means that this sniff only needs to check the file once, but its code was executed for each occurrence of the T_INLINE_HTML token.

As discussed in #278 (comment), this PR changes the sniff code to return the number of tokens in all of its exit points to ensure that PHPCS executes it just a single time per file.

I noticed that some sniffs return $phpcsFile->numTokens while others return ($phpcsFile->numTokens + 1) to ignore the rest of the file. As far as I could check, returning the number of tokens is enough as the code checks if the returned value is greater than the current position in the token array, so I went with this option. It is not clear to me why some sniffs return the number of tokens plus one. Highlighting this in case I'm missing something, and the code in this PR should return ($phpcsFile->numTokens + 1).

Suggested changelog entry

Small performance improvement for the Generic.Files.ByteOrderMark sniff

Related issues/external references

Discussed in #278 (comment)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
- This change is only breaking for integrators, not for external standards or end-users.
Documentation improvement

PR checklist

I have checked there is no other PR open for the same change.
I have read the Contribution Guidelines.
I grant the project the right to include and distribute the code under the BSD-3-Clause license (and I have the right to grant these rights).
I have added tests to cover my changes.
I have verified that the code complies with the projects coding standards.
[Required for new sniffs] I have added XML documentation for the sniff.

fredden · 2024-02-23T16:01:19Z

It is not clear to me why some sniffs return the number of tokens plus one.

See the documentation of the method here:

PHP_CodeSniffer/src/Sniffs/Sniff.php

Lines 74 to 75 in 1d256d1

    
                *                  pointer is reached. Return (count($tokens) + 1) to skip 
        
                *                  the rest of the file.

The BOM character must be the first character of the file. This means that this sniff only needs to check the file once, but its code was being executed for each occurrence of the T_INLINE_HTML token. This commit changes the sniff code to return the number of tokens in all of its exit points to ensure that PHPCS executes it just a single time per file.

jrfnl

Thanks @rodrigoprimo ! LGTM.

jrfnl · 2024-02-24T08:09:05Z

I noticed that some sniffs return $phpcsFile->numTokens while others return ($phpcsFile->numTokens + 1) to ignore the rest of the file. As far as I could check, returning the number of tokens is enough as the code checks if the returned value is greater than the current position in the token array, so I went with this option. It is not clear to me why some sniffs return the number of tokens plus one. Highlighting this in case I'm missing something, and the code in this PR should return ($phpcsFile->numTokens + 1).

@rodrigoprimo You are right, returning $phpcsFile->numTokens should be sufficient.

See the documentation of the method here:

PHP_CodeSniffer/src/Sniffs/Sniff.php

Lines 74 to 75 in 1d256d1

* pointer is reached. Return (count($tokens) + 1) to skip

* the rest of the file.

@fredden Good catch on the docs being outdated and not giving the best of advise.

I have created PR #363 now to update the docs and to make the sniffs skipping to the end of a file consistent.

For the record for anyone else coming across this discussion:
The tokens stack indexes are 0-based and a token count is 1-based, so it is sufficient to return the token count to skip the rest of the file.

Or in practical terms:
If a file contains 50 tokens, the last $stackPtr / index in the $tokens array will be 49, so there is no need to change the 50 to 51 as 50 is already > 49.

github-actions bot added Standard: Generic Status: triage labels Feb 23, 2024

jrfnl added Focus: Performance and removed Status: triage labels Feb 24, 2024

jrfnl added this to the 3.9.1 milestone Feb 24, 2024

jrfnl mentioned this pull request Feb 24, 2024

Various sniffs: simplify skipping the rest of the file + update docs about skipping rest of file #363

Merged

jrfnl approved these changes Feb 24, 2024

View reviewed changes

jrfnl force-pushed the byte-order-mark-improvement branch from 669323b to 57d9e89 Compare February 24, 2024 08:01

jrfnl merged commit 8e3a124 into PHPCSStandards:master Feb 24, 2024
38 checks passed

rodrigoprimo deleted the byte-order-mark-improvement branch February 26, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic/ByteOrderMark: small performance improvement #360

Generic/ByteOrderMark: small performance improvement #360

rodrigoprimo commented Feb 23, 2024

fredden commented Feb 23, 2024

jrfnl left a comment

jrfnl commented Feb 24, 2024

Generic/ByteOrderMark: small performance improvement #360

Generic/ByteOrderMark: small performance improvement #360

Conversation

rodrigoprimo commented Feb 23, 2024

Description

Suggested changelog entry

Related issues/external references

Types of changes

PR checklist

fredden commented Feb 23, 2024

jrfnl left a comment

Choose a reason for hiding this comment

jrfnl commented Feb 24, 2024