Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic/ByteOrderMark: small performance improvement #360

Merged

Conversation

rodrigoprimo
Copy link
Contributor

Description

The BOM character must be the first character of the file. This means that this sniff only needs to check the file once, but its code was executed for each occurrence of the T_INLINE_HTML token.

As discussed in #278 (comment), this PR changes the sniff code to return the number of tokens in all of its exit points to ensure that PHPCS executes it just a single time per file.

I noticed that some sniffs return $phpcsFile->numTokens while others return ($phpcsFile->numTokens + 1) to ignore the rest of the file. As far as I could check, returning the number of tokens is enough as the code checks if the returned value is greater than the current position in the token array, so I went with this option. It is not clear to me why some sniffs return the number of tokens plus one. Highlighting this in case I'm missing something, and the code in this PR should return ($phpcsFile->numTokens + 1).

Suggested changelog entry

Small performance improvement for the Generic.Files.ByteOrderMark sniff

Related issues/external references

Discussed in #278 (comment)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
    • This change is only breaking for integrators, not for external standards or end-users.
  • Documentation improvement

PR checklist

  • I have checked there is no other PR open for the same change.
  • I have read the Contribution Guidelines.
  • I grant the project the right to include and distribute the code under the BSD-3-Clause license (and I have the right to grant these rights).
  • I have added tests to cover my changes.
  • I have verified that the code complies with the projects coding standards.
  • [Required for new sniffs] I have added XML documentation for the sniff.

@fredden
Copy link
Member

fredden commented Feb 23, 2024

It is not clear to me why some sniffs return the number of tokens plus one.

See the documentation of the method here:

* pointer is reached. Return (count($tokens) + 1) to skip
* the rest of the file.

The BOM character must be the first character of the file. This means
that this sniff only needs to check the file once, but its code
was being executed for each occurrence of the T_INLINE_HTML token.

This commit changes the sniff code to return the number of tokens in all
of its exit points to ensure that PHPCS executes it just a single time
per file.
Copy link
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rodrigoprimo ! LGTM.

@jrfnl jrfnl force-pushed the byte-order-mark-improvement branch from 669323b to 57d9e89 Compare February 24, 2024 08:01
@jrfnl
Copy link
Member

jrfnl commented Feb 24, 2024

I noticed that some sniffs return $phpcsFile->numTokens while others return ($phpcsFile->numTokens + 1) to ignore the rest of the file. As far as I could check, returning the number of tokens is enough as the code checks if the returned value is greater than the current position in the token array, so I went with this option. It is not clear to me why some sniffs return the number of tokens plus one. Highlighting this in case I'm missing something, and the code in this PR should return ($phpcsFile->numTokens + 1).

@rodrigoprimo You are right, returning $phpcsFile->numTokens should be sufficient.

See the documentation of the method here:

* pointer is reached. Return (count($tokens) + 1) to skip
* the rest of the file.

@fredden Good catch on the docs being outdated and not giving the best of advise.

I have created PR #363 now to update the docs and to make the sniffs skipping to the end of a file consistent.

For the record for anyone else coming across this discussion:
The tokens stack indexes are 0-based and a token count is 1-based, so it is sufficient to return the token count to skip the rest of the file.

Or in practical terms:
If a file contains 50 tokens, the last $stackPtr / index in the $tokens array will be 49, so there is no need to change the 50 to 51 as 50 is already > 49.

@jrfnl jrfnl merged commit 8e3a124 into PHPCSStandards:master Feb 24, 2024
38 checks passed
@rodrigoprimo rodrigoprimo deleted the byte-order-mark-improvement branch February 26, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants