Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Diagnose invalid characters in bundle identifiers #1012

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

agisilaos
Copy link

@agisilaos agisilaos commented Aug 21, 2024

Fixes: #290

Changes

This PR introduces bundle identifier validation in the NavigatorIndex.Builder class and adds corresponding unit tests to ensure its correctness.

Implementation

  • Added a private validateBundleIdentifier() method to NavigatorIndex.Builder.
  • The method checks for invalid characters in the bundle identifier based on RFC 3986 URL host requirements.
  • Invalid bundle identifiers trigger a warning diagnostic with a detailed explanation.

Tests

  • Created DiagnosticBundleIdentifierTests to thoroughly test the new validation logic.
  • Tests cover valid bundle identifiers, identifiers with spaces, and identifiers with both valid and invalid special characters.

Rationale

  1. Improved Data Integrity: Validating bundle identifiers ensures that only valid identifiers are used, preventing potential issues downstream in the documentation generation process.

  2. Compliance with Standards: By adhering to RFC 3986 for URL hosts, we ensure that our bundle identifiers are compatible with web-based systems and avoid potential conflicts or encoding issues.

  3. Early Error Detection: Catching invalid bundle identifiers early in the process allows developers to address issues promptly, improving the overall quality of documentation projects.

  4. Clear Feedback: The detailed error messages provide developers with actionable information to correct invalid bundle identifiers.

  5. Robust Testing: The comprehensive test suite ensures that the validation logic works as expected and helps prevent regressions in future updates.

Impact

  • Developers will receive immediate feedback if they use invalid characters in bundle identifiers.
  • This change may catch existing invalid bundle identifiers in projects, requiring updates to conform to the new validation rules.
  • The documentation generation process will be more robust against potential issues caused by malformed bundle identifiers.

Notes for Reviewers

  • Please pay special attention to the character set used for validation. Are there any additional characters we should consider allowing or restricting?

  • The error message content and severity (warning) are open for discussion. Should we consider making this a more severe error?

  • The test coverage aims to be comprehensive. Are there any additional edge cases you think we should test?

  • Added tests

  • Ran the ./bin/test script and it succeeded

  • Updated documentation if necessary

- Implemented `validateBundleIdentifier()` to validate that the bundle identifier only contains characters allowed in URL hosts as per RFC 3986.
- The method checks for invalid characters such as spaces and any characters not included in the alphanumeric, hyphen, period, underscore, and tilde sets.
- If invalid characters are detected, a diagnostic warning is generated and appended to the `problems` array with details about the issue.
- This enhancement helps prevent issues related to incorrect bundle identifiers by enforcing proper formatting standards.
- Implement tests for valid and invalid bundle identifiers
- Cover scenarios with spaces and special characters
- Use GIVEN-WHEN-THEN structure for improved readability
- Ensure consistent setup() call across all tests
Copy link
Contributor

@d-ronnqvist d-ronnqvist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening this PR. However, the NavigatorIndex is not the right place to validate the bundle identifier.

It would be better to validate the bundle identifier earlier, before the build starts DocC begins processing the documentation inputs.

There are only two places where the developer can specify a bundle identifier:

  • A value in the documentation catalog's Into.plist
  • A --fallback-bundle-identifier value on the command line

It would be a nicer experience for the developer if the diagnostic let the developer know which of these two that the invalid identifier came from.

This code in DocumentationBundle+Info.swift could be one place to consider adding this validation since it knows if the value came from the Info.plist or from the command line fallback.

When I tested running docc convert /path/to/SomeCatalog.docc --fallback-bundle-identifier "not valid" just now, it crashed failing to create a ResolvedTopicReference value. Because of this, a "warning" level diagnostic wouldn't be suitable, since that lets the build progress until it crashes.

That said, since we can diagnose the issue and know which characters are not allowed, it's possible that we could correct the issue by removing the disallowed characters in the developer-provided bundle identifier (or by replacing them with an allowed character like "-"). Doing so would avoid the crash which would allow for a "warning" level diagnostic again.


It's also worth noting that the docc process-archive index command has its own bundle identifier command line argument that possibly would require its own validation.

/// - Important: This method assumes that `bundleIdentifier` is a non-optional string.
private func validateBundleIdentifier() {
// URL host valid characters (RFC 3986)
let validCharacters = CharacterSet.alphanumerics.union(CharacterSet(charactersIn: "-._~"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use CharacterSet.urlHostAllowed for this.

Comment on lines +17 to +28
// GIVEN
let outputURL = try createTemporaryDirectory().appendingPathComponent("valid-bundle")
let builder = NavigatorIndex.Builder(
outputURL: outputURL,
bundleIdentifier: "com.example.valid-bundle_identifier"
)

// WHEN
builder.setup()

// THEN
XCTAssertTrue(builder.problems.isEmpty, "No problems should be reported for a valid bundle identifier")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "given", "when", "then" comments is not a style that we use anywhere else.

These specific tests won't be relevant—since the navigator index isn't the right place to verify the bundle identifier—but for any other tests, please omit those specific comments to match the style of the existing tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The navigator index is just one of the places that make use of the bundle identifier but it's not a source of it. As such, it is not the right place to perform this validation.

@d-ronnqvist
Copy link
Contributor

@agisilaos Are you still working on this?
I'm touching some other code in this area and am wondering if I should fix this with those changes or leave it as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Diagnose invalid characters in bundle identifiers
2 participants