Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not tolerate + in protocol #228

Open
rotu opened this issue Aug 2, 2024 · 8 comments
Open

Does not tolerate + in protocol #228

rotu opened this issue Aug 2, 2024 · 8 comments

Comments

@rotu
Copy link

rotu commented Aug 2, 2024

What is the issue with the URL Pattern Standard?

URLPattern doesn't tolerate + in protocol. The polyfill gives this error:

new URLPattern("web+foo://example.com/baz")

TypeError: Failed to construct 'URLPattern': Unexpected OTHER_MODIFIER at 3, expected END

This is especially a problem since the web+ prefix is mandatory when registering schemes.

@sisidovski
Copy link
Collaborator

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there, but this is valid. To canonicalize a protocol, it uses the result of running basic URL parser. And in scheme state
, a character should be an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.). So it should tolerate +, like git+https.

Actually chromium throws an error with the protocol string containing +, it seems to be a bug to be fixed.

@jeremyroman
Copy link
Collaborator

I haven't looked into this, but @sisidovski if you think this is just a bug in the Chromium implementation, could you file & link a Chromium bug?

@crowlKats
Copy link

crowlKats commented Aug 6, 2024

for reference, this is also happening in Deno, so potentially more of a spec issue

@sisidovski
Copy link
Collaborator

sisidovski commented Aug 6, 2024

@jeremyroman Filed https://crbug.com/357760925. I'm happy to work on it in spare time.

@rotu
Copy link
Author

rotu commented Aug 6, 2024

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there

@sisidovski I'm not sure if you're saying there is no bug in the spec here.

I have a very hard time reading state-machine-oriented specs. What are the expected "token list" and "part list" and "protocol component" from Constructor string parsing given input "web+foo://example.com/baz"?

@sisidovski
Copy link
Collaborator

@rotu Thanks. I took a look again a bit more in detail, and probably I caught your point. The step 3.11 Run consume a required token given parser and "end" in parse-a-pattern-string will throw TypeError if web+foo is passed, because this algorithm doesn't handle +, which is treated as "other-modifier" token type in the token list, and obviously this is not the "end" token type.

@rotu
Copy link
Author

rotu commented Aug 28, 2024

@sisidovski I don't think I even understood my point when I wrote that.

  • As things stands, it's not even clear what the constructor string can and should look like! The pattern string section could definitely use some examples! The URL spec has many expository examples, which make it more approachable. (It's confusing to me that this spec supports two pattern matching languages, both path-to-regexp-like patterns and regexp patterns. If I had my druthers, I'd probably ditch the new pattern syntax in favor of only regexp, but I doubt you share my appetite for that change!)

  • My naive expectation is that a URL string should also be a valid URL pattern. It a source of future confusion that this spec interprets legal URL characters non-literally in its pattern syntax. For instance, (, {, :, \\ are valid in the query string but would need to be escaped in a URLPattern constructor string. This deserves a prominent note explaining 1. what needs escaping 2. how to escape characters.

  • My naive expectation also that a URL object should not be reinterpreted when converted to a URLPattern. So new URLPattern(new URL('http://foo?json={}')) should NOT be equivalent to new URLPattern('http://foo?json=')

  • It does work to do new URLPattern("web\\+foo://*") (i.e. escape + in the "pattern string" language) or new URLPattern("(web\\+foo)://*") (escape + in a regexp), or new URLPattern("(web[+]foo)://*") (use a character class in a regexp). Per this issue, I don't think this should need escaping.

@jeremyroman
Copy link
Collaborator

I don't think we will ever be able to make all URLs valid URL patterns (or, if they're valid, have the same meaning), though we can make needing escaping a little less common. I agree that describing how to effectively escape (either by hand or programmatically) would be a useful addition (I've written such algorithms myself, and they are indeed not trivial).

I think it's probably possible to allow other-modifier tokens (+ and ?) after some fixed text to get subsumed by it, since it otherwise has no existing syntactic meaning. This would make things like web+foo viable without changing the meaning of :foo? and similar. It's not completely trivial to make this change, though, so I need to actually try to make the change for that to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants