Does not tolerate `+` in protocol #228

rotu · 2024-08-02T20:11:06Z

What is the issue with the URL Pattern Standard?

URLPattern doesn't tolerate + in protocol. The polyfill gives this error:

new URLPattern("web+foo://example.com/baz")

TypeError: Failed to construct 'URLPattern': Unexpected OTHER_MODIFIER at 3, expected END

This is especially a problem since the web+ prefix is mandatory when registering schemes.

The text was updated successfully, but these errors were encountered:

sisidovski · 2024-08-06T15:01:29Z

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there, but this is valid. To canonicalize a protocol, it uses the result of running basic URL parser. And in scheme state
, a character should be an ASCII alphanumeric, U+002B (+), U+002D (-), or U+002E (.). So it should tolerate +, like git+https.

Actually chromium throws an error with the protocol string containing +, it seems to be a bug to be fixed.

jeremyroman · 2024-08-06T15:56:18Z

I haven't looked into this, but @sisidovski if you think this is just a bug in the Chromium implementation, could you file & link a Chromium bug?

crowlKats · 2024-08-06T15:58:46Z

for reference, this is also happening in Deno, so potentially more of a spec issue

sisidovski · 2024-08-06T16:15:01Z

@jeremyroman Filed https://crbug.com/357760925. I'm happy to work on it in spare time.

rotu · 2024-08-06T17:29:27Z

The polyfill is managed in https://github.com/kenchris/urlpattern-polyfill and should be claimed there

@sisidovski I'm not sure if you're saying there is no bug in the spec here.

I have a very hard time reading state-machine-oriented specs. What are the expected "token list" and "part list" and "protocol component" from Constructor string parsing given input "web+foo://example.com/baz"?

sisidovski · 2024-08-28T16:05:12Z

@rotu Thanks. I took a look again a bit more in detail, and probably I caught your point. The step 3.11 Run consume a required token given parser and "end" in parse-a-pattern-string will throw TypeError if web+foo is passed, because this algorithm doesn't handle +, which is treated as "other-modifier" token type in the token list, and obviously this is not the "end" token type.

rotu · 2024-08-28T18:52:45Z

@sisidovski I don't think I even understood my point when I wrote that.

As things stands, it's not even clear what the constructor string can and should look like! The pattern string section could definitely use some examples! The URL spec has many expository examples, which make it more approachable. (It's confusing to me that this spec supports two pattern matching languages, both path-to-regexp-like patterns and regexp patterns. If I had my druthers, I'd probably ditch the new pattern syntax in favor of only regexp, but I doubt you share my appetite for that change!)
My naive expectation is that a URL string should also be a valid URL pattern. It a source of future confusion that this spec interprets legal URL characters non-literally in its pattern syntax. For instance, (, {, :, \\ are valid in the query string but would need to be escaped in a URLPattern constructor string. This deserves a prominent note explaining 1. what needs escaping 2. how to escape characters.
My naive expectation also that a URL object should not be reinterpreted when converted to a URLPattern. So new URLPattern(new URL('http://foo?json={}')) should NOT be equivalent to new URLPattern('http://foo?json=')
It does work to do new URLPattern("web\\+foo://*") (i.e. escape + in the "pattern string" language) or new URLPattern("(web\\+foo)://*") (escape + in a regexp), or new URLPattern("(web[+]foo)://*") (use a character class in a regexp). Per this issue, I don't think this should need escaping.

jeremyroman · 2024-09-27T20:59:32Z

I don't think we will ever be able to make all URLs valid URL patterns (or, if they're valid, have the same meaning), though we can make needing escaping a little less common. I agree that describing how to effectively escape (either by hand or programmatically) would be a useful addition (I've written such algorithms myself, and they are indeed not trivial).

I think it's probably possible to allow other-modifier tokens (+ and ?) after some fixed text to get subsumed by it, since it otherwise has no existing syntactic meaning. This would make things like web+foo viable without changing the meaning of :foo? and similar. It's not completely trivial to make this change, though, so I need to actually try to make the change for that to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does not tolerate `+` in protocol #228

Does not tolerate `+` in protocol #228

rotu commented Aug 2, 2024

sisidovski commented Aug 6, 2024

jeremyroman commented Aug 6, 2024

crowlKats commented Aug 6, 2024 •

edited

Loading

sisidovski commented Aug 6, 2024 •

edited

Loading

rotu commented Aug 6, 2024

sisidovski commented Aug 28, 2024

rotu commented Aug 28, 2024

jeremyroman commented Sep 27, 2024

Does not tolerate + in protocol #228

Does not tolerate + in protocol #228

Comments

rotu commented Aug 2, 2024

What is the issue with the URL Pattern Standard?

sisidovski commented Aug 6, 2024

jeremyroman commented Aug 6, 2024

crowlKats commented Aug 6, 2024 • edited Loading

sisidovski commented Aug 6, 2024 • edited Loading

rotu commented Aug 6, 2024

sisidovski commented Aug 28, 2024

rotu commented Aug 28, 2024

jeremyroman commented Sep 27, 2024

Does not tolerate `+` in protocol #228

Does not tolerate `+` in protocol #228

crowlKats commented Aug 6, 2024 •

edited

Loading

sisidovski commented Aug 6, 2024 •

edited

Loading