Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PackageURL not properly re-encoding strings when rendering to string #154

Open
jkugler opened this issue Apr 25, 2024 · 3 comments
Open

Comments

@jkugler
Copy link

jkugler commented Apr 25, 2024

When passing in a URL encoded name to PackageURL.from_string, it de-encodes the string, which is correct to have the actual name. However, when rendering this out as a string, it does not re-encode the string, resulting in an incorrect PURL.

>>> import packageurl
>>> from urllib.parse import quote_plus
>>> quote_plus("parent/child")
'parent%2Fchild'
>>> p = packageurl.PackageURL.from_string(f"pkg:my_type/my_namepace/{quote_plus('parent/child')}/@1234")
>>> p
PackageURL(type='my_type', namespace='my_namepace', name='parent/child', version='1234', qualifiers={}, subpath=None)

That is correct, as the name is parent/child. However:

>>> str(p)
'pkg:my_type/my_namepace/parent/child@1234'

Which is an invalid/incorrect PURL.

The fix looks easy. This line https://github.com/package-url/packageurl-python/blob/main/src/packageurl/__init__.py#L458 instead of being

        purl.append(name)

looks like it should be

        purl.append(urllib.parse.quote_plus(name))
@jkugler jkugler changed the title PackageURL not properly re-encode strings when rendering to string PackageURL not properly re-encoding strings when rendering to string Apr 25, 2024
@jkugler
Copy link
Author

jkugler commented Apr 26, 2024

I've been thinking about this some more, and I don't know if it's strictly a bug, or if it's spec compliant, but it does "break" in the round trip:

p = packageurl.PackageURL.from_string(f"pkg:my_type/my_namepace/{quote_plus('parent/child')}/@1234")
>>> p
PackageURL(type='my_type', namespace='my_namepace', name='parent/child', version='1234', qualifiers={}, subpath=None)
>>> str(p)
'pkg:my_type/my_namepace/parent/child@1234'
>>> p = packageurl.PackageURL.from_string(str(p))
>>> p
PackageURL(type='my_type', namespace='my_namepace/parent', name='child', version='1234', qualifiers={}, subpath=None)

Note namespace and name change, whereas if PackageURL had retained the URL encoding upon __str__ invocation, it would have retained the name of parent/child.

@matt-phylum
Copy link

Related to PR #123

@jkugler
Copy link
Author

jkugler commented Apr 30, 2024

So, another related issue. Is this a bug? Or is this expected behavior?

>>> p = PackageURL.from_string('pkg:maven/com.google.guava%[email protected]')
>>> p
PackageURL(type='maven', namespace=None, name='com.google.guava:guava', version='25.1-jre', qualifiers={}, subpath=None)
>>> str(p)
'pkg:maven/com.google.guava:[email protected]'
>>> PackageURL.from_string(str(p))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    PackageURL.from_string(str(p))
  File "/opt/homebrew/lib/python3.11/site-packages/packageurl/__init__.py", line 512, in from_string
    raise ValueError(msg)
ValueError: Invalid purl 'pkg:maven/com.google.guava:[email protected]' cannot contain a "user:pass@host:port" URL Authority component: ''.

What is the proper behavior here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants