Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Appending Paths without Checking for Query Strings in URLs #193

Open
XYZliang opened this issue Sep 15, 2023 · 1 comment
Open

Comments

@XYZliang
Copy link

XYZliang commented Sep 15, 2023

While integrating with linkedin_scraper, I've come across a potential issue where paths are directly appended to URLs without checking for the presence of query strings. This leads to malformed URLs if the original URL contains a query string.

Current Behavior:
When appending a path to a URL that already has a query string, the result is a malformed URL.
For example, appending details/experience to https://www.linkedin.com/in/douglas-b-b23472b/?trk=people-guest_people_search-card results in https://www.linkedin.com/in/douglas-b-b23472b/?trk=people-guest_people_search-card instead of the desired https://www.linkedin.com/in/douglas-b-b23472b/details/experience?trk=people-guest_people_search-card

Suggested Fix:
Before appending the path, the package should check for the presence of a query string in the URL. If one exists, the path should be inserted before the query string, and then the query string should be appended after the path. Utilizing Python's urlparse can help efficiently manage and restructure the URL.

Impact:
This change will ensure that the URLs constructed by linkedin_scraper are always correctly formatted and valid, reducing potential issues for downstream users and systems.

I believe this fix would greatly enhance the robustness of URL handling in the package. Please let me know if more information or context is needed, and I'd be happy to help further!

@mhoualla
Copy link

mhoualla commented Nov 1, 2023

@alicemy478 and I are interested in investigating this issue. After reviewing the latest commits, it appears that the problem is still present. We could work on a solution that checks for the presence of a query string in the URL before appending the path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants