Remove use of `requests`, switch to `httpx`, support Async http requests #235

ashleysommer · 2024-07-25T23:43:22Z

HTTPX is a much more modern HTTP library than requests, it is higher performance, supports async/await, supports HTTP/2, and supports steaming large responses (in sync or async mode).

The downside is the httpx library is larger in size and has more dependencies to pull in than requests does.

Related to #157

I think switching from requests to httpx would open the door to a bunch of features in SparqlWrapper, including better Session support, streaming requests and streaming responses, and multiple parallel queries.

Opinions? @edmondchuc @nicholascar

The text was updated successfully, but these errors were encountered:

vemonet · 2024-09-10T15:25:54Z

@ashleysommer SPARQLWrapper currently does not use requests unfortunately, it uses urllib from python standard lib: https://github.com/RDFLib/sparqlwrapper/blob/master/SPARQLWrapper/Wrapper.py#L838

In our case it is causing issues: it mysteriously fails for some endpoints that are behind proxies, while bare requests is working, cf. #234

So yes that would be really nice to upgrade SPARQLWrapper to use a reliable library for HTTP requests. Right now we don't use it at all anymore because urllib is not reliable (most python developers don't use urllib, probably there is a reason, people don't install external libraries for fun if the standard lib is decent :) )

edmondchuc · 2024-09-11T01:49:44Z

I think it's a good move to switch to httpx. I've been using it for a while now and it's worked well in both sync and async contexts. Being able to stream responses is a big plus too.

I'm actually using somewhat of a spiritual successor to sparqlwrapper based on httpx with query results modelled using pydantic. There are more things I want to add there such as streaming support but it's been catering to my needs in various projects. If anyone's interested, I can put the code somewhere and use it as a base.

ashleysommer · 2024-09-11T03:26:17Z

@vemonet

SPARQLWrapper currently does not use requests unfortunately, it uses urllib from python

You're right. I don't know why I thought it uses requests. I think I mis-read the details in this other Issue: #157 and I assumed sparqlwrapper already uses requests.

@edmondchuc

spiritual successor to sparqlwrapper based on httpx with query results modelled using pydantic

This sounds great. Does it support all of the 11 different SPARQL backend kinds that sparqlwrapper does?

edmondchuc · 2024-09-11T03:58:16Z

It should work with any backend that accepts URLs and header values based on the SPARQL HTTP Protocol since the user is just sending a regular httpx request. Any of the request parameters can be adjusted as needed.

response = client.post(url, headers=headers, ...)

The nice part is it will convert to a pydantic query results object if the response is JSON. If the result is text/turtle from a CONSTRUCT query, it'll return as text and the user can do what they want with it.

I had a quick look at the sparqlwrapper docs, the different backends it supports is really just detailing what header values to set in combination with the kind of query, isn't it?

Maybe an enhancement could look like this, where there are methods that map to the kind of query (SELECT, CONSTRUCT, UPDATE) that's sent to the server.

# This will set the Accept header to application/n-triples by default
response = client.construct(...)

We may be able to determine the kind of query by parsing it using this sparql parser and auto-set the correct headers. This will be slower (have to parse the tree etc) but could be a good idea to at least validate that the query is correct before sending it off.

vemonet · 2024-09-11T12:01:12Z

We may be able to determine the kind of query by parsing it using this sparql parser

Why not just using the RDFLib SPARQL parser? RDFLib is already a dependency of SPARQLWrapper, and it is probably better maintained than this repo that has not been published to pip yet

from rdflib.plugins.sparql import prepareQuery

sq = prepareQuery(sparql_query)
print(sq.algebra.name)
# SelectQuery or ConstructQuery or DescribeQuery or AskQuery...

with query results modelled using pydantic

I like pydantic, but it is introducing a new dependency that might conflict when SPARQLWrapper gets imported in projects that are already using pydantic (and all the issues brought by v1 vs v2 pydantic...)

Is it really needed here? Could we achieve the same with a dataclass with almost the same amount of code? Deserializing the SPARQL result JSON is not a lot of work

Maybe an enhancement could look like this, where there are methods that map to the kind of query (SELECT, CONSTRUCT, UPDATE) that's sent to the server.
# This will set the Accept header to application/n-triples by default
response = client.construct(...)

That could be nice to have the 2:

A generic .query() that returns SelectResults | str | bool, in this case the dev will not get fully automated help from type annotation, but they can still add the type annotation themselves, or handle result depending on the type of the object they get (if isinstance()...)
Specific .construct()/.select() methods which return the right type directly. .query() could just parse the query and run one of .construct()/.select() depending on the query type

edmondchuc · 2024-09-11T13:21:20Z

Good points @vemonet and I completely agree with not adding pydantic as a dependency and to just use dataclasses.

Yes, no reason not to use RDFLib's sparql parser. I was just outlining what I had in mind with the standalone sparql client project where I had the idea of adding a lightweight parser to check for errors etc before sending the query off.

Keen to get this other sparql client added within sparqlwrapper as I am using it quite a bit now. Should it go in a separate package under something like next? I can add it as a PR when I have some time and we can all review it and build on top of it.

vemonet · 2024-09-12T07:20:41Z

I would love to see these improvements in sparqlwrapper! But I just checked the last commits and PRs, and there have not been any PR merged in more than 2 years, with not a lot of activity.

Do you think such a big change will be actually merged at some point? Is there interest for this kind of substantive improvements from the maintainers? @nicholascar

Or would it be better to just fork the project or create a new package?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove use of `requests`, switch to `httpx`, support Async http requests #235

Remove use of `requests`, switch to `httpx`, support Async http requests #235

ashleysommer commented Jul 25, 2024

vemonet commented Sep 10, 2024 •

edited

Loading

edmondchuc commented Sep 11, 2024

ashleysommer commented Sep 11, 2024 •

edited

Loading

edmondchuc commented Sep 11, 2024

vemonet commented Sep 11, 2024

edmondchuc commented Sep 11, 2024

vemonet commented Sep 12, 2024

Remove use of requests, switch to httpx, support Async http requests #235

Remove use of requests, switch to httpx, support Async http requests #235

Comments

ashleysommer commented Jul 25, 2024

vemonet commented Sep 10, 2024 • edited Loading

edmondchuc commented Sep 11, 2024

ashleysommer commented Sep 11, 2024 • edited Loading

edmondchuc commented Sep 11, 2024

vemonet commented Sep 11, 2024

edmondchuc commented Sep 11, 2024

vemonet commented Sep 12, 2024

Remove use of `requests`, switch to `httpx`, support Async http requests #235

Remove use of `requests`, switch to `httpx`, support Async http requests #235

vemonet commented Sep 10, 2024 •

edited

Loading

ashleysommer commented Sep 11, 2024 •

edited

Loading