Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add sqlite-lines extension #519

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

feat: Add sqlite-lines extension #519

wants to merge 3 commits into from

Conversation

rfhb
Copy link
Contributor

@rfhb rfhb commented Aug 20, 2024

This is a proposal to add the lines_read() function as per source code by @asg017 at https://github.com/asg017/sqlite-lines/ with MIT licence. The documentation is updated, and simple tests are included. Local devtools::test() completed without errors.

As per @asg017's README, "sqlite-lines is great for line-oriented datasets, like ndjson or JSON Lines, when paired with SQLite's JSON support." Thus, I see interesting and important use cases for this extension, providing large performance improvements over loading data using e.g. DBI::dbAppendTable(). Also, other DMSs have similar SQL functions for reading data from file, including duckdb.

I acknowledge this PR pulls in source files from outside the SQLite repository, thus enlarging the risk surface of RSQLite; please let me know about any policy and source control requirements (I am not a C/C++ developer and cannot offer source code quality review).

Many thanks for considering!

PS. I could not get the derived fileio_scan() function working as available at https://github.com/nalgeon/sqlean/blob/main/docs/fileio.md

@krlmlr
Copy link
Member

krlmlr commented Aug 20, 2024

Thanks, Ralf!

I see in the checks at https://github.com/r-dbi/RSQLite/actions/runs/10479285293/job/29024577294?pr=519#step:14:503 :

Error: Error: Failed to load extension: /home/runner/work/RSQLite/RSQLite/check/RSQLite.Rcheck/RSQLite/libs/RSQLite.so: undefined symbol: sqlite3_lines_init

What do you mean by "could not get fileio to work"? That other repo seems to have quite a few more GitHub stars -- not the best metric, but still a signal.

@asg017
Copy link

asg017 commented Aug 21, 2024

Thanks for putting this together @rfhb!

One question I have: Would it be possible to add a RSQLite::loadExtension(db, "./custom_extension") function to RSQLite, that allows you to load in any SQLite extension?

The reason I ask: I have published 12+ SQLite extensions that can be used in a variety of environments. They all have "bindings" to different programming languages and library. For example, you can npm install sqlite-lines, pip install sqlite-lines, gem install sqlite-lines, and cargo add sqlite-lines for Python/JavaScript/Ruby/Rust. I have packaged all my extensions on those package managers, and all of those languages have SQLite libraries that support loadExtension() methods.

But RSQLite is the exception. I do like how you have an allow-list of SQLite extensions with ::initExtension(), but not all SQLite extensions can reasonably be compiled and included into RSQLite. Some are written in C++/Rust/Go, and others would be disruptive inside RSQLite environments. But if we could load in any SQLite extension, then there would be a path to include vector search/geospatial/regular expression extensions with RSQLite.

I know this is a big ask, and out-of-scope for this PR. But in case you all are interested, I can make a separate issue and add info there. If so, I'd be open to publishing my extensions on CRAN!

…ent in upgrade.R (was inadvertently disabled)
@rfhb
Copy link
Contributor Author

rfhb commented Aug 21, 2024

Error: Error: Failed to load extension: /home/runner/work/RSQLite/RSQLite/check/RSQLite.Rcheck/RSQLite/libs/RSQLite.so: undefined symbol: sqlite3_lines_init

Thanks, inadvertently I had left disabled parts in upgrade.R that manage the repo, will update PR. Not sure if I can activate the source GH workflows in my fork to check my commits.

@krlmlr
Copy link
Member

krlmlr commented Aug 21, 2024

Alex, I like the idea of packaging the extra extensions in separate R packages. I remember tweaking the SQLite code to essentially disable dynamic loading. I also think we want to use R to load the shared library provided by the separate R package, and then pass a function pointer into the main RSQLite package for registering the extension. Would you like to draft an implementation? Happy to review!

Ralf, there shouldn't be a need to activate GHA on your fork, you can always push to this PR. On the other hand, I don't see a reason why activating GHA on the fork would fail.

@rfhb
Copy link
Contributor Author

rfhb commented Aug 21, 2024

One question I have: Would it be possible to add a RSQLite::loadExtension(db, "./custom_extension") function to RSQLite, that allows you to load in any SQLite extension?

Thank you @asg017 Alex! I would favour a generic solution that enables users to decide on extensions while minimising the efforts for the maintainer.

Best for @krlmlr Kirill to express a view on what would be best in line with plans and principles for feature development for RSQLite.

I know this is a big ask, and out-of-scope for this PR. But in case you all are interested, I can make a separate issue and add info there. If so, I'd be open to publishing my extensions on CRAN!

From my perspective this would be great to advance and perhaps broaden the discussion of options, thank you.

@rfhb
Copy link
Contributor Author

rfhb commented Aug 21, 2024

I realise that compilation fails under MS Windows because R uses mingw gcc, which does not support GNU getdelim() functions. No need to act on this PR, and I can close if you want, considering your earlier comment on dynamic loading - which in R worked for me with e.g.

con <- DBI::dbConnect(drv = RSQLite::SQLite())
DBI::dbExecute(con, "SELECT load_extension('~/some.dylib')")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants