Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL DB #357

Open
pseudotensor opened this issue Jun 30, 2023 · 8 comments
Open

SQL DB #357

pseudotensor opened this issue Jun 30, 2023 · 8 comments

Comments

@pseudotensor pseudotensor changed the title SQLDB SQL DB Jul 10, 2023
@pseudotensor
Copy link
Collaborator Author

https://h2oai.slack.com/archives/C050PKJ6GAX/p1689022427466269

For SQL, we can try flan-t5-xxl or Flan-UL2 and fine-tuning it on the Bird dataset. Those models are known to do surprisingly well compared to much larger models:
Even t5-3B does reasonably: https://bird-bench.github.io/#:~:text=BIRD%20(BIg%20Bench%20for%20LaRge,total%20size%20of%2033.4%20GB.
https://declare-lab.net/instruct-eval/
https://medium.com/@bnjmn_marie/behind-the-hype-models-based-on-t5-2019-still-better-than-vicuna-alpaca-mpt-and-dolly-6c4f1139f39e

Note that flan models have 2048 input context and 512 output context for sequence to sequence. Should be enough for many cases, although fine-tuning on different output context length is possible.

https://huggingface.co/datasets/wikisql
https://paperswithcode.com/dataset/kaggledbqa

@NidhiMehta
Copy link

@pseudotensor
Copy link
Collaborator Author

@pseudotensor
Copy link
Collaborator Author

@rkeshwani
Copy link

Uhh, I realise this is a dump for enabling this functionality but seeing as h20gpt integrates with langchain, thought I would post this here.

https://python.langchain.com/docs/use_cases/qa_structured/sql

Are there any plans to implement any functionality to support SQL databases?
Langchain integrates with SQLAlchemy from my understanding so you could provide support for various databases and let the user supply the connection string or credentials and host?

@pseudotensor
Copy link
Collaborator Author

There are no immediate plans to enable, but a PR is welcome :)

There is also a PR still WIP for elastic search that seems to function, just needs exposed in UI etc. #656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants