Skip to content

Commit

Permalink
Dev minor (#1321)
Browse files Browse the repository at this point in the history
* no-verify (#1314)

* no-verify

* update readme

* Kg testing (#1300)

* Feature/encapsulate orchestration (#1265)

* fully encapsulate orchestration

* fully encapsulate orchestration

* complete encapsulation

* revert import cmt

* making default r2r lighter (#1268)

* making default r2r lighter

* fix bug in ingest files

* checkin

* workingupdate

* complete simple orch

* update docs

* up (#1273)

* up

* up

* merge (#1276)

* Postgres configuration settings (#1277)

* Improvements on Auth in JS, CLI (#1267)

* CLI Telemetry (#1266)

* check in

* working

* redundant

* JS auth improvements (#1263)

* Check in JS auth improvements

* Update login with toke

* Fix to allow disabling telemetry

* fix lock

* Try to avoid merge conflicts

* Clean up collection bugs

* remove comments

* Add Postgres configuration settings

* Image

* bad github conflict

* merge (#1278)

* port KG to postgres (#1272)

* create + cluster

* local search

* up

* clean

* format

* basics

* add collection_id and paginate

* rename

* change api

* up

* kg_creation_status

* up

* up

* up

* Feature/cleanup docker (#1279)

* merge

* up

* rm neo4j refs and cleanup docker cmds

* fixup

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* Patch/cleanup kg migration (#1281)

* cleanup kg migration

* up

* Kg testing (#1280)

* up

* up

* up

* up

* slay neo4j

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* add back poetry lock

* Default Collections (#1282)

* Default collections

* Naughty naughty need to follow the SRP

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* Testing (#1284)

* CICD

* actions

* poetry

* poetry

* Add env vars

* name

* increase timeout

* add user to collection

* change postgres project name

* Kg testing (#1283)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/fix logic bugs (#1285)

* fixing minor logic bugs in dev branch

* fixing minor logic bugs in dev branch

* merge

* up

* Application docs

* add image (#1287)

* Add version to CLI telemetry (#1288)

* add image

* Add version to cli telemetry

* up

* KG hatchet orchestration (#1286)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

* up

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/update documentation rebased (#1289)

* up

* merge

* rebase

* fix ingestion issues (#1291)

* fix ingestion issues

* fix lock file

* fix embedding

* Fix SDK KG Serialization (#1292)

* add image

* serialization

* cleanup cli (#1294)

* CLI serialization (#1295)

* add image

* Fix more serialization around kg

* Nolan/schemacreation (#1296)

* add image

* Fix more serialization around kg

* add quotes to prevent reserved keywords from failing

* Prevent errors if config name is reserved name in postgres (#1297)

* Prevent reserved words (#1298)

* default collection ID

* up

* Move default collection id method to utils (#1299)

* up

* Allow json fallback (#1301)

* hotfix: import

* Fix description error (#1302)

* up

* push

* up (#1303)

* up

* up

* up

* up

* minor tweaks

* up

* mypy

* add back missing file

* up

* up

* up

* fix id

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>

* add way to access entities and relationships (#1313)

* Feature/encapsulate orchestration (#1265)

* fully encapsulate orchestration

* fully encapsulate orchestration

* complete encapsulation

* revert import cmt

* making default r2r lighter (#1268)

* making default r2r lighter

* fix bug in ingest files

* checkin

* workingupdate

* complete simple orch

* update docs

* up (#1273)

* up

* up

* merge (#1276)

* Postgres configuration settings (#1277)

* Improvements on Auth in JS, CLI (#1267)

* CLI Telemetry (#1266)

* check in

* working

* redundant

* JS auth improvements (#1263)

* Check in JS auth improvements

* Update login with toke

* Fix to allow disabling telemetry

* fix lock

* Try to avoid merge conflicts

* Clean up collection bugs

* remove comments

* Add Postgres configuration settings

* Image

* bad github conflict

* merge (#1278)

* port KG to postgres (#1272)

* create + cluster

* local search

* up

* clean

* format

* basics

* add collection_id and paginate

* rename

* change api

* up

* kg_creation_status

* up

* up

* up

* Feature/cleanup docker (#1279)

* merge

* up

* rm neo4j refs and cleanup docker cmds

* fixup

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* Patch/cleanup kg migration (#1281)

* cleanup kg migration

* up

* Kg testing (#1280)

* up

* up

* up

* up

* slay neo4j

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* add back poetry lock

* Default Collections (#1282)

* Default collections

* Naughty naughty need to follow the SRP

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* Testing (#1284)

* CICD

* actions

* poetry

* poetry

* Add env vars

* name

* increase timeout

* add user to collection

* change postgres project name

* Kg testing (#1283)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/fix logic bugs (#1285)

* fixing minor logic bugs in dev branch

* fixing minor logic bugs in dev branch

* merge

* up

* Application docs

* add image (#1287)

* Add version to CLI telemetry (#1288)

* add image

* Add version to cli telemetry

* up

* KG hatchet orchestration (#1286)

* up

* up

* cleanup kg migration

* up

* up

* up

* Kg testing (#1280)

* up

* up

* up

* up

* rename

* project name

* up

* add chunk order

* fragments => extractions

* bug squash

* up

* up

* up

* change postgres project name

* up

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Feature/update documentation rebased (#1289)

* up

* merge

* rebase

* fix ingestion issues (#1291)

* fix ingestion issues

* fix lock file

* fix embedding

* Fix SDK KG Serialization (#1292)

* add image

* serialization

* cleanup cli (#1294)

* CLI serialization (#1295)

* add image

* Fix more serialization around kg

* Nolan/schemacreation (#1296)

* add image

* Fix more serialization around kg

* add quotes to prevent reserved keywords from failing

* Prevent errors if config name is reserved name in postgres (#1297)

* Prevent reserved words (#1298)

* default collection ID

* up

* Move default collection id method to utils (#1299)

* up

* Allow json fallback (#1301)

* hotfix: import

* Fix description error (#1302)

* up

* push

* up (#1303)

* up

* up

* up

* up

* minor tweaks

* up

* mypy

* add back missing file

* up

* add way to access entities and relationships

* up

* up

* fix id

* up

* refine end pts

---------

Co-authored-by: Nolan Tremelling <[email protected]>
Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Add collection (#1317)

* add filters (in progress)

* todo comment

* minor addition

* modify command

* Permission Fixes around Collection Management (#1316)

* Allow super users to update others to super user status

* Fix auth on collections endpoints

* Better error message

* filters (#1318)

* Cost estimate (#1319)

* up

* slightly modify

* up

* minor fix

* docs

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>
  • Loading branch information
3 people authored Oct 3, 2024
1 parent 85e851e commit 55d7d87
Show file tree
Hide file tree
Showing 78 changed files with 1,655 additions and 418 deletions.
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ export POSTGRES_PASSWORD=your_password
export POSTGRES_HOST=your_host
export POSTGRES_PORT=your_port
export POSTGRES_DBNAME=your_db
export POSTGRES_PROJECT_NAME=your_project_name
export R2R_PROJECT_NAME=your_project_name
2 changes: 1 addition & 1 deletion .github/workflows/integration-test-workflow-debian.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }}
POSTGRES_HOST: ${{ secrets.POSTGRES_HOST }}
POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }}
R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }}

steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/py-ci-cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }}
R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }}

steps:
- name: Checkout code
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/r2r-js-sdk-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ jobs:
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }}
POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }}
R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
r2r serve --docker
Expand Down
2 changes: 1 addition & 1 deletion docs/api-reference/openapi.json

Large diffs are not rendered by default.

63 changes: 57 additions & 6 deletions docs/cookbooks/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -166,16 +166,44 @@ Knowledge graph creation is done in two steps:
1. `create-graph`: Extracts nodes and relationships from your input document collection.
2. `enrich-graph`: Enhances the graph structure through clustering and explaining entities (commonly referred to as `GraphRAG`).
```bash
# collection ID is optional. If you don't specify one, the default collection will be used.
r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
```
This will run a cost estimation step to give you an estimate of the cost of the graph creation process.
```bash Example Response
Time taken: 0.21 seconds
{
"results": {
"message": "These are estimated ranges, actual values may vary. To run the KG creation process, run `create-graph` with `--run` in the cli, or `run_mode=\"run\"` in the client.",
"document_count": 2,
"number_of_jobs_created": 3,
"total_chunks": 29,
"estimated_entities": "290 - 580",
"estimated_triples": "362 - 870",
"estimated_llm_calls": "348 - 638",
"estimated_total_in_out_tokens_in_millions": "0 - 1",
"estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.0 - 0.17",
"estimated_cost_in_usd": "0.0 - 0.06"
}
}
```
Then, you can run the graph creation process with:
```bash
# document-ids are optional
r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run
```
```bash Example Response
[{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]
```
This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j:
This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j (deprecated as of now. We are working on a new visualization tool):
```
MATCH (a)
Expand Down Expand Up @@ -226,13 +254,36 @@ Now we have a graph, but this graph is not searchable yet. We need to perform th
The graph enrichment step adds node and relationship descriptions, performs hierarchical leiden clustering to create communities, and embeds the descriptions. These embeddings will be used later in the local search stage of the pipeline. If you are more interested in the algorithm, please refer to the blog post [here](https://www.sciphi.ai/blog/graphrag).
```bash
r2r enrich-graph
# collection ID is optional. If you don't specify one, the default collection will be used.
r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09
```
Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community.
Similar to the graph creation step, this will run a cost estimation step to give you an estimate of the cost of the graph enrichment process.
![Enriched Graph](../images/enriched.png)
```bash Example Response
Time taken: 0.22 seconds
{
"results": {
"total_entities": 269,
"total_triples": 345,
"estimated_llm_calls": "26 - 53",
"estimated_total_in_out_tokens_in_millions": "0.05 - 0.11",
"estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.01 - 0.02",
"estimated_cost_in_usd": "0.0 - 0.01"
}
}
```
Now, you can run the graph enrichment process with:
```bash
r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run
```
Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community. Following is a visualization of the enriched graph (deprecated as of now. We are working on a new visualization tool):
![Enriched Graph](../images/enriched.png)
## Search
Expand Down
2 changes: 1 addition & 1 deletion docs/documentation/configuration/postgres.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ export POSTGRES_PASSWORD=your_postgres_password
export POSTGRES_HOST=your_postgres_host
export POSTGRES_PORT=your_postgres_port
export POSTGRES_DBNAME=your_database_name
export POSTGRES_PROJECT_NAME=your_project_name
export R2R_PROJECT_NAME=your_project_name
```

## Advanced Postgres Features in R2R
Expand Down
2 changes: 1 addition & 1 deletion docs/documentation/deep-dive/providers/database.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ export POSTGRES_PASSWORD=your_postgres_password
export POSTGRES_HOST=your_postgres_host
export POSTGRES_PORT=your_postgres_port
export POSTGRES_DBNAME=your_database_name
export POSTGRES_PROJECT_NAME=your_project_name
export R2R_PROJECT_NAME=your_project_name
```
Environment variables take precedence over the config settings in case of conflicts. The R2R Docker includes configuration options that facilitate integration with a combined Postgres+pgvector database setup.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Vector storage is a crucial component in R2R (RAG to Riches) for efficient simil
- `POSTGRES_HOST`
- `POSTGRES_PORT`
- `POSTGRES_DBNAME`
- `POSTGRES_PROJECT_NAME`
- `R2R_PROJECT_NAME`

3. **Check Docker Network:**
If using Docker, ensure the R2R and Postgres containers are on the same network:
Expand Down
4 changes: 2 additions & 2 deletions docs/documentation/installation/full/local-system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,10 @@ R2R requires connections to various services. Set up the following environment v
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below
export R2R_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below
```
<Note>
The `POSTGRES_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization.
The `R2R_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization.
</Note>
</Accordion>
<Accordion title="Unstructured" icon="puzzle">
Expand Down
4 changes: 2 additions & 2 deletions docs/documentation/installation/light/local-system.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ R2R requires connections to various services. Set up the following environment v
export POSTGRES_HOST=$YOUR_POSTGRES_HOST
export POSTGRES_PORT=$YOUR_POSTGRES_PORT
export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME
export POSTGRES_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below
export R2R_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below
```
<Note>
The `POSTGRES_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization.
The `R2R_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the required tables for R2R do not exist then they will be created by R2R during initialization.
</Note>
If you are unfamiliar with Postgres then <a href="https://supabase.com/docs"> Supabase's free cloud offering </a> is a good place to start.
</Accordion>
Expand Down
8 changes: 4 additions & 4 deletions docs/introduction/whats-new.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ description: 'Changelog'
icon: 'bell'
---

## Version 0.3.10 — Sep. 6, 2024
## Version 0.3.20 — Sep. 6, 2024

### New Features
- Orchestration with [Hatchet](https://github.com/hatchet-dev/hatchet)
- Default ingestion provider set to [Unstructured](https://docs.unstructured.io/welcome)
- Improved knowledge graph construction process
- [R2R Light](https://r2r-docs.sciphi.ai/documentation/installation/light/local-system) installation added
- Removed Neo4j and implemented GraphRAG inside of Postgres
- Improved efficiency and configurability of knowledge graph construction process

### Bug Fixes
- Minor bug fixes around config logic and other.
36 changes: 20 additions & 16 deletions js/sdk/__tests__/r2rClientIntegrationUser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -88,24 +88,29 @@ describe("r2rClient Integration Tests", () => {
).resolves.not.toThrow();
});

// test("User", async () => {
// const asdf = await client.user();
// console.log(asdf);
test("User", async () => {
const asdf = await client.user();

// await expect(client.user()).resolves.not.toThrow();
await expect(client.user()).resolves.not.toThrow();
});

// });
test("Update user profile", async () => {
const userId = "2bf8fd84-91ec-5048-9eb8-cf2ee9d66b64";
const email = "[email protected]";
const name = "New Name";
const bio = "Updated bio";
const profilePicture = "http://example.com/new-profile-pic.jpg";

// test("Update user profile", async () => {
// const email = "[email protected]";
// const name = "New Name";
// const bio = "Updated bio";
// const profilePicture = "http://example.com/new-profile-pic.jpg";
await expect(
client.updateUser(userId, email, undefined, name, bio, profilePicture),
).resolves.not.toThrow();
});

// await expect(
// client.updateUser(email, name, bio, profilePicture)
// ).resolves.not.toThrow();
// });
test("Login", async () => {
await expect(
client.login("[email protected]", "password"),
).resolves.not.toThrow();
});

test("Ingest file", async () => {
const files = [
Expand Down Expand Up @@ -189,8 +194,7 @@ describe("r2rClient Integration Tests", () => {

test("Login after logout", async () => {
await expect(
client.login("[email protected]", "password"),
// client.login("[email protected]", "password"),
client.login("[email protected]", "password"),
).resolves.not.toThrow();
});

Expand Down
2 changes: 1 addition & 1 deletion js/sdk/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 21 additions & 8 deletions js/sdk/src/r2rClient.ts
Original file line number Diff line number Diff line change
Expand Up @@ -335,20 +335,33 @@ export class r2rClient {
*/
@feature("updateUser")
async updateUser(
userId: string,
email?: string,
isSuperuser?: boolean,
name?: string,
bio?: string,
profilePicture?: string,
): Promise<any> {
this._ensureAuthenticated();
return await this._makeRequest("PUT", "user", {
data: {
email,
name,
bio,
profile_picture: profilePicture,
},
});

let data: Record<string, any> = { user_id: userId };
if (email !== undefined) {
data.email = email;
}
if (isSuperuser !== undefined) {
data.is_superuser = isSuperuser;
}
if (name !== undefined) {
data.name = name;
}
if (bio !== undefined) {
data.bio = bio;
}
if (profilePicture !== undefined) {
data.profile_picture = profilePicture;
}

return await this._makeRequest("PUT", "user", { data });
}

/**
Expand Down
2 changes: 1 addition & 1 deletion py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Alternatively, you may run R2R directly from the python package, but additional
```bash
# export OPENAI_API_KEY=sk-...
# export POSTGRES...
pip install 'r2r[core]'
pip install 'r2r[core,ingestion-bundle]'
r2r --config-name=default serve
```

Expand Down
7 changes: 5 additions & 2 deletions py/cli/commands/ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,13 @@ def update_files(ctx, file_paths, document_ids, metadatas):


@cli.command()
@click.option("--v2", is_flag=True, help="use aristotle_v2.txt (a smaller file)")
@pass_context
def ingest_sample_file(ctx):
def ingest_sample_file(ctx, v2=False):
"""Ingest the first sample file into R2R."""
sample_file_url = "https://raw.githubusercontent.com/SciPhi-AI/R2R/main/py/core/examples/data/aristotle.txt"
sample_file_url = (
f"https://raw.githubusercontent.com/SciPhi-AI/R2R/main/py/core/examples/data/aristotle{'_v2' if v2 else ''}.txt"
)
client = ctx.obj

with timer():
Expand Down
Loading

0 comments on commit 55d7d87

Please sign in to comment.