From 55d7d873b0172f91814fc3579d64ce93922aa512 Mon Sep 17 00:00:00 2001 From: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Date: Wed, 2 Oct 2024 18:12:00 -0700 Subject: [PATCH] Dev minor (#1321) * no-verify (#1314) * no-verify * update readme * Kg testing (#1300) * Feature/encapsulate orchestration (#1265) * fully encapsulate orchestration * fully encapsulate orchestration * complete encapsulation * revert import cmt * making default r2r lighter (#1268) * making default r2r lighter * fix bug in ingest files * checkin * workingupdate * complete simple orch * update docs * up (#1273) * up * up * merge (#1276) * Postgres configuration settings (#1277) * Improvements on Auth in JS, CLI (#1267) * CLI Telemetry (#1266) * check in * working * redundant * JS auth improvements (#1263) * Check in JS auth improvements * Update login with toke * Fix to allow disabling telemetry * fix lock * Try to avoid merge conflicts * Clean up collection bugs * remove comments * Add Postgres configuration settings * Image * bad github conflict * merge (#1278) * port KG to postgres (#1272) * create + cluster * local search * up * clean * format * basics * add collection_id and paginate * rename * change api * up * kg_creation_status * up * up * up * Feature/cleanup docker (#1279) * merge * up * rm neo4j refs and cleanup docker cmds * fixup * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * Patch/cleanup kg migration (#1281) * cleanup kg migration * up * Kg testing (#1280) * up * up * up * up * slay neo4j --------- Co-authored-by: Shreyas Pimpalgaonkar * add back poetry lock * Default Collections (#1282) * Default collections * Naughty naughty need to follow the SRP * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * Testing (#1284) * CICD * actions * poetry * poetry * Add env vars * name * increase timeout * add user to collection * change postgres project name * Kg testing (#1283) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name --------- Co-authored-by: emrgnt-cmplxty * Feature/fix logic bugs (#1285) * fixing minor logic bugs in dev branch * fixing minor logic bugs in dev branch * merge * up * Application docs * add image (#1287) * Add version to CLI telemetry (#1288) * add image * Add version to cli telemetry * up * KG hatchet orchestration (#1286) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name * up * up --------- Co-authored-by: emrgnt-cmplxty * Feature/update documentation rebased (#1289) * up * merge * rebase * fix ingestion issues (#1291) * fix ingestion issues * fix lock file * fix embedding * Fix SDK KG Serialization (#1292) * add image * serialization * cleanup cli (#1294) * CLI serialization (#1295) * add image * Fix more serialization around kg * Nolan/schemacreation (#1296) * add image * Fix more serialization around kg * add quotes to prevent reserved keywords from failing * Prevent errors if config name is reserved name in postgres (#1297) * Prevent reserved words (#1298) * default collection ID * up * Move default collection id method to utils (#1299) * up * Allow json fallback (#1301) * hotfix: import * Fix description error (#1302) * up * push * up (#1303) * up * up * up * up * minor tweaks * up * mypy * add back missing file * up * up * up * fix id * up --------- Co-authored-by: emrgnt-cmplxty <68796651+emrgnt-cmplxty@users.noreply.github.com> Co-authored-by: emrgnt-cmplxty Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> * add way to access entities and relationships (#1313) * Feature/encapsulate orchestration (#1265) * fully encapsulate orchestration * fully encapsulate orchestration * complete encapsulation * revert import cmt * making default r2r lighter (#1268) * making default r2r lighter * fix bug in ingest files * checkin * workingupdate * complete simple orch * update docs * up (#1273) * up * up * merge (#1276) * Postgres configuration settings (#1277) * Improvements on Auth in JS, CLI (#1267) * CLI Telemetry (#1266) * check in * working * redundant * JS auth improvements (#1263) * Check in JS auth improvements * Update login with toke * Fix to allow disabling telemetry * fix lock * Try to avoid merge conflicts * Clean up collection bugs * remove comments * Add Postgres configuration settings * Image * bad github conflict * merge (#1278) * port KG to postgres (#1272) * create + cluster * local search * up * clean * format * basics * add collection_id and paginate * rename * change api * up * kg_creation_status * up * up * up * Feature/cleanup docker (#1279) * merge * up * rm neo4j refs and cleanup docker cmds * fixup * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * Patch/cleanup kg migration (#1281) * cleanup kg migration * up * Kg testing (#1280) * up * up * up * up * slay neo4j --------- Co-authored-by: Shreyas Pimpalgaonkar * add back poetry lock * Default Collections (#1282) * Default collections * Naughty naughty need to follow the SRP * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * Testing (#1284) * CICD * actions * poetry * poetry * Add env vars * name * increase timeout * add user to collection * change postgres project name * Kg testing (#1283) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name --------- Co-authored-by: emrgnt-cmplxty * Feature/fix logic bugs (#1285) * fixing minor logic bugs in dev branch * fixing minor logic bugs in dev branch * merge * up * Application docs * add image (#1287) * Add version to CLI telemetry (#1288) * add image * Add version to cli telemetry * up * KG hatchet orchestration (#1286) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name * up * up --------- Co-authored-by: emrgnt-cmplxty * Feature/update documentation rebased (#1289) * up * merge * rebase * fix ingestion issues (#1291) * fix ingestion issues * fix lock file * fix embedding * Fix SDK KG Serialization (#1292) * add image * serialization * cleanup cli (#1294) * CLI serialization (#1295) * add image * Fix more serialization around kg * Nolan/schemacreation (#1296) * add image * Fix more serialization around kg * add quotes to prevent reserved keywords from failing * Prevent errors if config name is reserved name in postgres (#1297) * Prevent reserved words (#1298) * default collection ID * up * Move default collection id method to utils (#1299) * up * Allow json fallback (#1301) * hotfix: import * Fix description error (#1302) * up * push * up (#1303) * up * up * up * up * minor tweaks * up * mypy * add back missing file * up * add way to access entities and relationships * up * up * fix id * up * refine end pts --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: Shreyas Pimpalgaonkar * Add collection (#1317) * add filters (in progress) * todo comment * minor addition * modify command * Permission Fixes around Collection Management (#1316) * Allow super users to update others to super user status * Fix auth on collections endpoints * Better error message * filters (#1318) * Cost estimate (#1319) * up * slightly modify * up * minor fix * docs --------- Co-authored-by: Shreyas Pimpalgaonkar Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> --- .env.example | 2 +- .../integration-test-workflow-debian.yml | 2 +- .github/workflows/py-ci-cd.yml | 2 +- .../r2r-js-sdk-integration-tests.yml | 2 +- docs/api-reference/openapi.json | 2 +- docs/cookbooks/graphrag.mdx | 63 ++- docs/documentation/configuration/postgres.mdx | 2 +- .../deep-dive/providers/database.mdx | 2 +- .../troubleshooting/vector_store_issues.mdx | 2 +- .../installation/full/local-system.mdx | 4 +- .../installation/light/local-system.mdx | 4 +- docs/introduction/whats-new.mdx | 8 +- .../r2rClientIntegrationUser.test.ts | 36 +- js/sdk/package-lock.json | 2 +- js/sdk/src/r2rClient.ts | 29 +- py/README.md | 2 +- py/cli/commands/ingestion.py | 7 +- py/cli/commands/kg.py | 148 ++++- py/cli/commands/server.py | 4 +- py/cli/main.py | 2 +- py/cli/utils/docker_utils.py | 2 +- py/cli/utils/telemetry.py | 2 +- py/compose.full.yaml | 15 +- py/compose.yaml | 16 +- py/core/__init__.py | 3 +- py/core/base/__init__.py | 12 +- py/core/base/abstractions/__init__.py | 15 +- py/core/base/logging/run_logger.py | 6 +- py/core/base/providers/auth.py | 4 +- py/core/base/providers/database.py | 1 - py/core/base/providers/kg.py | 52 +- py/core/base/utils/__init__.py | 17 +- .../scripts/upload_hf_textbooks_ex.py | 4 +- py/core/main/api/auth_router.py | 25 +- .../api/data/ingestion_router_openapi.yml | 4 +- py/core/main/api/ingestion_router.py | 10 +- py/core/main/api/kg_router.py | 109 +++- py/core/main/api/management_router.py | 158 ++++-- py/core/main/api/retrieval_router.py | 6 + .../hatchet/ingestion_workflow.py | 16 +- .../main/orchestration/hatchet/kg_workflow.py | 10 +- .../simple/ingestion_workflow.py | 7 +- py/core/main/services/auth_service.py | 23 +- py/core/main/services/kg_service.py | 108 +++- py/core/main/services/retrieval_service.py | 6 +- py/core/pipes/ingestion/parsing_pipe.py | 4 +- py/core/pipes/kg/community_summary.py | 51 +- py/core/pipes/kg/entity_description.py | 40 +- py/core/pipes/kg/triples_extraction.py | 10 +- py/core/pipes/retrieval/kg_search_pipe.py | 9 +- py/core/providers/database/collection.py | 13 +- py/core/providers/database/document.py | 8 +- py/core/providers/database/postgres.py | 7 +- py/core/providers/database/user.py | 4 +- py/core/providers/database/vecs/collection.py | 4 +- py/core/providers/ingestion/r2r/base.py | 5 +- .../providers/ingestion/unstructured/base.py | 5 +- py/core/providers/kg/postgres.py | 529 +++++++++++++++--- py/core/providers/prompts/r2r_prompts.py | 6 +- py/core/utils/__init__.py | 16 +- py/poetry.lock | 2 +- py/sdk/auth.py | 9 +- py/sdk/kg.py | 83 ++- py/sdk/models.py | 1 + py/shared/abstractions/__init__.py | 18 +- py/shared/abstractions/document.py | 4 +- py/shared/abstractions/graph.py | 23 +- py/shared/abstractions/kg.py | 111 ++++ py/shared/abstractions/search.py | 19 +- py/shared/abstractions/user.py | 4 +- py/shared/utils/__init__.py | 20 +- py/shared/utils/base_utils.py | 84 ++- py/tests/conftest.py | 1 - .../database/relational/test_document_db.py | 10 +- services/unstructured/main.py | 2 +- .../web-app/public/data/r2r-installation.html | 6 +- .../python-backend/main.py | 3 - .../web-app/public/data/r2r-installation.html | 6 +- 78 files changed, 1655 insertions(+), 418 deletions(-) diff --git a/.env.example b/.env.example index 949030fe9..0069b6460 100644 --- a/.env.example +++ b/.env.example @@ -11,4 +11,4 @@ export POSTGRES_PASSWORD=your_password export POSTGRES_HOST=your_host export POSTGRES_PORT=your_port export POSTGRES_DBNAME=your_db -export POSTGRES_PROJECT_NAME=your_project_name +export R2R_PROJECT_NAME=your_project_name diff --git a/.github/workflows/integration-test-workflow-debian.yml b/.github/workflows/integration-test-workflow-debian.yml index 4b37be5cc..f027ee3d6 100644 --- a/.github/workflows/integration-test-workflow-debian.yml +++ b/.github/workflows/integration-test-workflow-debian.yml @@ -21,7 +21,7 @@ jobs: POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }} POSTGRES_HOST: ${{ secrets.POSTGRES_HOST }} POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }} - POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }} + R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }} steps: - uses: actions/checkout@v4 diff --git a/.github/workflows/py-ci-cd.yml b/.github/workflows/py-ci-cd.yml index 5ec11d497..d88023f31 100644 --- a/.github/workflows/py-ci-cd.yml +++ b/.github/workflows/py-ci-cd.yml @@ -51,7 +51,7 @@ jobs: POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }} POSTGRES_USER: ${{ secrets.POSTGRES_USER }} POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }} - POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }} + R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }} steps: - name: Checkout code diff --git a/.github/workflows/r2r-js-sdk-integration-tests.yml b/.github/workflows/r2r-js-sdk-integration-tests.yml index 6ce5241fd..2d5f82b44 100644 --- a/.github/workflows/r2r-js-sdk-integration-tests.yml +++ b/.github/workflows/r2r-js-sdk-integration-tests.yml @@ -38,7 +38,7 @@ jobs: POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }} POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }} POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }} - POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }} + R2R_PROJECT_NAME: ${{ secrets.R2R_PROJECT_NAME }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | r2r serve --docker diff --git a/docs/api-reference/openapi.json b/docs/api-reference/openapi.json index 99adc7967..a44a15e9d 100644 --- a/docs/api-reference/openapi.json +++ b/docs/api-reference/openapi.json @@ -1 +1 @@ -{"openapi":"3.1.0","info":{"title":"R2R Application API","version":"1.0.0"},"paths":{"/v2/ingest_files":{"post":{"summary":"Ingest Files App","description":"Ingest files into the system.\n\nThis endpoint supports multipart/form-data requests, enabling you to ingest files and their associated metadatas into R2R.\n\nA valid user authentication token is required to access this endpoint, as regular users can only ingest files for their own access. More expansive collection permissioning is under development.","operationId":"ingest_files_app_v2_ingest_files_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"multipart/form-data":{"schema":{"$ref":"#/components/schemas/Body_ingest_files_app_v2_ingest_files_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.ingest_files(\n file_paths=[\"pg_essay_1.html\", \"got.txt\"],\n metadatas=[{\"metadata_1\":\"some random metadata\"}, {\"metadata_2\": \"some other random metadata\"}],\n document_ids=None,\n versions=None\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/ingest_files\" \\\n -H \"Content-Type: multipart/form-data\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -F \"file=@pg_essay_1.html;type=text/html\" \\\n -F \"file=@got.txt;type=text/plain\" \\\n -F 'metadatas=[{},{}]' \\\n -F 'document_ids=null' \\\n -F 'versions=null'\n"}]}},"/v2/update_files":{"post":{"summary":"Update Files App","description":"Update existing files in the system.\n\nThis endpoint supports multipart/form-data requests, enabling you to update files and their associated metadatas into R2R.\n\nA valid user authentication token is required to access this endpoint, as regular users can only update their own files. More expansive collection permissioning is under development.","operationId":"update_files_app_v2_update_files_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"multipart/form-data":{"schema":{"$ref":"#/components/schemas/Body_update_files_app_v2_update_files_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.update_files(\n file_paths=[\"pg_essay_1_v2.txt\"],\n document_ids=[\"b4ac4dd6-5f27-596e-a55b-7cf242ca30aa\"]\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/update_files\" \\\n -H \"Content-Type: multipart/form-data\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -F \"file=@pg_essay_1_v2.txt;type=text/plain\" \\\n -F 'document_ids=[\"b4ac4dd6-5f27-596e-a55b-7cf242ca30aa\"]'\n"}]}},"/v2/health":{"get":{"summary":"Health Check","operationId":"health_check_v2_health_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/v2/server_stats":{"get":{"summary":"Server Stats","operationId":"server_stats_v2_server_stats_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/update_prompt":{"post":{"summary":"Update Prompt App","operationId":"update_prompt_app_v2_update_prompt_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_update_prompt_app_v2_update_prompt_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/add_prompt":{"post":{"summary":"Add Prompt App","operationId":"add_prompt_app_v2_add_prompt_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_add_prompt_app_v2_add_prompt_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/get_prompt/{prompt_name}":{"get":{"summary":"Get Prompt App","operationId":"get_prompt_app_v2_get_prompt__prompt_name__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"prompt_name","in":"path","required":true,"schema":{"type":"string","description":"Prompt name","title":"Prompt Name"},"description":"Prompt name"},{"name":"inputs","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"description":"JSON-encoded prompt inputs","title":"Inputs"},"description":"JSON-encoded prompt inputs"},{"name":"prompt_override","in":"query","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"Prompt override","title":"Prompt Override"},"description":"Prompt override"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/get_all_prompts":{"get":{"summary":"Get All Prompts App","operationId":"get_all_prompts_app_v2_get_all_prompts_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/delete_prompt/{prompt_name}":{"delete":{"summary":"Delete Prompt App","operationId":"delete_prompt_app_v2_delete_prompt__prompt_name__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"prompt_name","in":"path","required":true,"schema":{"type":"string","description":"Prompt name","title":"Prompt Name"},"description":"Prompt name"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/analytics":{"get":{"summary":"Get Analytics App","operationId":"get_analytics_app_v2_analytics_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"filter_criteria","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"default":{},"title":"Filter Criteria"}},{"name":"analysis_types","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"default":{},"title":"Analysis Types"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/logs":{"get":{"summary":"Logs App","operationId":"logs_app_v2_logs_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"run_type_filter","in":"query","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"default":"","title":"Run Type Filter"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/app_settings":{"get":{"summary":"App Settings","operationId":"app_settings_v2_app_settings_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/score_completion":{"post":{"summary":"Score Completion","operationId":"score_completion_v2_score_completion_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_score_completion_v2_score_completion_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/users_overview":{"get":{"summary":"Users Overview App","operationId":"users_overview_app_v2_users_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_ids","in":"query","required":false,"schema":{"anyOf":[{"type":"array","items":{"type":"string"}},{"type":"null"}],"default":[],"title":"User Ids"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/delete":{"delete":{"summary":"Delete App","operationId":"delete_app_v2_delete_delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"filters","in":"query","required":true,"schema":{"type":"string","description":"JSON-encoded filters","title":"Filters"},"description":"JSON-encoded filters"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"204":{"description":"Successful Response"},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/download_file/{document_id}":{"get":{"summary":"Download File App","description":"Download a file by its document ID as a stream.","operationId":"download_file_app_v2_download_file__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","description":"Document ID","title":"Document Id"},"description":"Document ID"}],"responses":{"200":{"description":"Successful Response"},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/documents_overview":{"get":{"summary":"Documents Overview App","operationId":"documents_overview_app_v2_documents_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_ids","in":"query","required":false,"schema":{"type":"array","items":{"type":"string"},"default":[],"title":"Document Ids"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/document_chunks/{document_id}":{"get":{"summary":"Document Chunks App","operationId":"document_chunks_app_v2_document_chunks__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","title":"Document Id"}},{"name":"offset","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/inspect_knowledge_graph":{"get":{"summary":"Inspect Knowledge Graph","operationId":"inspect_knowledge_graph_v2_inspect_knowledge_graph_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"offset","in":"query","required":false,"schema":{"type":"integer","default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","default":100,"title":"Limit"}},{"name":"print_descriptions","in":"query","required":false,"schema":{"type":"boolean","default":false,"title":"Print Descriptions"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/collections_overview":{"get":{"summary":"Collections Overview App","operationId":"collections_overview_app_v2_collections_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_ids","in":"query","required":false,"schema":{"anyOf":[{"type":"array","items":{"type":"string"}},{"type":"null"}],"title":"Collection Ids"}},{"name":"offset","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","maximum":1000,"minimum":1},{"type":"null"}],"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/create_collection":{"post":{"summary":"Create Collection App","operationId":"create_collection_app_v2_create_collection_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_create_collection_app_v2_create_collection_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/get_collection/{collection_id}":{"get":{"summary":"Get Collection App","operationId":"get_collection_app_v2_get_collection__collection_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/update_collection":{"put":{"summary":"Update Collection App","operationId":"update_collection_app_v2_update_collection_put","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_update_collection_app_v2_update_collection_put"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/delete_collection/{collection_id}":{"delete":{"summary":"Delete Collection App","operationId":"delete_collection_app_v2_delete_collection__collection_id__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/list_collections":{"get":{"summary":"List Collections App","operationId":"list_collections_app_v2_list_collections_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/add_user_to_collection":{"post":{"summary":"Add User To Collection App","operationId":"add_user_to_collection_app_v2_add_user_to_collection_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_add_user_to_collection_app_v2_add_user_to_collection_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/remove_user_from_collection":{"post":{"summary":"Remove User From Collection App","operationId":"remove_user_from_collection_app_v2_remove_user_from_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_remove_user_from_collection_app_v2_remove_user_from_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/get_users_in_collection/{collection_id}":{"get":{"summary":"Get Users In Collection App","operationId":"get_users_in_collection_app_v2_get_users_in_collection__collection_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"description":"Pagination offset","default":0,"title":"Offset"},"description":"Pagination offset"},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"description":"Pagination limit","default":100,"title":"Limit"},"description":"Pagination limit"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/user_collections/{user_id}":{"get":{"summary":"Get Collections For User App","operationId":"get_collections_for_user_app_v2_user_collections__user_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_id","in":"path","required":true,"schema":{"type":"string","description":"User ID","title":"User Id"},"description":"User ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"description":"Pagination offset","default":0,"title":"Offset"},"description":"Pagination offset"},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"description":"Pagination limit","default":100,"title":"Limit"},"description":"Pagination limit"},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/assign_document_to_collection":{"post":{"summary":"Assign Document To Collection App","operationId":"assign_document_to_collection_app_v2_assign_document_to_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_assign_document_to_collection_app_v2_assign_document_to_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/remove_document_from_collection":{"post":{"summary":"Remove Document From Collection App","operationId":"remove_document_from_collection_app_v2_remove_document_from_collection_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_remove_document_from_collection_app_v2_remove_document_from_collection_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/document_collections/{document_id}":{"get":{"summary":"Document Collections App","operationId":"document_collections_app_v2_document_collections__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","description":"Document ID","title":"Document Id"},"description":"Document ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/collection/{collection_id}/documents":{"get":{"summary":"Documents In Collection App","operationId":"documents_in_collection_app_v2_collection__collection_id__documents_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}},{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/search":{"post":{"summary":"Search App","description":"Perform a search query on the vector database and knowledge graph.\n\nThis endpoint allows for complex filtering of search results using PostgreSQL-based queries.\nFilters can be applied to various fields such as document_id, and internal metadata values.\n\n\nAllowed operators include `eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `like`, `ilike`, `in`, and `nin`.","operationId":"search_app_v2_search_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_search_app_v2_search_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.search(\n query=\"Who is Aristotle?\",\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n kg_search_settings={\n \"use_kg_search\": True, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7,\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n }\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/search\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"query\": \"Who is Aristotle?\",\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": true\n },\n \"kg_search_settings\": {\n \"use_kg_search\": true, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n }\n }'\n"}]}},"/v2/rag":{"post":{"summary":"Rag App","description":"Execute a RAG (Retrieval-Augmented Generation) query.\n\nThis endpoint combines search results with language model generation.\nIt supports the same filtering capabilities as the search endpoint,\nallowing for precise control over the retrieved context.\n\nThe generation process can be customized using the rag_generation_config parameter.","operationId":"rag_app_v2_rag_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_rag_app_v2_rag_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.rag(\n query=\"Who is Aristotle?\",\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n kg_search_settings={\n \"use_kg_search\": True,\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7,\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n },\n rag_generation_config={\n \"stream\": False,\n \"temperature\": 0.7,\n \"max_tokens\": 150\n }\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/rag\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"query\": \"Who is Aristotle?\",\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n \"kg_search_settings\": {\n \"use_kg_search\": true, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n },\n \"rag_generation_config\": {\n \"stream\": false,\n \"temperature\": 0.7,\n \"max_tokens\": 150\n }\n }'\n"}]}},"/v2/agent":{"post":{"summary":"Agent App","description":"Implement an agent-based interaction for complex query processing.\n\nThis endpoint supports multi-turn conversations and can handle complex queries\nby breaking them down into sub-tasks. It uses the same filtering capabilities\nas the search and RAG endpoints for retrieving relevant information.\n\nThe agent's behavior can be customized using the rag_generation_config and\ntask_prompt_override parameters.","operationId":"agent_app_v2_agent_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_agent_app_v2_agent_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.agent(\n messages=[\n {\"role\": \"user\", \"content\": \"Who is the greatest philospher of all time?\"},\n {\"role\": \"assistant\", \"content\": \"Aristotle is widely considered the greatest philospher of all time.\"},\n {\"role\": \"user\", \"content\": \"Can you tell me more about him?\"}\n ],\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"5e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n rag_generation_config={\n \"stream\": False,\n \"temperature\": 0.7,\n \"max_tokens\": 200\n },\n include_title_if_available=True\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/agent\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"messages\": [\n {\"role\": \"user\", \"content\": \"Who is the greatest philospher of all time?\"},\n {\"role\": \"assistant\", \"content\": \"Aristotle is widely considered the greatest philospher of all time.\"},\n {\"role\": \"user\", \"content\": \"Can you tell me more about him?\"}\n ],\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"5e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": true\n },\n \"kg_search_settings\": {\n \"use_kg_search\": false # to enable this, please read the graphrag cookbook\n },\n \"rag_generation_config\": {\n \"stream\": false,\n \"temperature\": 0.7,\n \"max_tokens\": 200\n },\n \"include_title_if_available\": true\n }'\n"}]}},"/v2/register":{"post":{"summary":"Register App","description":"Register a new user with the given email and password.","operationId":"register_app_v2_register_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_register_app_v2_register_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/verify_email":{"post":{"summary":"Verify Email App","description":"Verify a user's email address.\n\nThis endpoint is used to confirm a user's email address using the verification code\nsent to their email after registration.","operationId":"verify_email_app_v2_verify_email_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_verify_email_app_v2_verify_email_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/login":{"post":{"summary":"Login App","description":"Authenticate a user and provide access tokens.\n\nThis endpoint authenticates a user using their email (username) and password,\nand returns access and refresh tokens upon successful authentication.","operationId":"login_app_v2_login_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_login_app_v2_login_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_TokenResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/logout":{"post":{"summary":"Logout App","description":"Log out the current user.\n\nThis endpoint invalidates the user's current access token, effectively logging them out.","operationId":"logout_app_v2_logout_post","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}}},"security":[{"OAuth2PasswordBearer":[]},{"HTTPBearer":[]}]}},"/v2/user":{"get":{"summary":"Get User App","description":"Get the current user's profile information.\n\nThis endpoint returns the profile information of the currently authenticated user.","operationId":"get_user_app_v2_user_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}}},"security":[{"HTTPBearer":[]}]},"put":{"summary":"Put User App","description":"Update the current user's profile information.\n\nThis endpoint allows the authenticated user to update their profile information.","operationId":"put_user_app_v2_user_put","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_put_user_app_v2_user_put"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/refresh_access_token":{"post":{"summary":"Refresh Access Token App","description":"Refresh the access token using a refresh token.\n\nThis endpoint allows users to obtain a new access token using their refresh token.","operationId":"refresh_access_token_app_v2_refresh_access_token_post","requestBody":{"content":{"application/json":{"schema":{"type":"string","title":"Refresh Token","description":"Refresh token"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_TokenResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/change_password":{"post":{"summary":"Change Password App","description":"Change the authenticated user's password.\n\nThis endpoint allows users to change their password by providing their current password\nand a new password.","operationId":"change_password_app_v2_change_password_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_change_password_app_v2_change_password_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/request_password_reset":{"post":{"summary":"Request Password Reset App","description":"Request a password reset for a user.\n\nThis endpoint initiates the password reset process by sending a reset link\nto the specified email address.","operationId":"request_password_reset_app_v2_request_password_reset_post","requestBody":{"content":{"application/json":{"schema":{"type":"string","format":"email","title":"Email","description":"User's email address"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/reset_password":{"post":{"summary":"Reset Password App","operationId":"reset_password_app_v2_reset_password_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_reset_password_app_v2_reset_password_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/user/{user_id}":{"delete":{"summary":"Delete User App","description":"Delete a user account.\n\nThis endpoint allows users to delete their own account or, for superusers,\nto delete any user account.","operationId":"delete_user_app_v2_user__user_id__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_id","in":"path","required":true,"schema":{"type":"string","description":"ID of the user to delete","title":"User Id"},"description":"ID of the user to delete"}],"requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_delete_user_app_v2_user__user_id__delete"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/create_graph":{"post":{"summary":"Create Graph","description":"Creating a graph on your documents. This endpoint takes input a list of document ids and KGCreationSettings. If document IDs are not provided, the graph will be created on all documents in the system.\n\nThis step extracts the relevant entities and relationships from the documents and creates a graph based on the extracted information.\n\nIn order to do GraphRAG, you will need to run the enrich_graph endpoint.","operationId":"create_graph_v2_create_graph_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_create_graph_v2_create_graph_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/enrich_graph":{"post":{"summary":"Enrich Graph","description":"This endpoint enriches the graph with additional information. It creates communities of nodes based on their similarity and adds embeddings to the graph. This step is necessary for GraphRAG to work.","operationId":"enrich_graph_v2_enrich_graph_post","security":[{"HTTPBearer":[]}],"parameters":[{"name":"response_model","in":"query","required":false,"schema":{"title":"Response Model"}}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_enrich_graph_v2_enrich_graph_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/openapi_spec":{"get":{"summary":"Openapi Spec","operationId":"openapi_spec_v2_openapi_spec_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}}},"components":{"schemas":{"Body_add_prompt_app_v2_add_prompt_post":{"properties":{"name":{"type":"string","title":"Name","description":"Prompt name"},"template":{"type":"string","title":"Template","description":"Prompt template"},"input_types":{"additionalProperties":{"type":"string"},"type":"object","title":"Input Types","description":"Input types","default":{}}},"type":"object","required":["name","template"],"title":"Body_add_prompt_app_v2_add_prompt_post"},"Body_add_user_to_collection_app_v2_add_user_to_collection_post":{"properties":{"user_id":{"type":"string","title":"User Id","description":"User ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["user_id","collection_id"],"title":"Body_add_user_to_collection_app_v2_add_user_to_collection_post"},"Body_agent_app_v2_agent_post":{"properties":{"messages":{"items":{"$ref":"#/components/schemas/Message"},"type":"array","title":"Messages","description":"List of message objects"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"},"rag_generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"RAG generation configuration"},"task_prompt_override":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Prompt Override","description":"Task prompt override"},"include_title_if_available":{"type":"boolean","title":"Include Title If Available","description":"Includes document title in chunk response, if available.","default":true}},"type":"object","required":["messages"],"title":"Body_agent_app_v2_agent_post"},"Body_assign_document_to_collection_app_v2_assign_document_to_collection_post":{"properties":{"document_id":{"type":"string","title":"Document Id","description":"Document ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["document_id","collection_id"],"title":"Body_assign_document_to_collection_app_v2_assign_document_to_collection_post"},"Body_change_password_app_v2_change_password_post":{"properties":{"current_password":{"type":"string","title":"Current Password","description":"Current password"},"new_password":{"type":"string","title":"New Password","description":"New password"}},"type":"object","required":["current_password","new_password"],"title":"Body_change_password_app_v2_change_password_post"},"Body_create_collection_app_v2_create_collection_post":{"properties":{"name":{"type":"string","title":"Name","description":"Collection name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description","description":"Collection description","default":""}},"type":"object","required":["name"],"title":"Body_create_collection_app_v2_create_collection_post"},"Body_create_graph_v2_create_graph_post":{"properties":{"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID to create graph for."},"kg_creation_settings":{"anyOf":[{"type":"object"},{"$ref":"#/components/schemas/KGCreationSettings"},{"type":"null"}],"title":"Kg Creation Settings","description":"Settings for the graph creation process.","default":"{}"}},"type":"object","required":["collection_id"],"title":"Body_create_graph_v2_create_graph_post"},"Body_delete_user_app_v2_user__user_id__delete":{"properties":{"password":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Password","description":"User's current password"},"delete_vector_data":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Delete Vector Data","description":"Whether to delete the user's vector data","default":false}},"type":"object","title":"Body_delete_user_app_v2_user__user_id__delete"},"Body_enrich_graph_v2_enrich_graph_post":{"properties":{"collection_id":{"type":"string","title":"Collection Id","description":"Collection name to enrich graph for."},"kg_enrichment_settings":{"anyOf":[{"type":"object"},{"$ref":"#/components/schemas/KGEnrichmentSettings"},{"type":"null"}],"title":"Kg Enrichment Settings","description":"Settings for the graph enrichment process.","default":"{}"}},"type":"object","required":["collection_id"],"title":"Body_enrich_graph_v2_enrich_graph_post"},"Body_ingest_files_app_v2_ingest_files_post":{"properties":{"files":{"items":{"type":"string","format":"binary"},"type":"array","title":"Files","description":"The list of paths of input files to ingest into the system."},"document_ids":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"string","format":"uuid"},"type":"array"}},{"type":"null"}],"title":"Document Ids","description":"An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method."},"metadatas":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"object"},"type":"array"}},{"type":"null"}],"title":"Metadatas","description":"An optional list of JSON metadata to affix to each file"},"ingestion_config":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"title":"Ingestion Config","description":"An optional dictionary to override the default chunking configuration for the ingestion process. If not provided, the system will use the default server-side chunking configuration."}},"type":"object","required":["files"],"title":"Body_ingest_files_app_v2_ingest_files_post"},"Body_login_app_v2_login_post":{"properties":{"grant_type":{"anyOf":[{"type":"string","pattern":"password"},{"type":"null"}],"title":"Grant Type"},"username":{"type":"string","title":"Username"},"password":{"type":"string","title":"Password"},"scope":{"type":"string","title":"Scope","default":""},"client_id":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Client Id"},"client_secret":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Client Secret"}},"type":"object","required":["username","password"],"title":"Body_login_app_v2_login_post"},"Body_put_user_app_v2_user_put":{"properties":{"email":{"anyOf":[{"type":"string","format":"email"},{"type":"null"}],"title":"Email","description":"Updated email address"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name","description":"Updated user name"},"bio":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Bio","description":"Updated user bio"},"profile_picture":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Profile Picture","description":"Updated profile picture URL"}},"type":"object","title":"Body_put_user_app_v2_user_put"},"Body_rag_app_v2_rag_post":{"properties":{"query":{"type":"string","title":"Query","description":"RAG query"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"},"rag_generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"RAG generation configuration"},"task_prompt_override":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Prompt Override","description":"Task prompt override"},"include_title_if_available":{"type":"boolean","title":"Include Title If Available","description":"Includes document title in chunk response, if available.","default":false}},"type":"object","required":["query"],"title":"Body_rag_app_v2_rag_post"},"Body_register_app_v2_register_post":{"properties":{"email":{"type":"string","format":"email","title":"Email","description":"User's email address"},"password":{"type":"string","title":"Password","description":"User's password"}},"type":"object","required":["email","password"],"title":"Body_register_app_v2_register_post"},"Body_remove_document_from_collection_app_v2_remove_document_from_collection_post":{"properties":{"document_id":{"type":"string","title":"Document Id","description":"Document ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["document_id","collection_id"],"title":"Body_remove_document_from_collection_app_v2_remove_document_from_collection_post"},"Body_remove_user_from_collection_app_v2_remove_user_from_collection_post":{"properties":{"user_id":{"type":"string","title":"User Id","description":"User ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["user_id","collection_id"],"title":"Body_remove_user_from_collection_app_v2_remove_user_from_collection_post"},"Body_reset_password_app_v2_reset_password_post":{"properties":{"reset_token":{"type":"string","title":"Reset Token","description":"Password reset token"},"new_password":{"type":"string","title":"New Password","description":"New password"}},"type":"object","required":["reset_token","new_password"],"title":"Body_reset_password_app_v2_reset_password_post"},"Body_score_completion_v2_score_completion_post":{"properties":{"message_id":{"type":"string","title":"Message Id","description":"Message ID"},"score":{"type":"number","title":"Score","description":"Completion score"}},"type":"object","required":["message_id","score"],"title":"Body_score_completion_v2_score_completion_post"},"Body_search_app_v2_search_post":{"properties":{"query":{"type":"string","title":"Query","description":"Search query"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"}},"type":"object","required":["query"],"title":"Body_search_app_v2_search_post"},"Body_update_collection_app_v2_update_collection_put":{"properties":{"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name","description":"Updated collection name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description","description":"Updated collection description"}},"type":"object","required":["collection_id"],"title":"Body_update_collection_app_v2_update_collection_put"},"Body_update_files_app_v2_update_files_post":{"properties":{"files":{"items":{"type":"string","format":"binary"},"type":"array","title":"Files","description":"The list of paths of input files to update in the system."},"document_ids":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"string","format":"uuid"},"type":"array"}},{"type":"null"}],"title":"Document Ids","description":"An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method."},"metadatas":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"object"},"type":"array"}},{"type":"null"}],"title":"Metadatas","description":"An optional list of JSON metadata to affix to each file"},"ingestion_config":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"title":"Ingestion Config","description":"An optional dictionary to override the default chunking configuration for the ingestion process. If not provided, the system will use the default server-side chunking configuration."}},"type":"object","required":["files"],"title":"Body_update_files_app_v2_update_files_post"},"Body_update_prompt_app_v2_update_prompt_post":{"properties":{"name":{"type":"string","title":"Name","description":"Prompt name"},"template":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Template","description":"Prompt template"},"input_types":{"anyOf":[{"additionalProperties":{"type":"string"},"type":"object"},{"type":"null"}],"title":"Input Types","description":"Input types","default":{}}},"type":"object","required":["name"],"title":"Body_update_prompt_app_v2_update_prompt_post"},"Body_verify_email_app_v2_verify_email_post":{"properties":{"email":{"type":"string","format":"email","title":"Email","description":"User's email address"},"verification_code":{"type":"string","title":"Verification Code","description":"Email verification code"}},"type":"object","required":["email","verification_code"],"title":"Body_verify_email_app_v2_verify_email_post"},"GenerationConfig":{"properties":{"model":{"type":"string","title":"Model"},"temperature":{"type":"number","title":"Temperature"},"top_p":{"type":"number","title":"Top P"},"max_tokens_to_sample":{"type":"integer","title":"Max Tokens To Sample"},"stream":{"type":"boolean","title":"Stream"},"functions":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Functions"},"tools":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Tools"},"add_generation_kwargs":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Add Generation Kwargs"},"api_base":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Api Base"}},"type":"object","title":"GenerationConfig","max_tokens_to_sample":1024,"model":"openai/gpt-4o","stream":false,"temperature":0.1,"top_p":1.0},"GenericMessageResponse":{"properties":{"message":{"type":"string","title":"Message"}},"type":"object","required":["message"],"title":"GenericMessageResponse"},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"HybridSearchSettings":{"properties":{"full_text_weight":{"type":"number","title":"Full Text Weight","description":"Weight to apply to full text search","default":1.0},"semantic_weight":{"type":"number","title":"Semantic Weight","description":"Weight to apply to semantic search","default":5.0},"full_text_limit":{"type":"integer","title":"Full Text Limit","description":"Maximum number of results to return from full text search","default":200},"rrf_k":{"type":"integer","title":"Rrf K","description":"K-value for RRF (Rank Reciprocal Fusion)","default":50}},"type":"object","title":"HybridSearchSettings"},"IndexMeasure":{"type":"string","enum":["cosine_distance","l2_distance","max_inner_product"],"title":"IndexMeasure","description":"An enum representing the types of distance measures available for indexing.\n\nAttributes:\n cosine_distance (str): The cosine distance measure for indexing.\n l2_distance (str): The Euclidean (L2) distance measure for indexing.\n max_inner_product (str): The maximum inner product measure for indexing."},"KGCreationSettings":{"properties":{"kg_triples_extraction_prompt":{"type":"string","title":"Kg Extraction Prompt","description":"The prompt to use for knowledge graph extraction.","default":"graphrag_triples_extraction_few_shot"},"force_kg_creation":{"type":"boolean","title":"Force Kg Creation","description":"Force run the KG creation step even if the graph is already created.","default":false},"entity_types":{"items":{"type":"string"},"type":"array","title":"Entity Types","description":"The types of entities to extract.","default":[]},"relation_types":{"items":{"type":"string"},"type":"array","title":"Relation Types","description":"The types of relations to extract.","default":[]},"extraction_merge_count":{"type":"integer","title":"Extraction Merge Count","description":"The number of extractions to merge into a single KG extraction.","default":4},"max_knowledge_triples":{"type":"integer","title":"Max Knowledge Triples","description":"The maximum number of knowledge triples to extract from each chunk.","default":100},"max_description_input_length":{"type":"integer","title":"Max Description Input Length","description":"The maximum length of the description for a node in the graph.","default":65536},"generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"Configuration for text generation during graph enrichment."}},"type":"object","title":"KGCreationSettings","description":"Settings for knowledge graph creation."},"KGEnrichmentSettings":{"properties":{"skip_clustering":{"type":"boolean","title":"Skip Clustering","description":"Whether to skip leiden clustering on the graph or not.","default":false},"force_enrichment":{"type":"boolean","title":"Force Enrichment","description":"Force run the enrichment step even if graph creation is still in progress for some documents.","default":false},"community_reports_prompt":{"type":"string","title":"Community Reports Prompt","description":"The prompt to use for knowledge graph enrichment.","default":"graphrag_community_reports_prompt"},"max_summary_input_length":{"type":"integer","title":"Max Summary Input Length","description":"The maximum length of the summary for a community.","default":65536},"generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"Configuration for text generation during graph enrichment."},"leiden_params":{"type":"object","title":"Leiden Params","description":"Parameters for the Leiden algorithm."}},"type":"object","title":"KGEnrichmentSettings","description":"Settings for knowledge graph enrichment."},"KGSearchSettings":{"properties":{"graphrag_map_system_prompt":{"type":"string","title":"Graphrag Map System Prompt","description":"The system prompt for the graphrag map prompt.","default":"graphrag_map_system_prompt"},"graphrag_reduce_system_prompt":{"type":"string","title":"Graphrag Reduce System Prompt","description":"The system prompt for the graphrag reduce prompt.","default":"graphrag_reduce_system_prompt"},"use_kg_search":{"type":"boolean","title":"Use Kg Search","description":"Whether to use KG search","default":false},"kg_search_type":{"type":"string","title":"Kg Search Type","description":"KG search type","default":"local"},"kg_search_level":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Kg Search Level","description":"KG search level"},"generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"Configuration for text generation during graph search."},"max_community_description_length":{"type":"integer","title":"Max Community Description Length","default":65536},"max_llm_queries_for_global_search":{"type":"integer","title":"Max Llm Queries For Global Search","default":250},"local_search_limits":{"additionalProperties":{"type":"integer"},"type":"object","title":"Local Search Limits","default":{"__Entity__":20,"__Relationship__":20,"__Community__":20}}},"type":"object","title":"KGSearchSettings","generation_config":{"max_tokens_to_sample":1024,"model":"openai/gpt-4o","stream":false,"temperature":0.1,"top_p":1.0},"kg_search_level":"0","kg_search_type":"global","local_search_limits":{"__Community__":20,"__Entity__":20,"__Relationship__":20},"max_community_description_length":65536,"max_llm_queries_for_global_search":250,"use_kg_search":true},"Message":{"properties":{"role":{"anyOf":[{"$ref":"#/components/schemas/MessageType"},{"type":"string"}],"title":"Role"},"content":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Content"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name"},"function_call":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Function Call"},"tool_calls":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Tool Calls"}},"type":"object","required":["role"],"title":"Message","content":"This is a test message.","role":"user"},"MessageType":{"type":"string","enum":["system","user","assistant","function","tool"],"title":"MessageType"},"ResultsWrapper_GenericMessageResponse_":{"properties":{"results":{"$ref":"#/components/schemas/GenericMessageResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[GenericMessageResponse]"},"ResultsWrapper_TokenResponse_":{"properties":{"results":{"$ref":"#/components/schemas/TokenResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[TokenResponse]"},"ResultsWrapper_UserResponse_":{"properties":{"results":{"$ref":"#/components/schemas/UserResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[UserResponse]"},"Token":{"properties":{"token":{"type":"string","title":"Token"},"token_type":{"type":"string","title":"Token Type"}},"type":"object","required":["token","token_type"],"title":"Token"},"TokenResponse":{"properties":{"access_token":{"$ref":"#/components/schemas/Token"},"refresh_token":{"$ref":"#/components/schemas/Token"}},"type":"object","required":["access_token","refresh_token"],"title":"TokenResponse"},"UserResponse":{"properties":{"id":{"type":"string","format":"uuid","title":"Id"},"email":{"type":"string","title":"Email"},"is_active":{"type":"boolean","title":"Is Active","default":true},"is_superuser":{"type":"boolean","title":"Is Superuser","default":false},"created_at":{"type":"string","format":"date-time","title":"Created At","default":"2024-09-30T22:13:38.156070"},"updated_at":{"type":"string","format":"date-time","title":"Updated At","default":"2024-09-30T22:13:38.156077"},"is_verified":{"type":"boolean","title":"Is Verified","default":false},"collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Collection Ids","default":[]},"hashed_password":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Hashed Password"},"verification_code_expiry":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Verification Code Expiry"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name"},"bio":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Bio"},"profile_picture":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Profile Picture"}},"type":"object","required":["id","email"],"title":"UserResponse"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"},"VectorSearchSettings":{"properties":{"use_vector_search":{"type":"boolean","title":"Use Vector Search","description":"Whether to use vector search","default":true},"use_hybrid_search":{"type":"boolean","title":"Use Hybrid Search","description":"Whether to perform a hybrid search (combining vector and keyword search)","default":false},"filters":{"type":"object","title":"Filters","description":"Filters to apply to the vector search"},"search_limit":{"type":"integer","maximum":1000.0,"minimum":1.0,"title":"Search Limit","description":"Maximum number of results to return","default":10},"offset":{"type":"integer","minimum":0.0,"title":"Offset","description":"Offset to paginate search results","default":0},"selected_collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Selected Collection Ids","description":"Collection IDs to search for"},"index_measure":{"$ref":"#/components/schemas/IndexMeasure","description":"The distance measure to use for indexing","default":"cosine_distance"},"include_values":{"type":"boolean","title":"Include Values","description":"Whether to include search score values in the search results","default":true},"include_metadatas":{"type":"boolean","title":"Include Metadatas","description":"Whether to include element metadata in the search results","default":true},"probes":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Probes","description":"Number of ivfflat index lists to query. Higher increases accuracy but decreases speed.","default":10},"ef_search":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Ef Search","description":"Size of the dynamic candidate list for HNSW index search. Higher increases accuracy but decreases speed.","default":40},"hybrid_search_settings":{"anyOf":[{"$ref":"#/components/schemas/HybridSearchSettings"},{"type":"null"}],"description":"Settings for hybrid search","default":{"full_text_weight":1.0,"semantic_weight":5.0,"full_text_limit":200,"rrf_k":50}},"search_strategy":{"type":"string","title":"Search Strategy","description":"Search strategy to use (e.g., 'default', 'query_fusion', 'hyde')","default":"vanilla"}},"type":"object","title":"VectorSearchSettings","ef_search":40,"filters":{"category":"technology"},"hybrid_search_settings":{"full_text_limit":200,"full_text_weight":1.0,"rrf_k":50,"semantic_weight":5.0},"include_metadata":true,"index_measure":"cosine_distance","limit":20,"offset":0,"probes":10,"selected_collection_ids":["2acb499e-8428-543b-bd85-0d9098718220","3e157b3a-8469-51db-90d9-52e7d896b49b"],"use_hybrid_search":true,"use_vector_search":true}},"securitySchemes":{"HTTPBearer":{"type":"http","scheme":"bearer"},"OAuth2PasswordBearer":{"type":"oauth2","flows":{"password":{"scopes":{},"tokenUrl":"token"}}}}}} + -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -F \"file=@pg_essay_1_v2.txt;type=text/plain\" \\\n -F 'document_ids=[\"b4ac4dd6-5f27-596e-a55b-7cf242ca30aa\"]'\n"}]}},"/v2/health":{"get":{"summary":"Health Check","operationId":"health_check_v2_health_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}},"/v2/server_stats":{"get":{"summary":"Server Stats","operationId":"server_stats_v2_server_stats_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_ServerStats_"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/update_prompt":{"post":{"summary":"Update Prompt App","operationId":"update_prompt_app_v2_update_prompt_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_update_prompt_app_v2_update_prompt_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UpdatePromptResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/add_prompt":{"post":{"summary":"Add Prompt App","operationId":"add_prompt_app_v2_add_prompt_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_add_prompt_app_v2_add_prompt_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UpdatePromptResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/get_prompt/{prompt_name}":{"get":{"summary":"Get Prompt App","operationId":"get_prompt_app_v2_get_prompt__prompt_name__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"prompt_name","in":"path","required":true,"schema":{"type":"string","description":"Prompt name","title":"Prompt Name"},"description":"Prompt name"},{"name":"inputs","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"description":"JSON-encoded prompt inputs","title":"Inputs"},"description":"JSON-encoded prompt inputs"},{"name":"prompt_override","in":"query","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"Prompt override","title":"Prompt Override"},"description":"Prompt override"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UpdatePromptResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/get_all_prompts":{"get":{"summary":"Get All Prompts App","operationId":"get_all_prompts_app_v2_get_all_prompts_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_AllPromptsResponse_"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/delete_prompt/{prompt_name}":{"delete":{"summary":"Delete Prompt App","operationId":"delete_prompt_app_v2_delete_prompt__prompt_name__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"prompt_name","in":"path","required":true,"schema":{"type":"string","description":"Prompt name","title":"Prompt Name"},"description":"Prompt name"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/analytics":{"get":{"summary":"Get Analytics App","operationId":"get_analytics_app_v2_analytics_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"filter_criteria","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"default":{},"title":"Filter Criteria"}},{"name":"analysis_types","in":"query","required":false,"schema":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"type":"object"}},{"type":"null"}],"default":{},"title":"Analysis Types"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_AnalyticsResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/logs":{"get":{"summary":"Logs App","operationId":"logs_app_v2_logs_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"run_type_filter","in":"query","required":false,"schema":{"anyOf":[{"type":"string"},{"type":"null"}],"default":"","title":"Run Type Filter"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_LogResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/app_settings":{"get":{"summary":"App Settings","operationId":"app_settings_v2_app_settings_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_AppSettingsResponse_"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/score_completion":{"post":{"summary":"Score Completion","operationId":"score_completion_v2_score_completion_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_score_completion_v2_score_completion_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_ScoreCompletionResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/users_overview":{"get":{"summary":"Users Overview App","operationId":"users_overview_app_v2_users_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_ids","in":"query","required":false,"schema":{"anyOf":[{"type":"array","items":{"type":"string"}},{"type":"null"}],"default":[],"title":"User Ids"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_UserOverviewResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/delete":{"delete":{"summary":"Delete App","operationId":"delete_app_v2_delete_delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"filters","in":"query","required":true,"schema":{"type":"string","description":"JSON-encoded filters","title":"Filters"},"description":"JSON-encoded filters"}],"responses":{"204":{"description":"Successful Response"},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/download_file/{document_id}":{"get":{"summary":"Download File App","description":"Download a file by its document ID as a stream.","operationId":"download_file_app_v2_download_file__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","description":"Document ID","title":"Document Id"},"description":"Document ID"}],"responses":{"200":{"description":"Successful Response"},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/documents_overview":{"get":{"summary":"Documents Overview App","operationId":"documents_overview_app_v2_documents_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_ids","in":"query","required":false,"schema":{"type":"array","items":{"type":"string"},"default":[],"title":"Document Ids"}},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_DocumentOverviewResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/document_chunks/{document_id}":{"get":{"summary":"Document Chunks App","operationId":"document_chunks_app_v2_document_chunks__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","title":"Document Id"}},{"name":"offset","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_DocumentChunkResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/inspect_knowledge_graph":{"get":{"summary":"Inspect Knowledge Graph","operationId":"inspect_knowledge_graph_v2_inspect_knowledge_graph_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"offset","in":"query","required":false,"schema":{"type":"integer","default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","default":100,"title":"Limit"}},{"name":"print_descriptions","in":"query","required":false,"schema":{"type":"boolean","default":false,"title":"Print Descriptions"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_str_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/collections_overview":{"get":{"summary":"Collections Overview App","operationId":"collections_overview_app_v2_collections_overview_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_ids","in":"query","required":false,"schema":{"anyOf":[{"type":"array","items":{"type":"string"}},{"type":"null"}],"title":"Collection Ids"}},{"name":"offset","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","minimum":0},{"type":"null"}],"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"anyOf":[{"type":"integer","maximum":1000,"minimum":1},{"type":"null"}],"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_GroupOverviewResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/create_collection":{"post":{"summary":"Create Collection App","operationId":"create_collection_app_v2_create_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_create_collection_app_v2_create_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GroupResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/get_collection/{collection_id}":{"get":{"summary":"Get Collection App","operationId":"get_collection_app_v2_get_collection__collection_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GroupResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/update_collection":{"put":{"summary":"Update Collection App","operationId":"update_collection_app_v2_update_collection_put","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_update_collection_app_v2_update_collection_put"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GroupResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/delete_collection/{collection_id}":{"delete":{"summary":"Delete Collection App","operationId":"delete_collection_app_v2_delete_collection__collection_id__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/list_collections":{"get":{"summary":"List Collections App","operationId":"list_collections_app_v2_list_collections_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_GroupResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/add_user_to_collection":{"post":{"summary":"Add User To Collection App","operationId":"add_user_to_collection_app_v2_add_user_to_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_add_user_to_collection_app_v2_add_user_to_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_AddUserResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/remove_user_from_collection":{"post":{"summary":"Remove User From Collection App","operationId":"remove_user_from_collection_app_v2_remove_user_from_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_remove_user_from_collection_app_v2_remove_user_from_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/get_users_in_collection/{collection_id}":{"get":{"summary":"Get Users In Collection App","operationId":"get_users_in_collection_app_v2_get_users_in_collection__collection_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"description":"Pagination offset","default":0,"title":"Offset"},"description":"Pagination offset"},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"description":"Pagination limit","default":100,"title":"Limit"},"description":"Pagination limit"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/user_collections/{user_id}":{"get":{"summary":"Get Collections For User App","operationId":"get_collections_for_user_app_v2_user_collections__user_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_id","in":"path","required":true,"schema":{"type":"string","description":"User ID","title":"User Id"},"description":"User ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"description":"Pagination offset","default":0,"title":"Offset"},"description":"Pagination offset"},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"description":"Pagination limit","default":100,"title":"Limit"},"description":"Pagination limit"}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/assign_document_to_collection":{"post":{"summary":"Assign Document To Collection App","operationId":"assign_document_to_collection_app_v2_assign_document_to_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_assign_document_to_collection_app_v2_assign_document_to_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/remove_document_from_collection":{"post":{"summary":"Remove Document From Collection App","operationId":"remove_document_from_collection_app_v2_remove_document_from_collection_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_remove_document_from_collection_app_v2_remove_document_from_collection_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/document_collections/{document_id}":{"get":{"summary":"Document Collections App","operationId":"document_collections_app_v2_document_collections__document_id__get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"document_id","in":"path","required":true,"schema":{"type":"string","description":"Document ID","title":"Document Id"},"description":"Document ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_GroupResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/collection/{collection_id}/documents":{"get":{"summary":"Documents In Collection App","operationId":"documents_in_collection_app_v2_collection__collection_id__documents_get","security":[{"HTTPBearer":[]}],"parameters":[{"name":"collection_id","in":"path","required":true,"schema":{"type":"string","description":"Collection ID","title":"Collection Id"},"description":"Collection ID"},{"name":"offset","in":"query","required":false,"schema":{"type":"integer","minimum":0,"default":0,"title":"Offset"}},{"name":"limit","in":"query","required":false,"schema":{"type":"integer","maximum":1000,"minimum":1,"default":100,"title":"Limit"}}],"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_List_DocumentOverviewResponse__"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/search":{"post":{"summary":"Search App","description":"Perform a search query on the vector database and knowledge graph.\n\nThis endpoint allows for complex filtering of search results using PostgreSQL-based queries.\nFilters can be applied to various fields such as document_id, and internal metadata values.\n\n\nAllowed operators include `eq`, `neq`, `gt`, `gte`, `lt`, `lte`, `like`, `ilike`, `in`, and `nin`.","operationId":"search_app_v2_search_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_search_app_v2_search_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_SearchResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}],"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.search(\n query=\"Who is Aristotle?\",\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n kg_search_settings={\n \"use_kg_search\": True, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7,\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n }\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/search\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"query\": \"Who is Aristotle?\",\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": true\n },\n \"kg_search_settings\": {\n \"use_kg_search\": true, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n }\n }'\n"}]}},"/v2/rag":{"post":{"summary":"Rag App","description":"Execute a RAG (Retrieval-Augmented Generation) query.\n\nThis endpoint combines search results with language model generation.\nIt supports the same filtering capabilities as the search endpoint,\nallowing for precise control over the retrieved context.\n\nThe generation process can be customized using the rag_generation_config parameter.","operationId":"rag_app_v2_rag_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_rag_app_v2_rag_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_RAGResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}],"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.rag(\n query=\"Who is Aristotle?\",\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n kg_search_settings={\n \"use_kg_search\": True,\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7,\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n },\n rag_generation_config={\n \"stream\": False,\n \"temperature\": 0.7,\n \"max_tokens\": 150\n }\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/rag\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"query\": \"Who is Aristotle?\",\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"3e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n \"kg_search_settings\": {\n \"use_kg_search\": true, # graph needs to be constructed first\n \"kg_search_type\": \"local\",\n \"kg_search_level\": \"0\",\n \"generation_config\": {\n \"model\": \"gpt-4o-mini\",\n \"temperature\": 0.7\n },\n \"local_search_limits\": {\n \"__Entity__\": 20,\n \"__Relationship__\": 20,\n \"__Community__\": 20,\n },\n \"max_community_description_length\": 65536,\n \"max_llm_queries_for_global_search\": 250\n },\n \"rag_generation_config\": {\n \"stream\": false,\n \"temperature\": 0.7,\n \"max_tokens\": 150\n }\n }'\n"}]}},"/v2/agent":{"post":{"summary":"Agent App","description":"Implement an agent-based interaction for complex query processing.\n\nThis endpoint supports multi-turn conversations and can handle complex queries\nby breaking them down into sub-tasks. It uses the same filtering capabilities\nas the search and RAG endpoints for retrieving relevant information.\n\nThe agent's behavior can be customized using the rag_generation_config and\ntask_prompt_override parameters.","operationId":"agent_app_v2_agent_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_agent_app_v2_agent_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_RAGAgentResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}],"x-codeSamples":[{"lang":"Python","source":"from r2r import R2RClient\n\nclient = R2RClient(\"http://localhost:7272\")\n# when using auth, do client.login(...)\n\nresult = client.agent(\n messages=[\n {\"role\": \"user\", \"content\": \"Who is the greatest philospher of all time?\"},\n {\"role\": \"assistant\", \"content\": \"Aristotle is widely considered the greatest philospher of all time.\"},\n {\"role\": \"user\", \"content\": \"Can you tell me more about him?\"}\n ],\n vector_search_settings={\n \"use_vector_search\": True,\n \"filters\": {\"document_id\": {\"eq\": \"5e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": True\n },\n rag_generation_config={\n \"stream\": False,\n \"temperature\": 0.7,\n \"max_tokens\": 200\n },\n include_title_if_available=True\n)\n"},{"lang":"Shell","source":"curl -X POST \"https://api.example.com/agent\" \\\n -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer YOUR_API_KEY\" \\\n -d '{\n \"messages\": [\n {\"role\": \"user\", \"content\": \"Who is the greatest philospher of all time?\"},\n {\"role\": \"assistant\", \"content\": \"Aristotle is widely considered the greatest philospher of all time.\"},\n {\"role\": \"user\", \"content\": \"Can you tell me more about him?\"}\n ],\n \"vector_search_settings\": {\n \"use_vector_search\": true,\n \"filters\": {\"document_id\": {\"eq\": \"5e157b3a-8469-51db-90d9-52e7d896b49b\"}},\n \"search_limit\": 20,\n \"use_hybrid_search\": true\n },\n \"kg_search_settings\": {\n \"use_kg_search\": false # to enable this, please read the graphrag cookbook\n },\n \"rag_generation_config\": {\n \"stream\": false,\n \"temperature\": 0.7,\n \"max_tokens\": 200\n },\n \"include_title_if_available\": true\n }'\n"}]}},"/v2/register":{"post":{"summary":"Register App","description":"Register a new user with the given email and password.","operationId":"register_app_v2_register_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_register_app_v2_register_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/verify_email":{"post":{"summary":"Verify Email App","description":"Verify a user's email address.\n\nThis endpoint is used to confirm a user's email address using the verification code\nsent to their email after registration.","operationId":"verify_email_app_v2_verify_email_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_verify_email_app_v2_verify_email_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/login":{"post":{"summary":"Login App","description":"Authenticate a user and provide access tokens.\n\nThis endpoint authenticates a user using their email (username) and password,\nand returns access and refresh tokens upon successful authentication.","operationId":"login_app_v2_login_post","requestBody":{"content":{"application/x-www-form-urlencoded":{"schema":{"$ref":"#/components/schemas/Body_login_app_v2_login_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_TokenResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/logout":{"post":{"summary":"Logout App","description":"Log out the current user.\n\nThis endpoint invalidates the user's current access token, effectively logging them out.","operationId":"logout_app_v2_logout_post","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}}},"security":[{"OAuth2PasswordBearer":[]},{"HTTPBearer":[]}]}},"/v2/user":{"get":{"summary":"Get User App","description":"Get the current user's profile information.\n\nThis endpoint returns the profile information of the currently authenticated user.","operationId":"get_user_app_v2_user_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}}},"security":[{"HTTPBearer":[]}]},"put":{"summary":"Put User App","description":"Update the current user's profile information.\n\nThis endpoint allows the authenticated user to update their profile information.","operationId":"put_user_app_v2_user_put","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_put_user_app_v2_user_put"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_UserResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/refresh_access_token":{"post":{"summary":"Refresh Access Token App","description":"Refresh the access token using a refresh token.\n\nThis endpoint allows users to obtain a new access token using their refresh token.","operationId":"refresh_access_token_app_v2_refresh_access_token_post","requestBody":{"content":{"application/json":{"schema":{"type":"string","title":"Refresh Token","description":"Refresh token"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_TokenResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/change_password":{"post":{"summary":"Change Password App","description":"Change the authenticated user's password.\n\nThis endpoint allows users to change their password by providing their current password\nand a new password.","operationId":"change_password_app_v2_change_password_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_change_password_app_v2_change_password_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/request_password_reset":{"post":{"summary":"Request Password Reset App","description":"Request a password reset for a user.\n\nThis endpoint initiates the password reset process by sending a reset link\nto the specified email address.","operationId":"request_password_reset_app_v2_request_password_reset_post","requestBody":{"content":{"application/json":{"schema":{"type":"string","format":"email","title":"Email","description":"User's email address"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/reset_password":{"post":{"summary":"Reset Password App","operationId":"reset_password_app_v2_reset_password_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_reset_password_app_v2_reset_password_post"}}},"required":true},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/user/{user_id}":{"delete":{"summary":"Delete User App","description":"Delete a user account.\n\nThis endpoint allows users to delete their own account or, for superusers,\nto delete any user account.","operationId":"delete_user_app_v2_user__user_id__delete","security":[{"HTTPBearer":[]}],"parameters":[{"name":"user_id","in":"path","required":true,"schema":{"type":"string","description":"ID of the user to delete","title":"User Id"},"description":"ID of the user to delete"}],"requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_delete_user_app_v2_user__user_id__delete"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_GenericMessageResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}}}},"/v2/create_graph":{"post":{"summary":"Create Graph","description":"Creating a graph on your documents. This endpoint takes input a list of document ids and KGCreationSettings. If document IDs are not provided, the graph will be created on all documents in the system.\n\nThis step extracts the relevant entities and relationships from the documents and creates a graph based on the extracted information. You can view the graph through the neo4j browser.\n\nIn order to do GraphRAG, you will need to run the enrich_graph endpoint.","operationId":"create_graph_v2_create_graph_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_create_graph_v2_create_graph_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_KGCreationResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/enrich_graph":{"post":{"summary":"Enrich Graph","description":"This endpoint enriches the graph with additional information. It creates communities of nodes based on their similarity and adds embeddings to the graph. This step is necessary for GraphRAG to work.","operationId":"enrich_graph_v2_enrich_graph_post","requestBody":{"content":{"application/json":{"schema":{"$ref":"#/components/schemas/Body_enrich_graph_v2_enrich_graph_post"}}}},"responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ResultsWrapper_KGEnrichmentResponse_"}}}},"422":{"description":"Validation Error","content":{"application/json":{"schema":{"$ref":"#/components/schemas/HTTPValidationError"}}}}},"security":[{"HTTPBearer":[]}]}},"/v2/openapi_spec":{"get":{"summary":"Openapi Spec","operationId":"openapi_spec_v2_openapi_spec_get","responses":{"200":{"description":"Successful Response","content":{"application/json":{"schema":{}}}}}}}},"components":{"schemas":{"AddUserResponse":{"properties":{"result":{"type":"boolean","title":"Result"}},"type":"object","required":["result"],"title":"AddUserResponse"},"AllPromptsResponse":{"properties":{"prompts":{"additionalProperties":{"$ref":"#/components/schemas/PromptResponse"},"type":"object","title":"Prompts"}},"type":"object","required":["prompts"],"title":"AllPromptsResponse"},"AnalyticsResponse":{"properties":{"analytics_data":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Analytics Data"},"filtered_logs":{"type":"object","title":"Filtered Logs"}},"type":"object","required":["filtered_logs"],"title":"AnalyticsResponse"},"AppSettingsResponse":{"properties":{"config":{"type":"object","title":"Config"},"prompts":{"type":"object","title":"Prompts"}},"type":"object","required":["config","prompts"],"title":"AppSettingsResponse"},"Body_add_prompt_app_v2_add_prompt_post":{"properties":{"name":{"type":"string","title":"Name","description":"Prompt name"},"template":{"type":"string","title":"Template","description":"Prompt template"},"input_types":{"additionalProperties":{"type":"string"},"type":"object","title":"Input Types","description":"Input types","default":{}}},"type":"object","required":["name","template"],"title":"Body_add_prompt_app_v2_add_prompt_post"},"Body_add_user_to_collection_app_v2_add_user_to_collection_post":{"properties":{"user_id":{"type":"string","title":"User Id","description":"User ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["user_id","collection_id"],"title":"Body_add_user_to_collection_app_v2_add_user_to_collection_post"},"Body_agent_app_v2_agent_post":{"properties":{"messages":{"items":{"$ref":"#/components/schemas/Message-Input"},"type":"array","title":"Messages","description":"List of message objects"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"},"rag_generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"RAG generation configuration"},"task_prompt_override":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Prompt Override","description":"Task prompt override"},"include_title_if_available":{"type":"boolean","title":"Include Title If Available","description":"Includes document title in chunk response, if available.","default":true}},"type":"object","required":["messages"],"title":"Body_agent_app_v2_agent_post"},"Body_assign_document_to_collection_app_v2_assign_document_to_collection_post":{"properties":{"document_id":{"type":"string","title":"Document Id","description":"Document ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["document_id","collection_id"],"title":"Body_assign_document_to_collection_app_v2_assign_document_to_collection_post"},"Body_change_password_app_v2_change_password_post":{"properties":{"current_password":{"type":"string","title":"Current Password","description":"Current password"},"new_password":{"type":"string","title":"New Password","description":"New password"}},"type":"object","required":["current_password","new_password"],"title":"Body_change_password_app_v2_change_password_post"},"Body_create_collection_app_v2_create_collection_post":{"properties":{"name":{"type":"string","title":"Name","description":"Collection name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description","description":"Collection description","default":""}},"type":"object","required":["name"],"title":"Body_create_collection_app_v2_create_collection_post"},"Body_create_graph_v2_create_graph_post":{"properties":{"document_ids":{"anyOf":[{"items":{"type":"string"},"type":"array"},{"type":"null"}],"title":"Document Ids","description":"List of document IDs to create the graph on."},"kg_creation_settings":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"$ref":"#/components/schemas/KGCreationSettings"}},{"type":"null"}],"title":"Kg Creation Settings","description":"Settings for the graph creation process."}},"type":"object","title":"Body_create_graph_v2_create_graph_post"},"Body_delete_user_app_v2_user__user_id__delete":{"properties":{"password":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Password","description":"User's current password"},"delete_vector_data":{"anyOf":[{"type":"boolean"},{"type":"null"}],"title":"Delete Vector Data","description":"Whether to delete the user's vector data","default":false}},"type":"object","title":"Body_delete_user_app_v2_user__user_id__delete"},"Body_enrich_graph_v2_enrich_graph_post":{"properties":{"skip_clustering":{"type":"boolean","title":"Skip Clustering","description":"Whether to skip leiden clustering on the graph or not.","default":false},"force_enrichment":{"type":"boolean","title":"Force Enrichment","description":"Force Enrichment step even if graph creation is still in progress for some documents.","default":false},"kg_enrichment_settings":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"$ref":"#/components/schemas/KGEnrichmentSettings"}},{"type":"null"}],"title":"Kg Enrichment Settings","description":"Settings for the graph enrichment process."}},"type":"object","title":"Body_enrich_graph_v2_enrich_graph_post"},"Body_ingest_files_app_v2_ingest_files_post":{"properties":{"files":{"items":{"type":"string","format":"binary"},"type":"array","title":"Files","description":"The list of paths of input files to ingest into the system."},"document_ids":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"string","format":"uuid"},"type":"array"}},{"type":"null"}],"title":"Document Ids","description":"An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method."},"metadatas":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"object"},"type":"array"}},{"type":"null"}],"title":"Metadatas","description":"An optional list of JSON metadata to affix to each file"},"chunking_config":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"$ref":"#/components/schemas/ChunkingConfig"}},{"type":"null"}],"title":"Chunking Config","description":"An optional dictionary to override the default chunking configuration for the ingestion process. If not provided, the system will use the default server-side chunking configuration."}},"type":"object","required":["files"],"title":"Body_ingest_files_app_v2_ingest_files_post"},"Body_login_app_v2_login_post":{"properties":{"grant_type":{"anyOf":[{"type":"string","pattern":"password"},{"type":"null"}],"title":"Grant Type"},"username":{"type":"string","title":"Username"},"password":{"type":"string","title":"Password"},"scope":{"type":"string","title":"Scope","default":""},"client_id":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Client Id"},"client_secret":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Client Secret"}},"type":"object","required":["username","password"],"title":"Body_login_app_v2_login_post"},"Body_put_user_app_v2_user_put":{"properties":{"email":{"anyOf":[{"type":"string","format":"email"},{"type":"null"}],"title":"Email","description":"Updated email address"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name","description":"Updated user name"},"bio":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Bio","description":"Updated user bio"},"profile_picture":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Profile Picture","description":"Updated profile picture URL"}},"type":"object","title":"Body_put_user_app_v2_user_put"},"Body_rag_app_v2_rag_post":{"properties":{"query":{"type":"string","title":"Query","description":"RAG query"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"},"rag_generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"RAG generation configuration"},"task_prompt_override":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Task Prompt Override","description":"Task prompt override"},"include_title_if_available":{"type":"boolean","title":"Include Title If Available","description":"Includes document title in chunk response, if available.","default":false}},"type":"object","required":["query"],"title":"Body_rag_app_v2_rag_post"},"Body_register_app_v2_register_post":{"properties":{"email":{"type":"string","format":"email","title":"Email","description":"User's email address"},"password":{"type":"string","title":"Password","description":"User's password"}},"type":"object","required":["email","password"],"title":"Body_register_app_v2_register_post"},"Body_remove_document_from_collection_app_v2_remove_document_from_collection_post":{"properties":{"document_id":{"type":"string","title":"Document Id","description":"Document ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["document_id","collection_id"],"title":"Body_remove_document_from_collection_app_v2_remove_document_from_collection_post"},"Body_remove_user_from_collection_app_v2_remove_user_from_collection_post":{"properties":{"user_id":{"type":"string","title":"User Id","description":"User ID"},"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"}},"type":"object","required":["user_id","collection_id"],"title":"Body_remove_user_from_collection_app_v2_remove_user_from_collection_post"},"Body_reset_password_app_v2_reset_password_post":{"properties":{"reset_token":{"type":"string","title":"Reset Token","description":"Password reset token"},"new_password":{"type":"string","title":"New Password","description":"New password"}},"type":"object","required":["reset_token","new_password"],"title":"Body_reset_password_app_v2_reset_password_post"},"Body_retry_ingest_files_v2_retry_ingest_files_post":{"properties":{"document_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Document Ids","description":"An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method."}},"type":"object","required":["document_ids"],"title":"Body_retry_ingest_files_v2_retry_ingest_files_post"},"Body_score_completion_v2_score_completion_post":{"properties":{"message_id":{"type":"string","title":"Message Id","description":"Message ID"},"score":{"type":"number","title":"Score","description":"Completion score"}},"type":"object","required":["message_id","score"],"title":"Body_score_completion_v2_score_completion_post"},"Body_search_app_v2_search_post":{"properties":{"query":{"type":"string","title":"Query","description":"Search query"},"vector_search_settings":{"$ref":"#/components/schemas/VectorSearchSettings","description":"Vector search settings"},"kg_search_settings":{"$ref":"#/components/schemas/KGSearchSettings","description":"Knowledge graph search settings"}},"type":"object","required":["query"],"title":"Body_search_app_v2_search_post"},"Body_update_collection_app_v2_update_collection_put":{"properties":{"collection_id":{"type":"string","title":"Collection Id","description":"Collection ID"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name","description":"Updated collection name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description","description":"Updated collection description"}},"type":"object","required":["collection_id"],"title":"Body_update_collection_app_v2_update_collection_put"},"Body_update_files_app_v2_update_files_post":{"properties":{"files":{"items":{"type":"string","format":"binary"},"type":"array","title":"Files","description":"The list of paths of input files to update in the system."},"document_ids":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"string","format":"uuid"},"type":"array"}},{"type":"null"}],"title":"Document Ids","description":"An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method."},"metadatas":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"items":{"type":"object"},"type":"array"}},{"type":"null"}],"title":"Metadatas","description":"An optional list of JSON metadata to affix to each file"},"chunking_config":{"anyOf":[{"type":"string","contentMediaType":"application/json","contentSchema":{"$ref":"#/components/schemas/ChunkingConfig"}},{"type":"null"}],"title":"Chunking Config","description":"An optional dictionary to override the default chunking configuration for the ingestion process. If not provided, the system will use the default server-side chunking configuration."}},"type":"object","required":["files"],"title":"Body_update_files_app_v2_update_files_post"},"Body_update_prompt_app_v2_update_prompt_post":{"properties":{"name":{"type":"string","title":"Name","description":"Prompt name"},"template":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Template","description":"Prompt template"},"input_types":{"anyOf":[{"additionalProperties":{"type":"string"},"type":"object"},{"type":"null"}],"title":"Input Types","description":"Input types","default":{}}},"type":"object","required":["name"],"title":"Body_update_prompt_app_v2_update_prompt_post"},"Body_verify_email_app_v2_verify_email_post":{"properties":{"email":{"type":"string","format":"email","title":"Email","description":"User's email address"},"verification_code":{"type":"string","title":"Verification Code","description":"Email verification code"}},"type":"object","required":["email","verification_code"],"title":"Body_verify_email_app_v2_verify_email_post"},"ChunkingConfig":{"properties":{"provider":{"type":"string"}},"type":"object","title":"ChunkingConfig","example":{"chunking_strategy":"by_title","combine_under_n_chars":128,"max_characters":1024,"new_after_n_chars":512,"overlap":20,"provider":"unstructured_local","strategy":"auto"}},"DocumentChunkResponse":{"properties":{"fragment_id":{"type":"string","format":"uuid","title":"Fragment Id"},"extraction_id":{"type":"string","format":"uuid","title":"Extraction Id"},"document_id":{"type":"string","format":"uuid","title":"Document Id"},"user_id":{"type":"string","format":"uuid","title":"User Id"},"collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Collection Ids"},"text":{"type":"string","title":"Text"},"metadata":{"type":"object","title":"Metadata"}},"type":"object","required":["fragment_id","extraction_id","document_id","user_id","collection_ids","text","metadata"],"title":"DocumentChunkResponse"},"DocumentOverviewResponse":{"properties":{"id":{"type":"string","format":"uuid","title":"Id"},"title":{"type":"string","title":"Title"},"user_id":{"type":"string","format":"uuid","title":"User Id"},"type":{"type":"string","title":"Type"},"created_at":{"type":"string","format":"date-time","title":"Created At"},"updated_at":{"type":"string","format":"date-time","title":"Updated At"},"ingestion_status":{"type":"string","title":"Ingestion Status"},"restructuring_status":{"type":"string","title":"Restructuring Status"},"version":{"type":"string","title":"Version"},"collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Collection Ids"},"metadata":{"type":"object","title":"Metadata"}},"type":"object","required":["id","title","user_id","type","created_at","updated_at","ingestion_status","restructuring_status","version","collection_ids","metadata"],"title":"DocumentOverviewResponse"},"GenerationConfig":{"properties":{"model":{"type":"string","title":"Model"},"temperature":{"type":"number","title":"Temperature"},"top_p":{"type":"number","title":"Top P"},"max_tokens_to_sample":{"type":"integer","title":"Max Tokens To Sample"},"stream":{"type":"boolean","title":"Stream"},"functions":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Functions"},"tools":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Tools"},"add_generation_kwargs":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Add Generation Kwargs"},"api_base":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Api Base"}},"type":"object","title":"GenerationConfig","max_tokens_to_sample":1024,"model":"openai/gpt-4o","stream":false,"temperature":0.1,"top_p":1.0},"GenericMessageResponse":{"properties":{"message":{"type":"string","title":"Message"}},"type":"object","required":["message"],"title":"GenericMessageResponse"},"GroupOverviewResponse":{"properties":{"collection_id":{"type":"string","format":"uuid","title":"Collection Id"},"name":{"type":"string","title":"Name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description"},"created_at":{"type":"string","format":"date-time","title":"Created At"},"updated_at":{"type":"string","format":"date-time","title":"Updated At"},"user_count":{"type":"integer","title":"User Count"},"document_count":{"type":"integer","title":"Document Count"}},"type":"object","required":["collection_id","name","description","created_at","updated_at","user_count","document_count"],"title":"GroupOverviewResponse"},"GroupResponse":{"properties":{"collection_id":{"type":"string","format":"uuid","title":"Collection Id"},"name":{"type":"string","title":"Name"},"description":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Description"},"created_at":{"type":"string","format":"date-time","title":"Created At"},"updated_at":{"type":"string","format":"date-time","title":"Updated At"}},"type":"object","required":["collection_id","name","description","created_at","updated_at"],"title":"GroupResponse"},"HTTPValidationError":{"properties":{"detail":{"items":{"$ref":"#/components/schemas/ValidationError"},"type":"array","title":"Detail"}},"type":"object","title":"HTTPValidationError"},"HybridSearchSettings":{"properties":{"full_text_weight":{"type":"number","title":"Full Text Weight","description":"Weight to apply to full text search","default":1.0},"semantic_weight":{"type":"number","title":"Semantic Weight","description":"Weight to apply to semantic search","default":5.0},"full_text_limit":{"type":"integer","title":"Full Text Limit","description":"Maximum number of results to return from full text search","default":200},"rrf_k":{"type":"integer","title":"Rrf K","description":"K-value for RRF (Rank Reciprocal Fusion)","default":50}},"type":"object","title":"HybridSearchSettings"},"IndexMeasure":{"type":"string","enum":["cosine_distance","l2_distance","max_inner_product"],"title":"IndexMeasure","description":"An enum representing the types of distance measures available for indexing.\n\nAttributes:\n cosine_distance (str): The cosine distance measure for indexing.\n l2_distance (str): The Euclidean (L2) distance measure for indexing.\n max_inner_product (str): The maximum inner product measure for indexing."},"IngestionResponse":{"properties":{"message":{"type":"string","title":"Message","description":"A message describing the result of the ingestion request."},"task_id":{"type":"string","format":"uuid","title":"Task Id","description":"The task ID of the ingestion request."},"document_id":{"type":"string","format":"uuid","title":"Document Id","description":"The ID of the document that was ingested."}},"type":"object","required":["message","task_id","document_id"],"title":"IngestionResponse","example":{"document_id":"9fbe403b-c11c-5aae-8ade-ef22980c3ad1","message":"Ingestion task queued successfully.","task_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96"}},"KGCreationResponse":{"properties":{"message":{"type":"string","title":"Message","description":"A message describing the result of the restructure request."},"task_id":{"type":"string","format":"uuid","title":"Task Id","description":"The task ID of the restructure request."}},"type":"object","required":["message","task_id"],"title":"KGCreationResponse","example":{"message":"Graph creation queued successfully.","task_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96"}},"KGCreationSettings":{"properties":{"entity_types":{"items":{"type":"string"},"type":"array","title":"Entity Types","description":"The types of entities to extract.","default":[]},"relation_types":{"items":{"type":"string"},"type":"array","title":"Relation Types","description":"The types of relations to extract.","default":[]},"fragment_merge_count":{"type":"integer","title":"Fragment Merge Count","description":"The number of fragments to merge into a single KG extraction.","default":4},"max_knowledge_triples":{"type":"integer","title":"Max Knowledge Triples","description":"The maximum number of knowledge triples to extract from each chunk.","default":100},"generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"Configuration for text generation during graph enrichment."}},"type":"object","title":"KGCreationSettings","description":"Settings for knowledge graph creation."},"KGEnrichmentResponse":{"properties":{"message":{"type":"string","title":"Message","description":"A message describing the result of the restructure request."},"task_id":{"type":"string","format":"uuid","title":"Task Id","description":"The task ID of the restructure request."}},"type":"object","required":["message","task_id"],"title":"KGEnrichmentResponse","example":{"message":"Graph enrichment queued successfuly.","task_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96"}},"KGEnrichmentSettings":{"properties":{"max_description_input_length":{"type":"integer","title":"Max Description Input Length","description":"The maximum length of the description for a node in the graph.","default":65536},"max_summary_input_length":{"type":"integer","title":"Max Summary Input Length","description":"The maximum length of the summary for a community.","default":65536},"generation_config":{"$ref":"#/components/schemas/GenerationConfig","description":"Configuration for text generation during graph enrichment."},"leiden_params":{"type":"object","title":"Leiden Params","description":"Parameters for the Leiden algorithm."}},"type":"object","title":"KGEnrichmentSettings","description":"Settings for knowledge graph enrichment."},"KGGlobalSearchResult":{"properties":{"query":{"type":"string","title":"Query"},"search_result":{"items":{"type":"string"},"type":"array","title":"Search Result"}},"type":"object","required":["query","search_result"],"title":"KGGlobalSearchResult","description":"Result of a global knowledge graph search operation.","query":"What were Aristotles key contributions to philosophy?","search_result":["### Aristotle's Key Contributions to Philosophy\n\nAristotle's extensive body of work laid the foundation for numerous fields within philosophy and beyond, significantly shaping the trajectory of Western thought. His systematic approach to data collection and analysis has had a lasting impact on modern scientific methods. Below, we explore some of his most influential contributions.\n\n#### Foundational Works and Systematic Approach\n\nAristotle's writings cover a broad spectrum of topics, including logic, biology, ethics, and political science. His key works such as 'Physics,' 'On the Soul,' and 'Nicomachean Ethics' delve into fundamental concepts like substance, memory, and the nature of the city [Data: Reports (1, 2, 3, 4, 5, +more)]. These texts not only provided a comprehensive framework for understanding various aspects of the natural and human world but also established methodologies that continue to influence contemporary scientific inquiry.\n\n#### Ethical and Political Philosophy\n\nIn 'Nicomachean Ethics,' Aristotle explores the concept of a virtuous character, emphasizing the importance of moral virtues and the development of good habits. His work 'Politics' further examines the structure and function of the city (polis), addressing issues related to property, trade, and governance. Aristotle's classification of political constitutions and his definition of the city as the natural political community have had a profound and enduring impact on political thought [Data: Reports (11, 12); Triples (21, 22, 23, 24, 25)].\n\n#### Theories on Memory and Perception\n\nAristotle's theories on memory and perception are articulated in his works 'On the Soul' and 'De Anima iii 3.' He defines memory as the retention of experiences shaped by sensation and discusses the faculty of imagination (phantasia). These theories have significantly influenced subsequent philosophical and psychological studies on cognition and perception [Data: Reports (13, 14); Triples (26, 27, 28, 29, 30)].\n\n#### Epistemology and Scientific Method\n\nAristotle's epistemology, known as immanent realism, is based on the study of things that exist or happen in the world. This approach emphasizes empirical observation and has been instrumental in shaping the development of scientific methods. His insistence on grounding knowledge in observable phenomena laid the groundwork for future empirical research [Data: Reports (3)].\n\n#### Engagement with Predecessors and Contemporaries\n\nAristotle was also known for his critical engagement with the ideas of his predecessors and contemporaries. For instance, he refuted Democritus's claim about the Milky Way and criticized Empedocles's materialist theory of 'survival of the fittest.' These critiques highlight Aristotle's active participation in the broader philosophical discourse of his time and his contributions to refining and advancing philosophical thought [Data: Reports (15, 16); Triples (31, 32, 33, 34, 35)].\n\n### Conclusion\n\nAristotle's contributions to philosophy are vast and multifaceted, encompassing ethics, politics, epistemology, and more. His works continue to be studied and revered for their depth, rigor, and enduring relevance. Through his systematic approach and critical engagement with existing ideas, Aristotle has left an indelible mark on the landscape of Western philosophy."]},"KGLocalSearchResult":{"properties":{"query":{"type":"string","title":"Query"},"entities":{"type":"object","title":"Entities"},"relationships":{"type":"object","title":"Relationships"},"communities":{"type":"object","title":"Communities"}},"type":"object","required":["query","entities","relationships","communities"],"title":"KGLocalSearchResult","description":"Result of a local knowledge graph search operation.","communities":{"0":{"summary":{"findings":[{"explanation":"Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six books ...","summary":"Aristotle's Foundational Role in Logic"}],"rating":9.5,"rating_explanation":"The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.","summary":"The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.","title":"Aristotle and His Contributions"}}},"entities":{"0":{"description":"Aristotle was an ancient Greek philosopher and polymath, recognized as the father of various fields including logic, biology, and political science. He authored significant works such as the *Nicomachean Ethics* and *Politics*, where he explored concepts of virtue, governance, and the nature of reality, while also critiquing Platos ideas. His teachings and observations laid the groundwork for numerous disciplines, influencing thinkers ...","name":"Aristotle"}},"query":"Who is Aristotle?","relationships":{}},"KGSearchResult":{"properties":{"local_result":{"anyOf":[{"$ref":"#/components/schemas/KGLocalSearchResult"},{"type":"null"}]},"global_result":{"anyOf":[{"$ref":"#/components/schemas/KGGlobalSearchResult"},{"type":"null"}]}},"type":"object","title":"KGSearchResult","description":"Result of a knowledge graph search operation.","global_result":{"query":"What were Aristotles key contributions to philosophy?","search_result":["### Aristotle's Key Contributions to Philosophy\n\nAristotle's extensive body of work laid the foundation for numerous fields within philosophy and beyond, significantly shaping the trajectory of Western thought. His systematic approach to data collection and analysis has had a lasting impact on modern scientific methods. Below, we explore some of his most influential contributions.\n\n#### Foundational Works and Systematic Approach\n\nAristotle's writings cover a broad spectrum of topics, including logic, biology, ethics, and political science. His key works such as 'Physics,' 'On the Soul,' and 'Nicomachean Ethics' delve into fundamental concepts like substance, memory, and the nature of the city [Data: Reports (1, 2, 3, 4, 5, +more)]. These texts not only provided a comprehensive framework for understanding various aspects of the natural and human world but also established methodologies that continue to influence contemporary scientific inquiry.\n\n#### Ethical and Political Philosophy\n\nIn 'Nicomachean Ethics,' Aristotle explores the concept of a virtuous character, emphasizing the importance of moral virtues and the development of good habits. His work 'Politics' further examines the structure and function of the city (polis), addressing issues related to property, trade, and governance. Aristotle's classification of political constitutions and his definition of the city as the natural political community have had a profound and enduring impact on political thought [Data: Reports (11, 12); Triples (21, 22, 23, 24, 25)].\n\n#### Theories on Memory and Perception\n\nAristotle's theories on memory and perception are articulated in his works 'On the Soul' and 'De Anima iii 3.' He defines memory as the retention of experiences shaped by sensation and discusses the faculty of imagination (phantasia). These theories have significantly influenced subsequent philosophical and psychological studies on cognition and perception [Data: Reports (13, 14); Triples (26, 27, 28, 29, 30)].\n\n#### Epistemology and Scientific Method\n\nAristotle's epistemology, known as immanent realism, is based on the study of things that exist or happen in the world. This approach emphasizes empirical observation and has been instrumental in shaping the development of scientific methods. His insistence on grounding knowledge in observable phenomena laid the groundwork for future empirical research [Data: Reports (3)].\n\n#### Engagement with Predecessors and Contemporaries\n\nAristotle was also known for his critical engagement with the ideas of his predecessors and contemporaries. For instance, he refuted Democritus's claim about the Milky Way and criticized Empedocles's materialist theory of 'survival of the fittest.' These critiques highlight Aristotle's active participation in the broader philosophical discourse of his time and his contributions to refining and advancing philosophical thought [Data: Reports (15, 16); Triples (31, 32, 33, 34, 35)].\n\n### Conclusion\n\nAristotle's contributions to philosophy are vast and multifaceted, encompassing ethics, politics, epistemology, and more. His works continue to be studied and revered for their depth, rigor, and enduring relevance. Through his systematic approach and critical engagement with existing ideas, Aristotle has left an indelible mark on the landscape of Western philosophy."]},"local_result":{"communities":{"0":{"summary":{"findings":[{"explanation":"Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six books ...","summary":"Aristotle's Foundational Role in Logic"}],"rating":9.5,"rating_explanation":"The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.","summary":"The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.","title":"Aristotle and His Contributions"}}},"entities":{"0":{"description":"Aristotle was an ancient Greek philosopher and polymath, recognized as the father of various fields including logic, biology, and political science. He authored significant works such as the *Nicomachean Ethics* and *Politics*, where he explored concepts of virtue, governance, and the nature of reality, while also critiquing Platos ideas. His teachings and observations laid the groundwork for numerous disciplines, influencing thinkers ...","name":"Aristotle"}},"query":"Who is Aristotle?","relationships":{}}},"KGSearchSettings":{"properties":{"use_kg_search":{"type":"boolean","title":"Use Kg Search","default":false},"kg_search_type":{"type":"string","title":"Kg Search Type","default":"global"},"kg_search_level":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Kg Search Level"},"generation_config":{"anyOf":[{"$ref":"#/components/schemas/GenerationConfig"},{"type":"null"}]},"max_community_description_length":{"type":"integer","title":"Max Community Description Length","default":65536},"max_llm_queries_for_global_search":{"type":"integer","title":"Max Llm Queries For Global Search","default":250},"local_search_limits":{"additionalProperties":{"type":"integer"},"type":"object","title":"Local Search Limits","default":{"__Entity__":20,"__Relationship__":20,"__Community__":20}}},"type":"object","title":"KGSearchSettings","generation_config":{"max_tokens_to_sample":1024,"model":"openai/gpt-4o","stream":false,"temperature":0.1,"top_p":1.0},"kg_search_level":"0","kg_search_type":"global","local_search_limits":{"__Community__":20,"__Entity__":20,"__Relationship__":20},"max_community_description_length":65536,"max_llm_queries_for_global_search":250,"use_kg_search":true},"LogEntry":{"properties":{"key":{"type":"string","title":"Key"},"value":{"title":"Value"},"timestamp":{"type":"string","format":"date-time","title":"Timestamp"}},"type":"object","required":["key","value","timestamp"],"title":"LogEntry"},"LogResponse":{"properties":{"run_id":{"type":"string","format":"uuid","title":"Run Id"},"run_type":{"type":"string","title":"Run Type"},"entries":{"items":{"$ref":"#/components/schemas/LogEntry"},"type":"array","title":"Entries"},"timestamp":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Timestamp"},"user_id":{"anyOf":[{"type":"string","format":"uuid"},{"type":"null"}],"title":"User Id"}},"type":"object","required":["run_id","run_type","entries","timestamp","user_id"],"title":"LogResponse"},"Message-Input":{"properties":{"role":{"anyOf":[{"$ref":"#/components/schemas/MessageType"},{"type":"string"}],"title":"Role"},"content":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Content"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name"},"function_call":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Function Call"},"tool_calls":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Tool Calls"}},"type":"object","required":["role"],"title":"Message"},"Message-Output":{"properties":{"role":{"anyOf":[{"$ref":"#/components/schemas/MessageType"},{"type":"string"}],"title":"Role"},"content":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Content"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name"},"function_call":{"anyOf":[{"type":"object"},{"type":"null"}],"title":"Function Call"},"tool_calls":{"anyOf":[{"items":{"type":"object"},"type":"array"},{"type":"null"}],"title":"Tool Calls"}},"type":"object","required":["role"],"title":"Message","content":"This is a test message.","role":"user"},"MessageType":{"type":"string","enum":["system","user","assistant","function","tool"],"title":"MessageType"},"PromptResponse":{"properties":{"name":{"type":"string","title":"Name"},"template":{"type":"string","title":"Template"},"created_at":{"type":"string","format":"date-time","title":"Created At"},"updated_at":{"type":"string","format":"date-time","title":"Updated At"},"input_types":{"additionalProperties":{"type":"string"},"type":"object","title":"Input Types"}},"type":"object","required":["name","template","created_at","updated_at","input_types"],"title":"PromptResponse"},"RAGAgentResponse":{"properties":{"messages":{"items":{"$ref":"#/components/schemas/Message-Output"},"type":"array","title":"Messages","description":"List of messages in the RAG agent response"}},"type":"object","required":["messages"],"title":"RAGAgentResponse","example":[{"content":"## You are a helpful assistant that can search for information.\n\nWhen asked a question, perform a search to find relevant information and provide a response.\n\nThe response should contain line-item attributions to relevent search results, and be as informative if possible.\nIf no relevant results are found, then state that no results were found.\nIf no obvious question is present, then do not carry out a search, and instead ask for clarification.","role":"system"},{"content":"You are a helpful assistant.","role":"system"},{"content":"Who is the greatest philospher of all time?","role":"user"},{"content":"Aristotle is widely considered the greatest philospher of all time.","role":"assistant"},{"content":"Can you tell me more about him?","role":"user"},{"function_call":{"arguments":"{\"query\":\"Aristotle biography\"}","name":"search"},"role":"assistant"},{"content":"1. Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath. His writings cover a broad range of subjects spanning the natural sciences, philosophy, linguistics, economics, politics, psychology, and the arts. As the founder of the Peripatetic school of philosophy in the Lyceum in Athens, he began the wider Aristotelian tradition that followed, which set the groundwork for the development of modern science.\n2. Aristotle[A] (Greek: Ἀριστοτέλης Aristotélēs, pronounced [aristotélɛːs]; 384–322 BC) was an Ancient Greek philosopher and polymath. His writings cover a broad range of subjects spanning the natural sciences, philosophy, linguistics, economics, politics, psychology, and the arts. As the founder of the Peripatetic school of philosophy in the Lyceum in Athens, he began the wider Aristotelian tradition that followed, which set the groundwork for the development of modern science.\n3. Aristotle was born in 384 BC[C] in Stagira, Chalcidice,[2] about 55 km (34 miles) east of modern-day Thessaloniki.[3][4] His father, Nicomachus, was the personal physician to King Amyntas of Macedon. While he was young, Aristotle learned about biology and medical information, which was taught by his father.[5] Both of Aristotle's parents died when he was about thirteen, and Proxenus of Atarneus became his guardian.[6] Although little information about Aristotle's childhood has survived, he probably spent\n4. Aristotle was born in 384 BC[C] in Stagira, Chalcidice,[2] about 55 km (34 miles) east of modern-day Thessaloniki.[3][4] His father, Nicomachus, was the personal physician to King Amyntas of Macedon. While he was young, Aristotle learned about biology and medical information, which was taught by his father.[5] Both of Aristotle's parents died when he was about thirteen, and Proxenus of Atarneus became his guardian.[6] Although little information about Aristotle's childhood has survived, he probably spent\n5. Life\nIn general, the details of Aristotle's life are not well-established. The biographies written in ancient times are often speculative and historians only agree on a few salient points.[B]\n","name":"search","role":"function"},{"content":"Aristotle (384–322 BC) was an Ancient Greek philosopher and polymath whose contributions have had a profound impact on various fields of knowledge. Here are some key points about his life and work:\n\n1. **Early Life**: Aristotle was born in 384 BC in Stagira, Chalcidice, which is near modern-day Thessaloniki, Greece. His father, Nicomachus, was the personal physician to King Amyntas of Macedon, which exposed Aristotle to medical and biological knowledge from a young age [C].\n\n2. **Education and Career**: After the death of his parents, Aristotle was sent to Athens to study at Plato's Academy, where he remained for about 20 years. After Plato's death, Aristotle left Athens and eventually became the tutor of Alexander the Great [C].\n\n3. **Philosophical Contributions**: Aristotle founded the Lyceum in Athens, where he established the Peripatetic school of philosophy. His works cover a wide range of subjects, including metaphysics, ethics, politics, logic, biology, and aesthetics. His writings laid the groundwork for many modern scientific and philosophical inquiries [A].\n\n4. **Legacy**: Aristotle's influence extends beyond philosophy to the natural sciences, linguistics, economics, and psychology. His method of systematic observation and analysis has been foundational to the development of modern science [A].\n\nAristotle's comprehensive approach to knowledge and his systematic methodology have earned him a lasting legacy as one of the greatest philosophers of all time.\n\nSources:\n- [A] Aristotle's broad range of writings and influence on modern science.\n- [C] Details about Aristotle's early life and education.","role":"assistant"}]},"RAGResponse":{"properties":{"completion":{"title":"Completion","description":"The generated completion from the RAG process"},"search_results":{"$ref":"#/components/schemas/SearchResponse","description":"The search results used for the RAG process"}},"type":"object","required":["completion","search_results"],"title":"RAGResponse","example":{"completion":{"choices":[{"finish_reason":"stop","index":0,"message":{"content":"Paris is the capital of France.","role":"assistant"}}],"id":"chatcmpl-example123"},"search_results":{"kg_search_results":[{"global_result":{"query":"What were Aristotles key contributions to philosophy?","search_result":["### Aristotle's Key Contributions to Philosophy\n\nAristotle's extensive body of work laid the foundation for numerous fields within philosophy and beyond, significantly shaping the trajectory of Western thought. His systematic approach to data collection and analysis has had a lasting impact on modern scientific methods. Below, we explore some of his most influential contributions.\n\n#### Foundational Works and Systematic Approach\n\nAristotle's writings cover a broad spectrum of topics, including logic, biology, ethics, and political science. His key works such as 'Physics,' 'On the Soul,' and 'Nicomachean Ethics' delve into fundamental concepts like substance, memory, and the nature of the city [Data: Reports (1, 2, 3, 4, 5, +more)]. These texts not only provided a comprehensive framework for understanding various aspects of the natural and human world but also established methodologies that continue to influence contemporary scientific inquiry.\n\n#### Ethical and Political Philosophy\n\nIn 'Nicomachean Ethics,' Aristotle explores the concept of a virtuous character, emphasizing the importance of moral virtues and the development of good habits. His work 'Politics' further examines the structure and function of the city (polis), addressing issues related to property, trade, and governance. Aristotle's classification of political constitutions and his definition of the city as the natural political community have had a profound and enduring impact on political thought [Data: Reports (11, 12); Triples (21, 22, 23, 24, 25)].\n\n#### Theories on Memory and Perception\n\nAristotle's theories on memory and perception are articulated in his works 'On the Soul' and 'De Anima iii 3.' He defines memory as the retention of experiences shaped by sensation and discusses the faculty of imagination (phantasia). These theories have significantly influenced subsequent philosophical and psychological studies on cognition and perception [Data: Reports (13, 14); Triples (26, 27, 28, 29, 30)].\n\n#### Epistemology and Scientific Method\n\nAristotle's epistemology, known as immanent realism, is based on the study of things that exist or happen in the world. This approach emphasizes empirical observation and has been instrumental in shaping the development of scientific methods. His insistence on grounding knowledge in observable phenomena laid the groundwork for future empirical research [Data: Reports (3)].\n\n#### Engagement with Predecessors and Contemporaries\n\nAristotle was also known for his critical engagement with the ideas of his predecessors and contemporaries. For instance, he refuted Democritus's claim about the Milky Way and criticized Empedocles's materialist theory of 'survival of the fittest.' These critiques highlight Aristotle's active participation in the broader philosophical discourse of his time and his contributions to refining and advancing philosophical thought [Data: Reports (15, 16); Triples (31, 32, 33, 34, 35)].\n\n### Conclusion\n\nAristotle's contributions to philosophy are vast and multifaceted, encompassing ethics, politics, epistemology, and more. His works continue to be studied and revered for their depth, rigor, and enduring relevance. Through his systematic approach and critical engagement with existing ideas, Aristotle has left an indelible mark on the landscape of Western philosophy."]},"local_result":{"communities":{"0":{"summary":{"findings":[{"explanation":"Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six books ...","summary":"Aristotle's Foundational Role in Logic"}],"rating":9.5,"rating_explanation":"The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.","summary":"The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.","title":"Aristotle and His Contributions"}}},"entities":{"0":{"description":"Aristotle was an ancient Greek philosopher and polymath, recognized as the father of various fields including logic, biology, and political science. He authored significant works such as the *Nicomachean Ethics* and *Politics*, where he explored concepts of virtue, governance, and the nature of reality, while also critiquing Platos ideas. His teachings and observations laid the groundwork for numerous disciplines, influencing thinkers ...","name":"Aristotle"}},"query":"Who is Aristotle?","relationships":{}}}],"vector_search_results":[{"collection_ids":[],"document_id":"3e157b3a-8469-51db-90d9-52e7d896b49b","extraction_id":"3f3d47f3-8baf-58eb-8bc2-0171fb1c6e09","fragment_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96","metadata":{"associated_query":"What is the capital of France?","title":"example_document.pdf"},"score":0.23943702876567796,"text":"Example text from the document","user_id":"2acb499e-8428-543b-bd85-0d9098718220"}]}}},"ResultsWrapper_AddUserResponse_":{"properties":{"results":{"$ref":"#/components/schemas/AddUserResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[AddUserResponse]"},"ResultsWrapper_AllPromptsResponse_":{"properties":{"results":{"$ref":"#/components/schemas/AllPromptsResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[AllPromptsResponse]"},"ResultsWrapper_AnalyticsResponse_":{"properties":{"results":{"$ref":"#/components/schemas/AnalyticsResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[AnalyticsResponse]"},"ResultsWrapper_AppSettingsResponse_":{"properties":{"results":{"$ref":"#/components/schemas/AppSettingsResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[AppSettingsResponse]"},"ResultsWrapper_GenericMessageResponse_":{"properties":{"results":{"$ref":"#/components/schemas/GenericMessageResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[GenericMessageResponse]"},"ResultsWrapper_GroupResponse_":{"properties":{"results":{"$ref":"#/components/schemas/GroupResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[GroupResponse]"},"ResultsWrapper_KGCreationResponse_":{"properties":{"results":{"$ref":"#/components/schemas/KGCreationResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[KGCreationResponse]"},"ResultsWrapper_KGEnrichmentResponse_":{"properties":{"results":{"$ref":"#/components/schemas/KGEnrichmentResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[KGEnrichmentResponse]"},"ResultsWrapper_List_DocumentChunkResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/DocumentChunkResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[DocumentChunkResponse]]"},"ResultsWrapper_List_DocumentOverviewResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/DocumentOverviewResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[DocumentOverviewResponse]]"},"ResultsWrapper_List_GroupOverviewResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/GroupOverviewResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[GroupOverviewResponse]]"},"ResultsWrapper_List_GroupResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/GroupResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[GroupResponse]]"},"ResultsWrapper_List_LogResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/LogResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[LogResponse]]"},"ResultsWrapper_List_UserOverviewResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/UserOverviewResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[List[UserOverviewResponse]]"},"ResultsWrapper_RAGAgentResponse_":{"properties":{"results":{"$ref":"#/components/schemas/RAGAgentResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[RAGAgentResponse]"},"ResultsWrapper_RAGResponse_":{"properties":{"results":{"$ref":"#/components/schemas/RAGResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[RAGResponse]"},"ResultsWrapper_ScoreCompletionResponse_":{"properties":{"results":{"$ref":"#/components/schemas/ScoreCompletionResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[ScoreCompletionResponse]"},"ResultsWrapper_SearchResponse_":{"properties":{"results":{"$ref":"#/components/schemas/SearchResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[SearchResponse]"},"ResultsWrapper_ServerStats_":{"properties":{"results":{"$ref":"#/components/schemas/ServerStats"}},"type":"object","required":["results"],"title":"ResultsWrapper[ServerStats]"},"ResultsWrapper_TokenResponse_":{"properties":{"results":{"$ref":"#/components/schemas/TokenResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[TokenResponse]"},"ResultsWrapper_UpdatePromptResponse_":{"properties":{"results":{"$ref":"#/components/schemas/UpdatePromptResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[UpdatePromptResponse]"},"ResultsWrapper_UpdateResponse_":{"properties":{"results":{"$ref":"#/components/schemas/UpdateResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[UpdateResponse]"},"ResultsWrapper_UserResponse_":{"properties":{"results":{"$ref":"#/components/schemas/UserResponse"}},"type":"object","required":["results"],"title":"ResultsWrapper[UserResponse]"},"ResultsWrapper_list_IngestionResponse__":{"properties":{"results":{"items":{"$ref":"#/components/schemas/IngestionResponse"},"type":"array","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[list[IngestionResponse]]"},"ResultsWrapper_str_":{"properties":{"results":{"type":"string","title":"Results"}},"type":"object","required":["results"],"title":"ResultsWrapper[str]"},"ScoreCompletionResponse":{"properties":{"message":{"type":"string","title":"Message"}},"type":"object","required":["message"],"title":"ScoreCompletionResponse"},"SearchResponse":{"properties":{"vector_search_results":{"items":{"$ref":"#/components/schemas/VectorSearchResult"},"type":"array","title":"Vector Search Results","description":"List of vector search results"},"kg_search_results":{"anyOf":[{"items":{"$ref":"#/components/schemas/KGSearchResult"},"type":"array"},{"type":"null"}],"title":"Kg Search Results","description":"Knowledge graph search results, if applicable"}},"type":"object","required":["vector_search_results"],"title":"SearchResponse","example":{"kg_search_results":[{"global_result":{"query":"What were Aristotles key contributions to philosophy?","search_result":["### Aristotle's Key Contributions to Philosophy\n\nAristotle's extensive body of work laid the foundation for numerous fields within philosophy and beyond, significantly shaping the trajectory of Western thought. His systematic approach to data collection and analysis has had a lasting impact on modern scientific methods. Below, we explore some of his most influential contributions.\n\n#### Foundational Works and Systematic Approach\n\nAristotle's writings cover a broad spectrum of topics, including logic, biology, ethics, and political science. His key works such as 'Physics,' 'On the Soul,' and 'Nicomachean Ethics' delve into fundamental concepts like substance, memory, and the nature of the city [Data: Reports (1, 2, 3, 4, 5, +more)]. These texts not only provided a comprehensive framework for understanding various aspects of the natural and human world but also established methodologies that continue to influence contemporary scientific inquiry.\n\n#### Ethical and Political Philosophy\n\nIn 'Nicomachean Ethics,' Aristotle explores the concept of a virtuous character, emphasizing the importance of moral virtues and the development of good habits. His work 'Politics' further examines the structure and function of the city (polis), addressing issues related to property, trade, and governance. Aristotle's classification of political constitutions and his definition of the city as the natural political community have had a profound and enduring impact on political thought [Data: Reports (11, 12); Triples (21, 22, 23, 24, 25)].\n\n#### Theories on Memory and Perception\n\nAristotle's theories on memory and perception are articulated in his works 'On the Soul' and 'De Anima iii 3.' He defines memory as the retention of experiences shaped by sensation and discusses the faculty of imagination (phantasia). These theories have significantly influenced subsequent philosophical and psychological studies on cognition and perception [Data: Reports (13, 14); Triples (26, 27, 28, 29, 30)].\n\n#### Epistemology and Scientific Method\n\nAristotle's epistemology, known as immanent realism, is based on the study of things that exist or happen in the world. This approach emphasizes empirical observation and has been instrumental in shaping the development of scientific methods. His insistence on grounding knowledge in observable phenomena laid the groundwork for future empirical research [Data: Reports (3)].\n\n#### Engagement with Predecessors and Contemporaries\n\nAristotle was also known for his critical engagement with the ideas of his predecessors and contemporaries. For instance, he refuted Democritus's claim about the Milky Way and criticized Empedocles's materialist theory of 'survival of the fittest.' These critiques highlight Aristotle's active participation in the broader philosophical discourse of his time and his contributions to refining and advancing philosophical thought [Data: Reports (15, 16); Triples (31, 32, 33, 34, 35)].\n\n### Conclusion\n\nAristotle's contributions to philosophy are vast and multifaceted, encompassing ethics, politics, epistemology, and more. His works continue to be studied and revered for their depth, rigor, and enduring relevance. Through his systematic approach and critical engagement with existing ideas, Aristotle has left an indelible mark on the landscape of Western philosophy."]},"local_result":{"communities":{"0":{"summary":{"findings":[{"explanation":"Aristotle is credited with the earliest study of formal logic, and his conception of it was the dominant form of Western logic until the 19th-century advances in mathematical logic. His works compiled into a set of six books ...","summary":"Aristotle's Foundational Role in Logic"}],"rating":9.5,"rating_explanation":"The impact severity rating is high due to Aristotle's foundational influence on multiple disciplines and his enduring legacy in Western philosophy and science.","summary":"The community revolves around Aristotle, an ancient Greek philosopher and polymath, who made significant contributions to various fields including logic, biology, political science, and economics. His works, such as 'Politics' and 'Nicomachean Ethics', have influenced numerous disciplines and thinkers from antiquity through the Middle Ages and beyond. The relationships between his various works and the fields he contributed to highlight his profound impact on Western thought.","title":"Aristotle and His Contributions"}}},"entities":{"0":{"description":"Aristotle was an ancient Greek philosopher and polymath, recognized as the father of various fields including logic, biology, and political science. He authored significant works such as the *Nicomachean Ethics* and *Politics*, where he explored concepts of virtue, governance, and the nature of reality, while also critiquing Platos ideas. His teachings and observations laid the groundwork for numerous disciplines, influencing thinkers ...","name":"Aristotle"}},"query":"Who is Aristotle?","relationships":{}}}],"vector_search_results":[{"collection_ids":[],"document_id":"3e157b3a-8469-51db-90d9-52e7d896b49b","extraction_id":"3f3d47f3-8baf-58eb-8bc2-0171fb1c6e09","fragment_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96","metadata":{"associated_query":"What is the capital of France?","title":"example_document.pdf"},"score":0.23943702876567796,"text":"Example text from the document","user_id":"2acb499e-8428-543b-bd85-0d9098718220"}]}},"ServerStats":{"properties":{"start_time":{"type":"string","format":"date-time","title":"Start Time"},"uptime_seconds":{"type":"number","title":"Uptime Seconds"},"cpu_usage":{"type":"number","title":"Cpu Usage"},"memory_usage":{"type":"number","title":"Memory Usage"}},"type":"object","required":["start_time","uptime_seconds","cpu_usage","memory_usage"],"title":"ServerStats"},"Token":{"properties":{"token":{"type":"string","title":"Token"},"token_type":{"type":"string","title":"Token Type"}},"type":"object","required":["token","token_type"],"title":"Token"},"TokenResponse":{"properties":{"access_token":{"$ref":"#/components/schemas/Token"},"refresh_token":{"$ref":"#/components/schemas/Token"}},"type":"object","required":["access_token","refresh_token"],"title":"TokenResponse"},"UpdatePromptResponse":{"properties":{"message":{"type":"string","title":"Message"}},"type":"object","required":["message"],"title":"UpdatePromptResponse"},"UpdateResponse":{"properties":{"message":{"type":"string","title":"Message","description":"A message describing the result of the ingestion request."},"task_id":{"type":"string","format":"uuid","title":"Task Id","description":"The task ID of the ingestion request."},"document_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Document Ids","description":"The ID of the document that was ingested."}},"type":"object","required":["message","task_id","document_ids"],"title":"UpdateResponse","example":{"document_ids":["9fbe403b-c11c-5aae-8ade-ef22980c3ad1"],"message":"Update task queued successfully.","task_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96"}},"UserOverviewResponse":{"properties":{"user_id":{"type":"string","format":"uuid","title":"User Id"},"num_files":{"type":"integer","title":"Num Files"},"total_size_in_bytes":{"type":"integer","title":"Total Size In Bytes"},"document_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Document Ids"}},"type":"object","required":["user_id","num_files","total_size_in_bytes","document_ids"],"title":"UserOverviewResponse"},"UserResponse":{"properties":{"id":{"type":"string","format":"uuid","title":"Id"},"email":{"type":"string","title":"Email"},"is_active":{"type":"boolean","title":"Is Active","default":true},"is_superuser":{"type":"boolean","title":"Is Superuser","default":false},"created_at":{"type":"string","format":"date-time","title":"Created At","default":"2024-09-18T18:14:34.700273"},"updated_at":{"type":"string","format":"date-time","title":"Updated At","default":"2024-09-18T18:14:34.700279"},"is_verified":{"type":"boolean","title":"Is Verified","default":false},"collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Collection Ids","default":[]},"hashed_password":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Hashed Password"},"verification_code_expiry":{"anyOf":[{"type":"string","format":"date-time"},{"type":"null"}],"title":"Verification Code Expiry"},"name":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Name"},"bio":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Bio"},"profile_picture":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Profile Picture"}},"type":"object","required":["id","email"],"title":"UserResponse"},"ValidationError":{"properties":{"loc":{"items":{"anyOf":[{"type":"string"},{"type":"integer"}]},"type":"array","title":"Location"},"msg":{"type":"string","title":"Message"},"type":{"type":"string","title":"Error Type"}},"type":"object","required":["loc","msg","type"],"title":"ValidationError"},"VectorSearchResult":{"properties":{"fragment_id":{"type":"string","format":"uuid","title":"Fragment Id"},"extraction_id":{"type":"string","format":"uuid","title":"Extraction Id"},"document_id":{"type":"string","format":"uuid","title":"Document Id"},"user_id":{"anyOf":[{"type":"string","format":"uuid"},{"type":"null"}],"title":"User Id"},"collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Collection Ids"},"score":{"type":"number","title":"Score"},"text":{"type":"string","title":"Text"},"metadata":{"type":"object","title":"Metadata"}},"type":"object","required":["fragment_id","extraction_id","document_id","user_id","collection_ids","score","text","metadata"],"title":"VectorSearchResult","description":"Result of a search operation.","collection_ids":[],"document_id":"3e157b3a-8469-51db-90d9-52e7d896b49b","extraction_id":"3f3d47f3-8baf-58eb-8bc2-0171fb1c6e09","fragment_id":"c68dc72e-fc23-5452-8f49-d7bd46088a96","metadata":{"associated_query":"What is the capital of France?","title":"example_document.pdf"},"score":0.23943702876567796,"text":"Example text from the document","user_id":"2acb499e-8428-543b-bd85-0d9098718220"},"VectorSearchSettings":{"properties":{"use_vector_search":{"type":"boolean","title":"Use Vector Search","description":"Whether to use vector search","default":true},"use_hybrid_search":{"type":"boolean","title":"Use Hybrid Search","description":"Whether to perform a hybrid search (combining vector and keyword search)","default":false},"filters":{"type":"object","title":"Filters","description":"Filters to apply to the vector search"},"search_limit":{"type":"integer","maximum":1000.0,"minimum":1.0,"title":"Search Limit","description":"Maximum number of results to return","default":10},"selected_collection_ids":{"items":{"type":"string","format":"uuid"},"type":"array","title":"Selected Collection Ids","description":"Collection IDs to search for"},"index_measure":{"$ref":"#/components/schemas/IndexMeasure","description":"The distance measure to use for indexing","default":"cosine_distance"},"include_values":{"type":"boolean","title":"Include Values","description":"Whether to include search score values in the search results","default":true},"include_metadatas":{"type":"boolean","title":"Include Metadatas","description":"Whether to include element metadata in the search results","default":true},"probes":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Probes","description":"Number of ivfflat index lists to query. Higher increases accuracy but decreases speed.","default":10},"ef_search":{"anyOf":[{"type":"integer"},{"type":"null"}],"title":"Ef Search","description":"Size of the dynamic candidate list for HNSW index search. Higher increases accuracy but decreases speed.","default":40},"hybrid_search_settings":{"anyOf":[{"$ref":"#/components/schemas/HybridSearchSettings"},{"type":"null"}],"description":"Settings for hybrid search","default":{"full_text_weight":1.0,"semantic_weight":5.0,"full_text_limit":200,"rrf_k":50}},"search_strategy":{"anyOf":[{"type":"string"},{"type":"null"}],"title":"Search Strategy","description":"Search strategy to use (e.g., 'default', 'query_fusion', 'hyde')","default":"vanilla"}},"type":"object","title":"VectorSearchSettings","ef_search":40,"filters":{"category":"technology"},"hybrid_search_settings":{"full_text_limit":200,"full_text_weight":1.0,"rrf_k":50,"semantic_weight":5.0},"include_metadata":true,"index_measure":"cosine_distance","probes":10,"search_limit":20,"selected_collection_ids":["2acb499e-8428-543b-bd85-0d9098718220","3e157b3a-8469-51db-90d9-52e7d896b49b"],"use_hybrid_search":true,"use_vector_search":true}},"securitySchemes":{"HTTPBearer":{"type":"http","scheme":"bearer"},"OAuth2PasswordBearer":{"type":"oauth2","flows":{"password":{"scopes":{},"tokenUrl":"token"}}}}}} diff --git a/docs/cookbooks/graphrag.mdx b/docs/cookbooks/graphrag.mdx index e85ec97e0..e89a74b56 100644 --- a/docs/cookbooks/graphrag.mdx +++ b/docs/cookbooks/graphrag.mdx @@ -166,16 +166,44 @@ Knowledge graph creation is done in two steps: 1. `create-graph`: Extracts nodes and relationships from your input document collection. 2. `enrich-graph`: Enhances the graph structure through clustering and explaining entities (commonly referred to as `GraphRAG`). + +```bash +# collection ID is optional. If you don't specify one, the default collection will be used. +r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 +``` + + +This will run a cost estimation step to give you an estimate of the cost of the graph creation process. + +```bash Example Response +Time taken: 0.21 seconds +{ + "results": { + "message": "These are estimated ranges, actual values may vary. To run the KG creation process, run `create-graph` with `--run` in the cli, or `run_mode=\"run\"` in the client.", + "document_count": 2, + "number_of_jobs_created": 3, + "total_chunks": 29, + "estimated_entities": "290 - 580", + "estimated_triples": "362 - 870", + "estimated_llm_calls": "348 - 638", + "estimated_total_in_out_tokens_in_millions": "0 - 1", + "estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.0 - 0.17", + "estimated_cost_in_usd": "0.0 - 0.06" + } +} +``` + +Then, you can run the graph creation process with: + ```bash -# document-ids are optional -r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1 +r2r create-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run ``` ```bash Example Response [{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}] ``` -This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j: +This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j (deprecated as of now. We are working on a new visualization tool): ``` MATCH (a) @@ -226,13 +254,36 @@ Now we have a graph, but this graph is not searchable yet. We need to perform th The graph enrichment step adds node and relationship descriptions, performs hierarchical leiden clustering to create communities, and embeds the descriptions. These embeddings will be used later in the local search stage of the pipeline. If you are more interested in the algorithm, please refer to the blog post [here](https://www.sciphi.ai/blog/graphrag). ```bash -r2r enrich-graph +# collection ID is optional. If you don't specify one, the default collection will be used. +r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 ``` -Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community. +Similar to the graph creation step, this will run a cost estimation step to give you an estimate of the cost of the graph enrichment process. -![Enriched Graph](../images/enriched.png) +```bash Example Response +Time taken: 0.22 seconds +{ + "results": { + "total_entities": 269, + "total_triples": 345, + "estimated_llm_calls": "26 - 53", + "estimated_total_in_out_tokens_in_millions": "0.05 - 0.11", + "estimated_total_time_in_minutes": "Depends on your API key tier. Accurate estimate coming soon. Rough estimate: 0.01 - 0.02", + "estimated_cost_in_usd": "0.0 - 0.01" + } +} +``` + +Now, you can run the graph enrichment process with: +```bash +r2r enrich-graph --collection-id=122fdf6a-e116-546b-a8f6-e4cb2e2c0a09 --run +``` + + +Now you can see that the graph is enriched with the following information. We have added descriptions and embeddings to the nodes and relationships. Also, each node is mapped to a community. Following is a visualization of the enriched graph (deprecated as of now. We are working on a new visualization tool): + +![Enriched Graph](../images/enriched.png) ## Search diff --git a/docs/documentation/configuration/postgres.mdx b/docs/documentation/configuration/postgres.mdx index 86e2f6c43..f89111a92 100644 --- a/docs/documentation/configuration/postgres.mdx +++ b/docs/documentation/configuration/postgres.mdx @@ -46,7 +46,7 @@ export POSTGRES_PASSWORD=your_postgres_password export POSTGRES_HOST=your_postgres_host export POSTGRES_PORT=your_postgres_port export POSTGRES_DBNAME=your_database_name -export POSTGRES_PROJECT_NAME=your_project_name +export R2R_PROJECT_NAME=your_project_name ``` ## Advanced Postgres Features in R2R diff --git a/docs/documentation/deep-dive/providers/database.mdx b/docs/documentation/deep-dive/providers/database.mdx index 2e09b1587..11f5df28b 100644 --- a/docs/documentation/deep-dive/providers/database.mdx +++ b/docs/documentation/deep-dive/providers/database.mdx @@ -40,7 +40,7 @@ export POSTGRES_PASSWORD=your_postgres_password export POSTGRES_HOST=your_postgres_host export POSTGRES_PORT=your_postgres_port export POSTGRES_DBNAME=your_database_name -export POSTGRES_PROJECT_NAME=your_project_name +export R2R_PROJECT_NAME=your_project_name ``` Environment variables take precedence over the config settings in case of conflicts. The R2R Docker includes configuration options that facilitate integration with a combined Postgres+pgvector database setup. diff --git a/docs/documentation/deployment/troubleshooting/vector_store_issues.mdx b/docs/documentation/deployment/troubleshooting/vector_store_issues.mdx index 0d9338a69..877e9913b 100644 --- a/docs/documentation/deployment/troubleshooting/vector_store_issues.mdx +++ b/docs/documentation/deployment/troubleshooting/vector_store_issues.mdx @@ -19,7 +19,7 @@ Vector storage is a crucial component in R2R (RAG to Riches) for efficient simil - `POSTGRES_HOST` - `POSTGRES_PORT` - `POSTGRES_DBNAME` - - `POSTGRES_PROJECT_NAME` + - `R2R_PROJECT_NAME` 3. **Check Docker Network:** If using Docker, ensure the R2R and Postgres containers are on the same network: diff --git a/docs/documentation/installation/full/local-system.mdx b/docs/documentation/installation/full/local-system.mdx index 120ff0253..bc844ac53 100644 --- a/docs/documentation/installation/full/local-system.mdx +++ b/docs/documentation/installation/full/local-system.mdx @@ -68,10 +68,10 @@ R2R requires connections to various services. Set up the following environment v export POSTGRES_HOST=$YOUR_POSTGRES_HOST export POSTGRES_PORT=$YOUR_POSTGRES_PORT export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME - export POSTGRES_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below + export R2R_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below ``` - The `POSTGRES_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization. + The `R2R_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization. diff --git a/docs/documentation/installation/light/local-system.mdx b/docs/documentation/installation/light/local-system.mdx index b7460c7d7..cb389feb0 100644 --- a/docs/documentation/installation/light/local-system.mdx +++ b/docs/documentation/installation/light/local-system.mdx @@ -46,10 +46,10 @@ R2R requires connections to various services. Set up the following environment v export POSTGRES_HOST=$YOUR_POSTGRES_HOST export POSTGRES_PORT=$YOUR_POSTGRES_PORT export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME - export POSTGRES_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below + export R2R_PROJECT_NAME=$YOUR_PROJECT_NAME # see note below ``` - The `POSTGRES_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the specified tables do not exist then they will be created by R2R during initialization. + The `R2R_PROJECT_NAME` environment variable defines the tables within your Postgres database where the selected R2R project resides. If the required tables for R2R do not exist then they will be created by R2R during initialization. If you are unfamiliar with Postgres then Supabase's free cloud offering is a good place to start. diff --git a/docs/introduction/whats-new.mdx b/docs/introduction/whats-new.mdx index 4e355a902..0a8f775cb 100644 --- a/docs/introduction/whats-new.mdx +++ b/docs/introduction/whats-new.mdx @@ -4,12 +4,12 @@ description: 'Changelog' icon: 'bell' --- -## Version 0.3.10 — Sep. 6, 2024 +## Version 0.3.20 — Sep. 6, 2024 ### New Features -- Orchestration with [Hatchet](https://github.com/hatchet-dev/hatchet) -- Default ingestion provider set to [Unstructured](https://docs.unstructured.io/welcome) -- Improved knowledge graph construction process +- [R2R Light](https://r2r-docs.sciphi.ai/documentation/installation/light/local-system) installation added +- Removed Neo4j and implemented GraphRAG inside of Postgres +- Improved efficiency and configurability of knowledge graph construction process ### Bug Fixes - Minor bug fixes around config logic and other. diff --git a/js/sdk/__tests__/r2rClientIntegrationUser.test.ts b/js/sdk/__tests__/r2rClientIntegrationUser.test.ts index 558bad2f0..32ec27cae 100644 --- a/js/sdk/__tests__/r2rClientIntegrationUser.test.ts +++ b/js/sdk/__tests__/r2rClientIntegrationUser.test.ts @@ -88,24 +88,29 @@ describe("r2rClient Integration Tests", () => { ).resolves.not.toThrow(); }); - // test("User", async () => { - // const asdf = await client.user(); - // console.log(asdf); + test("User", async () => { + const asdf = await client.user(); - // await expect(client.user()).resolves.not.toThrow(); + await expect(client.user()).resolves.not.toThrow(); + }); - // }); + test("Update user profile", async () => { + const userId = "2bf8fd84-91ec-5048-9eb8-cf2ee9d66b64"; + const email = "newemail@example.com"; + const name = "New Name"; + const bio = "Updated bio"; + const profilePicture = "http://example.com/new-profile-pic.jpg"; - // test("Update user profile", async () => { - // const email = "newemail@example.com"; - // const name = "New Name"; - // const bio = "Updated bio"; - // const profilePicture = "http://example.com/new-profile-pic.jpg"; + await expect( + client.updateUser(userId, email, undefined, name, bio, profilePicture), + ).resolves.not.toThrow(); + }); - // await expect( - // client.updateUser(email, name, bio, profilePicture) - // ).resolves.not.toThrow(); - // }); + test("Login", async () => { + await expect( + client.login("newemail@example.com", "password"), + ).resolves.not.toThrow(); + }); test("Ingest file", async () => { const files = [ @@ -189,8 +194,7 @@ describe("r2rClient Integration Tests", () => { test("Login after logout", async () => { await expect( - client.login("test@gmail.com", "password"), - // client.login("newemail@example.com", "password"), + client.login("newemail@example.com", "password"), ).resolves.not.toThrow(); }); diff --git a/js/sdk/package-lock.json b/js/sdk/package-lock.json index 0d2d1ec9a..eef878747 100644 --- a/js/sdk/package-lock.json +++ b/js/sdk/package-lock.json @@ -1,6 +1,6 @@ { "name": "r2r-js", - "version": "0.3.5", + "version": "0.3.6", "lockfileVersion": 3, "requires": true, "packages": { diff --git a/js/sdk/src/r2rClient.ts b/js/sdk/src/r2rClient.ts index 8bce88dfa..c904cf1d1 100644 --- a/js/sdk/src/r2rClient.ts +++ b/js/sdk/src/r2rClient.ts @@ -335,20 +335,33 @@ export class r2rClient { */ @feature("updateUser") async updateUser( + userId: string, email?: string, + isSuperuser?: boolean, name?: string, bio?: string, profilePicture?: string, ): Promise { this._ensureAuthenticated(); - return await this._makeRequest("PUT", "user", { - data: { - email, - name, - bio, - profile_picture: profilePicture, - }, - }); + + let data: Record = { user_id: userId }; + if (email !== undefined) { + data.email = email; + } + if (isSuperuser !== undefined) { + data.is_superuser = isSuperuser; + } + if (name !== undefined) { + data.name = name; + } + if (bio !== undefined) { + data.bio = bio; + } + if (profilePicture !== undefined) { + data.profile_picture = profilePicture; + } + + return await this._makeRequest("PUT", "user", { data }); } /** diff --git a/py/README.md b/py/README.md index ecd653965..86cc99cfd 100644 --- a/py/README.md +++ b/py/README.md @@ -57,7 +57,7 @@ Alternatively, you may run R2R directly from the python package, but additional ```bash # export OPENAI_API_KEY=sk-... # export POSTGRES... -pip install 'r2r[core]' +pip install 'r2r[core,ingestion-bundle]' r2r --config-name=default serve ``` diff --git a/py/cli/commands/ingestion.py b/py/cli/commands/ingestion.py index d58362f0d..043e4dff6 100644 --- a/py/cli/commands/ingestion.py +++ b/py/cli/commands/ingestion.py @@ -123,10 +123,13 @@ def update_files(ctx, file_paths, document_ids, metadatas): @cli.command() +@click.option("--v2", is_flag=True, help="use aristotle_v2.txt (a smaller file)") @pass_context -def ingest_sample_file(ctx): +def ingest_sample_file(ctx, v2=False): """Ingest the first sample file into R2R.""" - sample_file_url = "https://raw.githubusercontent.com/SciPhi-AI/R2R/main/py/core/examples/data/aristotle.txt" + sample_file_url = ( + f"https://raw.githubusercontent.com/SciPhi-AI/R2R/main/py/core/examples/data/aristotle{'_v2' if v2 else ''}.txt" + ) client = ctx.obj with timer(): diff --git a/py/cli/commands/kg.py b/py/cli/commands/kg.py index 0e2a4c5d3..26e613129 100644 --- a/py/cli/commands/kg.py +++ b/py/cli/commands/kg.py @@ -10,16 +10,29 @@ @cli.command() @click.option( "--collection-id", - required=True, + required=False, + default="", help="Collection ID to create graph for.", ) +@click.option( + "--run", + is_flag=True, + help="Run the graph creation process.", +) @click.option( "--kg-creation-settings", required=False, help="Settings for the graph creation process.", ) +@click.option( + "--force-kg-creation", + is_flag=True, + help="Force the graph creation process.", +) @pass_context -def create_graph(ctx, collection_id, kg_creation_settings): +def create_graph( + ctx, collection_id, run, kg_creation_settings, force_kg_creation +): """ Create a new graph. """ @@ -34,8 +47,18 @@ def create_graph(ctx, collection_id, kg_creation_settings): ) return + if not run: + run_type = "estimate" + else: + run_type = "run" + + if force_kg_creation: + kg_creation_settings = {"force_kg_creation": True} + with timer(): - response = client.create_graph(collection_id, kg_creation_settings) + response = client.create_graph( + collection_id, run_type, kg_creation_settings + ) click.echo(json.dumps(response, indent=2)) @@ -43,16 +66,29 @@ def create_graph(ctx, collection_id, kg_creation_settings): @cli.command() @click.option( "--collection-id", - required=True, + required=False, + default="", help="Collection ID to enrich graph for.", ) +@click.option( + "--run", + is_flag=True, + help="Run the graph enrichment process.", +) +@click.option( + "--force-kg-enrichment", + is_flag=True, + help="Force the graph enrichment process.", +) @click.option( "--kg-enrichment-settings", required=False, help="Settings for the graph enrichment process.", ) @pass_context -def enrich_graph(ctx, collection_id, kg_enrichment_settings): +def enrich_graph( + ctx, collection_id, run, force_kg_enrichment, kg_enrichment_settings +): """ Enrich an existing graph. """ @@ -67,7 +103,107 @@ def enrich_graph(ctx, collection_id, kg_enrichment_settings): ) return + if not run: + run_type = "estimate" + else: + run_type = "run" + + if force_kg_enrichment: + kg_enrichment_settings = {"force_kg_enrichment": True} + + with timer(): + response = client.enrich_graph( + collection_id, run_type, kg_enrichment_settings + ) + + click.echo(json.dumps(response, indent=2)) + + +@cli.command() +@click.option( + "--collection-id", + required=True, + help="Collection ID to retrieve entities from.", +) +@click.option( + "--offset", + type=int, + default=0, + help="Offset for pagination.", +) +@click.option( + "--limit", + type=int, + default=100, + help="Limit for pagination.", +) +@click.option( + "--entity-ids", + multiple=True, + help="Entity IDs to filter by.", +) +@click.option( + "--with-description", + is_flag=True, + help="Include entity descriptions in the response.", +) +@pass_context +def get_entities( + ctx, collection_id, offset, limit, entity_ids, with_description +): + """ + Retrieve entities from the knowledge graph. + """ + client = ctx.obj + + with timer(): + response = client.get_entities( + collection_id, + offset, + limit, + list(entity_ids), + with_description, + ) + + click.echo(json.dumps(response, indent=2)) + + +@cli.command() +@click.option( + "--collection-id", + required=True, + help="Collection ID to retrieve triples from.", +) +@click.option( + "--offset", + type=int, + default=0, + help="Offset for pagination.", +) +@click.option( + "--limit", + type=int, + default=100, + help="Limit for pagination.", +) +@click.option( + "--triple-ids", + multiple=True, + help="Triple IDs to filter by.", +) +@pass_context +def get_triples(ctx, collection_id, offset, limit, triple_ids): + """ + Retrieve triples from the knowledge graph. + """ + client = ctx.obj + with timer(): - response = client.enrich_graph(collection_id, kg_enrichment_settings) + response = client.get_triples( + collection_id, + offset, + limit, + list(triple_ids), + ) click.echo(json.dumps(response, indent=2)) diff --git a/py/cli/commands/server.py b/py/cli/commands/server.py index 720361c83..3d6a31376 100644 --- a/py/cli/commands/server.py +++ b/py/cli/commands/server.py @@ -244,12 +244,12 @@ async def serve( if full: click.echo( - "Running the full R2R compose which includes `Hatchet` and `Unstructured.io`." + "Running the full R2R setup which includes `Hatchet` and `Unstructured.io`." ) if project_name == "r2r": # overwrite project name if full compose project_name = "r2r-full" else: - click.echo("Running the lightweight R2R compose.") + click.echo("Running the lightweight R2R setup.") if config_path and config_name: raise click.UsageError( diff --git a/py/cli/main.py b/py/cli/main.py index 0c4756d9f..271093ffa 100644 --- a/py/cli/main.py +++ b/py/cli/main.py @@ -72,4 +72,4 @@ def main(): if __name__ == "__main__": - main() \ No newline at end of file + main() diff --git a/py/cli/utils/docker_utils.py b/py/cli/utils/docker_utils.py index dcbe91ae1..0d15e37ba 100644 --- a/py/cli/utils/docker_utils.py +++ b/py/cli/utils/docker_utils.py @@ -241,7 +241,7 @@ def check_external_ollama(ollama_url="http://localhost:11434/api/version"): def check_set_docker_env_vars(): env_vars = { - "POSTGRES_PROJECT_NAME": "r2r", + "R2R_PROJECT_NAME": "r2r", "POSTGRES_HOST": "postgres", "POSTGRES_PORT": "5432", "POSTGRES_DBNAME": "postgres", diff --git a/py/cli/utils/telemetry.py b/py/cli/utils/telemetry.py index cda8bfe59..9c19a8507 100644 --- a/py/cli/utils/telemetry.py +++ b/py/cli/utils/telemetry.py @@ -149,4 +149,4 @@ def tracked_callback(*args, **kwargs): raise command.callback = tracked_callback - return command \ No newline at end of file + return command diff --git a/py/compose.full.yaml b/py/compose.full.yaml index 2290372b5..b2c0b0902 100644 --- a/py/compose.full.yaml +++ b/py/compose.full.yaml @@ -47,6 +47,17 @@ services: timeout: 5s retries: 5 restart: on-failure + command: > + bash -c " + if [ -z \"\$(ls -A /var/lib/postgresql/data)\" ]; then + echo \"Data directory is empty, initializing database...\" + echo \"max_connections = ${POSTGRES_MAX_CONNECTIONS:-1024}\" >> /var/lib/postgresql/data/postgresql.conf + docker-entrypoint.sh postgres + else + echo \"Data directory is not empty, skipping initialization...\" + docker-entrypoint.sh postgres + fi + " hatchet-rabbitmq: image: "rabbitmq:3-management" @@ -186,7 +197,7 @@ services: aliases: - host.docker.internal healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:7077/health"] + test: ["CMD", "curl", "-f", "http://localhost:8733/live"] interval: 10s timeout: 5s retries: 5 @@ -333,7 +344,7 @@ services: - POSTGRES_HOST=${POSTGRES_HOST:-postgres} - POSTGRES_PORT=${POSTGRES_PORT:-5432} - POSTGRES_DBNAME=${POSTGRES_DBNAME:-postgres} - - POSTGRES_PROJECT_NAME=${POSTGRES_PROJECT_NAME:-r2r_default} + - R2R_PROJECT_NAME=${R2R_PROJECT_NAME:-r2r_default} - OPENAI_API_KEY=${OPENAI_API_KEY:-} - OPENAI_API_BASE=${OPENAI_API_BASE:-} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-} diff --git a/py/compose.yaml b/py/compose.yaml index 4b20c2d72..5a355aca8 100644 --- a/py/compose.yaml +++ b/py/compose.yaml @@ -34,6 +34,17 @@ services: timeout: 5s retries: 5 restart: on-failure + command: > + bash -c " + if [ -z \"\$(ls -A /var/lib/postgresql/data)\" ]; then + echo \"Data directory is empty, initializing database...\" + echo \"max_connections = ${POSTGRES_MAX_CONNECTIONS:-1024}\" >> /var/lib/postgresql/data/postgresql.conf + docker-entrypoint.sh postgres + else + echo \"Data directory is not empty, skipping initialization...\" + docker-entrypoint.sh postgres + fi + " r2r: image: ${R2R_IMAGE:-ragtoriches/prod:latest-unstructured} @@ -58,8 +69,9 @@ services: - POSTGRES_HOST=${POSTGRES_HOST:-postgres} - POSTGRES_PORT=${POSTGRES_PORT:-5432} - POSTGRES_DBNAME=${POSTGRES_DBNAME:-postgres} - - POSTGRES_PROJECT_NAME=${POSTGRES_PROJECT_NAME:-${CONFIG_NAME:-vecs}} - POSTGRES_MAX_CONNECTIONS=${POSTGRES_MAX_CONNECTIONS:-} + - POSTGRES_PROJECT_NAME=${POSTGRES_PROJECT_NAME:-r2r_default} + - R2R_PROJECT_NAME=${R2R_PROJECT_NAME:-r2r_default} # OpenAI - OPENAI_API_KEY=${OPENAI_API_KEY:-} @@ -156,4 +168,4 @@ services: - "traefik.http.routers.hatchet-api.middlewares=no-cache-headers" volumes: - postgres_data: \ No newline at end of file + postgres_data: diff --git a/py/core/__init__.py b/py/core/__init__.py index be0ec7f7d..ae875d33a 100644 --- a/py/core/__init__.py +++ b/py/core/__init__.py @@ -54,7 +54,7 @@ "DocumentExtraction", "DocumentInfo", "IngestionStatus", - "KGCreationStatus", + "KGExtractionStatus", "KGEnrichmentStatus", "DocumentType", # Embedding abstractions @@ -161,7 +161,6 @@ "run_pipeline", "to_async_generator", "generate_run_id", - "generate_id_from_label", "increment_version", "EntityType", "RelationshipType", diff --git a/py/core/base/__init__.py b/py/core/base/__init__.py index 4a42d0307..37d8ef310 100644 --- a/py/core/base/__init__.py +++ b/py/core/base/__init__.py @@ -22,7 +22,7 @@ "DocumentExtraction", "DocumentInfo", "IngestionStatus", - "KGCreationStatus", + "KGExtractionStatus", "KGEnrichmentStatus", "DocumentType", # Embedding abstractions @@ -51,6 +51,9 @@ # KG abstractions "KGCreationSettings", "KGEnrichmentSettings", + "KGRunType", + "KGCreationEstimationResponse", + "KGEnrichmentEstimationResponse", # User abstractions "Token", "TokenData", @@ -142,9 +145,14 @@ "to_async_generator", "format_search_results_for_llm", "format_search_results_for_stream", + # ID generation "generate_run_id", - "generate_id_from_label", + "generate_document_id", + "generate_extraction_id", "generate_default_user_collection_id", + "generate_collection_id_from_name", + "generate_user_id", + "generate_message_id", "increment_version", "EntityType", "RelationshipType", diff --git a/py/core/base/abstractions/__init__.py b/py/core/base/abstractions/__init__.py index 74d2e2252..19d99f647 100644 --- a/py/core/base/abstractions/__init__.py +++ b/py/core/base/abstractions/__init__.py @@ -7,7 +7,7 @@ DocumentInfo, DocumentType, IngestionStatus, - KGCreationStatus, + KGExtractionStatus, KGEnrichmentStatus, ) from shared.abstractions.embedding import ( @@ -27,7 +27,13 @@ RelationshipType, Triple, ) -from shared.abstractions.kg import KGCreationSettings, KGEnrichmentSettings +from shared.abstractions.kg import ( + KGCreationEstimationResponse, + KGCreationSettings, + KGEnrichmentEstimationResponse, + KGEnrichmentSettings, + KGRunType, +) from shared.abstractions.llm import ( GenerationConfig, LLMChatCompletion, @@ -72,7 +78,7 @@ "DocumentExtraction", "DocumentInfo", "IngestionStatus", - "KGCreationStatus", + "KGExtractionStatus", "KGEnrichmentStatus", "DocumentType", # Embedding abstractions @@ -113,6 +119,9 @@ # Restructure abstractions "KGCreationSettings", "KGEnrichmentSettings", + "KGRunType", + "KGCreationEstimationResponse", + "KGEnrichmentEstimationResponse", # User abstractions "Token", "TokenData", diff --git a/py/core/base/logging/run_logger.py b/py/core/base/logging/run_logger.py index f768c516f..6f562908c 100644 --- a/py/core/base/logging/run_logger.py +++ b/py/core/base/logging/run_logger.py @@ -89,9 +89,9 @@ def __init__(self, config: LoggingConfig): self.log_table = config.log_table self.log_info_table = config.log_info_table # TODO - Should we re-consider this naming convention? - # e.g. it is confusing to have `POSTGRES_PROJECT_NAME` refer + # e.g. it is confusing to have `R2R_PROJECT_NAME` refer # to a global project name that is used in non-Postgres contexts - self.project_name = os.getenv("POSTGRES_PROJECT_NAME", "default") + self.project_name = os.getenv("R2R_PROJECT_NAME", "default") self.logging_path = config.logging_path or os.getenv( "LOCAL_DB_PATH", "local.sqlite" ) @@ -344,7 +344,7 @@ def __init__(self, config: PostgresLoggingConfig): self.log_table = config.log_table self.log_info_table = config.log_info_table self.config = config - self.project_name = os.getenv("POSTGRES_PROJECT_NAME", "default") + self.project_name = os.getenv("R2R_PROJECT_NAME", "default") self.pool = None if not os.getenv("POSTGRES_DBNAME"): raise ValueError( diff --git a/py/core/base/providers/auth.py b/py/core/base/providers/auth.py index df43ac0b3..0ac1a02c0 100644 --- a/py/core/base/providers/auth.py +++ b/py/core/base/providers/auth.py @@ -5,7 +5,7 @@ from fastapi import Security from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer -from ...utils import generate_id_from_label +from ...utils import generate_user_id from ..abstractions import R2RException, Token, TokenData from ..api.models import UserResponse from .base import Provider, ProviderConfig @@ -48,7 +48,7 @@ def __init__(self, config: AuthConfig, crypto_provider: CryptoProvider): def _get_default_admin_user(self) -> UserResponse: return UserResponse( - id=generate_id_from_label(self.admin_email), + id=generate_user_id(self.admin_email), email=self.admin_email, hashed_password=self.crypto_provider.get_password_hash( self.admin_password diff --git a/py/core/base/providers/database.py b/py/core/base/providers/database.py index d9b300e0c..85c471e5e 100644 --- a/py/core/base/providers/database.py +++ b/py/core/base/providers/database.py @@ -45,7 +45,6 @@ class DatabaseConfig(ProviderConfig): host: Optional[str] = None port: Optional[int] = None db_name: Optional[str] = None - vecs_collection: Optional[str] = None project_name: Optional[str] = None postgres_configuration_settings: Optional[ PostgresConfigurationSettings diff --git a/py/core/base/providers/kg.py b/py/core/base/providers/kg.py index 92ddcfa3f..6e3e366d2 100644 --- a/py/core/base/providers/kg.py +++ b/py/core/base/providers/kg.py @@ -6,6 +6,7 @@ from uuid import UUID from ..abstractions import ( + CommunityReport, Entity, KGCreationSettings, KGEnrichmentSettings, @@ -67,14 +68,16 @@ async def add_entities( pass @abstractmethod - async def add_triples(self, triples: list[Triple]) -> None: + async def add_triples( + self, triples: list[Triple], table_name: str + ) -> None: """Abstract method to add triples.""" pass @abstractmethod async def add_kg_extractions( - self, kg_extractions: list[KGExtraction] - ) -> None: + self, kg_extractions: list[KGExtraction], table_suffix: str = "_raw" + ) -> Tuple[int, int]: """Abstract method to add KG extractions.""" pass @@ -158,7 +161,7 @@ async def perform_graph_clustering( @abstractmethod async def get_entity_map( - self, offset: int, limit: int, document_id: str + self, offset: int, limit: int, document_id: UUID ) -> dict[str, Any]: """Abstract method to get the entity map.""" pass @@ -168,6 +171,47 @@ async def get_community_details(self, community_number: int): """Abstract method to get community details.""" pass + @abstractmethod + async def get_entity_count(self, document_id: UUID) -> int: + """Abstract method to get the entity count.""" + pass + + @abstractmethod + async def delete_graph_for_collection(self, collection_id: UUID) -> None: + """Abstract method to delete the graph for a collection.""" + pass + + @abstractmethod + async def get_creation_estimate(self, *args: Any, **kwargs: Any) -> Any: + """Abstract method to get the creation estimate.""" + pass + + @abstractmethod + async def get_enrichment_estimate(self, *args: Any, **kwargs: Any) -> Any: + """Abstract method to get the enrichment estimate.""" + pass + + @abstractmethod + async def add_community_report( + self, community_report: CommunityReport + ) -> None: + """Abstract method to add a community report.""" + pass + + @abstractmethod + async def get_community_reports( + self, collection_id: UUID + ) -> list[CommunityReport]: + """Abstract method to get community reports.""" + pass + + @abstractmethod + async def check_community_reports_exist( + self, collection_id: UUID, offset: int, limit: int + ) -> list[int]: + """Abstract method to check if community reports exist.""" + pass + def escape_braces(s: str) -> str: """ diff --git a/py/core/base/utils/__init__.py b/py/core/base/utils/__init__.py index 3a488f6e4..17adb7233 100644 --- a/py/core/base/utils/__init__.py +++ b/py/core/base/utils/__init__.py @@ -6,10 +6,14 @@ format_relations, format_search_results_for_llm, format_search_results_for_stream, + generate_collection_id_from_name, + generate_default_prompt_id, generate_default_user_collection_id, - generate_id_from_label, + generate_document_id, + generate_extraction_id, + generate_message_id, generate_run_id, - generate_user_document_id, + generate_user_id, increment_version, run_pipeline, to_async_generator, @@ -18,17 +22,20 @@ __all__ = [ "format_entity_types", "format_relations", - "generate_id_from_label", "format_search_results_for_stream", "format_search_results_for_llm", "generate_run_id", - "generate_user_document_id", "generate_default_user_collection_id", "increment_version", "decrement_version", "run_pipeline", "to_async_generator", - # Text splitter + "generate_document_id", + "generate_extraction_id", + "generate_user_id", + "generate_collection_id_from_name", + "generate_message_id", + "generate_default_prompt_id", "RecursiveCharacterTextSplitter", "TextSplitter", ] diff --git a/py/core/examples/scripts/upload_hf_textbooks_ex.py b/py/core/examples/scripts/upload_hf_textbooks_ex.py index 272d963a3..94b755413 100644 --- a/py/core/examples/scripts/upload_hf_textbooks_ex.py +++ b/py/core/examples/scripts/upload_hf_textbooks_ex.py @@ -9,7 +9,7 @@ from r2r import R2RClient -def generate_id_from_label(label: str) -> uuid.UUID: +def generate_id(label: str) -> uuid.UUID: return uuid.uuid5(uuid.NAMESPACE_DNS, label) @@ -42,7 +42,7 @@ async def process_dataset(client, dataset, batch_size, executor, semaphore): for example in dataset: count += 1 - fname = f"example_{generate_id_from_label(example['completion'])}.txt" + fname = f"example_{generate_id(example['completion'])}.txt" print(f"Streaming {fname} w/ completion {count} ...") # Asynchronously write to file diff --git a/py/core/main/api/auth_router.py b/py/core/main/api/auth_router.py index f906a1830..81bcd4c4e 100644 --- a/py/core/main/api/auth_router.py +++ b/py/core/main/api/auth_router.py @@ -1,10 +1,11 @@ -import uuid +from uuid import UUID from typing import Optional from fastapi import Body, Depends, Path from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm from pydantic import EmailStr +from core.base import R2RException from core.base.api.models import ( GenericMessageResponse, WrappedGenericMessageResponse, @@ -108,9 +109,13 @@ async def get_user_app( @self.router.put("/user", response_model=WrappedUserResponse) @self.base_endpoint async def put_user_app( + user_id: str = Body(None, description="ID of the user to update"), email: EmailStr | None = Body( None, description="Updated email address" ), + is_superuser: bool | None = Body( + None, description="Updated superuser status" + ), name: str | None = Body(None, description="Updated user name"), bio: str | None = Body(None, description="Updated user bio"), profile_picture: str | None = Body( @@ -123,9 +128,23 @@ async def put_user_app( This endpoint allows the authenticated user to update their profile information. """ + if is_superuser is not None and not auth_user.is_superuser: + raise R2RException( + "Only superusers can update the superuser status of a user", + 403, + ) + + try: + user_uuid = UUID(user_id) + except ValueError: + raise R2RException( + status_code=400, message="Invalid user ID format." + ) + return await self.service.update_user( - user_id=auth_user.id, + user_id=user_uuid, email=email, + is_superuser=is_superuser, name=name, bio=bio, profile_picture=profile_picture, @@ -226,7 +245,7 @@ async def delete_user_app( raise Exception("User ID does not match authenticated user") if not auth_user.is_superuser and not password: raise Exception("Password is required for non-superusers") - user_uuid = uuid.UUID(user_id) + user_uuid = UUID(user_id) result = await self.service.delete_user( user_uuid, password, delete_vector_data ) diff --git a/py/core/main/api/data/ingestion_router_openapi.yml b/py/core/main/api/data/ingestion_router_openapi.yml index 08a419ff1..48c836b18 100644 --- a/py/core/main/api/data/ingestion_router_openapi.yml +++ b/py/core/main/api/data/ingestion_router_openapi.yml @@ -28,7 +28,7 @@ ingest_files: input_descriptions: files: "The list of paths of input files to ingest into the system." metadatas: "An optional list of JSON metadata to affix to each file" - document_ids: "An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_user_document_id` method." + document_ids: "An optional list of document ids for each file. If not provided, the system will generate a unique document id via the `generate_document_id` method." versions: "An optional list of versions for each file. If not provided, the system will default to the tag `v1`." ingestion_config: "An optional dictionary to override the default chunking configuration for the ingestion process. If not provided, the system will use the default server-side chunking configuration." @@ -56,6 +56,6 @@ update_files: input_descriptions: files: "The list of paths of input files to update in the system." - document_ids: "An optional list of document ids for each file. If not provided, the system will attempt to generate the corresponding unique from the `generate_user_document_id` method." + document_ids: "An optional list of document ids for each file. If not provided, the system will attempt to generate the corresponding unique from the `generate_document_id` method." metadatas: "An optional list of JSON metadata to affix to each file" ingestion_config: "JSON string for chunking configuration override" diff --git a/py/core/main/api/ingestion_router.py b/py/core/main/api/ingestion_router.py index bad85f518..45297df39 100644 --- a/py/core/main/api/ingestion_router.py +++ b/py/core/main/api/ingestion_router.py @@ -9,7 +9,7 @@ from fastapi import Depends, File, Form, UploadFile from pydantic import Json -from core.base import R2RException, generate_user_document_id +from core.base import R2RException, generate_document_id from core.base.api.models import ( WrappedIngestionResponse, WrappedUpdateResponse, @@ -37,7 +37,7 @@ def _register_workflows(self): Workflow.INGESTION, self.service, { - "ingest-file-changed": ( + "ingest-files": ( "Ingestion task queued successfully." if self.orchestration_provider.config.provider != "simple" else "Ingestion task completed successfully." @@ -122,7 +122,7 @@ async def ingest_files_app( document_id = ( document_ids[it] if document_ids - else generate_user_document_id( + else generate_document_id( file_data["filename"], auth_user.id ) ) @@ -145,7 +145,7 @@ async def ingest_files_app( file_data["content_type"], ) raw_message = await self.orchestration_provider.run_workflow( - "ingest-file-changed", + "ingest-files", {"request": workflow_input}, options={ "additional_metadata": { @@ -212,7 +212,7 @@ async def update_files_app( document_id = ( document_ids[it] if document_ids - else generate_user_document_id( + else generate_document_id( file_data["filename"], auth_user.id ) ) diff --git a/py/core/main/api/kg_router.py b/py/core/main/api/kg_router.py index bab4e5131..081ad3a90 100644 --- a/py/core/main/api/kg_router.py +++ b/py/core/main/api/kg_router.py @@ -1,19 +1,23 @@ import logging from pathlib import Path from typing import Optional +from uuid import UUID import yaml -from fastapi import Body, Depends +from fastapi import Body, Depends, Query from pydantic import Json +from core.base import RunType from core.base.api.models import ( WrappedKGCreationResponse, WrappedKGEnrichmentResponse, ) from core.base.providers import OrchestrationProvider, Workflow +from core.utils import generate_default_user_collection_id +from shared.abstractions.kg import KGRunType from ..services.kg_service import KgService -from .base_router import BaseRouter, RunType +from .base_router import BaseRouter logger = logging.getLogger(__name__) @@ -52,9 +56,14 @@ def _setup_routes(self): ) @self.base_endpoint async def create_graph( - collection_id: str = Body( + collection_id: Optional[UUID] = Body( + default=None, description="Collection ID to create graph for.", ), + run_type: Optional[KGRunType] = Body( + default=KGRunType.ESTIMATE, + description="Run type for the graph creation process.", + ), kg_creation_settings: Optional[Json[dict]] = Body( default=None, description="Settings for the graph creation process.", @@ -64,23 +73,34 @@ async def create_graph( ): """ Creating a graph on your documents. This endpoint takes input a list of document ids and KGCreationSettings. If document IDs are not provided, the graph will be created on all documents in the system. - This step extracts the relevant entities and relationships from the documents and creates a graph based on the extracted information. - In order to do GraphRAG, you will need to run the enrich_graph endpoint. """ if not auth_user.is_superuser: logger.warning("Implement permission checks here.") + if not collection_id: + collection_id = generate_default_user_collection_id( + auth_user.id + ) + + logger.info(f"Running on collection {collection_id}") + server_kg_creation_settings = ( self.service.providers.kg.config.kg_creation_settings ) + if kg_creation_settings: for key, value in kg_creation_settings.items(): if value is not None: setattr(server_kg_creation_settings, key, value) + if run_type is KGRunType.ESTIMATE: + return await self.service.get_creation_estimate( + collection_id, server_kg_creation_settings + ) + workflow_input = { "collection_id": collection_id, "kg_creation_settings": server_kg_creation_settings.model_dump_json(), @@ -96,8 +116,13 @@ async def create_graph( ) @self.base_endpoint async def enrich_graph( - collection_id: str = Body( - description="Collection name to enrich graph for.", + collection_id: Optional[UUID] = Body( + default=None, + description="Collection ID to enrich graph for.", + ), + run_type: Optional[KGRunType] = Body( + default=KGRunType.ESTIMATE, + description="Run type for the graph enrichment process.", ), kg_enrichment_settings: Optional[Json[dict]] = Body( default=None, @@ -117,6 +142,15 @@ async def enrich_graph( self.service.providers.kg.config.kg_enrichment_settings ) + if not collection_id: + collection_id = generate_default_user_collection_id( + auth_user.id + ) + + if run_type is KGRunType.ESTIMATE: + + return await self.service.get_enrichment_estimate(collection_id, server_kg_enrichment_settings) + if kg_enrichment_settings: for key, value in kg_enrichment_settings.items(): if value is not None: @@ -131,3 +165,64 @@ async def enrich_graph( return await self.orchestration_provider.run_workflow( "enrich-graph", {"request": workflow_input}, {} ) + + @self.router.get("/entities") + @self.base_endpoint + async def get_entities( + collection_id: UUID = Query( + ..., description="Collection ID to retrieve entities from." + ), + offset: int = Query(0, ge=0, description="Offset for pagination."), + limit: int = Query( + 100, ge=1, le=1000, description="Limit for pagination." + ), + entity_ids: Optional[list[str]] = Query( + None, description="Entity IDs to filter by." + ), + with_description: bool = Query( + False, + description="Include entity descriptions in the response.", + ), + auth_user=Depends(self.service.providers.auth.auth_wrapper), + ): + """ + Retrieve entities from the knowledge graph. + """ + if not auth_user.is_superuser: + logger.warning("Implement permission checks here.") + + return await self.service.get_entities( + collection_id, + offset, + limit, + entity_ids, + with_description, + ) + + @self.router.get("/triples") + @self.base_endpoint + async def get_triples( + collection_id: UUID = Query( + ..., description="Collection ID to retrieve triples from." + ), + offset: int = Query(0, ge=0, description="Offset for pagination."), + limit: int = Query( + 100, ge=1, le=1000, description="Limit for pagination." + ), + triple_ids: Optional[list[str]] = Query( + None, description="Triple IDs to filter by." + ), + auth_user=Depends(self.service.providers.auth.auth_wrapper), + ): + """ + Retrieve triples from the knowledge graph. + """ + if not auth_user.is_superuser: + logger.warning("Implement permission checks here.") + + return await self.service.get_triples( + collection_id, + offset, + limit, + triple_ids, + ) diff --git a/py/core/main/api/management_router.py b/py/core/main/api/management_router.py index f3fb34fa5..6d3566adf 100644 --- a/py/core/main/api/management_router.py +++ b/py/core/main/api/management_router.py @@ -2,7 +2,7 @@ import json import mimetypes from datetime import datetime, timezone -from typing import Optional +from typing import Optional, Set from uuid import UUID import psutil @@ -403,20 +403,34 @@ async def collections_overview_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedCollectionOverviewResponse, ): - if not auth_user.is_superuser: - raise R2RException( - "Only a superuser can call the `collections_overview` endpoint.", - 403, - ) - - collection_uuids = ( - [UUID(collection_id) for collection_id in collection_ids] - if collection_ids - else None + user_collections: Optional[Set[UUID]] = ( + None + if auth_user.is_superuser + else {UUID(str(cid)) for cid in auth_user.collection_ids} ) + + filtered_collections: Optional[Set[UUID]] = None + + if collection_ids: + input_collections = {UUID(cid) for cid in collection_ids} + if user_collections is not None: + filtered_collections = input_collections.intersection( + user_collections + ) + else: + filtered_collections = input_collections + else: + filtered_collections = user_collections + collections_overview_response = ( await self.service.collections_overview( - collection_ids=collection_uuids, offset=offset, limit=limit + collection_ids=( + [str(cid) for cid in filtered_collections] + if filtered_collections is not None + else None + ), + offset=offset, + limit=limit, ) ) @@ -434,11 +448,12 @@ async def create_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedCollectionResponse, ): - if not auth_user.is_superuser: - raise R2RException( - "Only a superuser can create collections.", 403 - ) - return await self.service.create_collection(name, description) + collection_id = await self.service.create_collection( + name, description + ) + return await self.service.add_user_to_collection( + auth_user.id, collection_id.collection_id + ) @self.router.get("/get_collection/{collection_id}") @self.base_endpoint @@ -447,11 +462,16 @@ async def get_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedCollectionResponse, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can get Collection details.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - collection_uuid = UUID(collection_id) + result = await self.service.get_collection(collection_uuid) return result @@ -468,11 +488,16 @@ async def update_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedCollectionResponse, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can update collections.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - collection_uuid = UUID(collection_id) + return await self.service.update_collection( collection_uuid, name, description ) @@ -483,11 +508,15 @@ async def delete_collection_app( collection_id: str = Path(..., description="Collection ID"), auth_user=Depends(self.service.providers.auth.auth_wrapper), ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can delete collections.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - collection_uuid = UUID(collection_id) return await self.service.delete_collection(collection_uuid) @self.router.get("/list_collections") @@ -518,12 +547,17 @@ async def add_user_to_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedAddUserResponse, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + user_uuid = UUID(user_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can add users to collections.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - user_uuid = UUID(user_id) - collection_uuid = UUID(collection_id) + result = await self.service.add_user_to_collection( user_uuid, collection_uuid ) @@ -536,12 +570,17 @@ async def remove_user_from_collection_app( collection_id: str = Body(..., description="Collection ID"), auth_user=Depends(self.service.providers.auth.auth_wrapper), ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + user_uuid = UUID(user_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can remove users from collections.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - user_uuid = UUID(user_id) - collection_uuid = UUID(collection_id) + await self.service.remove_user_from_collection( user_uuid, collection_uuid ) @@ -559,11 +598,16 @@ async def get_users_in_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedUsersInCollectionResponse, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can get users in a collection.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - collection_uuid = UUID(collection_id) + users_in_collection_response = ( await self.service.get_users_in_collection( collection_id=collection_uuid, @@ -587,9 +631,10 @@ async def get_collections_for_user_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedUserCollectionResponse, ): - if not auth_user.is_superuser: + if str(auth_user.id) != user_id or not auth_user.is_superuser: raise R2RException( - "Only a superuser can get collections for a user.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) user_uuid = UUID(user_id) user_collection_response = ( @@ -609,13 +654,17 @@ async def assign_document_to_collection_app( collection_id: str = Body(..., description="Collection ID"), auth_user=Depends(self.service.providers.auth.auth_wrapper), ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + document_uuid = UUID(document_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can assign documents to collections.", + "The currently authenticated user does not have access to the specified collection.", 403, ) - document_uuid = UUID(document_id) - collection_uuid = UUID(collection_id) + return await self.service.assign_document_to_collection( document_uuid, collection_uuid ) @@ -628,13 +677,17 @@ async def remove_document_from_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=None, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + document_uuid = UUID(document_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can remove documents from collections.", + "The currently authenticated user does not have access to the specified collection.", 403, ) - document_uuid = UUID(document_id) - collection_uuid = UUID(collection_id) + await self.service.remove_document_from_collection( document_uuid, collection_uuid ) @@ -673,11 +726,16 @@ async def documents_in_collection_app( auth_user=Depends(self.service.providers.auth.auth_wrapper), response_model=WrappedDocumentOverviewResponse, ): - if not auth_user.is_superuser: + collection_uuid = UUID(collection_id) + if ( + not auth_user.is_superuser + and collection_uuid not in auth_user.collection_ids + ): raise R2RException( - "Only a superuser can get documents in a collection.", 403 + "The currently authenticated user does not have access to the specified collection.", + 403, ) - collection_uuid = UUID(collection_id) + documents_in_collection_response = ( await self.service.documents_in_collection( collection_uuid, offset, limit diff --git a/py/core/main/api/retrieval_router.py b/py/core/main/api/retrieval_router.py index 5c9a5edfb..f169020e3 100644 --- a/py/core/main/api/retrieval_router.py +++ b/py/core/main/api/retrieval_router.py @@ -126,6 +126,10 @@ async def search_app( auth_user, vector_search_settings ) + kg_search_settings.filters = self._select_filters( + auth_user, kg_search_settings + ) + results = await self.service.search( query=query, vector_search_settings=vector_search_settings, @@ -253,6 +257,8 @@ async def agent_app( auth_user, vector_search_settings ) + kg_search_settings.filters = vector_search_settings.filters + try: response = await self.service.agent( messages=messages, diff --git a/py/core/main/orchestration/hatchet/ingestion_workflow.py b/py/core/main/orchestration/hatchet/ingestion_workflow.py index 8199f352a..2e5cd27e1 100644 --- a/py/core/main/orchestration/hatchet/ingestion_workflow.py +++ b/py/core/main/orchestration/hatchet/ingestion_workflow.py @@ -4,13 +4,9 @@ from hatchet_sdk import Context -from core.base import ( - IngestionStatus, - OrchestrationProvider, - generate_id_from_label, - increment_version, -) +from core.base import IngestionStatus, OrchestrationProvider, increment_version from core.base.abstractions import DocumentInfo, R2RException +from core.utils import generate_default_user_collection_id from ...services import IngestionService, IngestionServiceAdapter @@ -24,7 +20,7 @@ def hatchet_ingestion_factory( orchestration_provider: OrchestrationProvider, service: IngestionService ) -> dict[str, "Hatchet.Workflow"]: @orchestration_provider.workflow( - name="ingest-file-changed", + name="ingest-files", timeout="60m", ) class HatchetIngestFilesWorkflow: @@ -123,8 +119,8 @@ async def embed(self, context: Context) -> dict: collection_id = await service.providers.database.relational.assign_document_to_collection( document_id=document_info.id, - collection_id=generate_id_from_label( - str(document_info.user_id) + collection_id=generate_default_user_collection_id( + document_info.user_id ), ) @@ -265,7 +261,7 @@ async def update_files(self, context: Context) -> None: # Spawn ingest_file workflow as a child workflow child_result = ( await context.aio.spawn_workflow( - "ingest-file-changed", + "ingest-files", {"request": ingest_input}, key=f"ingest_file_{doc_id}", ) diff --git a/py/core/main/orchestration/hatchet/kg_workflow.py b/py/core/main/orchestration/hatchet/kg_workflow.py index ebb425f69..2e8ee0dd8 100644 --- a/py/core/main/orchestration/hatchet/kg_workflow.py +++ b/py/core/main/orchestration/hatchet/kg_workflow.py @@ -208,16 +208,18 @@ async def kg_community_summary(self, context: Context) -> dict: parallel_communities = min(100, num_communities) total_workflows = math.ceil(num_communities / parallel_communities) workflows = [] - for i, offset in enumerate( - range(0, num_communities, parallel_communities) - ): + for i in range(total_workflows): + offset = i * parallel_communities workflows.append( context.aio.spawn_workflow( "kg-community-summary", { "request": { "offset": offset, - "limit": parallel_communities, + "limit": min( + parallel_communities, + num_communities - offset, + ), "collection_id": collection_id, **input_data["kg_enrichment_settings"], } diff --git a/py/core/main/orchestration/simple/ingestion_workflow.py b/py/core/main/orchestration/simple/ingestion_workflow.py index e7f9a0956..4e4e06c11 100644 --- a/py/core/main/orchestration/simple/ingestion_workflow.py +++ b/py/core/main/orchestration/simple/ingestion_workflow.py @@ -1,7 +1,8 @@ import asyncio import logging -from core.base import R2RException, generate_id_from_label, increment_version +from core.base import R2RException, increment_version +from core.utils import generate_default_user_collection_id from ...services import IngestionService @@ -63,7 +64,7 @@ async def ingest_files(input_data): try: collection_id = await service.providers.database.relational.assign_document_to_collection( document_id=document_info.id, - collection_id=generate_id_from_label( + collection_id=generate_default_user_collection_id( str(document_info.user_id) ), ) @@ -167,4 +168,4 @@ async def update_files(input_data): await asyncio.gather(*results) - return {"ingest-file-changed": ingest_files, "update-files": update_files} + return {"ingest-files": ingest_files, "update-files": update_files} diff --git a/py/core/main/services/auth_service.py b/py/core/main/services/auth_service.py index 46d7093ed..646b62a50 100644 --- a/py/core/main/services/auth_service.py +++ b/py/core/main/services/auth_service.py @@ -118,21 +118,26 @@ async def update_user( self, user_id: UUID, email: Optional[str] = None, + is_superuser: Optional[bool] = None, name: Optional[str] = None, bio: Optional[str] = None, profile_picture: Optional[str] = None, ) -> UserResponse: - user = await self.providers.database.relational.get_user_by_id(user_id) + user: UserResponse = ( + await self.providers.database.relational.get_user_by_id(user_id) + ) if not user: raise R2RException(status_code=404, message="User not found") - if email: - setattr(user, "email", email) - if name: - setattr(user, "name", name) - if bio: - setattr(user, "bio", bio) - if profile_picture: - setattr(user, "profile_picture", profile_picture) + if email is not None: + user.email = email + if is_superuser is not None: + user.is_superuser = is_superuser + if name is not None: + user.name = name + if bio is not None: + user.bio = bio + if profile_picture is not None: + user.profile_picture = profile_picture return await self.providers.database.relational.update_user(user) @telemetry_event("DeleteUserAccount") diff --git a/py/core/main/services/kg_service.py b/py/core/main/services/kg_service.py index fbe1ebfc8..856f341a4 100644 --- a/py/core/main/services/kg_service.py +++ b/py/core/main/services/kg_service.py @@ -3,7 +3,10 @@ from typing import Any, AsyncGenerator, Optional from uuid import UUID + from core.base import KGCreationStatus, RunLoggingSingleton, RunManager +from core.base.abstractions import KGCreationSettings, KGEnrichmentSettings + from core.base.abstractions import GenerationConfig from core.telemetry.telemetry_decorator import telemetry_event @@ -62,7 +65,7 @@ async def kg_triples_extraction( await self.providers.database.relational.set_workflow_status( id=document_id, status_type="kg_extraction_status", - status=KGCreationStatus.PROCESSING, + status=KGExtractionStatus.PROCESSING, ) triples = await self.pipes.kg_triples_extraction_pipe.run( @@ -86,18 +89,12 @@ async def kg_triples_extraction( run_manager=self.run_manager, ) - await self.providers.database.relational.set_workflow_status( - id=document_id, - status_type="kg_extraction_status", - status=KGCreationStatus.SUCCESS, - ) - except Exception as e: logger.error(f"Error in kg_extraction: {e}") await self.providers.database.relational.set_workflow_status( id=document_id, status_type="kg_extraction_status", - status=KGCreationStatus.FAILED, + status=KGExtractionStatus.FAILED, ) return await _collect_results(result_gen) @@ -111,13 +108,13 @@ async def get_document_ids_for_create_graph( ): document_status_filter = [ - KGCreationStatus.PENDING, - KGCreationStatus.FAILED, + KGExtractionStatus.PENDING, + KGExtractionStatus.FAILED, ] if force_kg_creation: document_status_filter += [ - KGCreationStatus.SUCCESS, - KGCreationStatus.PROCESSING, + KGExtractionStatus.SUCCESS, + KGExtractionStatus.PROCESSING, ] document_ids = await self.providers.database.relational.get_document_ids_by_status( @@ -136,7 +133,9 @@ async def kg_entity_description( **kwargs, ): - entity_count = await self.providers.kg.get_entity_count(document_id) + entity_count = await self.providers.kg.get_entity_count( + document_id=document_id + ) # TODO - Do not hardcode the batch size, # make it a configurable parameter at runtime & server-side defaults @@ -164,6 +163,12 @@ async def kg_entity_description( all_results.append(await _collect_results(node_descriptions)) + await self.providers.database.relational.set_workflow_status( + id=document_id, + status_type="kg_extraction_status", + status=KGExtractionStatus.SUCCESS, + ) + return all_results @telemetry_event("kg_clustering") @@ -211,3 +216,80 @@ async def kg_community_summary( run_manager=self.run_manager, ) return await _collect_results(summary_results) + + @telemetry_event("delete_graph_for_documents") + async def delete_graph_for_documents( + self, + document_ids: list[UUID], + **kwargs, + ): + # TODO: Implement this, as it needs some checks. + raise NotImplementedError + + @telemetry_event("delete_graph_for_collection") + async def delete_graph_for_collection( + self, + collection_id: UUID, + cascade: bool, + **kwargs, + ): + return await self.providers.kg.delete_graph_for_collection( + collection_id, cascade + ) + + @telemetry_event("get_creation_estimate") + async def get_creation_estimate( + self, + collection_id: UUID, + kg_creation_settings: KGCreationSettings, + **kwargs, + ): + return await self.providers.kg.get_creation_estimate( + collection_id, kg_creation_settings + ) + + @telemetry_event("get_enrichment_estimate") + async def get_enrichment_estimate( + self, + collection_id: UUID, + kg_enrichment_settings: KGEnrichmentSettings, + **kwargs, + ): + + return await self.providers.kg.get_enrichment_estimate( + collection_id, kg_enrichment_settings + ) + + @telemetry_event("get_entities") + async def get_entities( + self, + collection_id: UUID, + offset: int = 0, + limit: int = 100, + entity_ids: Optional[list[str]] = None, + with_description: bool = False, + **kwargs, + ): + return await self.providers.kg.get_entities( + collection_id, + offset, + limit, + entity_ids, + with_description, + ) + + @telemetry_event("get_triples") + async def get_triples( + self, + collection_id: UUID, + offset: int = 0, + limit: int = 100, + triple_ids: Optional[list[str]] = None, + **kwargs, + ): + return await self.providers.kg.get_triples( + collection_id, + offset, + limit, + triple_ids, + ) diff --git a/py/core/main/services/retrieval_service.py b/py/core/main/services/retrieval_service.py index d38e785fd..0296a2024 100644 --- a/py/core/main/services/retrieval_service.py +++ b/py/core/main/services/retrieval_service.py @@ -17,12 +17,12 @@ RunManager, RunType, VectorSearchSettings, - generate_id_from_label, manage_run, to_async_generator, ) from core.base.api.models import RAGResponse, SearchResponse, UserResponse from core.telemetry.telemetry_decorator import telemetry_event +from core.utils import generate_message_id from ..abstractions import R2RAgents, R2RPipelines, R2RPipes, R2RProviders from ..config import R2RConfig @@ -140,9 +140,7 @@ async def rag( vector_search_settings.filters[filter] = str(value) completion_start_time = datetime.now() - message_id = generate_id_from_label( - f"{query}-{completion_start_time.isoformat()}" - ) + message_id = generate_message_id(query, completion_start_time) completion_record = CompletionRecord( message_id=message_id, diff --git a/py/core/pipes/ingestion/parsing_pipe.py b/py/core/pipes/ingestion/parsing_pipe.py index 75f8d3ab8..f86aa3fef 100644 --- a/py/core/pipes/ingestion/parsing_pipe.py +++ b/py/core/pipes/ingestion/parsing_pipe.py @@ -10,11 +10,11 @@ IngestionConfig, PipeType, RunLoggingSingleton, - generate_id_from_label, ) from core.base.abstractions import R2RDocumentProcessingError from core.base.pipes.base_pipe import AsyncPipe from core.base.providers.ingestion import IngestionProvider +from core.utils import generate_extraction_id logger = logging.getLogger(__name__) @@ -70,7 +70,7 @@ async def _parse( async for extraction in self.ingestion_provider.parse( # type: ignore file_content, document, ingestion_config_override ): - id = generate_id_from_label(f"{extraction.id}-{version}") + id = generate_extraction_id(extraction.id, version=version) extraction.id = id extraction.metadata["version"] = version yield extraction diff --git a/py/core/pipes/kg/community_summary.py b/py/core/pipes/kg/community_summary.py index d9e652014..0acece79c 100644 --- a/py/core/pipes/kg/community_summary.py +++ b/py/core/pipes/kg/community_summary.py @@ -92,7 +92,7 @@ def community_summary_prompt( async def process_community( self, - community_number: str, + community_number: int, max_summary_input_length: int, generation_config: GenerationConfig, collection_id: UUID, @@ -135,17 +135,21 @@ async def process_community( ) try: - if description.startswith("```json"): + if description and description.startswith("```json"): description = ( description.strip("```json").strip("```").strip() ) + else: + raise ValueError( + f"Failed to generate a summary for community {community_number} at level {community_level}." + ) - description = json.loads(description) - name = description["name"] - summary = description["summary"] - findings = description["findings"] - rating = description["rating"] - rating_explanation = description["rating_explanation"] + description_dict = json.loads(description) + name = description_dict["name"] + summary = description_dict["summary"] + findings = description_dict["findings"] + rating = description_dict["rating"] + rating_explanation = description_dict["rating_explanation"] break except Exception as e: if attempt == 2: @@ -153,7 +157,7 @@ async def process_community( f"Failed to generate a summary for community {community_number} at level {community_level}." ) from e - community = CommunityReport( + community_report = CommunityReport( community_number=community_number, collection_id=collection_id, level=community_level, @@ -170,11 +174,11 @@ async def process_community( ), ) - await self.kg_provider.add_community_report(community) + await self.kg_provider.add_community_report(community_report) return { - "community_number": community.community_number, - "name": community.name, + "community_number": community_report.community_number, + "name": community_report.name, } async def _run_logic( # type: ignore @@ -195,15 +199,24 @@ async def _run_logic( # type: ignore max_summary_input_length = input.message["max_summary_input_length"] collection_id = input.message["collection_id"] community_summary_jobs = [] + + # check which community summaries exist and don't run them again + community_numbers_exist = ( + await self.kg_provider.check_community_reports_exist( + collection_id=collection_id, offset=offset, limit=limit + ) + ) + for community_number in range(offset, offset + limit): - community_summary_jobs.append( - self.process_community( - community_number=community_number, - max_summary_input_length=max_summary_input_length, - generation_config=generation_config, - collection_id=collection_id, + if community_number not in community_numbers_exist: + community_summary_jobs.append( + self.process_community( + community_number=community_number, + max_summary_input_length=max_summary_input_length, + generation_config=generation_config, + collection_id=collection_id, + ) ) - ) for community_summary in asyncio.as_completed(community_summary_jobs): yield await community_summary diff --git a/py/core/pipes/kg/entity_description.py b/py/core/pipes/kg/entity_description.py index 6a85eccb7..2966e5564 100644 --- a/py/core/pipes/kg/entity_description.py +++ b/py/core/pipes/kg/entity_description.py @@ -15,6 +15,7 @@ RunLoggingSingleton, ) from core.base.pipes.base_pipe import AsyncPipe +from core.base.abstractions import Entity logger = logging.getLogger(__name__) @@ -91,21 +92,31 @@ def truncate_info(info_list, max_length): return truncated_info async def process_entity( - entities, triples, max_description_input_length + entities, triples, max_description_input_length, document_id ): entity_info = [ - f"{entity['name']}, {entity['description']}" - for entity in entities + f"{entity.name}, {entity.description}" for entity in entities ] triples_txt = [ - f"{i+1}: {triple['subject']}, {triple['object']}, {triple['predicate']} - Summary: {triple['description']}" + f"{i+1}: {triple.subject}, {triple.object}, {triple.predicate} - Summary: {triple.description}" for i, triple in enumerate(triples) ] - out_entity = {"name": entities[0]["name"]} - out_entity["description"] = ( + # potentially slow at scale, but set to avoid duplicates + unique_extraction_ids = set() + for entity in entities: + for extraction_id in entity.extraction_ids: + unique_extraction_ids.add(extraction_id) + + out_entity = Entity( + name=entities[0].name, + extraction_ids=list(unique_extraction_ids), + document_ids=[document_id], + ) + + out_entity.description = ( ( await self.llm_provider.aget_completion( messages=[ @@ -131,9 +142,9 @@ async def process_entity( ) # will do more requests, but it is simpler - out_entity["description_embedding"] = ( + out_entity.description_embedding = ( await self.embedding_provider.async_get_embeddings( - [out_entity["description"]] + [out_entity.description] ) )[0] @@ -141,15 +152,17 @@ async def process_entity( await self.kg_provider.upsert_embeddings( [ ( - out_entity["name"], - out_entity["description"], - str(out_entity["description_embedding"]), + out_entity.name, + out_entity.description, + str(out_entity.description_embedding), + out_entity.extraction_ids, + document_id, ) ], "entity_embedding", ) - return out_entity["name"] + return out_entity.name offset = input.message["offset"] limit = input.message["limit"] @@ -171,6 +184,7 @@ async def process_entity( entity_info["entities"], entity_info["triples"], input.message["max_description_input_length"], + document_id, ) ) except Exception as e: @@ -178,3 +192,5 @@ async def process_entity( for result in asyncio.as_completed(workflows): yield await result + + logger.info(f"Processed {total_entities} entities for document {document_id}") \ No newline at end of file diff --git a/py/core/pipes/kg/triples_extraction.py b/py/core/pipes/kg/triples_extraction.py index 62b74b739..e818b6cc9 100644 --- a/py/core/pipes/kg/triples_extraction.py +++ b/py/core/pipes/kg/triples_extraction.py @@ -147,10 +147,9 @@ def parse_fn(response_str: str) -> Any: category=entity_category, description=entity_description, name=entity_value, - document_id=str(extractions[0].document_id), + document_id=extractions[0].document_id, extraction_ids=[ - str(extraction.id) - for extraction in extractions + extraction.id for extraction in extractions ], attributes={}, ) @@ -172,10 +171,9 @@ def parse_fn(response_str: str) -> Any: object=object, description=description, weight=weight, - document_id=str(extractions[0].document_id), + document_id=extractions[0].document_id, extraction_ids=[ - str(extraction.id) - for extraction in extractions + extraction.id for extraction in extractions ], attributes={}, ) diff --git a/py/core/pipes/retrieval/kg_search_pipe.py b/py/core/pipes/retrieval/kg_search_pipe.py index cfaf74d8c..880a1b4f5 100644 --- a/py/core/pipes/retrieval/kg_search_pipe.py +++ b/py/core/pipes/retrieval/kg_search_pipe.py @@ -130,9 +130,9 @@ async def local_search( property_names=[ "name", "description", - # "extraction_ids", - # "document_ids", + "extraction_ids", ], + filters=kg_search_settings.filters, ): yield KGSearchResult( content=KGEntityResult( @@ -141,12 +141,12 @@ async def local_search( ), method=KGSearchMethod.LOCAL, result_type=KGSearchResultType.ENTITY, - # extraction_ids=search_result["extraction_ids"], - # document_ids=search_result["document_ids"], + extraction_ids=search_result["extraction_ids"], metadata={"associated_query": message}, ) # relationship search + # disabled for now. We will check evaluations and see if we need it # search_type = "__Relationship__" # async for search_result in self.kg_provider.vector_query( # type: ignore # input, @@ -192,6 +192,7 @@ async def local_search( "rating_explanation", "summary", ], + filters=kg_search_settings.filters, ): yield KGSearchResult( content=KGCommunityResult( diff --git a/py/core/providers/database/collection.py b/py/core/providers/database/collection.py index cc30736a2..ab0fed47c 100644 --- a/py/core/providers/database/collection.py +++ b/py/core/providers/database/collection.py @@ -4,13 +4,13 @@ from typing import Optional, Union from uuid import UUID, uuid4 -from core.base import ( - R2RException, - generate_default_user_collection_id, - generate_id_from_label, -) +from core.base import R2RException, generate_default_user_collection_id from core.base.abstractions import DocumentInfo, DocumentType, IngestionStatus from core.base.api.models import CollectionOverviewResponse, CollectionResponse +from core.utils import ( + generate_collection_id_from_name, + generate_default_user_collection_id, +) from .base import DatabaseMixin @@ -24,6 +24,7 @@ async def create_table(self) -> None: collection_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), name TEXT NOT NULL, description TEXT, + kg_enrichment_status TEXT DEFAULT 'PENDING', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); @@ -41,7 +42,7 @@ async def create_default_collection( user_id ) else: - default_collection_uuid = generate_id_from_label( + default_collection_uuid = generate_collection_id_from_name( config.default_collection_name ) diff --git a/py/core/providers/database/document.py b/py/core/providers/database/document.py index a4878319e..332b751d3 100644 --- a/py/core/providers/database/document.py +++ b/py/core/providers/database/document.py @@ -21,7 +21,7 @@ DocumentInfo, DocumentType, IngestionStatus, - KGCreationStatus, + KGExtractionStatus, KGEnrichmentStatus, R2RException, ) @@ -294,7 +294,7 @@ def _get_status_model_and_table_name(self, status_type: str): if status_type == "ingestion": return IngestionStatus, "document_info" elif status_type == "kg_extraction_status": - return KGCreationStatus, "document_info" + return KGExtractionStatus, "document_info" elif status_type == "kg_enrichment_status": return KGEnrichmentStatus, "collection_info" else: @@ -348,7 +348,7 @@ async def get_document_ids_by_status( self, status_type: str, status: Union[str, list[str]], - collection_id: UUID = None, + collection_id: Optional[UUID] = None, ): """ Get the IDs for a given status. @@ -437,7 +437,7 @@ async def get_documents_overview( ingestion_status=IngestionStatus(row["ingestion_status"]), created_at=row["created_at"], updated_at=row["updated_at"], - kg_extraction_status=KGCreationStatus( + kg_extraction_status=KGExtractionStatus( row["kg_extraction_status"] ), ) diff --git a/py/core/providers/database/postgres.py b/py/core/providers/database/postgres.py index 0109f17e4..5f8d58a05 100644 --- a/py/core/providers/database/postgres.py +++ b/py/core/providers/database/postgres.py @@ -73,18 +73,19 @@ def __init__( project_name = ( config.project_name - or config.vecs_collection # remove after deprecation + or os.getenv("R2R_PROJECT_NAME") + # Remove the following line after deprecation or os.getenv("POSTGRES_PROJECT_NAME") ) if not project_name: raise ValueError( - "Error, please set a valid POSTGRES_PROJECT_NAME environment variable or set a 'project_name' in the 'database' settings of your `r2r.toml`." + "Error, please set a valid R2R_PROJECT_NAME environment variable or set a 'project_name' in the 'database' settings of your `r2r.toml`." ) self.project_name = project_name if not all([user, password, host, port, db_name, project_name]): raise ValueError( - "Error, please set the POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DBNAME, and POSTGRES_PROJECT_NAME environment variables to use pgvector database." + "Error, please set the POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DBNAME, and `R2R_PROJECT_NAME` environment variables to use pgvector database." ) # Check if it's a Unix socket connection diff --git a/py/core/providers/database/user.py b/py/core/providers/database/user.py index ebf39d17e..e315e17ec 100644 --- a/py/core/providers/database/user.py +++ b/py/core/providers/database/user.py @@ -4,7 +4,7 @@ from core.base.abstractions import R2RException, UserStats from core.base.api.models import UserResponse -from core.base.utils import generate_id_from_label +from core.utils import generate_user_id from .base import DatabaseMixin, QueryBuilder @@ -136,7 +136,7 @@ async def create_user(self, email: str, password: str) -> UserResponse: RETURNING user_id, email, is_superuser, is_active, is_verified, created_at, updated_at, collection_ids """ result = await self.fetchrow_query( - query, [email, generate_id_from_label(email), hashed_password, []] + query, [email, generate_user_id(email), hashed_password, []] ) if not result: diff --git a/py/core/providers/database/vecs/collection.py b/py/core/providers/database/vecs/collection.py index 45a990447..57b5f1d81 100644 --- a/py/core/providers/database/vecs/collection.py +++ b/py/core/providers/database/vecs/collection.py @@ -245,7 +245,7 @@ def __init__( ) elif len(reported_dimensions) > 1: raise MismatchedDimension( - "Mismatch in the reported dimensions of the selected vector collection and embedding model. Correct the selected embedding model or specify a new vector collection by modifying the `POSTGRES_PROJECT_NAME` environment variable." + "Mismatch in the reported dimensions of the selected vector collection and embedding model. Correct the selected embedding model or specify a new vector collection by modifying the `R2R_PROJECT_NAME` environment variable." ) def __repr__(self): @@ -314,7 +314,7 @@ def _create_if_not_exists(self): ) if len(reported_dimensions) > 1: raise MismatchedDimension( - "Mismatch in the reported dimensions of the selected vector collection and embedding model. Correct the selected embedding model or specify a new vector collection by modifying the `POSTGRES_PROJECT_NAME` environment variable." + "Mismatch in the reported dimensions of the selected vector collection and embedding model. Correct the selected embedding model or specify a new vector collection by modifying the `R2R_PROJECT_NAME` environment variable." ) if not collection_dimension: diff --git a/py/core/providers/ingestion/r2r/base.py b/py/core/providers/ingestion/r2r/base.py index 0f2034ba9..889c7b463 100644 --- a/py/core/providers/ingestion/r2r/base.py +++ b/py/core/providers/ingestion/r2r/base.py @@ -15,8 +15,9 @@ R2RDocumentProcessingError, RecursiveCharacterTextSplitter, TextSplitter, - generate_id_from_label, ) +from core.base.abstractions import DocumentExtraction +from core.utils import generate_extraction_id logger = logging.getLogger(__name__) @@ -184,7 +185,7 @@ async def parse( # type: ignore chunks = self.chunk(contents, ingestion_config_override) for chunk in chunks: extraction = DocumentExtraction( - id=generate_id_from_label(f"{document.id}-{iteration}"), + id=generate_extraction_id(document.id, iteration), document_id=document.id, user_id=document.user_id, collection_ids=document.collection_ids, diff --git a/py/core/providers/ingestion/unstructured/base.py b/py/core/providers/ingestion/unstructured/base.py index 24f437fb1..95a92510a 100644 --- a/py/core/providers/ingestion/unstructured/base.py +++ b/py/core/providers/ingestion/unstructured/base.py @@ -18,10 +18,10 @@ Document, DocumentExtraction, DocumentType, - generate_id_from_label, ) from core.base.abstractions import R2RSerializable from core.base.providers.ingestion import IngestionConfig, IngestionProvider +from core.utils import generate_extraction_id logger = logging.getLogger(__name__) @@ -84,6 +84,7 @@ class UnstructuredIngestionProvider(IngestionProvider): DocumentType.MP3: [parsers.AudioParser], DocumentType.JSON: [parsers.JSONParser], DocumentType.HTML: [parsers.HTMLParser], + DocumentType.XLSX: [parsers.XLSXParser], } IMAGE_TYPES = { @@ -269,7 +270,7 @@ async def parse( metadata["chunk_order"] = iteration # creating the text extraction yield DocumentExtraction( - id=generate_id_from_label(f"{document.id}-{iteration}"), + id=generate_extraction_id(document.id, iteration), document_id=document.id, user_id=document.user_id, collection_ids=document.collection_ids, diff --git a/py/core/providers/kg/postgres.py b/py/core/providers/kg/postgres.py index 4b6ce29b4..d86b8ab40 100644 --- a/py/core/providers/kg/postgres.py +++ b/py/core/providers/kg/postgres.py @@ -1,20 +1,31 @@ import json import logging -from typing import Any, Optional, Tuple +from typing import Any, Dict, List, Optional, Tuple, Union from uuid import UUID import asyncpg +from graspologic.partition import HierarchicalClusters -from core import KGExtraction from core.base import ( - Community, + CommunityReport, DatabaseProvider, EmbeddingProvider, Entity, KGConfig, + KGExtractionStatus, + KGExtraction, KGProvider, Triple, ) +from shared.abstractions import ( + KGCreationEstimationResponse, + KGEnrichmentEstimationResponse, + KGEnrichmentSettings, + KGCreationSettings, +) + +from shared.utils import llm_cost_per_million_tokens + logger = logging.getLogger(__name__) @@ -49,7 +60,7 @@ async def initialize(self): await self.create_tables(project_name=self.db_provider.project_name) async def execute_query( - self, query: str, params: Optional[list[tuple[Any]]] = None + self, query: str, params: Optional[list[Any]] = None ) -> Any: return await self.db_provider.execute_query(query, params) @@ -62,7 +73,9 @@ async def execute_many( return await self.db_provider.execute_many(query, params, batch_size) async def fetch_query( - self, query: str, params: Optional[list[tuple[Any]]] = None + self, + query: str, + params: Optional[Any] = None, # TODO: make this strongly typed ) -> Any: return await self.db_provider.fetch_query(query, params) @@ -74,7 +87,6 @@ async def create_tables(self, project_name: str): # create schema query = f""" - CREATE TABLE IF NOT EXISTS {self._get_table_name("entity_raw")} ( id SERIAL PRIMARY KEY, category TEXT NOT NULL, @@ -143,26 +155,17 @@ async def create_tables(self, project_name: str): id SERIAL PRIMARY KEY, name TEXT NOT NULL, description TEXT NOT NULL, + extraction_ids UUID[] NOT NULL, description_embedding vector({self.embedding_provider.config.base_dimension}) NOT NULL, - UNIQUE (name) + document_id UUID NOT NULL, + UNIQUE (name, document_id) ); """ await self.execute_query(query) - # triples embeddings table - query = f""" - CREATE TABLE IF NOT EXISTS {self._get_table_name("triple_embedding")} ( - id SERIAL PRIMARY KEY, - subject TEXT NOT NULL, - predicate TEXT NOT NULL, - object TEXT NOT NULL, - description_embedding vector({self.embedding_provider.config.base_dimension}) NOT NULL, - UNIQUE (subject, predicate, object) - ); - """ - - await self.execute_query(query) + # TODO: Create another table for entity_embedding_collection + # entity embeddings at a collection level # communities table, result of the Leiden algorithm query = f""" @@ -173,7 +176,8 @@ async def create_tables(self, project_name: str): parent_cluster INT, level INT NOT NULL, is_final_cluster BOOLEAN NOT NULL, - triple_ids INT[] NOT NULL + triple_ids INT[] NOT NULL, + collection_id UUID NOT NULL );""" await self.execute_query(query) @@ -247,8 +251,8 @@ async def add_entities( async def add_triples( self, triples: list[Triple], - table_name: str, - ) -> asyncpg.Record: + table_name: str = "triples", + ) -> None: """ Upsert triples into the triple_raw table. These are raw triples extracted from the document. @@ -318,30 +322,34 @@ async def add_kg_extractions( return (total_entities, total_relationships) async def get_entity_map( - self, offset: int, limit: int, document_id: str - ) -> dict[str, Any]: + self, offset: int, limit: int, document_id: UUID + ) -> Dict[str, Dict[str, List[Dict[str, Any]]]]: QUERY1 = f""" WITH entities_list AS ( - SELECT DISTINCT name FROM {self._get_table_name("entity_raw")} WHERE document_id = $1 ORDER BY name ASC LIMIT {limit} OFFSET {offset} ) - SELECT DISTINCT e.name, e.description, e.category + SELECT e.name, e.description, e.category, + (SELECT array_agg(DISTINCT x) FROM unnest(e.extraction_ids) x) AS extraction_ids, + e.document_id FROM {self._get_table_name("entity_raw")} e JOIN entities_list el ON e.name = el.name + GROUP BY e.name, e.description, e.category, e.extraction_ids, e.document_id ORDER BY e.name;""" entities_list = await self.fetch_query(QUERY1, [document_id]) entities_list = [ - { - "name": entity["name"], - "description": entity["description"], - "category": entity["category"], - } + Entity( + name=entity["name"], + description=entity["description"], + category=entity["category"], + extraction_ids=entity["extraction_ids"], + document_id=entity["document_id"], + ) for entity in entities_list ] @@ -355,7 +363,8 @@ async def get_entity_map( LIMIT {limit} OFFSET {offset} ) - SELECT DISTINCT t.subject, t.predicate, t.object, t.weight, t.description + SELECT DISTINCT t.subject, t.predicate, t.object, t.weight, t.description, + (SELECT array_agg(DISTINCT x) FROM unnest(t.extraction_ids) x) AS extraction_ids, t.document_id FROM {self._get_table_name("triple_raw")} t JOIN entities_list el ON t.subject = el.name ORDER BY t.subject, t.predicate, t.object; @@ -363,45 +372,49 @@ async def get_entity_map( triples_list = await self.fetch_query(QUERY2, [document_id]) triples_list = [ - { - "subject": triple["subject"], - "predicate": triple["predicate"], - "object": triple["object"], - "weight": triple["weight"], - "description": triple["description"], - } + Triple( + subject=triple["subject"], + predicate=triple["predicate"], + object=triple["object"], + weight=triple["weight"], + description=triple["description"], + extraction_ids=triple["extraction_ids"], + document_id=triple["document_id"], + ) for triple in triples_list ] - entity_map = {} + entity_map: Dict[str, Dict[str, List[Any]]] = {} for entity in entities_list: - if entity["name"] not in entity_map: - entity_map[entity["name"]] = {"entities": [], "triples": []} - entity_map[entity["name"]]["entities"].append(entity) + if entity.name not in entity_map: + entity_map[entity.name] = {"entities": [], "triples": []} + entity_map[entity.name]["entities"].append(entity) for triple in triples_list: - if triple["subject"] in entity_map: - entity_map[triple["subject"]]["triples"].append(triple) - if triple["object"] in entity_map: - entity_map[triple["object"]]["triples"].append(triple) + if triple.subject in entity_map: + entity_map[triple.subject]["triples"].append(triple) + if triple.object in entity_map: + entity_map[triple.object]["triples"].append(triple) return entity_map async def upsert_embeddings( self, - data: list[dict[str, Any]], + data: List[Tuple[Any]], table_name: str, ) -> None: QUERY = f""" - INSERT INTO {self._get_table_name(table_name)} (name, description, description_embedding) - VALUES ($1, $2, $3) - ON CONFLICT (name) DO UPDATE SET + INSERT INTO {self._get_table_name(table_name)} (name, description, description_embedding, extraction_ids, document_id) + VALUES ($1, $2, $3, $4, $5) + ON CONFLICT (name, document_id) DO UPDATE SET description = EXCLUDED.description, - description_embedding = EXCLUDED.description_embedding + description_embedding = EXCLUDED.description_embedding, + extraction_ids = EXCLUDED.extraction_ids, + document_id = EXCLUDED.document_id """ return await self.execute_many(QUERY, data) - async def upsert_entities(self, entities: list[Entity]) -> None: + async def upsert_entities(self, entities: List[Entity]) -> None: QUERY = """ INSERT INTO $1.$2 (category, name, description, description_embedding, extraction_ids, document_id, attributes) VALUES ($1, $2, $3, $4, $5, $6, $7) @@ -427,6 +440,7 @@ async def vector_query(self, query: str, **kwargs: Any) -> Any: search_type = kwargs.get("search_type", "__Entity__") embedding_type = kwargs.get("embedding_type", "description_embedding") property_names = kwargs.get("property_names", ["name", "description"]) + filters = kwargs.get("filters", {}) limit = kwargs.get("limit", 10) table_name = "" @@ -440,11 +454,32 @@ async def vector_query(self, query: str, **kwargs: Any) -> Any: raise ValueError(f"Invalid search type: {search_type}") property_names_str = ", ".join(property_names) + + collection_ids_dict = filters.get("collection_ids", {}) + filter_query = "" + if collection_ids_dict: + filter_query = "WHERE collection_id = ANY($3)" + filter_ids = collection_ids_dict['$overlap'] + + if search_type == "__Community__": + logger.info(f"Searching in collection ids: {filter_ids}") + + if search_type == "__Entity__" or search_type == "__Relationship__": + filter_query = "WHERE document_id = ANY($3)" + query = f""" + SELECT distinct document_id FROM {self._get_table_name('document_info')} WHERE $1 = ANY(collection_ids) + """ + filter_ids = [doc_id['document_id'] for doc_id in await self.fetch_query(query, filter_ids)] + logger.info(f"Searching in document ids: {filter_ids}") + QUERY = f""" - SELECT {property_names_str} FROM {self._get_table_name(table_name)} ORDER BY {embedding_type} <=> $1 LIMIT $2; + SELECT {property_names_str} FROM {self._get_table_name(table_name)} {filter_query} ORDER BY {embedding_type} <=> $1 LIMIT $2; """ - results = await self.fetch_query(QUERY, (str(query_embedding), limit)) + if filter_query != "": + results = await self.fetch_query(QUERY, (str(query_embedding), limit, filter_ids)) + else: + results = await self.fetch_query(QUERY, (str(query_embedding), limit)) for result in results: yield { @@ -452,7 +487,7 @@ async def vector_query(self, query: str, **kwargs: Any) -> Any: for property_name in property_names } - async def get_all_triples(self, collection_id: UUID) -> list[Triple]: + async def get_all_triples(self, collection_id: UUID) -> List[Triple]: # getting all documents for a collection QUERY = f""" @@ -467,30 +502,29 @@ async def get_all_triples(self, collection_id: UUID) -> list[Triple]: triples = await self.fetch_query(QUERY, [document_ids]) return triples - async def add_communities( - self, communities: list[tuple[int, Any]] - ) -> None: + async def add_communities(self, communities: List[Any]) -> None: QUERY = f""" - INSERT INTO {self._get_table_name("community")} (node, cluster, parent_cluster, level, is_final_cluster, triple_ids) - VALUES ($1, $2, $3, $4, $5, $6) + INSERT INTO {self._get_table_name("community")} (node, cluster, parent_cluster, level, is_final_cluster, triple_ids, collection_id) + VALUES ($1, $2, $3, $4, $5, $6, $7) """ await self.execute_many(QUERY, communities) - async def add_community_report(self, community: Community) -> None: + async def add_community_report( + self, community_report: CommunityReport + ) -> None: - community.embedding = str(community.embedding) + # TODO: Fix in the short term. + # we need to do this because postgres insert needs to be a string + community_report.embedding = str(community_report.embedding) # type: ignore[assignment] non_null_attrs = { - k: v for k, v in community.__dict__.items() if v is not None + k: v for k, v in community_report.__dict__.items() if v is not None } columns = ", ".join(non_null_attrs.keys()) placeholders = ", ".join(f"${i+1}" for i in range(len(non_null_attrs))) conflict_columns = ", ".join( - [ - f"{k} = EXCLUDED.{k}" - for k in non_null_attrs.keys() - ] + [f"{k} = EXCLUDED.{k}" for k in non_null_attrs.keys()] ) QUERY = f""" @@ -499,18 +533,14 @@ async def add_community_report(self, community: Community) -> None: ON CONFLICT (community_number, level, collection_id) DO UPDATE SET {conflict_columns} """ - + await self.execute_many(QUERY, [tuple(non_null_attrs.values())]) async def perform_graph_clustering( self, collection_id: UUID, - leiden_params: dict, # TODO - Add typing for leiden_params - ) -> Tuple[int, int, set[tuple[int, Any]]]: - # TODO: implementing the clustering algorithm but now we will get communities at a document level and then we will get communities at a higher level. - # we will use the Leiden algorithm for this. - # but for now let's skip it and make other stuff work. - # we will need multiple tables for this to work. + leiden_params: Dict[str, Any], + ) -> Tuple[int, int, set[Tuple[int, Any]]]: """ Leiden clustering algorithm to cluster the knowledge graph triples into communities. @@ -527,7 +557,7 @@ async def perform_graph_clustering( weight_default: Union[int, float] = 1.0, check_directed: bool = True, """ - settings = {} + settings: Dict[str, Any] = {} triples = await self.get_all_triples(collection_id) logger.info(f"Clustering with settings: {str(settings)}") @@ -546,6 +576,7 @@ async def perform_graph_clustering( ) def triple_ids(node: int) -> list[int]: + # TODO: convert this to objects return [ triple["id"] for triple in triples @@ -555,12 +586,13 @@ def triple_ids(node: int) -> list[int]: # upsert the communities into the database. inputs = [ ( - item.node, + str(item.node), item.cluster, item.parent_cluster, item.level, item.is_final_cluster, triple_ids(item.node), + collection_id, ) for item in hierarchical_communities ] @@ -576,14 +608,16 @@ def triple_ids(node: int) -> list[int]: async def _compute_leiden_communities( self, graph: Any, - leiden_params: dict, # TODO - make serve-side and run-time configuration paradigm - ) -> dict[int, dict[str, int]]: + leiden_params: Dict[str, Any], + ) -> HierarchicalClusters: """Compute Leiden communities.""" try: from graspologic.partition import hierarchical_leiden - if not leiden_params.get("random_seed"): - leiden_params["random_seed"] = 7272 + if "random_seed" not in leiden_params: + leiden_params["random_seed"] = ( + 7272 # add seed to control randomness + ) community_mapping = hierarchical_leiden(graph, **leiden_params) @@ -592,7 +626,9 @@ async def _compute_leiden_communities( except ImportError as e: raise ImportError("Please install the graspologic package.") from e - async def get_community_details(self, community_number: int): + async def get_community_details( + self, community_number: int + ) -> Tuple[int, List[Dict[str, Any]], List[Dict[str, Any]]]: QUERY = f""" SELECT level FROM {self._get_table_name("community")} WHERE cluster = $1 @@ -632,6 +668,198 @@ async def get_community_details(self, community_number: int): return level, entities, triples + # async def client(self): + # return None + + async def get_community_reports( + self, collection_id: UUID + ) -> List[CommunityReport]: + QUERY = f""" + SELECT *c FROM {self._get_table_name("community_report")} WHERE collection_id = $1 + """ + return await self.fetch_query(QUERY, [collection_id]) + + async def check_community_reports_exist( + self, collection_id: UUID, offset: int, limit: int + ) -> List[int]: + QUERY = f""" + SELECT distinct community_number FROM {self._get_table_name("community_report")} WHERE collection_id = $1 AND community_number >= $2 AND community_number < $3 + """ + community_numbers = await self.fetch_query( + QUERY, [collection_id, offset, offset + limit] + ) + return [item["community_number"] for item in community_numbers] + + async def delete_graph_for_collection( + self, collection_id: UUID, cascade: bool = False + ) -> None: + + # don't delete if status is PROCESSING. + QUERY = f""" + SELECT kg_enrichment_status FROM {self._get_table_name("collections")} WHERE collection_id = $1 + """ + status = (await self.fetch_query(QUERY, [collection_id]))[0][ + "kg_enrichment_status" + ] + if status == KGExtractionStatus.PROCESSING.value: + return + + # remove all triples for these documents. + QUERY = f""" + DELETE FROM {self._get_table_name("community")} WHERE collection_id = $1; + DELETE FROM {self._get_table_name("community_report")} WHERE collection_id = $1; + """ + + document_ids = await self.db_provider.documents_in_collection( + collection_id + ) + + if cascade: + QUERY += f""" + DELETE FROM {self._get_table_name("entity_raw")} WHERE document_id = ANY($1); + DELETE FROM {self._get_table_name("triple_raw")} WHERE document_id = ANY($1); + DELETE FROM {self._get_table_name("entity_embedding")} WHERE document_id = ANY($1); + """ + + await self.execute_query(QUERY, [document_ids]) + + # set status to PENDING for this collection. + QUERY = f""" + UPDATE {self._get_table_name("collections")} SET kg_enrichment_status = $1 WHERE collection_id = $2 + """ + await self.execute_query( + QUERY, [KGExtractionStatus.PENDING, collection_id] + ) + + + def _get_str_estimation_output(self, x: tuple[Any, Any]) -> str: + if isinstance(x[0], int) and isinstance(x[1], int): + return " - ".join(map(str, x)) + else: + return " - ".join(f"{round(a, 2)}" for a in x) + + async def get_creation_estimate( + self, collection_id: UUID, kg_creation_settings: KGCreationSettings + ) -> KGCreationEstimationResponse: + + # todo: harmonize the document_id and id fields: postgres table contains document_id, but other places use id. + document_ids = [ + doc.id + for doc in ( + await self.db_provider.documents_in_collection(collection_id) + )["results"] + ] + + # TODO: Vecs schema naming got messed up somewhere. + schema_name = self._get_table_name("document_chunks").split(".")[0] + + query = f""" + SELECT document_id, COUNT(*) as chunk_count + FROM {schema_name}.{schema_name} + WHERE document_id = ANY($1) + GROUP BY document_id + """ + + chunk_counts = await self.fetch_query(query, [document_ids]) + + total_chunks = ( + sum(doc["chunk_count"] for doc in chunk_counts) + // kg_creation_settings.extraction_merge_count + ) # 4 chunks per llm + estimated_entities = ( + (total_chunks * 10, total_chunks * 20) + ) # 25 entities per 4 chunks + estimated_triples = ( + int(estimated_entities[0] * 1.25), + int(estimated_entities[1] * 1.5), + ) # Assuming 1.25 triples per entity on average + + estimated_llm_calls = ( + total_chunks * 2 + estimated_entities[0], + total_chunks * 2 + estimated_entities[1], + ) + + total_in_out_tokens = ( + 2000 * estimated_llm_calls[0] // 1000000, + 2000 * estimated_llm_calls[1] // 1000000, + ) # in millions + + estimated_cost = ( + total_in_out_tokens[0] * llm_cost_per_million_tokens(kg_creation_settings.generation_config.model), + total_in_out_tokens[1] * llm_cost_per_million_tokens(kg_creation_settings.generation_config.model), + ) + + total_time_in_minutes = ( + total_in_out_tokens[0] * 10 / 60, + total_in_out_tokens[1] * 10 / 60, + ) # 10 minutes per million tokens + + + return KGCreationEstimationResponse( + message="These are estimated ranges, actual values may vary. To run the KG creation process, run `create-graph` with `--run` in the cli, or `run_mode=\"run\"` in the client.", + document_count=len(document_ids), + number_of_jobs_created=len(document_ids) + 1, + total_chunks=total_chunks, + estimated_entities=self._get_str_estimation_output(estimated_entities), + estimated_triples=self._get_str_estimation_output(estimated_triples), + estimated_llm_calls=self._get_str_estimation_output(estimated_llm_calls), + estimated_total_in_out_tokens_in_millions=self._get_str_estimation_output(total_in_out_tokens), + estimated_cost_in_usd=self._get_str_estimation_output(estimated_cost), + estimated_total_time_in_minutes="Depends on your API key tier. Accurate estimate coming soon. Rough estimate: " + self._get_str_estimation_output(total_time_in_minutes), + ) + + async def get_enrichment_estimate( + self, collection_id: UUID, + kg_enrichment_settings: KGEnrichmentSettings + ) -> KGEnrichmentEstimationResponse: + + document_ids = [doc.id for doc in (await self.db_provider.documents_in_collection( + collection_id + ))["results"]] + + QUERY = f""" + SELECT COUNT(*) FROM {self._get_table_name("entity_embedding")} WHERE document_id = ANY($1); + """ + entity_count = (await self.fetch_query(QUERY, [document_ids]))[0][ + "count" + ] + + if not entity_count: + raise ValueError("No entities found in the graph. Please run `create-graph` first.") + + QUERY = f""" + SELECT COUNT(*) FROM {self._get_table_name("triple_raw")} WHERE document_id = ANY($1); + """ + triple_count = (await self.fetch_query(QUERY, [document_ids]))[0][ + "count" + ] + + estimated_llm_calls = (entity_count // 10, entity_count // 5) + estimated_total_in_out_tokens_in_millions = ( + 2000 * estimated_llm_calls[0] / 1000000, + 2000 * estimated_llm_calls[1] / 1000000, + ) + cost_per_million_tokens = llm_cost_per_million_tokens(kg_enrichment_settings.generation_config.model) + estimated_cost = ( + estimated_total_in_out_tokens_in_millions[0] * cost_per_million_tokens, + estimated_total_in_out_tokens_in_millions[1] * cost_per_million_tokens, + ) + + estimated_total_time = ( + estimated_total_in_out_tokens_in_millions[0] * 10 / 60, + estimated_total_in_out_tokens_in_millions[1] * 10 / 60, + ) + + return KGEnrichmentEstimationResponse( + message="These are estimated ranges, actual values may vary. To run the KG enrichment process, run `enrich-graph` with `--run` in the cli, or `run_mode=\"run\"` in the client.", + total_entities=entity_count, + total_triples=triple_count, + estimated_llm_calls=self._get_str_estimation_output(estimated_llm_calls), + estimated_total_in_out_tokens_in_millions=self._get_str_estimation_output(estimated_total_in_out_tokens_in_millions), + estimated_cost_in_usd=self._get_str_estimation_output(estimated_cost), + estimated_total_time_in_minutes="Depends on your API key tier. Accurate estimate coming soon. Rough estimate: " + self._get_str_estimation_output(estimated_total_time), + ) + async def create_vector_index(self): # need to implement this. Just call vector db provider's create_vector_index method. # this needs to be run periodically for every collection. @@ -647,13 +875,72 @@ async def get_schema(self): async def get_entities( self, - entity_ids: list[str] | None = None, + collection_id: UUID, + offset: int = 0, + limit: int = 100, + entity_ids: Optional[List[str]] = None, with_description: bool = False, - ): - raise NotImplementedError + ) -> dict: + conditions = [] + params = [collection_id] - async def get_triples(self, triple_ids: list[str] | None = None): - raise NotImplementedError + if entity_ids: + conditions.append(f"id = ANY(${len(params) + 1})") + params.append(entity_ids) + + params.extend([offset, limit]) + + query = f""" + SELECT id, name, category, description + FROM {self._get_table_name("entity_raw")} + WHERE document_id = ANY( + SELECT document_id FROM {self._get_table_name("document_info")} + WHERE $1 = ANY(collection_ids) + ) + {" AND " + " AND ".join(conditions) if conditions else ""} + ORDER BY id + OFFSET ${len(params) - 1} LIMIT ${len(params)} + """ + results = await self.fetch_query(query, params) + total_entries = await self.get_entity_count( + collection_id=collection_id + ) + + return {"results": results, "total_entries": total_entries} + + async def get_triples( + self, + collection_id: UUID, + offset: int = 0, + limit: int = 100, + triple_ids: Optional[List[str]] = None, + ) -> dict: + conditions = [] + params = [str(collection_id)] + + if triple_ids: + conditions.append(f"id = ANY(${len(params) + 1})") + params.append([str(ele) for ele in triple_ids]) + + query = f""" + SELECT id, subject, predicate, object + FROM {self._get_table_name("triple_raw")} + WHERE document_id = ANY( + SELECT document_id FROM {self._get_table_name("document_info")} + WHERE $1 = ANY(collection_ids) + ) + {" AND " + " AND ".join(conditions) if conditions else ""} + ORDER BY id + OFFSET ${len(params) + 1} LIMIT ${len(params) + 2} + """ + params.extend([offset, limit]) + + results = await self.fetch_query(query, params) + total_entries = await self.get_triple_count( + collection_id=collection_id + ) + + return {"results": results, "total_entries": total_entries} async def structured_query(self): raise NotImplementedError @@ -667,8 +954,68 @@ async def update_kg_search_prompt(self): async def upsert_triples(self): raise NotImplementedError - async def get_entity_count(self, document_id: str) -> int: + async def get_entity_count( + self, + collection_id: Optional[UUID] = None, + document_id: Optional[UUID] = None, + ) -> int: + if collection_id is None and document_id is None: + raise ValueError( + "Either collection_id or document_id must be provided." + ) + + conditions = [] + params = [] + + if collection_id: + conditions.append( + f""" + document_id = ANY( + SELECT document_id FROM {self._get_table_name("document_info")} + WHERE $1 = ANY(collection_ids) + ) + """ + ) + params.append(str(collection_id)) + else: + conditions.append("document_id = $1") + params.append(str(document_id)) + + QUERY = f""" + SELECT COUNT(*) FROM {self._get_table_name("entity_raw")} + WHERE {" AND ".join(conditions)} + """ + return (await self.fetch_query(QUERY, params))[0]["count"] + + async def get_triple_count( + self, + collection_id: Optional[UUID] = None, + document_id: Optional[UUID] = None, + ) -> int: + if collection_id is None and document_id is None: + raise ValueError( + "Either collection_id or document_id must be provided." + ) + + conditions = [] + params = [] + + if collection_id: + conditions.append( + f""" + document_id = ANY( + SELECT document_id FROM {self._get_table_name("document_info")} + WHERE $1 = ANY(collection_ids) + ) + """ + ) + params.append(str(collection_id)) + else: + conditions.append("document_id = $1") + params.append(str(document_id)) + QUERY = f""" - SELECT COUNT(*) FROM {self._get_table_name("entity_raw")} WHERE document_id = $1 + SELECT COUNT(*) FROM {self._get_table_name("triple_raw")} + WHERE {" AND ".join(conditions)} """ - return (await self.fetch_query(QUERY, [document_id]))[0]["count"] + return (await self.fetch_query(QUERY, params))[0]["count"] diff --git a/py/core/providers/prompts/r2r_prompts.py b/py/core/providers/prompts/r2r_prompts.py index 170356493..3fc882618 100644 --- a/py/core/providers/prompts/r2r_prompts.py +++ b/py/core/providers/prompts/r2r_prompts.py @@ -8,7 +8,7 @@ import yaml from core.base import DatabaseProvider, Prompt, PromptConfig, PromptProvider -from core.base.utils import generate_id_from_label +from core.base.utils import generate_default_prompt_id logger = logging.getLogger(__name__) @@ -203,7 +203,7 @@ async def add_prompt( modify_created_at: bool = False, ) -> None: prompt = Prompt( - prompt_id=generate_id_from_label(name), + prompt_id=generate_default_prompt_id(name), name=name, template=template, input_types=input_types, @@ -299,7 +299,7 @@ async def _save_prompt_to_database( await self.execute_query( query, [ - generate_id_from_label(prompt.name), + generate_default_prompt_id(prompt.name), prompt.name, prompt.template, json.dumps(prompt.input_types), diff --git a/py/core/utils/__init__.py b/py/core/utils/__init__.py index e43dc06c9..25b1b161a 100644 --- a/py/core/utils/__init__.py +++ b/py/core/utils/__init__.py @@ -4,9 +4,13 @@ format_relations, format_search_results_for_llm, format_search_results_for_stream, - generate_id_from_label, + generate_collection_id_from_name, + generate_default_user_collection_id, + generate_document_id, + generate_extraction_id, + generate_message_id, generate_run_id, - generate_user_document_id, + generate_user_id, increment_version, run_pipeline, to_async_generator, @@ -19,15 +23,19 @@ __all__ = [ "format_entity_types", "format_relations", - "generate_id_from_label", "format_search_results_for_stream", "format_search_results_for_llm", "generate_run_id", - "generate_user_document_id", + "generate_document_id", + "generate_extraction_id", + "generate_message_id", + "generate_collection_id_from_name", + "generate_user_id", "increment_version", "decrement_version", "run_pipeline", "to_async_generator", + "generate_default_user_collection_id", # Text splitter "RecursiveCharacterTextSplitter", "TextSplitter", diff --git a/py/poetry.lock b/py/poetry.lock index 488c2c690..0b1fd57a2 100644 --- a/py/poetry.lock +++ b/py/poetry.lock @@ -5610,4 +5610,4 @@ ingestion-bundle = ["beautifulsoup4", "markdown", "numpy", "openpyxl", "pypdf", [metadata] lock-version = "2.0" python-versions = ">=3.10,<3.13" -content-hash = "5c69411b316396adbf4267e0a9f531732d1758e58e8c37c27ca1581f3bcb4c00" \ No newline at end of file +content-hash = "5c69411b316396adbf4267e0a9f531732d1758e58e8c37c27ca1581f3bcb4c00" diff --git a/py/sdk/auth.py b/py/sdk/auth.py index 2ab8bace8..fc2716649 100644 --- a/py/sdk/auth.py +++ b/py/sdk/auth.py @@ -1,4 +1,5 @@ -from typing import Optional +from typing import Optional, Union +from uuid import UUID from .models import Token, UserResponse @@ -78,7 +79,9 @@ async def user(client) -> UserResponse: @staticmethod async def update_user( client, + user_id: Union[str, UUID], email: Optional[str] = None, + is_superuser: Optional[bool] = None, name: Optional[str] = None, bio: Optional[str] = None, profile_picture: Optional[str] = None, @@ -87,7 +90,9 @@ async def update_user( Updates the profile information for the currently authenticated user. Args: + user_id (Union[str, UUID]): The ID of the user to update. email (str, optional): The updated email for the user. + is_superuser (bool, optional): The updated superuser status for the user. name (str, optional): The updated name for the user. bio (str, optional): The updated bio for the user. profile_picture (str, optional): The updated profile picture URL for the user. @@ -96,7 +101,9 @@ async def update_user( UserResponse: The response from the server. """ data = { + "user_id": user_id, "email": email, + "is_superuser": is_superuser, "name": name, "bio": bio, "profile_picture": profile_picture, diff --git a/py/sdk/kg.py b/py/sdk/kg.py index 33a0f70e9..9ae1f1372 100644 --- a/py/sdk/kg.py +++ b/py/sdk/kg.py @@ -1,11 +1,13 @@ import json from typing import Optional, Union +from uuid import UUID from .models import ( KGCreationResponse, KGCreationSettings, KGEnrichmentResponse, KGEnrichmentSettings, + KGRunType, ) @@ -14,7 +16,8 @@ class KGMethods: @staticmethod async def create_graph( client, - collection_id: str, + collection_id: Optional[UUID] = None, + run_type: KGRunType = KGRunType.ESTIMATE, kg_creation_settings: Optional[Union[dict, KGCreationSettings]] = None, ) -> KGCreationResponse: """ @@ -26,18 +29,22 @@ async def create_graph( kg_creation_settings = {} data = { - "collection_id": collection_id, + "run_type": run_type, "kg_creation_settings": json.dumps(kg_creation_settings), } + if collection_id: + data["collection_id"] = collection_id + return await client._make_request("POST", "create_graph", json=data) @staticmethod async def enrich_graph( client, - collection_id: str, + collection_id: Optional[UUID] = None, + run_type: KGRunType = KGRunType.ESTIMATE, kg_enrichment_settings: Optional[ - Union[KGEnrichmentSettings, dict] + Union[dict, KGEnrichmentSettings] ] = None, ) -> KGEnrichmentResponse: """ @@ -55,8 +62,74 @@ async def enrich_graph( kg_enrichment_settings = {} data = { - "collection_id": collection_id, "kg_enrichment_settings": json.dumps(kg_enrichment_settings), + "run_type": run_type, } + if collection_id: + data["collection_id"] = collection_id + return await client._make_request("POST", "enrich_graph", json=data) + + @staticmethod + async def get_entities( + client, + collection_id: str, + offset: int = 0, + limit: int = 100, + entity_ids: Optional[list[str]] = None, + with_description: bool = False, + ) -> dict: + """ + Retrieve entities from the knowledge graph. + + Args: + collection_id (str): The ID of the collection to retrieve entities from. + offset (int): The offset for pagination. + limit (int): The limit for pagination. + entity_ids (Optional[List[str]]): Optional list of entity IDs to filter by. + with_description (bool): Whether to include entity descriptions in the response. + + Returns: + dict: A dictionary containing the retrieved entities and total count. + """ + params = { + "collection_id": collection_id, + "offset": offset, + "limit": limit, + "with_description": with_description, + } + if entity_ids: + params["entity_ids"] = ",".join(entity_ids) + + return await client._make_request("GET", "entities", params=params) + + @staticmethod + async def get_triples( + client, + collection_id: str, + offset: int = 0, + limit: int = 100, + triple_ids: Optional[list[str]] = None, + ) -> dict: + """ + Retrieve triples from the knowledge graph. + + Args: + collection_id (str): The ID of the collection to retrieve triples from. + offset (int): The offset for pagination. + limit (int): The limit for pagination. + triple_ids (Optional[List[str]]): Optional list of triple IDs to filter by. + + Returns: + dict: A dictionary containing the retrieved triples and total count. + """ + params = { + "collection_id": collection_id, + "offset": offset, + "limit": limit, + } + if triple_ids: + params["triple_ids"] = ",".join(triple_ids) + + return await client._make_request("GET", "triples", params=params) diff --git a/py/sdk/models.py b/py/sdk/models.py index c17c3ee93..0b40bd0a7 100644 --- a/py/sdk/models.py +++ b/py/sdk/models.py @@ -7,6 +7,7 @@ KGEntityResult, KGGlobalResult, KGRelationshipResult, + KGRunType, KGSearchMethod, KGSearchResult, KGSearchResultType, diff --git a/py/shared/abstractions/__init__.py b/py/shared/abstractions/__init__.py index 7bfb9a711..6cb7b57ca 100644 --- a/py/shared/abstractions/__init__.py +++ b/py/shared/abstractions/__init__.py @@ -7,7 +7,7 @@ DocumentInfo, DocumentType, IngestionStatus, - KGCreationStatus, + KGExtractionStatus, KGEnrichmentStatus, ) from .embedding import EmbeddingPurpose, default_embedding_prefixes @@ -21,7 +21,13 @@ RelationshipType, Triple, ) -from .kg import KGCreationSettings, KGEnrichmentSettings +from .kg import ( + KGCreationEstimationResponse, + KGCreationSettings, + KGEnrichmentEstimationResponse, + KGEnrichmentSettings, + KGRunType, +) from .llm import ( GenerationConfig, LLMChatCompletion, @@ -61,7 +67,7 @@ "DocumentExtraction", "DocumentInfo", "IngestionStatus", - "KGCreationStatus", + "KGExtractionStatus", "KGEnrichmentStatus", "DocumentType", # Embedding abstractions @@ -99,9 +105,13 @@ "VectorSearchResult", "VectorSearchSettings", "HybridSearchSettings", - # Restructure abstractions + # KG abstractions "KGCreationSettings", "KGEnrichmentSettings", + "KGEnrichmentEstimationResponse", + "KGExtraction", + "KGCreationEstimationResponse", + "KGRunType", # User abstractions "Token", "TokenData", diff --git a/py/shared/abstractions/document.py b/py/shared/abstractions/document.py index 87714cb6c..84fae5554 100644 --- a/py/shared/abstractions/document.py +++ b/py/shared/abstractions/document.py @@ -120,7 +120,7 @@ class IngestionStatus(str, Enum): SUCCESS = "success" -class KGCreationStatus(str, Enum): +class KGExtractionStatus(str, Enum): """Status of KG Creation per document.""" PENDING = "pending" @@ -150,7 +150,7 @@ class DocumentInfo(R2RSerializable): version: str size_in_bytes: int ingestion_status: IngestionStatus = IngestionStatus.PENDING - kg_extraction_status: KGCreationStatus = KGCreationStatus.PENDING + kg_extraction_status: KGExtractionStatus = KGExtractionStatus.PENDING created_at: Optional[datetime] = None updated_at: Optional[datetime] = None ingestion_attempt_number: Optional[int] = None diff --git a/py/shared/abstractions/graph.py b/py/shared/abstractions/graph.py index 6af5e20c5..253fe8a7f 100644 --- a/py/shared/abstractions/graph.py +++ b/py/shared/abstractions/graph.py @@ -3,6 +3,7 @@ import uuid from dataclasses import dataclass from typing import Any, Optional, Union +from uuid import UUID from pydantic import BaseModel @@ -45,17 +46,19 @@ class RelationshipType(R2RSerializable): class Entity(R2RSerializable): """An entity extracted from a document.""" - id: Optional[int] = None - category: str name: str + id: Optional[int] = None + category: Optional[str] = None description: Optional[str] = None description_embedding: Optional[list[float]] = None - name_embedding: Optional[list[float]] = None - graph_embedding: Optional[list[float]] = None community_numbers: Optional[list[str]] = None - extraction_ids: Optional[list[str]] = None - document_id: Optional[str] = None - rank: Optional[int] = None + extraction_ids: Optional[list[UUID]] = None + collection_id: Optional[UUID] = None + document_id: Optional[UUID] = None + # we don't use these yet + # name_embedding: Optional[list[float]] = None + # graph_embedding: Optional[list[float]] = None + # rank: Optional[int] = None attributes: Optional[Union[dict[str, Any], str]] = None def __str__(self): @@ -98,10 +101,10 @@ class Triple(BaseModel): predicate_embedding: list[float] | None = None """The semantic embedding for the relationship description (optional).""" - extraction_ids: list[str] = [] + extraction_ids: list[UUID] = [] """List of text unit IDs in which the relationship appears (optional).""" - document_id: str | None = None + document_id: UUID | None = None """Document ID in which the relationship appears (optional).""" attributes: dict[str, Any] | str = {} @@ -177,7 +180,7 @@ class Community(BaseModel): rank: float | None = 1.0 """Rank of the report, used for sorting (optional). Higher means more important""" - summary_embedding: list[float] | None = None + embedding: list[float] | None = None """The semantic (i.e. text) embedding of the report summary (optional).""" full_content_embedding: list[float] | None = None diff --git a/py/shared/abstractions/kg.py b/py/shared/abstractions/kg.py index ca6c15559..2561b51a9 100644 --- a/py/shared/abstractions/kg.py +++ b/py/shared/abstractions/kg.py @@ -1,12 +1,118 @@ +from enum import Enum + from pydantic import Field from .base import R2RSerializable from .llm import GenerationConfig +class KGRunType(Enum): + """Type of KG run.""" + + ESTIMATE = "estimate" + RUN = "run" + + +class KGCreationEstimationResponse(R2RSerializable): + """Response for knowledge graph creation estimation.""" + + message: str = Field( + default="", + description="The message to display to the user.", + ) + + document_count: int = Field( + default=-1, + description="The number of documents in the collection.", + ) + + number_of_jobs_created: int = Field( + default=-1, + description="The number of jobs created for the graph creation process.", + ) + + total_chunks: int = Field( + default=-1, + description="The estimated total number of chunks.", + ) + + estimated_entities: str = Field( + default="NA", + description="The estimated number of entities in the graph.", + ) + + estimated_triples: str = Field( + default="NA", + description="The estimated number of triples in the graph.", + ) + + estimated_llm_calls: str = Field( + default="NA", + description="The estimated number of LLM calls in millions.", + ) + + estimated_total_in_out_tokens_in_millions: str = Field( + default="NA", + description="The estimated total number of input and output tokens in millions.", + ) + + estimated_total_time_in_minutes: str = Field( + default="NA", + description="The estimated total time to run the graph creation process in minutes.", + ) + + estimated_cost_in_usd: str = Field( + default="NA", + description="The estimated cost to run the graph creation process in USD.", + ) + + +class KGEnrichmentEstimationResponse(R2RSerializable): + """Response for knowledge graph enrichment estimation.""" + + total_entities: int = Field( + default=-1, + description="The total number of entities in the graph.", + ) + + total_triples: int = Field( + default=-1, + description="The total number of triples in the graph.", + ) + + estimated_llm_calls: str = Field( + default="NA", + description="The estimated number of LLM calls.", + ) + + estimated_total_in_out_tokens_in_millions: str = Field( + default="NA", + description="The estimated total number of input and output tokens in millions.", + ) + + estimated_total_time_in_minutes: str = Field( + default="NA", + description="The estimated total time to run the graph enrichment process.", + ) + + estimated_cost_in_usd: str = Field( + default="NA", + description="The estimated cost to run the graph enrichment process.", + ) + + estimated_total_time_in_minutes: str = Field( + default="NA", + description="The estimated total time to run the graph enrichment process.", + ) + class KGCreationSettings(R2RSerializable): """Settings for knowledge graph creation.""" + run_mode: KGRunType = Field( + default=KGRunType.ESTIMATE, # or run + description="Run an estimate for the full graph creation process.", + ) + kg_triples_extraction_prompt: str = Field( default="graphrag_triples_extraction_few_shot", description="The prompt to use for knowledge graph extraction.", @@ -51,6 +157,11 @@ class KGCreationSettings(R2RSerializable): class KGEnrichmentSettings(R2RSerializable): """Settings for knowledge graph enrichment.""" + run_mode: str = Field( + default="estimate", # or run + description="Run an estimate for the full graph enrichment process.", + ) + skip_clustering: bool = Field( default=False, description="Whether to skip leiden clustering on the graph or not.", diff --git a/py/shared/abstractions/search.py b/py/shared/abstractions/search.py index 1a2c5b1cf..b0862e8a6 100644 --- a/py/shared/abstractions/search.py +++ b/py/shared/abstractions/search.py @@ -128,8 +128,7 @@ class KGSearchResult(R2RSerializable): KGEntityResult, KGRelationshipResult, KGCommunityResult, KGGlobalResult ] result_type: Optional[KGSearchResultType] = None - ids: Optional[list[UUID]] = None - document_ids: Optional[list[UUID]] = None + extraction_ids: Optional[list[UUID]] = None metadata: dict[str, Any] = {} class Config: @@ -137,8 +136,7 @@ class Config: "method": "local", "content": KGEntityResult.Config.json_schema_extra, "result_type": "entity", - "ids": ["c68dc72e-fc23-5452-8f49-d7bd46088a96"], - "document_ids": ["3e157b3a-8469-51db-90d9-52e7d896b49b"], + "extraction_ids": ["c68dc72e-fc23-5452-8f49-d7bd46088a96"], "metadata": {"associated_query": "What is the capital of France?"}, } @@ -287,6 +285,17 @@ def model_dump(self, *args, **kwargs): class KGSearchSettings(R2RSerializable): + + filters: dict[str, Any] = Field( + default_factory=dict, + description="Filters to apply to the KG search", + ) + + selected_collection_ids: list[UUID] = Field( + default_factory=list, + description="Collection IDs to search for", + ) + graphrag_map_system_prompt: str = Field( default="graphrag_map_system_prompt", description="The system prompt for the graphrag map prompt.", @@ -310,6 +319,8 @@ class KGSearchSettings(R2RSerializable): default_factory=GenerationConfig, description="Configuration for text generation during graph search.", ) + + # TODO: add these back in # entity_types: list = [] # relationships: list = [] diff --git a/py/shared/abstractions/user.py b/py/shared/abstractions/user.py index 9b2bc66ef..7cdaa625e 100644 --- a/py/shared/abstractions/user.py +++ b/py/shared/abstractions/user.py @@ -4,7 +4,7 @@ from pydantic import BaseModel, Field -from ..utils import generate_id_from_label +from ..utils import generate_default_user_collection_id class Collection(BaseModel): @@ -20,7 +20,7 @@ class Config: def __init__(self, **data): super().__init__(**data) if self.id is None: - self.id = generate_id_from_label(self.name) + self.id = generate_default_user_collection_id(self.name) class Token(BaseModel): diff --git a/py/shared/utils/__init__.py b/py/shared/utils/__init__.py index 8fe601043..7bf87e669 100644 --- a/py/shared/utils/__init__.py +++ b/py/shared/utils/__init__.py @@ -4,29 +4,41 @@ format_relations, format_search_results_for_llm, format_search_results_for_stream, + generate_collection_id_from_name, + generate_default_prompt_id, generate_default_user_collection_id, - generate_id_from_label, + generate_document_id, + generate_extraction_id, + generate_message_id, generate_run_id, - generate_user_document_id, + generate_user_id, increment_version, run_pipeline, to_async_generator, + llm_cost_per_million_tokens, ) from .splitter.text import RecursiveCharacterTextSplitter, TextSplitter __all__ = [ "format_entity_types", "format_relations", - "generate_id_from_label", "format_search_results_for_stream", "format_search_results_for_llm", + # ID generation "generate_run_id", - "generate_user_document_id", + "generate_document_id", + "generate_extraction_id", "generate_default_user_collection_id", + "generate_user_id", + "generate_collection_id_from_name", + "generate_message_id", + "generate_default_prompt_id", + # Other "increment_version", "decrement_version", "run_pipeline", "to_async_generator", + "llm_cost_per_million_tokens", # Text splitter "RecursiveCharacterTextSplitter", "TextSplitter", diff --git a/py/shared/utils/base_utils.py b/py/shared/utils/base_utils.py index 7cef3b9cb..aaabfe9c2 100644 --- a/py/shared/utils/base_utils.py +++ b/py/shared/utils/base_utils.py @@ -1,6 +1,7 @@ import asyncio import json import logging +from datetime import datetime from typing import TYPE_CHECKING, Any, AsyncGenerator, Iterable from uuid import NAMESPACE_DNS, UUID, uuid4, uuid5 @@ -102,24 +103,68 @@ def format_search_results_for_stream( from ..pipeline.base_pipeline import AsyncPipeline +def _generate_id_from_label(label: str) -> UUID: + return uuid5(NAMESPACE_DNS, label) + + def generate_run_id() -> UUID: - return uuid5(NAMESPACE_DNS, str(uuid4())) + """ + Generates a unique run id + """ + return _generate_id_from_label(str(uuid4())) -def generate_id_from_label(label: str) -> UUID: - return uuid5(NAMESPACE_DNS, label) +def generate_document_id(filename: str, user_id: UUID) -> UUID: + """ + Generates a unique document id from a given filename and user id + """ + return _generate_id_from_label(f'{filename.split("/")[-1]}-{str(user_id)}') + + +def generate_extraction_id( + document_id: UUID, iteration: int = 0, version: str = "0" +) -> UUID: + """ + Generates a unique extraction id from a given document id and iteration + """ + return _generate_id_from_label(f"{str(document_id)}-{iteration}-{version}") def generate_default_user_collection_id(user_id: UUID) -> UUID: - """Generate the default collection ID for a user.""" - return generate_id_from_label(f"{user_id}") + """ + Generates a unique collection id from a given user id + """ + return _generate_id_from_label(str(user_id)) -def generate_user_document_id(filename: str, user_id: UUID) -> UUID: +def generate_collection_id_from_name(collection_name: str) -> UUID: """ - Generates a unique document id from a given filename and user id + Generates a unique collection id from a given collection name + """ + return _generate_id_from_label(collection_name) + + +def generate_user_id(email: str) -> UUID: + """ + Generates a unique user id from a given email + """ + return _generate_id_from_label(email) + + +def generate_message_id(query: str, completion_start_time: datetime) -> UUID: + """ + Generates a unique message id from a given query and completion start time + """ + return _generate_id_from_label( + f"{query}-{completion_start_time.isoformat()}" + ) + + +def generate_default_prompt_id(prompt_name: str) -> UUID: + """ + Generates a unique prompt id """ - return generate_id_from_label(f'{filename.split("/")[-1]}-{str(user_id)}') + return _generate_id_from_label(prompt_name) async def to_async_generator( @@ -162,3 +207,26 @@ def format_entity_types(entity_types: list[EntityType]) -> str: def format_relations(predicates: list[RelationshipType]) -> str: lines = [predicate.name for predicate in predicates] return "\n".join(lines) + +def llm_cost_per_million_tokens(model: str, input_output_ratio: float = 2) -> float: + """ + Returns the cost per million tokens for a given model and input/output ratio. + + Input/Output ratio is the ratio of input tokens to output tokens. + + """ + + # improving this to use provider in the future + + model = model.split("/")[-1] # simplifying assumption + cost_dict = { + "gpt-4o-mini": (0.15, 0.6), + "gpt-4o": (2.5, 10), + } + + if model in cost_dict: + return (cost_dict[model][0] * input_output_ratio * cost_dict[model][1])/(1 + input_output_ratio) + else: + # use gpt-4o as default + logger.warning(f"Unknown model: {model}. Using gpt-4o as default.") + return (cost_dict["gpt-4o"][0] * input_output_ratio * cost_dict["gpt-4o"][1])/(1 + input_output_ratio) \ No newline at end of file diff --git a/py/tests/conftest.py b/py/tests/conftest.py index 84def65c4..e8a3b83ca 100644 --- a/py/tests/conftest.py +++ b/py/tests/conftest.py @@ -16,7 +16,6 @@ LoggingConfig, Vector, VectorEntry, - generate_id_from_label, ) from core.providers import ( BCryptProvider, diff --git a/py/tests/core/providers/database/relational/test_document_db.py b/py/tests/core/providers/database/relational/test_document_db.py index add3aa7f4..459d1f842 100644 --- a/py/tests/core/providers/database/relational/test_document_db.py +++ b/py/tests/core/providers/database/relational/test_document_db.py @@ -7,7 +7,7 @@ DocumentInfo, DocumentType, IngestionStatus, - KGCreationStatus, + KGExtractionStatus, KGEnrichmentStatus, ) @@ -31,7 +31,7 @@ async def test_upsert_documents_overview(temporary_postgres_db_provider): version="1.0", size_in_bytes=1024, ingestion_status=IngestionStatus.PENDING, - kg_extraction_status=KGCreationStatus.PENDING, + kg_extraction_status=KGExtractionStatus.PENDING, ) await temporary_postgres_db_provider.relational.upsert_documents_overview( document_info @@ -89,7 +89,7 @@ async def test_delete_from_documents_overview(temporary_postgres_db_provider): version="1.0", size_in_bytes=1024, ingestion_status=IngestionStatus.PENDING, - kg_extraction_status=KGCreationStatus.PENDING, + kg_extraction_status=KGExtractionStatus.PENDING, ) await temporary_postgres_db_provider.relational.upsert_documents_overview( document_info @@ -120,7 +120,7 @@ async def test_get_documents_overview(temporary_postgres_db_provider): version="1.0", size_in_bytes=1024, ingestion_status=IngestionStatus.PENDING, - kg_extraction_status=KGCreationStatus.PENDING, + kg_extraction_status=KGExtractionStatus.PENDING, ) document_info2 = DocumentInfo( id=UUID("00000000-0000-0000-0000-000000000004"), @@ -132,7 +132,7 @@ async def test_get_documents_overview(temporary_postgres_db_provider): version="1.0", size_in_bytes=2048, ingestion_status=IngestionStatus.SUCCESS, - kg_extraction_status=KGCreationStatus.PENDING, + kg_extraction_status=KGExtractionStatus.PENDING, ) await temporary_postgres_db_provider.relational.upsert_documents_overview( [document_info1, document_info2] diff --git a/services/unstructured/main.py b/services/unstructured/main.py index c76327936..70e65bfa8 100644 --- a/services/unstructured/main.py +++ b/services/unstructured/main.py @@ -4,7 +4,7 @@ import logging import os from io import BytesIO -from typing import Dict, List, Optional +from typing import Dict, List from fastapi import FastAPI, HTTPException from pydantic import BaseModel diff --git a/templates/agentic_rag_chatbot/web-app/public/data/r2r-installation.html b/templates/agentic_rag_chatbot/web-app/public/data/r2r-installation.html index 6bbf10318..07783fdf9 100644 --- a/templates/agentic_rag_chatbot/web-app/public/data/r2r-installation.html +++ b/templates/agentic_rag_chatbot/web-app/public/data/r2r-installation.html @@ -71,8 +71,8 @@

export POSTGRES_HOST=$YOUR_POSTGRES_HOST export POSTGRES_PORT=$YOUR_POSTGRES_PORT export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME - export POSTGRES_VECS_COLLECTION=$YOUR_VECS_COLLECTION # see note below -

The POSTGRES_VECS_COLLECTION environment variable defines the collection within your Postgres database where R2R related tables reside. If the specified collection does not exist then it will be created by R2R during initialization.

+

For more information on getting started with R2R, check out the quickstart and the walkthrough for even greater depth.

diff --git a/templates/ycombinator_graphrag/python-backend/main.py b/templates/ycombinator_graphrag/python-backend/main.py index d45eb2510..3850f6e80 100644 --- a/templates/ycombinator_graphrag/python-backend/main.py +++ b/templates/ycombinator_graphrag/python-backend/main.py @@ -33,9 +33,6 @@ def get_dataset(dataset_name, save_folder = '.data', split = "train", column_nam f.write(item[column_name]) yield file_path -def generate_id_from_label(label: str) -> uuid.UUID: - return uuid.uuid5(uuid.NAMESPACE_DNS, label) - def wait_till_ready(status_var, status_value): while True: documents_overview = client.documents_overview(limit=1000)['results'] diff --git a/templates/ycombinator_graphrag/web-app/public/data/r2r-installation.html b/templates/ycombinator_graphrag/web-app/public/data/r2r-installation.html index 6bbf10318..07783fdf9 100644 --- a/templates/ycombinator_graphrag/web-app/public/data/r2r-installation.html +++ b/templates/ycombinator_graphrag/web-app/public/data/r2r-installation.html @@ -71,8 +71,8 @@

export POSTGRES_HOST=$YOUR_POSTGRES_HOST export POSTGRES_PORT=$YOUR_POSTGRES_PORT export POSTGRES_DBNAME=$YOUR_POSTGRES_DBNAME - export POSTGRES_VECS_COLLECTION=$YOUR_VECS_COLLECTION # see note below -

The POSTGRES_VECS_COLLECTION environment variable defines the collection within your Postgres database where R2R related tables reside. If the specified collection does not exist then it will be created by R2R during initialization.

+

For more information on getting started with R2R, check out the quickstart and the walkthrough for even greater depth.