-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
moving kg construction to enrich-graph #984
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 48a441d in 52 seconds
More details
- Looked at
539
lines of code in14
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. py/cli/utils/docker_utils.py:124
- Draft comment:
Changing the default value ofclick.confirm
fromFalse
toTrue
alters the user interaction flow, making it more likely for users to proceed without explicit confirmation. Ensure this change aligns with the intended user experience. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about a change made in the diff, specifically the change in the default value ofclick.confirm
. However, the comment is speculative, asking to ensure the change aligns with the intended user experience, which violates the rule against speculative comments. The change itself is clear and does not require further confirmation or explanation from the PR author.
I might be overlooking the potential impact of the change on user experience, but the comment does not provide a clear, actionable code change suggestion.
The rules clearly state not to make speculative comments or ask for confirmation of intentions. The change is straightforward, and the comment does not suggest a specific code change.
The comment should be removed because it is speculative and does not suggest a specific code change.
2. py/core/providers/database/vector.py:470
- Draft comment:
Theget_document_chunks
method now allows unlimited results by settinglimit
to -1. Ensure that the calling code can handle large result sets appropriately to avoid performance issues. - Reason this comment was not posted:
Confidence changes required:50%
Theget_document_chunks
method invector.py
has been modified to allow unlimited results by settinglimit
to -1. This change is reflected in the SQL query construction, where theLIMIT
clause is conditionally included. This is a good practice for flexibility, but it should be ensured that the calling code handles potentially large result sets appropriately.
3. py/core/parsers/media/openai_helpers.py:43
- Draft comment:
Consider using an asynchronous HTTP client forprocess_audio_with_openai
to improve performance, especially if this function is called frequently or if network latency is high. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is relevant to the function that was changed in the diff, as it suggests a potential improvement in performance. The suggestion is actionable and clear, focusing on code quality and performance enhancement. It does not violate any of the rules for review comments.
The comment does not directly address the specific change made in the diff, which was the URL update. However, it is still relevant to the function's performance, which is a valid concern.
While the comment does not address the URL change, it provides a valuable suggestion for improving the function's performance, which is within the scope of code quality refactors.
Keep the comment as it provides a useful suggestion for improving the performance of the function, which is relevant to the changes made in the diff.
4. py/core/providers/kg/neo4j/provider.py:203
- Draft comment:
Theupsert_nodes_and_relationships
method now returns a tuple of lengths of nodes and relationships upserted. Ensure that any calling code is updated to handle this return type appropriately. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is about a change made in the diff, specifically the return type of a method. However, it asks the author to ensure that calling code is updated, which is speculative and not actionable. The comment does not suggest a specific code change within the diff itself.
I might be overlooking the importance of ensuring that calling code is updated, but the comment does not provide a specific action to take within the diff.
The rules specify not to ask the author to ensure behavior is intended or to double-check things, which this comment does.
The comment should be removed because it asks the author to ensure something outside the scope of the diff and does not suggest a specific code change.
Workflow ID: wflow_1wqBv2RATZ1HLSPk
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
emrgnt-cmplxty
added a commit
that referenced
this pull request
Aug 27, 2024
* moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]>
emrgnt-cmplxty
added a commit
that referenced
this pull request
Aug 27, 2024
* Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]>
shreyaspimpalgaonkar
added a commit
that referenced
this pull request
Aug 27, 2024
* moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter * Patch/ollama base cli (#992) * Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * Ingestion refactor (#991) * fix test (#993) --------- Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]>
emrgnt-cmplxty
added a commit
that referenced
this pull request
Aug 27, 2024
* moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * removes an unnecessary abstraction * sync changes --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
emrgnt-cmplxty
added a commit
that referenced
this pull request
Aug 30, 2024
* Feature/remove extra r2r abstraction (#996) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * removes an unnecessary abstraction * sync changes --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * first commit * move towards orchestration * tweaks * check in working ingestion * move * kg enrichment * update future, postgres compose * hatchetize ingestion pipeline * ready for prime time * finish --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
shreyaspimpalgaonkar
added a commit
that referenced
this pull request
Sep 4, 2024
* moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter * Patch/ollama base cli (#992) * Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * Ingestion refactor (#991) * fix test (#993) * Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256. * Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries. * Update runners (#1007) * Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j. * Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider. * Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe. * hatchet works * throw error if you run global search before enrichment * Fix communities in local search * turn off node desc embedding * fix rag endpoint * Increase hatchet msg size * Update ingestion.py * Refactor and clean up code formatting * modified workflow * Add graph creation functionality * Refactor KG parameters and logging. * review * up --------- Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> Co-authored-by: Nolan Tremelling <[email protected]>
emrgnt-cmplxty
added a commit
that referenced
this pull request
Sep 6, 2024
* Feature/orchestration v0 (#1006) * Feature/remove extra r2r abstraction (#996) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * removes an unnecessary abstraction * sync changes --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * first commit * move towards orchestration * tweaks * check in working ingestion * move * kg enrichment * update future, postgres compose * hatchetize ingestion pipeline * ready for prime time * finish --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * Feature/add update files workflow (#1010) * add update files workflow * rm ingestion pipeline * Feature/add enrichment flow (#1013) * add update files workflow * rm ingestion pipeline * v0 restructure orch * Feature/merged enrichment flow (#1016) * add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups * Rm graspologic (#1034) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter * Patch/ollama base cli (#992) * Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * Ingestion refactor (#991) * fix test (#993) * Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256. * Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries. * Update runners (#1007) * Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j. * Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider. * Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe. * hatchet works * throw error if you run global search before enrichment * Fix communities in local search * turn off node desc embedding * fix rag endpoint * Increase hatchet msg size * Update ingestion.py * Refactor and clean up code formatting * modified workflow * Add graph creation functionality * Refactor KG parameters and logging. * review * up --------- Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> Co-authored-by: Nolan Tremelling <[email protected]> * Feature/add hatchet api key setup rebased (#1040) * add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups * add hatchet api key setup * cleanup * add hatchet api key setup (#1037) * add hatchet api key setup * cleanup * fix merge * cleanups * Feature/nolan logs refactored (#1041) * Update runners (#1007) * Check in logs --------- Co-authored-by: Nolan Tremelling <[email protected]> * Pull open PRs into dev (#1042) * Pull in subnet and graph PR * Add in templates * Add python files for templates in cli (#1043) * working hatchet integration (#1046) * Update local_llm_neo4j_kg.toml * Unstructured fixes (#1048) * dockerfile * Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling. * clean up * clean up dockerfile * up * Update sample file and clean code * Add hatchet-sdk dependency in project. * Update providers to include local option. * Introduce File Provider (#1044) * Draft of file provider * Some cleanup * Regenearte lock * Stream it * Use document_id as primary key * Pydantic v2 * File provider finished * Make 7272 the default port (#1045) * Fix poetry.lock * Precommit * Enhance Dockerfile and add telemetry events (#1049) * Fix File Provider (#1050) * Fix * Fix parsing pipeline * working * Feature/improve docs (#1051) * improve documentation * fix unstr * add ingestion * fix compose * Add unstructured chunking configuration updates * Revert "Add unstructured chunking configuration updates" This reverts commit bae8c0b. * Separate File Provider and Relational Database Provider (#1054) * Move to self.execute_query * Check in push * Check in * Get file provider running * Actually use file provider * Final touches * undo changes in compose * Patch/fix unstructured config rebased (#1059) * fix unstr err * tweak * by_title default * cleanups * checkin * merge * Graph docs (#1058) * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * up * Remove duplicate UnstructuredChunkingConfig entry. * cleanup docs * up --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * Graph docs (#1060) * fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. --------- Co-authored-by: emrgnt-cmplxty <[email protected]> * Remove duplicate method (#1061) * update docs (#1064) * rm extra prints * fix img * Fallback logic (#1062) * fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. * Implement fallback parsing mechanism * Fallback parser * Refactor code for readability and formatting * Refactor and enhance media parsers * Update response types in router. * Remove telemetry and add logging * Refactor logging format in parsers * Refactor image and movie parsers * Fix formatting in movie_parser.py * Remove debug logging statements * Remove debug logging for chunking config * Rename debug option to build. --------- Co-authored-by: emrgnt-cmplxty <[email protected]> * Refactor response models for clarity * Refactor response types in router. * Feature/fix agent (#1065) * ready for merge * fix agent * Patch/fix 123 (#1066) * ready for merge * fix agent * fix import * Feature/add orchestration draft (#1067) * ready for merge * fix agent * fix import * Fix some of the tests (#1068) * Fix fallback parsing (#1069) * Fix fallback parsing * Fix * Compose * up * Feature/iterate on docs (#1070) * add orchestration docs * docs iteration * iterate * add images * add images * Fix restructuring enum (#1071) * Feature/formatting cleanup (#1072) * add orchestration docs * docs iteration * iterate * add images * add images * run pre-commit * reclean --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> Co-authored-by: Nolan Tremelling <[email protected]>
shreyaspimpalgaonkar
added a commit
that referenced
this pull request
Sep 6, 2024
* Feature/orchestration v0 (#1006) * Feature/remove extra r2r abstraction (#996) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * removes an unnecessary abstraction * sync changes --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * first commit * move towards orchestration * tweaks * check in working ingestion * move * kg enrichment * update future, postgres compose * hatchetize ingestion pipeline * ready for prime time * finish --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * Feature/add update files workflow (#1010) * add update files workflow * rm ingestion pipeline * Feature/add enrichment flow (#1013) * add update files workflow * rm ingestion pipeline * v0 restructure orch * Feature/merged enrichment flow (#1016) * add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups * Refactor and update GraphRAG documentation * Rm graspologic (#1034) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter * Patch/ollama base cli (#992) * Dev (#990) * moving kg construction to enrich-graph (#984) * checkin * up * done * formatting * Feature/update ingestion issues (#985) * udpate ingestion issues * keep unbounded limit support, but default to bounded * fix * fmt * Add support for CharacterTextSplitter (#986) * Add support for CharacterTextSplitter Allows R2R client to override the text splitter. Example: ```python ingestion_response = client.ingest_files( file_paths=[file_path], metadatas=metadata, # optionally override chunking settings at runtime chunking_settings={ "provider": "r2r", "method": "character", "extra_fields": { "separator": "---" }, } ) ``` * fixup! Add support for CharacterTextSplitter * fixup! fixup! Add support for CharacterTextSplitter --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * fix ollama cli --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> * Ingestion refactor (#991) * fix test (#993) * Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256. * Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries. * Update runners (#1007) * Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j. * Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider. * Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe. * hatchet works * throw error if you run global search before enrichment * Fix communities in local search * turn off node desc embedding * fix rag endpoint * Increase hatchet msg size * Update ingestion.py * Refactor and clean up code formatting * modified workflow * Add graph creation functionality * Refactor KG parameters and logging. * review * up --------- Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> Co-authored-by: Nolan Tremelling <[email protected]> * Feature/add hatchet api key setup rebased (#1040) * add update files workflow * rm ingestion pipeline * v0 restructure orch * kg orchestration * finish kg orchestration * update service * merge * cleanups * add hatchet api key setup * cleanup * add hatchet api key setup (#1037) * add hatchet api key setup * cleanup * fix merge * cleanups * Feature/nolan logs refactored (#1041) * Update runners (#1007) * Check in logs --------- Co-authored-by: Nolan Tremelling <[email protected]> * Pull open PRs into dev (#1042) * Pull in subnet and graph PR * Add in templates * Add python files for templates in cli (#1043) * working hatchet integration (#1046) * Update local_llm_neo4j_kg.toml * Unstructured fixes (#1048) * dockerfile * Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling. * clean up * clean up dockerfile * up * Update sample file and clean code * Add hatchet-sdk dependency in project. * Update providers to include local option. * Introduce File Provider (#1044) * Draft of file provider * Some cleanup * Regenearte lock * Stream it * Use document_id as primary key * Pydantic v2 * File provider finished * Make 7272 the default port (#1045) * Fix poetry.lock * Precommit * Enhance Dockerfile and add telemetry events (#1049) * Fix File Provider (#1050) * Fix * Fix parsing pipeline * working * Feature/improve docs (#1051) * improve documentation * fix unstr * add ingestion * fix compose * Add unstructured chunking configuration updates * Revert "Add unstructured chunking configuration updates" This reverts commit bae8c0b. * Separate File Provider and Relational Database Provider (#1054) * Move to self.execute_query * Check in push * Check in * Get file provider running * Actually use file provider * Final touches * undo changes in compose * Patch/fix unstructured config rebased (#1059) * fix unstr err * tweak * by_title default * cleanups * checkin * merge * Graph docs (#1058) * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * up * Remove duplicate UnstructuredChunkingConfig entry. * cleanup docs * up --------- Co-authored-by: Shreyas Pimpalgaonkar <[email protected]> * Graph docs (#1060) * fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. --------- Co-authored-by: emrgnt-cmplxty <[email protected]> * Remove duplicate method (#1061) * update docs (#1064) * rm extra prints * fix img * Fallback logic (#1062) * fix unstr err * tweak * by_title default * cleanups * Add document chunks and enrich graph endpoints. * up * Add KG creation and enrichment responses * checkin * merge * up * Remove duplicate UnstructuredChunkingConfig entry. * Remove unused kg_search settings. * Refactor knowledge graph settings handling. * Update image and clean up logs. * Implement fallback parsing mechanism * Fallback parser * Refactor code for readability and formatting * Refactor and enhance media parsers * Update response types in router. * Remove telemetry and add logging * Refactor logging format in parsers * Refactor image and movie parsers * Fix formatting in movie_parser.py * Remove debug logging statements * Remove debug logging for chunking config * Rename debug option to build. --------- Co-authored-by: emrgnt-cmplxty <[email protected]> * Refactor response models for clarity * Refactor response types in router. * Feature/fix agent (#1065) * ready for merge * fix agent * Patch/fix 123 (#1066) * ready for merge * fix agent * fix import * Feature/add orchestration draft (#1067) * ready for merge * fix agent * fix import * bump (#1075) * Enhance KG search capabilities and examples. * Fix formatting and update documentation. * Remove debug print statements in parsing. * Fix content and search level values * Update OpenAPI spec and responses. * Add customizable RAG assistant example * Update API documentation and search models. --------- Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: emrgnt-cmplxty <[email protected]> Co-authored-by: Manuel R. Ciosici <[email protected]> Co-authored-by: Nolan Tremelling <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Moved KG construction to
enrich-graph
pipeline, updated configurations, improved logging, and optimized document and vector operations.Key points:
py/cli/utils/docker_utils.py
toTrue
.py/core/configs/neo4j_kg.toml
.py/core/examples/scripts/advanced_kg_cookbook.py
to use string literals for entity types and relations.py/core/main/assembly/factory.py
.py/core/main/assembly/factory.py
.py/core/main/services/restructure_service.py
for enrichment.py/core/parsers/media/audio_parser.py
andpy/core/parsers/media/openai_helpers.py
.py/core/pipes/ingestion/kg_extraction_pipe.py
andpy/core/pipes/kg/extraction.py
.py/core/pipes/kg/node_extraction.py
andpy/core/pipes/kg/storage.py
.py/core/providers/database/document.py
andpy/core/providers/database/vector.py
.py/core/providers/kg/neo4j/provider.py
.Generated with ❤️ by ellipsis.dev