Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #990

Merged
merged 5 commits into from
Aug 27, 2024
Merged

Dev #990

merged 5 commits into from
Aug 27, 2024

Conversation

emrgnt-cmplxty
Copy link
Contributor

@emrgnt-cmplxty emrgnt-cmplxty commented Aug 27, 2024

🚀 This description was created by Ellipsis for commit 09ffa21

Summary:

This pull request enhances the R2R system with CLI improvements, Docker setup changes, knowledge graph updates, and document handling enhancements, while updating configurations and dependencies.

Key points:

  • Added --base-url option to py/cli/command_group.py for API configuration.
  • Changed default confirmation behavior in py/cli/utils/docker_utils.py for Docker setup.
  • Updated format_search_results_for_llm in py/core/agent/rag.py to use result.text.
  • Introduced CHARACTER method in py/core/base/providers/chunking.py.
  • Set DEFAULT_SEPARATOR in py/core/base/utils/splitter/text.py for CharacterTextSplitter.
  • Removed [embedding], [ingestion], and [database] sections from py/core/configs/neo4j_kg.toml.
  • Simplified entity and relationship types in py/core/examples/scripts/advanced_kg_cookbook.py.
  • Made kg_enrichment_pipeline optional in py/core/main/abstractions.py.
  • Added database_provider to create_kg_pipe in py/core/main/assembly/factory.py.
  • Limited file ingestion to 100 files in py/core/main/services/ingestion_service.py.
  • Updated api_base in py/core/parsers/media/audio_parser.py and openai_helpers.py.
  • Enhanced logging in py/core/pipelines/graph_enrichment.py and kg_extraction_pipe.py.
  • Modified upsert_nodes_and_relationships in py/core/providers/kg/neo4j/provider.py to return counts.
  • Updated project version to 3.0.5 in py/pyproject.toml.

Generated with ❤️ by ellipsis.dev

* checkin

* up

* done

* formatting
* udpate ingestion issues

* keep unbounded limit support, but default to bounded
* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter
@emrgnt-cmplxty emrgnt-cmplxty marked this pull request as ready for review August 27, 2024 14:27
@emrgnt-cmplxty emrgnt-cmplxty merged commit 4fbc436 into main Aug 27, 2024
2 of 3 checks passed
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 09ffa21 in 1 minute and 56 seconds

More details
  • Looked at 762 lines of code in 24 files
  • Skipped 1 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/parsers/media/audio_parser.py:13
  • Draft comment:
    Access the 'text' key in the transcription dictionary using transcription['text'] instead of transcription.text.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_B5sY3fPlWG66AtX5


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -43,7 +43,7 @@ def process_frame_with_openai(
def process_audio_with_openai(
audio_file,
api_key: str,
audio_api_base: str = "https://api.openai.com/v2/audio/transcriptions",
audio_api_base: str = "https://api.openai.com/v1/audio/transcriptions",
) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Access the 'text' key in the transcription dictionary using transcription['text'] instead of transcription.text.

emrgnt-cmplxty added a commit that referenced this pull request Aug 27, 2024
* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
shreyaspimpalgaonkar added a commit that referenced this pull request Aug 27, 2024
* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* Ingestion refactor (#991)

* fix test (#993)

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
shreyaspimpalgaonkar added a commit that referenced this pull request Sep 4, 2024
* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* Ingestion refactor (#991)

* fix test (#993)

* Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256.

* Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries.

* Update runners (#1007)

* Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j.

* Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider.

* Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe.

* hatchet works

* throw error if you run global search before enrichment

* Fix communities in local search

* turn off node desc embedding

* fix rag endpoint

* Increase hatchet msg size

* Update ingestion.py

* Refactor and clean up code formatting

* modified workflow

* Add graph creation functionality

* Refactor KG parameters and logging.

* review

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>
emrgnt-cmplxty added a commit that referenced this pull request Sep 6, 2024
* Feature/orchestration v0 (#1006)

* Feature/remove extra r2r abstraction (#996)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* removes an unnecessary abstraction

* sync changes

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* first commit

* move towards orchestration

* tweaks

* check in working ingestion

* move

* kg enrichment

* update future, postgres compose

* hatchetize ingestion pipeline

* ready for prime time

* finish

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Feature/add update files workflow (#1010)

* add update files workflow

* rm ingestion pipeline

* Feature/add enrichment flow (#1013)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* Feature/merged enrichment flow (#1016)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* Rm graspologic (#1034)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* Ingestion refactor (#991)

* fix test (#993)

* Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256.

* Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries.

* Update runners (#1007)

* Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j.

* Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider.

* Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe.

* hatchet works

* throw error if you run global search before enrichment

* Fix communities in local search

* turn off node desc embedding

* fix rag endpoint

* Increase hatchet msg size

* Update ingestion.py

* Refactor and clean up code formatting

* modified workflow

* Add graph creation functionality

* Refactor KG parameters and logging.

* review

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>

* Feature/add hatchet api key setup rebased (#1040)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* add hatchet api key setup

* cleanup

* add hatchet api key setup (#1037)

* add hatchet api key setup

* cleanup

* fix merge

* cleanups

* Feature/nolan logs refactored (#1041)

* Update runners (#1007)

* Check in logs

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Pull open PRs into dev (#1042)

* Pull in subnet and graph PR

* Add in templates

* Add python files for templates in cli (#1043)

* working hatchet integration (#1046)

* Update local_llm_neo4j_kg.toml

* Unstructured fixes (#1048)

* dockerfile

* Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling.

* clean up

* clean up dockerfile

* up

* Update sample file and clean code

* Add hatchet-sdk dependency in project.

* Update providers to include local option.

* Introduce File Provider (#1044)

* Draft of file provider

* Some cleanup

* Regenearte lock

* Stream it

* Use document_id as primary key

* Pydantic v2

* File provider finished

* Make 7272 the default port (#1045)

* Fix poetry.lock

* Precommit

* Enhance Dockerfile and add telemetry events (#1049)

* Fix File Provider (#1050)

* Fix

* Fix parsing pipeline

* working

* Feature/improve docs (#1051)

* improve documentation

* fix unstr

* add ingestion

* fix compose

* Add unstructured chunking configuration updates

* Revert "Add unstructured chunking configuration updates"

This reverts commit bae8c0b.

* Separate File Provider and Relational Database Provider (#1054)

* Move to self.execute_query

* Check in push

* Check in

* Get file provider running

* Actually use file provider

* Final touches

* undo changes in compose

* Patch/fix unstructured config rebased (#1059)

* fix unstr err

* tweak

* by_title default

* cleanups

* checkin

* merge

* Graph docs (#1058)

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* cleanup docs

* up

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Graph docs (#1060)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Remove duplicate method (#1061)

* update docs (#1064)

* rm extra prints

* fix img

* Fallback logic (#1062)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

* Implement fallback parsing mechanism

* Fallback parser

* Refactor code for readability and formatting

* Refactor and enhance media parsers

* Update response types in router.

* Remove telemetry and add logging

* Refactor logging format in parsers

* Refactor image and movie parsers

* Fix formatting in movie_parser.py

* Remove debug logging statements

* Remove debug logging for chunking config

* Rename debug option to build.

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Refactor response models for clarity

* Refactor response types in router.

* Feature/fix agent (#1065)

* ready for merge

* fix agent

* Patch/fix 123 (#1066)

* ready for merge

* fix agent

* fix import

* Feature/add orchestration draft (#1067)

* ready for merge

* fix agent

* fix import

* Fix some of the tests (#1068)

* Fix fallback parsing (#1069)

* Fix fallback parsing

* Fix

* Compose

* up

* Feature/iterate on docs (#1070)

* add orchestration docs

* docs iteration

* iterate

* add images

* add images

* Fix restructuring enum (#1071)

* Feature/formatting cleanup (#1072)

* add orchestration docs

* docs iteration

* iterate

* add images

* add images

* run pre-commit

* reclean

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>
shreyaspimpalgaonkar added a commit that referenced this pull request Sep 6, 2024
* Feature/orchestration v0 (#1006)

* Feature/remove extra r2r abstraction (#996)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* removes an unnecessary abstraction

* sync changes

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* first commit

* move towards orchestration

* tweaks

* check in working ingestion

* move

* kg enrichment

* update future, postgres compose

* hatchetize ingestion pipeline

* ready for prime time

* finish

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Feature/add update files workflow (#1010)

* add update files workflow

* rm ingestion pipeline

* Feature/add enrichment flow (#1013)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* Feature/merged enrichment flow (#1016)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* Refactor and update GraphRAG documentation

* Rm graspologic (#1034)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

* Patch/ollama base cli (#992)

* Dev (#990)

* moving kg construction to enrich-graph (#984)

* checkin

* up

* done

* formatting

* Feature/update ingestion issues (#985)

* udpate ingestion issues

* keep unbounded limit support, but default to bounded

* fix

* fmt

* Add support for CharacterTextSplitter (#986)

* Add support for CharacterTextSplitter

Allows R2R client to override the text splitter. Example:

```python
ingestion_response = client.ingest_files(
        file_paths=[file_path],
        metadatas=metadata,
        # optionally override chunking settings at runtime
        chunking_settings={
            "provider": "r2r",
            "method": "character",
            "extra_fields": {
                "separator": "---"
            },
        }
    )
```

* fixup! Add support for CharacterTextSplitter

* fixup! fixup! Add support for CharacterTextSplitter

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* fix ollama cli

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>

* Ingestion refactor (#991)

* fix test (#993)

* Increase Neo4j memory limits, add GDS plugin, and update LLM concurrency limit to 256.

* Update ingestion sample file, disable KG node extraction pipe, add community processing in clustering, and enhance graph clustering queries.

* Update runners (#1007)

* Refactor KG clustering process to simplify community processing and enhance entity-triple retrieval from Neo4j.

* Refactor Neo4j configuration for memory settings and update graph clustering logic in the KG provider.

* Fix pipeline by enabling node extraction and refactor community processing logic in KGClusteringPipe.

* hatchet works

* throw error if you run global search before enrichment

* Fix communities in local search

* turn off node desc embedding

* fix rag endpoint

* Increase hatchet msg size

* Update ingestion.py

* Refactor and clean up code formatting

* modified workflow

* Add graph creation functionality

* Refactor KG parameters and logging.

* review

* up

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>

* Feature/add hatchet api key setup rebased (#1040)

* add update files workflow

* rm ingestion pipeline

* v0 restructure orch

* kg orchestration

* finish kg orchestration

* update service

* merge

* cleanups

* add hatchet api key setup

* cleanup

* add hatchet api key setup (#1037)

* add hatchet api key setup

* cleanup

* fix merge

* cleanups

* Feature/nolan logs refactored (#1041)

* Update runners (#1007)

* Check in logs

---------

Co-authored-by: Nolan Tremelling <[email protected]>

* Pull open PRs into dev (#1042)

* Pull in subnet and graph PR

* Add in templates

* Add python files for templates in cli (#1043)

* working hatchet integration (#1046)

* Update local_llm_neo4j_kg.toml

* Unstructured fixes (#1048)

* dockerfile

* Update ingestion file with new sample URL and enhance unstructured chunking configuration and error handling.

* clean up

* clean up dockerfile

* up

* Update sample file and clean code

* Add hatchet-sdk dependency in project.

* Update providers to include local option.

* Introduce File Provider (#1044)

* Draft of file provider

* Some cleanup

* Regenearte lock

* Stream it

* Use document_id as primary key

* Pydantic v2

* File provider finished

* Make 7272 the default port (#1045)

* Fix poetry.lock

* Precommit

* Enhance Dockerfile and add telemetry events (#1049)

* Fix File Provider (#1050)

* Fix

* Fix parsing pipeline

* working

* Feature/improve docs (#1051)

* improve documentation

* fix unstr

* add ingestion

* fix compose

* Add unstructured chunking configuration updates

* Revert "Add unstructured chunking configuration updates"

This reverts commit bae8c0b.

* Separate File Provider and Relational Database Provider (#1054)

* Move to self.execute_query

* Check in push

* Check in

* Get file provider running

* Actually use file provider

* Final touches

* undo changes in compose

* Patch/fix unstructured config rebased (#1059)

* fix unstr err

* tweak

* by_title default

* cleanups

* checkin

* merge

* Graph docs (#1058)

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* cleanup docs

* up

---------

Co-authored-by: Shreyas Pimpalgaonkar <[email protected]>

* Graph docs (#1060)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Remove duplicate method (#1061)

* update docs (#1064)

* rm extra prints

* fix img

* Fallback logic (#1062)

* fix unstr err

* tweak

* by_title default

* cleanups

* Add document chunks and enrich graph endpoints.

* up

* Add KG creation and enrichment responses

* checkin

* merge

* up

* Remove duplicate UnstructuredChunkingConfig entry.

* Remove unused kg_search settings.

* Refactor knowledge graph settings handling.

* Update image and clean up logs.

* Implement fallback parsing mechanism

* Fallback parser

* Refactor code for readability and formatting

* Refactor and enhance media parsers

* Update response types in router.

* Remove telemetry and add logging

* Refactor logging format in parsers

* Refactor image and movie parsers

* Fix formatting in movie_parser.py

* Remove debug logging statements

* Remove debug logging for chunking config

* Rename debug option to build.

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>

* Refactor response models for clarity

* Refactor response types in router.

* Feature/fix agent (#1065)

* ready for merge

* fix agent

* Patch/fix 123 (#1066)

* ready for merge

* fix agent

* fix import

* Feature/add orchestration draft (#1067)

* ready for merge

* fix agent

* fix import

* bump (#1075)

* Enhance KG search capabilities and examples.

* Fix formatting and update documentation.

* Remove debug print statements in parsing.

* Fix content and search level values

* Update OpenAPI spec and responses.

* Add customizable RAG assistant example

* Update API documentation and search models.

---------

Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: emrgnt-cmplxty <[email protected]>
Co-authored-by: Manuel R. Ciosici <[email protected]>
Co-authored-by: Nolan Tremelling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants