Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev minor #1148

Merged
merged 37 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
0c09129
Feature/improve r2r telemetry (#1122)
emrgnt-cmplxty Sep 11, 2024
7111728
Feature/improve cli infra (#1123)
emrgnt-cmplxty Sep 11, 2024
338bdc7
Feature/add serve fallback to main (#1125)
emrgnt-cmplxty Sep 11, 2024
9f12b2c
Merge fragments (#1127)
shreyaspimpalgaonkar Sep 11, 2024
b64aead
troubleshooting docs (#1128)
emrgnt-cmplxty Sep 11, 2024
de9f9f5
troubleshooting docs (#1129)
emrgnt-cmplxty Sep 11, 2024
042bffb
add system diagram (#1130)
emrgnt-cmplxty Sep 11, 2024
80255b0
change to fast strategy by default (#1133)
emrgnt-cmplxty Sep 12, 2024
0f5a65f
Update parameter passing in js sdk (#1132)
NolanTrem Sep 12, 2024
a2e4173
Docs changes + add entity and relationship types (#1134)
shreyaspimpalgaonkar Sep 12, 2024
1f7361f
reduce verbosity
emrgnt-cmplxty Sep 12, 2024
f8de663
Feature/dev minor cleanups (#1135)
emrgnt-cmplxty Sep 12, 2024
e381a3c
rebase
emrgnt-cmplxty Sep 12, 2024
ac9d558
Update Tesseract OCR version in Dockerfile and change chunking strate…
shreyaspimpalgaonkar Sep 12, 2024
386e21e
Merge branch 'dev-minor' of https://github.com/SciPhi-AI/R2R into dev…
shreyaspimpalgaonkar Sep 12, 2024
65c3d7c
Merge remote-tracking branch 'origin/main' into dev-minor
shreyaspimpalgaonkar Sep 12, 2024
ce23a0c
Update chunking strategy to "auto" in r2r.toml
shreyaspimpalgaonkar Sep 12, 2024
5808846
Merge remote-tracking branch 'origin/main' into dev-minor
shreyaspimpalgaonkar Sep 12, 2024
79c7fc7
Feature/improve docker workflow (#1146)
emrgnt-cmplxty Sep 12, 2024
bb98f3a
Update docker_utils.py (#1143)
Renaulte Sep 12, 2024
91afece
adding test pypi (#1147)
emrgnt-cmplxty Sep 12, 2024
16eb957
Feature/add test pypi (#1149)
emrgnt-cmplxty Sep 12, 2024
873b660
Feature/add boto3 fix pypi (#1151)
emrgnt-cmplxty Sep 12, 2024
cd324d4
try new pub strat (#1153)
emrgnt-cmplxty Sep 12, 2024
009c930
update publish
emrgnt-cmplxty Sep 12, 2024
1243752
try compliant version for test pypi
emrgnt-cmplxty Sep 12, 2024
d1de904
fix build workflow
emrgnt-cmplxty Sep 12, 2024
82adc14
ad registyr image output
emrgnt-cmplxty Sep 12, 2024
772eb93
add reminder to rag agent reply
emrgnt-cmplxty Sep 12, 2024
d762261
update prompt (#1137)
shreyaspimpalgaonkar Sep 12, 2024
1be1851
fix cli and workflows
emrgnt-cmplxty Sep 12, 2024
61fdc51
fix cli and workflows
emrgnt-cmplxty Sep 12, 2024
a66c1d9
merge and improve server logic
emrgnt-cmplxty Sep 12, 2024
d00cf53
add filetypes (#1157)
shreyaspimpalgaonkar Sep 12, 2024
ec0c653
fix l74
emrgnt-cmplxty Sep 12, 2024
2d460c9
fix indent level
emrgnt-cmplxty Sep 13, 2024
198f872
bump pkg
emrgnt-cmplxty Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions .github/workflows/build-docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: Build and Publish Docker Image

on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to simulate (e.g., dev, dev-minor, main)'
required: false
default: 'main'
push:
branches:
- dev
- dev-minor

env:
REGISTRY_BASE: ragtoriches
jobs:
prepare:
runs-on: ubuntu-latest
outputs:
release_version: ${{ steps.version.outputs.RELEASE_VERSION }}
registry_image: ${{ steps.version.outputs.REGISTRY_IMAGE }}
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch || github.ref }} # This checks out the correct branch

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install toml package
run: pip install toml

- name: Determine version and registry
id: version
run: |
VERSION=$(python -c "import toml; print(toml.load('py/pyproject.toml')['tool']['poetry']['version'])")
RELEASE_VERSION=$VERSION

# Use input branch if this is a workflow dispatch
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
BRANCH="${{ github.event.inputs.branch }}"
else
BRANCH="${{ github.ref }}"
fi

# Determine the registry based on the branch
if [ "$BRANCH" == "refs/heads/dev" ] || [ "$BRANCH" == "dev" ]; then
REGISTRY_IMAGE="${{ env.REGISTRY_BASE }}/dev"
elif [ "$BRANCH" == "refs/heads/dev-minor" ] || [ "$BRANCH" == "dev-minor" ]; then
REGISTRY_IMAGE="${{ env.REGISTRY_BASE }}/dev-minor"
else
REGISTRY_IMAGE="${{ env.REGISTRY_BASE }}/prod"
fi

echo "RELEASE_VERSION=$RELEASE_VERSION" >> $GITHUB_OUTPUT
echo "REGISTRY_IMAGE=$REGISTRY_IMAGE" >> $GITHUB_OUTPUT

- name: Set matrix
id: set-matrix
run: |
echo "matrix={\"include\":[{\"platform\":\"amd64\",\"runner\":\"amd2\"},{\"platform\":\"arm64\",\"runner\":\"arm2\"}]}" >> $GITHUB_OUTPUT

build:
needs: prepare
strategy:
fail-fast: false
matrix: ${{fromJson(needs.prepare.outputs.matrix)}}
runs-on: ${{ matrix.runner }}
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Docker Auth
uses: docker/login-action@v3
with:
username: ${{ secrets.RAGTORICHES_DOCKER_UNAME }}
password: ${{ secrets.RAGTORICHES_DOCKER_TOKEN }}

- name: Build and push image
uses: docker/build-push-action@v5
with:
context: ./py
file: ./py/Dockerfile
platforms: ${{ matrix.platform }}
push: true
tags: |
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}-${{ matrix.platform }}
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:latest-${{ matrix.platform }}
provenance: false
sbom: false

create-manifest:
needs: [prepare, build]
runs-on: ubuntu-latest
steps:
- name: Docker Auth
uses: docker/login-action@v3
with:
username: ${{ secrets.RAGTORICHES_DOCKER_UNAME }}
password: ${{ secrets.RAGTORICHES_DOCKER_TOKEN }}

- name: Create and push multi-arch manifest
run: |
docker buildx imagetools create -t ${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }} \
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}-amd64 \
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}-arm64

docker buildx imagetools create -t ${{ needs.prepare.outputs.REGISTRY_IMAGE }}:latest \
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}-amd64 \
${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}-arm64

- name: Verify manifests
run: |
docker buildx imagetools inspect ${{ needs.prepare.outputs.REGISTRY_IMAGE }}:${{ needs.prepare.outputs.release_version }}
docker buildx imagetools inspect ${{ needs.prepare.outputs.REGISTRY_IMAGE }}:latest
123 changes: 0 additions & 123 deletions .github/workflows/build-main.yml

This file was deleted.

36 changes: 30 additions & 6 deletions .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,28 +2,52 @@ name: Publish to PyPI

on:
push:
tags:
- "*"
branches:
- dev
- dev-minor
workflow_dispatch:

jobs:
publish:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install poetry
working-directory: ./py
run: pip install poetry

- name: Build and publish
- name: Bump version for dev branches (TestPyPI)
if: github.event_name == 'push'
working-directory: ./py
run: |
version=$(poetry version -s)
new_version="${version}a$(date +'%Y%m%d%H%M')"
poetry version $new_version

- name: Build and publish to TestPyPI
if: github.event_name == 'push'
working-directory: ./py
run: |
poetry config repositories.testpypi https://test.pypi.org/legacy/
poetry build
poetry config pypi-token.testpypi ${{ secrets.TEST_PYPI_API_TOKEN }}
poetry publish -r testpypi -vvv
env:
PYTHON_KEYRING_BACKEND: keyring.backends.null.Keyring

- name: Build and publish to PyPI
if: github.event_name == 'workflow_dispatch'
working-directory: ./py
run: |
poetry build
poetry publish --username __token__ --password ${{ secrets.PYPI_API_TOKEN }}
poetry config pypi-token.pypi ${{ secrets.PYPI_API_TOKEN }}
poetry publish -vvv
env:
PYTHON_KEYRING_BACKEND: keyring.backends.null.Keyring
2 changes: 1 addition & 1 deletion docs/cookbooks/walkthrough.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ Key features include:
<AccordionGroup>

<Accordion icon="database" title="Ingest Data" defaultOpen={true}>
R2R offers a powerful data ingestion process that handles various file types including `html`, `pdf`, `png`, `mp3`, and `txt`. The ingestion process parses, chunks, embeds, and stores documents efficiently with a fully asynchronous pipeline. To demonstrate this functionality:
R2R offers a powerful data ingestion process that handles various file types including `html`, `pdf`, `png`, `mp3`, and `txt`. The full list of supported filetypes is available [here](/documentation/configuration/parsing_and_chunking). The ingestion process parses, chunks, embeds, and stores documents efficiently with a fully asynchronous pipeline. To demonstrate this functionality:

<Tabs>
<Tab title="CLI">
Expand Down
2 changes: 1 addition & 1 deletion docs/documentation/configuration/ingestion/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description: 'Configure your R2R ingestion pipeline'

R2R's ingestion pipeline efficiently processes various document formats, transforming them into searchable content. It seamlessly integrates with vector databases and knowledge graphs for optimal retrieval and analysis.

By default, R2R leverages Unstructured's open-source [ingestion platform](https://docs.unstructured.io/open-source/introduction/overview) to handle supported file types. For formats not covered by Unstructured, such as `.mp3`, R2R implements custom ingestion logic to ensure comprehensive support.
By default, R2R leverages Unstructured's open-source [ingestion platform](https://docs.unstructured.io/open-source/introduction/overview) to handle supported file types. For formats not covered by Unstructured, such as `.mp3`, R2R implements custom ingestion logic to ensure comprehensive support. Supported file types are listed [here](/documentation/configuration/parsing_and_chunking).

## Key Configuration Areas

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,39 @@ Available providers:

**R2R supports parsing for the following file types:**

- BMP (Bitmap Image)
- CSV (Comma-Separated Values)
- DOC (Microsoft Word Document)
- DOCX (Microsoft Word Document)
- EML (Electronic Mail)
- EPUB (Electronic Publication)
- GIF (Graphics Interchange Format)
- HEIC (High-Efficiency Image Format)
- HTM (HyperText Markup)
- HTML (HyperText Markup Language)
- JPEG (Joint Photographic Experts Group)
- JPG (Joint Photographic Experts Group)
- JSON (JavaScript Object Notation)
- MD (Markdown)
- MSG (Microsoft Outlook Message)
- MP3 (MPEG Audio Layer III)
- MP4 (MPEG-4 Part 14)
- ODT (Open Document Text)
- ORG (Org Mode)
- PDF (Portable Document Format)
- P7S (PKCS#7)
- PNG (Portable Network Graphics)
- PPT (PowerPoint)
- PPTX (Microsoft PowerPoint Presentation)
- RST (reStructured Text)
- RTF (Rich Text Format)
- SVG (Scalable Vector Graphics)
- TSV (Tab-Separated Values)
- TXT (Plain Text)
- XLS (Microsoft Excel Spreadsheet)
- XLSX (Microsoft Excel Spreadsheet)
- GIF (Graphics Interchange Format)
- JPEG/JPG (Joint Photographic Experts Group)
- PNG (Portable Network Graphics)
- SVG (Scalable Vector Graphics)
- MP3 (MPEG Audio Layer III)
- XML (Extensible Markup Language)
- TIFF (Tagged Image File Format)
- MP4 (MPEG-4 Part 14)

<Note> Parsing providers for an R2R system cannot be configured at runtime and are instead configured server side. </Note>
Expand Down
Loading
Loading