API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

yanxi0830 · 2024-09-20T19:23:17Z

Changes

Major Changes

✅ rebase on top of safety_refactor branch
✅ routers migration
- move memory router to use llama_stack/distribution/routers
- add RoutingTable class with to manage provider implementations
- add inference routing based on model param to use different models
✅ models/list & models/get endpoint to query information on what model is currently being served & corresponding api provider
- 1 adapter could support multiple models (e.g. fireworks/together/ollama), for this case, we list all supported models by the adapter
✅ safety/list_shields to query available shields in distribution

Minor Changes

✅ backward compatibility: single adapter without routing implementation still supported
✅ example configs to configure advanced routing table
✅ configure script to configure single adapter backed endpoints

Not in this PR

🚧 register_model endpoint to add new models
🚧 Client CLI for querying distribution models endpoint --> will be in llama-stack-apps
🚧 Update client SDK package to deal with new OpenAPI spec on run_shield

Tests

Routing Table

Test Inference Routing

python -m llama_stack.apis.inference.client 
python sdk_examples/inference/client.py localhost 5000 false

Meta-Llama3.1-8B-Instruct
Meta-Llama3.1-8B

Test Memory Routing

python -m llama_stack.apis.memory.client

vector memory type
switch to keyvalue memory type remote::pgvector

Test Agents

python -m llama_stack.apis.agents.client
python sdk_examples/agents/client.py

Test Safety

python -m llama_stack.apis.safety.client localhost 5000

Models Endpoint with Routing Table

using ollama supported models list

inference:
    provider_id: remote::ollama
    config:
      url: https://xxx

using default local

inference:
    provider_id: meta-reference
    config:
      model: Meta-Llama3.1-8B-Instruct
      quantization: null
      torch_seed: null
      max_seq_len: 4096
      max_batch_size: 1

Models Endpoint without Routing Table

using routing table

provider_routing_table:
  inference:
    - routing_key: Meta-Llama3.1-8B-Instruct
      provider_id: meta-reference
      config:
        model: Meta-Llama3.1-8B-Instruct
        quantization: null
        torch_seed: null
        max_seq_len: 4096
        max_batch_size: 1
    - routing_key: Meta-Llama3.1-8B
      provider_id: remote::ollama
      config:
        url: https:://ollama.com

run.yaml

built_at: '2024-09-18T13:41:17.656743'
image_name: local
docker_image: null
conda_env: local
apis_to_serve:
- inference
- memory
- telemetry
- agents
- safety
provider_map:
  telemetry:
    provider_id: meta-reference
    config: {}
  safety:
    provider_id: meta-reference
    config:
      llama_guard_shield:
        model: Llama-Guard-3-8B
        excluded_categories: []
        disable_input_check: false
        disable_output_check: false
      prompt_guard_shield:
        model: Prompt-Guard-86M
  agents:
    provider_id: meta-reference
    config: {}
provider_routing_table:
  inference:
    - routing_key: Meta-Llama3.1-8B-Instruct
      provider_id: meta-reference
      config:
        model: Meta-Llama3.1-8B-Instruct
        quantization: null
        torch_seed: null
        max_seq_len: 4096
        max_batch_size: 1
    # - routing_key: Meta-Llama3.1-8B
    #   provider_id: meta-reference
    #   config:
    #     model: Meta-Llama3.1-8B
    #     quantization: null
    #     torch_seed: null
    #     max_seq_len: 4096
    #     max_batch_size: 1
  memory:
    # - routing_key: keyvalue
    #   provider_id: remote::pgvector
    #   config:
    #     host: localhost
    #     port: 5432
    #     db: vectordb
    #     user: vectoruser
    #     password: xxxx
    - routing_key: vector
      provider_id: meta-reference
      config: {}

Sample adapter implementation for Bedrock implementation of Guardrails

This reverts commit fa04d86.

This reverts commit 164d0e2.

This reverts commit 6a95edc.

This reverts commit 756e98c.

This reverts commit bc4ac2c.

This reverts commit d8fab77.

This reverts commit 08379f5.

This reverts commit 34f0c11.

This reverts commit 73399fe.

… prompt

This reverts commit fa04d86.

yanxi0830 added 4 commits September 20, 2024 11:22

example config

9bb6ce5

add new resolve_impls_with_routing

7d4135d

migrate router for memory wip

cda6111

delete router from providers

9c33587

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 20, 2024

yanxi0830 and others added 25 commits September 20, 2024 13:02

clean up

3787408

simple run config

8df53ac

backward compatibility

308a1d1

stage tmp changes

a6be32b

update MemoryToolDefinition

06abd7e

Add a special header per-client call to parser provider data

32beecb

safety API cleanup part 1

7e40eea

Sample adapter implementation for Bedrock implementation of Guardrails

Update the meta reference safety implementation to match new API

82ddd85

Update safety implementation inside agents

d6a41d9

test safety against safety client

9252e81

Further bug fixes

a57411b

Add a special header per-client call to parser provider data

446914e

Revert "Add a special header per-client call to parser provider data"

e3fc36d

This reverts commit fa04d86.

Revert "stage tmp changes"

73133fb

This reverts commit 164d0e2.

Revert "backward compatibility"

5f9a7dc

This reverts commit 6a95edc.

Revert "simple run config"

cbd4fa6

This reverts commit 756e98c.

Revert "clean up"

ee77431

This reverts commit bc4ac2c.

Revert "delete router from providers"

665ab1f

This reverts commit d8fab77.

Revert "migrate router for memory wip"

39c27a3

This reverts commit 08379f5.

Revert "add new resolve_impls_with_routing"

32b9907

This reverts commit 34f0c11.

Revert "example config"

abe312c

This reverts commit 73399fe.

stage tmp changes

2dc14cb

skeleton unified routing table, api routers

85d927a

router table registration works

951cc9d

router method wrapper

04f480d

yanxi0830 and others added 4 commits September 21, 2024 16:40

memory routers working

f058025

Respect user sent instructions in agent config and add them to system…

8bf8c07

… prompt

models API

20a4302

supported models wip

c019902

yanxi0830 changed the title ~~[WIP] new router migration~~ API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor Sep 22, 2024

yanxi0830 and others added 8 commits September 22, 2024 00:01

models endpoint testing

0348f26

update MemoryToolDefinition

d29405d

Respect user sent instructions in agent config and add them to system…

8e757ed

… prompt

Add a special header per-client call to parser provider data

0b715c0

Add a special header per-client call to parser provider data

9380661

Revert "Add a special header per-client call to parser provider data"

bafb0ce

This reverts commit fa04d86.

Merge branch 'main' into new_router

d027eab

delete docs

c0f2f94

ashwinb changed the base branch from main to api_updates_3 September 22, 2024 07:11

ashwinb changed the base branch from api_updates_3 to safety_refactor September 22, 2024 07:12

yanxi0830 added 3 commits September 22, 2024 00:19

fix configure

b5217fe

Merge branch 'safety_refactor' into new_router

e42b555

update example run files

44fe099

yanxi0830 marked this pull request as ready for review September 22, 2024 07:25

yanxi0830 requested review from ashwinb, hardikjshah, dltn and raghotham as code owners September 22, 2024 07:25

yanxi0830 mentioned this pull request Sep 22, 2024

Models API with Inference Routing #82

Closed

add safety/list_shields to query available shields

b8914bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

yanxi0830 commented Sep 20, 2024 •

edited

Loading

API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

Are you sure you want to change the base?

API Updates: routing table, models endpoint, inference routing, rebase on safety_refactor #85

Conversation

yanxi0830 commented Sep 20, 2024 • edited Loading

Changes

Major Changes

Minor Changes

Not in this PR

Tests

Routing Table

Test Inference Routing

Test Memory Routing

Test Agents

Test Safety

Models Endpoint with Routing Table

Models Endpoint without Routing Table

run.yaml

yanxi0830 commented Sep 20, 2024 •

edited

Loading