Feature Request: Add support for Raspberry Pi Ai Kit #548

beingminimal · 2024-08-20T17:01:54Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Add support for raspberry pi ai kit to run llamafile.

Motivation

It will be much faster to run LLM on such a small and cheaper device.

Possible Implementation

Not aware

jart · 2024-08-20T17:44:10Z

llamafile already works great on Raspberry Pi 5. It goes very fast with CPU alone. We put a lot of work into making that happen.

Their AI accelerator module is cool, but support it isn't on our roadmap. I'm not even sure if it has functionality which would help transformer models. If anyone here understands its capabilities and could explain how it could potentially help make our matrix multiplications go faster, then I'm all ears. But I'm willing to bet we're already offering the best RPI support to you today, that's possible now.

beingminimal · 2024-08-21T03:46:35Z

@jart just little effort from my side and so posting ai reply only. i am not that much technical person into this.

The Raspberry Pi 5 AI Kit can potentially help make your matrix multiplications go faster for LLMs in several ways, leveraging its hardware and software capabilities:

Hardware Acceleration

RP2040 Microcontroller: While not as powerful as a GPU, the RP2040's dual-core architecture can be used to parallelize certain parts of the matrix multiplication process, offloading some compute from the main CPU.
Neural Compute Stick: If included in the AI Kit, the Neural Compute Stick can provide dedicated hardware acceleration for certain types of matrix operations, especially those commonly found in neural networks.

Software Optimization

TensorFlow Lite: The Raspberry Pi 5 is optimized to run TensorFlow Lite, a lightweight version of TensorFlow designed for edge devices. TensorFlow Lite includes optimized kernels for matrix operations that can significantly improve performance on the Raspberry Pi 5.
Python Libraries: NumPy and other Python libraries offer optimized functions for matrix operations that can take advantage of the Raspberry Pi 5's hardware.

Specific Techniques

Quantization: Quantization reduces the precision of the numbers used in the matrix calculations, allowing for faster computations and reduced memory usage. TensorFlow Lite supports quantization, which can be particularly effective on the Raspberry Pi 5.
Pruning: Pruning removes less important connections within the neural network, leading to smaller matrices and faster calculations.
Sparsity: Sparsity leverages the fact that many matrices in LLMs contain a large number of zero values. Optimized sparse matrix operations can significantly speed up calculations.

jart · 2024-08-21T03:59:26Z

So what you're telling me is that it's got a 32-bit ARM CPU on it with 2 cores. I doubt there's much advantage offloading to that. Plus having to use a proprietary SDK and drag and drop a special executable and reboot the thing for a program to run. It'd be simpler and platform-agnostic to just plug a second Raspberry Pi into your Raspberry Pi over the ethernet and we'll give you software that lets you cluster llamafile instances. Wouldn't you rather have that instead?

beingminimal · 2024-08-21T04:06:48Z

So what you're telling me is that it's got a 32-bit ARM CPU on it with 2 cores. I doubt there's much advantage offloading to that. Plus having to use a proprietary SDK and drag and drop a special executable and reboot the thing for a program to run. It'd be simpler and platform-agnostic to just plug a second Raspberry Pi into your Raspberry Pi over the ethernet and we'll give you software that lets you cluster llamafile instances. Wouldn't you rather have that instead?

ok got it, thanks for clarity.

devlux76 · 2024-08-24T05:11:56Z

So what you're telling me is that it's got a 32-bit ARM CPU on it with 2 cores. I doubt there's much advantage offloading to that. Plus having to use a proprietary SDK and drag and drop a special executable and reboot the thing for a program to run. It'd be simpler and platform-agnostic to just plug a second Raspberry Pi into your Raspberry Pi over the ethernet and we'll give you software that lets you cluster llamafile instances. Wouldn't you rather have that instead?

Ok that sounds awesome! I'm not the OP but I'm VERY interested in that!

beingminimal added the enhancement label Aug 20, 2024

jart added request to lend support and removed enhancement labels Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add support for Raspberry Pi Ai Kit #548

Feature Request: Add support for Raspberry Pi Ai Kit #548

beingminimal commented Aug 20, 2024

jart commented Aug 20, 2024

beingminimal commented Aug 21, 2024

jart commented Aug 21, 2024

beingminimal commented Aug 21, 2024

devlux76 commented Aug 24, 2024

Feature Request: Add support for Raspberry Pi Ai Kit #548

Feature Request: Add support for Raspberry Pi Ai Kit #548

Comments

beingminimal commented Aug 20, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

jart commented Aug 20, 2024

beingminimal commented Aug 21, 2024

jart commented Aug 21, 2024

beingminimal commented Aug 21, 2024

devlux76 commented Aug 24, 2024