Skip to content

Commit

Permalink
Update document
Browse files Browse the repository at this point in the history
  • Loading branch information
leng-yue committed Dec 18, 2023
1 parent ca11d4b commit c3f325e
Show file tree
Hide file tree
Showing 8 changed files with 184 additions and 55 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: ci
on:
push:
branches:
- main

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v4
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v3
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install mkdocs-material
- run: mkdocs gh-deploy --force
63 changes: 8 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,19 @@
# Fish Speech

**Documentation is under construction, English is not fully supported yet.**

[中文文档](README.zh.md)

This codebase is released under BSD-3-Clause License, and all models are released under CC-BY-NC-SA-4.0 License. Please refer to [LICENSE](LICENSE) for more details.

## Disclaimer
We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.

## Requirements
- GPU memory: 2GB (for inference), 24GB (for finetuning)
- System: Linux (full functionality), Windows (inference only, flash-attn is not supported, torch.compile is not supported)

Therefore, we strongly recommend to use WSL2 or docker to run the codebase for Windows users.

## Setup
```bash
# Basic environment setup
conda create -n fish-speech python=3.10
conda activate fish-speech
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# Install flash-attn (for linux)
pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation

# Install fish-speech
pip3 install -e .
```

## Inference (CLI)
Download required `vqgan` and `text2semantic` model from our huggingface repo.

```bash
wget https://huggingface.co/fishaudio/speech-lm-v1/raw/main/vqgan-v1.pth -O checkpoints/vqgan-v1.pth
wget https://huggingface.co/fishaudio/speech-lm-v1/blob/main/text2semantic-400m-v0.2-4k.pth -O checkpoints/text2semantic-400m-v0.2-4k.pth
```

Generate semantic tokens from text:
```bash
python tools/llama/generate.py \
--text "Hello" \
--num-samples 2 \
--compile
```

You may want to use `--compile` to fuse cuda kernels faster inference (~25 tokens/sec -> ~300 tokens/sec).
此代码库根据 BSD-3-Clause 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节.

Generate vocals from semantic tokens:
```bash
python tools/vqgan/inference.py -i codes_0.npy
```
## Disclaimer / 免责声明
We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
我们不对代码库的任何非法使用承担任何责任. 请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律的法律.

## Rust Data Server
Since loading and shuffle the dataset is very slow and memory consuming, we use a rust server to load and shuffle the dataset. The server is based on GRPC and can be installed by
## Documents / 文档
- [English](https://speech.fish.audio/en/)
- [中文](https://speech.fish.audio/zh/)

```bash
cd data_server
cargo build --release
```

## Credits
## Credits / 鸣谢
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
Expand Down
File renamed without changes
3 changes: 3 additions & 0 deletions docs/en/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Welcome to Fish Speech

English Document is under construction.
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
template: redirect.html
location: /zh/
---
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
mkdocs-material
85 changes: 85 additions & 0 deletions docs/zh/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# 介绍

此代码库根据 BSD-3-Clause 许可证发布, 所有模型根据 CC-BY-NC-SA-4.0 许可证发布。请参阅 [LICENSE](LICENSE) 了解更多细节.

<p align="center">
<img src="/assets/figs/diagram.png" width="75%">
</p>

## 免责声明
我们不对代码库的任何非法使用承担任何责任。请参阅您当地关于 DMCA (数字千年法案) 和其他相关法律的法律。

## 要求
- GPU内存: 2GB (用于推理), 16GB (用于微调)
- 系统: Linux (全部功能), Windows (仅推理, 不支持 `flash-attn`, 不支持 `torch.compile`)

因此, 我们强烈建议 Windows 用户使用 WSL2 或 docker 来运行代码库.

## 设置
```bash
# 基本环境设置
conda create -n fish-speech python=3.10
conda activate fish-speech
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# 安装 flash-attn (适用于linux)
pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation

# 安装 fish-speech
pip3 install -e .
```

## 推理 (命令行)

从我们的 huggingface 仓库下载所需的 `vqgan``text2semantic` 模型。

```bash
wget https://huggingface.co/fishaudio/speech-lm-v1/raw/main/vqgan-v1.pth -O checkpoints/vqgan-v1.pth
wget https://huggingface.co/fishaudio/speech-lm-v1/blob/main/text2semantic-400m-v0.2-4k.pth -O checkpoints/text2semantic-400m-v0.2-4k.pth
```

### 1. [可选] 从语音生成 prompt:
```bash
python tools/vqgan/inference.py -i paimon.wav --checkpoint-path checkpoints/vqgan-v1.pth
```

你应该能得到一个 `fake.npy` 文件.

### 2. 从文本生成语义 token:
```bash
python tools/llama/generate.py \
--text "要转换的文本" \
--prompt-text "你的参考文本" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/text2semantic-400m-v0.1-4k.pth" \
--num-samples 2 \
--compile
```

该命令会在工作目录下创建 `codes_N` 文件, 其中 N 是从 0 开始的整数.
您可能希望使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 个 token/秒 -> ~500 个 token/秒).

### 3. 从语义 token 生成人声:
```bash
python tools/vqgan/inference.py -i codes_0.npy --checkpoint-path checkpoints/vqgan-v1.pth
```

## Rust 数据服务器
由于加载和打乱数据集非常缓慢且占用内存, 因此我们使用 rust 服务器来加载和打乱数据. 该服务器基于 GRPC, 可以通过以下方式安装:

```bash
cd data_server
cargo build --release
```

## 更新日志

- 2023/12/17: 更新了 `text2semantic` 模型, 支持无音素模式.
- 2023/12/13: 测试版发布, 包含 VQGAN 模型和一个基于 LLAMA 的语言模型 (只支持音素).

## 致谢
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
53 changes: 53 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
site_name: Fish Speech
repo_url: https://github.com/fishaudio/fish-speech

theme:
name: material
language: en
features:
- navigation.instant
- navigation.instant.prefetch
- navigation.tracking
- search
- search.suggest
- search.highlight
- search.share

palette:
# Palette toggle for automatic mode
- media: "(prefers-color-scheme)"
toggle:
icon: material/brightness-auto
name: Switch to light mode

# Palette toggle for light mode
- media: "(prefers-color-scheme: light)"
scheme: default
toggle:
icon: material/brightness-7
name: Switch to dark mode
primary: black
font:
code: Roboto Mono

# Palette toggle for dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
toggle:
icon: material/brightness-4
name: Switch to light mode
primary: black
font:
code: Roboto Mono

extra:
homepage: https://speech.fish.audio
version:
provider: mike
alternate:
- name: English
link: /en/
lang: en
- name: 中文
link: /zh/
lang: zh

0 comments on commit c3f325e

Please sign in to comment.