Skip to content

Commit

Permalink
Update docs etc. (#524)
Browse files Browse the repository at this point in the history
* fully support ormsgpack

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* dependency

* torch==2.4.1 windows compilable

* Update docs

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove unused code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove autorerank

* api usage

* back slash

* fix docs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
AnyaCoder and pre-commit-ci[bot] committed Sep 12, 2024
1 parent b186c98 commit 9cb84a6
Show file tree
Hide file tree
Showing 11 changed files with 125 additions and 549 deletions.
125 changes: 65 additions & 60 deletions docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,66 +27,70 @@

## Windows Setup

Windows professional users may consider WSL2 or Docker to run the codebase.

Non-professional Windows users can consider the following methods to run the codebase without a Linux environment (with model compilation capabilities aka `torch.compile`):

<ol>
<li>Unzip the project package.</li>
<li>Click <code>install_env.bat</code> to install the environment.
<ul>
<li>You can decide whether to use a mirror site for downloads by editing the <code>USE_MIRROR</code> item in <code>install_env.bat</code>.</li>
<li><code>USE_MIRROR=false</code> downloads the latest stable version of <code>torch</code> from the original site. <code>USE_MIRROR=true</code> downloads the latest version of <code>torch</code> from a mirror site. The default is <code>true</code>.</li>
<li>You can decide whether to enable the compiled environment download by editing the <code>INSTALL_TYPE</code> item in <code>install_env.bat</code>.</li>
<li><code>INSTALL_TYPE=preview</code> downloads the preview version with the compiled environment. <code>INSTALL_TYPE=stable</code> downloads the stable version without the compiled environment.</li>
</ul>
</li>
<li>If step 2 has <code>USE_MIRROR=preview</code>, execute this step (optional, for activating the compiled model environment):
<ol>
<li>Download the LLVM compiler using the following links:
<ul>
<li><a href="https://huggingface.co/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true">LLVM-17.0.6 (original site download)</a></li>
<li><a href="https://hf-mirror.com/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true">LLVM-17.0.6 (mirror site download)</a></li>
<li>After downloading <code>LLVM-17.0.6-win64.exe</code>, double-click to install it, choose an appropriate installation location, and most importantly, check <code>Add Path to Current User</code> to add to the environment variables.</li>
<li>Confirm the installation is complete.</li>
</ul>
</li>
<li>Download and install the Microsoft Visual C++ Redistributable package to resolve potential .dll missing issues.
<ul>
<li><a href="https://aka.ms/vs/17/release/vc_redist.x64.exe">MSVC++ 14.40.33810.0 Download</a></li>
</ul>
</li>
<li>Download and install Visual Studio Community Edition to obtain MSVC++ build tools, resolving LLVM header file dependencies.
<ul>
<li><a href="https://visualstudio.microsoft.com/zh-hans/downloads/">Visual Studio Download</a></li>
<li>After installing Visual Studio Installer, download Visual Studio Community 2022.</li>
<li>Click the <code>Modify</code> button as shown below, find the <code>Desktop development with C++</code> option, and check it for download.</li>
<p align="center">
<img src="../assets/figs/VS_1.jpg" width="75%">
</p>
</ul>
</li>
<li>Install <a href="https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64">CUDA Toolkit 12</a></li>
</ol>
</li>
<li>Double-click <code>start.bat</code> to enter the Fish-Speech training inference configuration WebUI page.
<ul>
<li>(Optional) Want to go directly to the inference page? Edit the <code>API_FLAGS.txt</code> in the project root directory and modify the first three lines as follows:
<pre><code>--infer
# --api
# --listen ...
...</code></pre>
</li>
<li>(Optional) Want to start the API server? Edit the <code>API_FLAGS.txt</code> in the project root directory and modify the first three lines as follows:
<pre><code># --infer
--api
--listen ...
...</code></pre>
</li>
</ul>
</li>
<li>(Optional) Double-click <code>run_cmd.bat</code> to enter the conda/python command line environment of this project.</li>
</ol>
Professional Windows users may consider using WSL2 or Docker to run the codebase.

```bash
# Create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech

# Install pytorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install fish-speech
pip3 install -e .

# (Enable acceleration) Install triton-windows
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl
```

Non-professional Windows users can consider the following basic methods to run the project without a Linux environment (with model compilation capabilities, i.e., `torch.compile`):

1. Extract the project package.
2. Click `install_env.bat` to install the environment.
3. If you want to enable compilation acceleration, follow this step:
1. Download the LLVM compiler from the following links:
- [LLVM-17.0.6 (Official Site Download)](https://huggingface.co/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
- [LLVM-17.0.6 (Mirror Site Download)](https://hf-mirror.com/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
- After downloading `LLVM-17.0.6-win64.exe`, double-click to install, select an appropriate installation location, and most importantly, check the `Add Path to Current User` option to add the environment variable.
- Confirm that the installation is complete.
2. Download and install the Microsoft Visual C++ Redistributable to solve potential .dll missing issues:
- [MSVC++ 14.40.33810.0 Download](https://aka.ms/vs/17/release/vc_redist.x64.exe)
3. Download and install Visual Studio Community Edition to get MSVC++ build tools and resolve LLVM's header file dependencies:
- [Visual Studio Download](https://visualstudio.microsoft.com/zh-hans/downloads/)
- After installing Visual Studio Installer, download Visual Studio Community 2022.
- As shown below, click the `Modify` button and find the `Desktop development with C++` option to select and download.
4. Download and install [CUDA Toolkit 12.x](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64)
4. Double-click `start.bat` to open the training inference WebUI management interface. If needed, you can modify the `API_FLAGS` as prompted below.

!!! info "Optional"

Want to start the inference WebUI?

Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows:
```
--infer
# --api
# --listen ...
...
```

!!! info "Optional"

Want to start the API server?

Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows:

```
# --infer
--api
--listen ...
...
```

!!! info "Optional"

Double-click `run_cmd.bat` to enter the conda/python command line environment of this project.

## Linux Setup

Expand All @@ -107,6 +111,7 @@ apt install libsox-dev

## Changelog

- 2024/09/10: Updated Fish-Speech to 1.4 version, with an increase in dataset size and a change in the quantizer's n_groups from 4 to 8.
- 2024/07/02: Updated Fish-Speech to 1.2 version, remove VITS Decoder, and greatly enhanced zero-shot ability.
- 2024/05/10: Updated Fish-Speech to 1.1 version, implement VITS decoder to reduce WER and improve timbre similarity.
- 2024/04/22: Finished Fish-Speech 1.0 version, significantly modified VQGAN and LLAMA models.
Expand Down
51 changes: 11 additions & 40 deletions docs/en/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,51 +90,22 @@ python -m tools.post_api \

The above command indicates synthesizing the desired audio according to the reference audio information and returning it in a streaming manner.

If you need to randomly select reference audio based on `{SPEAKER}` and `{EMOTION}`, configure it according to the following steps:

### 1. Create a `ref_data` folder in the root directory of the project.

### 2. Create a directory structure similar to the following within the `ref_data` folder.

```
.
├── SPEAKER1
│ ├──EMOTION1
│ │ ├── 21.15-26.44.lab
│ │ ├── 21.15-26.44.wav
│ │ ├── 27.51-29.98.lab
│ │ ├── 27.51-29.98.wav
│ │ ├── 30.1-32.71.lab
│ │ └── 30.1-32.71.flac
│ └──EMOTION2
│ ├── 30.1-32.71.lab
│ └── 30.1-32.71.mp3
└── SPEAKER2
└─── EMOTION3
├── 30.1-32.71.lab
└── 30.1-32.71.mp3
```

That is, first place `{SPEAKER}` folders in `ref_data`, then place `{EMOTION}` folders under each speaker, and place any number of `audio-text pairs` under each emotion folder.

### 3. Enter the following command in the virtual environment

```bash
python tools/gen_ref.py

```

### 4. Call the API.
The following example demonstrates that you can use **multiple** reference audio paths and reference audio texts at once. Separate them with spaces in the command.

```bash
python -m tools.post_api \
--text "Text to be input" \
--speaker "${SPEAKER1}" \
--emotion "${EMOTION1}" \
--streaming True
--text "Text to input" \
--reference_audio "reference audio path1" "reference audio path2" \
--reference_text "reference audio text1" "reference audio text2"\
--streaming False \
--output "generated" \
--format "mp3"
```

The above example is for testing purposes only.
The above command synthesizes the desired `MP3` format audio based on the information from multiple reference audios and saves it as `generated.mp3` in the current directory.

## GUI Inference
[Download client](https://github.com/AnyaCoder/fish-speech-gui/releases/tag/v0.1.0)

## WebUI Inference

Expand Down
24 changes: 18 additions & 6 deletions docs/zh/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,26 @@

Windows 专业用户可以考虑 WSL2 或 docker 来运行代码库。

```bash
# 创建一个 python 3.10 虚拟环境, 你也可以用 virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech

# 安装 pytorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 安装 fish-speech
pip3 install -e .

# (开启编译加速) 安装 triton-windows
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl
```

Windows 非专业用户可考虑以下为免 Linux 环境的基础运行方法(附带模型编译功能,即 `torch.compile`):

1. 解压项目压缩包。
2. 点击 `install_env.bat` 安装环境。
- 可以通过编辑 `install_env.bat``USE_MIRROR` 项来决定是否使用镜像站下载。
- `USE_MIRROR=false` 使用原始站下载最新稳定版 `torch` 环境。`USE_MIRROR=true` 为从镜像站下载最新 `torch` 环境。默认为 `true`
- 可以通过编辑 `install_env.bat``INSTALL_TYPE` 项来决定是否启用可编译环境下载。
- `INSTALL_TYPE=preview` 下载开发版编译环境。`INSTALL_TYPE=stable` 下载稳定版不带编译环境。
3. 若第 2 步 `INSTALL_TYPE=preview` 则执行这一步(可跳过,此步为激活编译模型环境)
3. 若需要开启编译加速则执行这一步:
1. 使用如下链接下载 LLVM 编译器。
- [LLVM-17.0.6(原站站点下载)](https://huggingface.co/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
- [LLVM-17.0.6(镜像站点下载)](https://hf-mirror.com/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
Expand All @@ -49,7 +60,7 @@ Windows 非专业用户可考虑以下为免 Linux 环境的基础运行方法
- [Visual Studio 下载](https://visualstudio.microsoft.com/zh-hans/downloads/)
- 安装好 Visual Studio Installer 之后,下载 Visual Studio Community 2022
- 如下图点击`修改`按钮,找到`使用C++的桌面开发`项,勾选下载
4. 下载安装 [CUDA Toolkit 12](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64)
4. 下载安装 [CUDA Toolkit 12.x](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64)
4. 双击 `start.bat` 打开训练推理 WebUI 管理界面. 如有需要,可照下列提示修改`API_FLAGS`.

!!! info "可选"
Expand Down Expand Up @@ -158,6 +169,7 @@ apt install libsox-dev

## 更新日志

- 2024/09/10: 更新了 Fish-Speech 到 1.4, 增加了数据集大小, quantizer n_groups 4 -> 8.
- 2024/07/02: 更新了 Fish-Speech 到 1.2 版本,移除 VITS Decoder,同时极大幅度提升 zero-shot 能力.
- 2024/05/10: 更新了 Fish-Speech 到 1.1 版本,引入了 VITS Decoder 来降低口胡和提高音色相似度.
- 2024/04/22: 完成了 Fish-Speech 1.0 版本, 大幅修改了 VQGAN 和 LLAMA 模型.
Expand Down
50 changes: 10 additions & 40 deletions docs/zh/inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,52 +100,22 @@ python -m tools.post_api \

上面的命令表示按照参考音频的信息,合成所需的音频并流式返回.

如果需要通过`{说话人}``{情绪}`随机选择参考音频,那么就根据下列步骤配置:

### 1. 在项目根目录创建`ref_data`文件夹.

### 2. 在`ref_data`文件夹内创建类似如下结构的目录.

```
.
├── SPEAKER1
│ ├──EMOTION1
│ │ ├── 21.15-26.44.lab
│ │ ├── 21.15-26.44.wav
│ │ ├── 27.51-29.98.lab
│ │ ├── 27.51-29.98.wav
│ │ ├── 30.1-32.71.lab
│ │ └── 30.1-32.71.flac
│ └──EMOTION2
│ ├── 30.1-32.71.lab
│ └── 30.1-32.71.mp3
└── SPEAKER2
└─── EMOTION3
├── 30.1-32.71.lab
└── 30.1-32.71.mp3
```

也就是`ref_data`里先放`{说话人}`文件夹, 每个说话人下再放`{情绪}`文件夹, 每个情绪文件夹下放任意个`音频-文本对`

### 3. 在虚拟环境里输入

```bash
python tools/gen_ref.py
```

生成参考目录.

### 4. 调用 api.
下面的示例展示了, 可以一次使用**多个** `参考音频路径``参考音频的文本内容`。在命令里用空格隔开即可。

```bash
python -m tools.post_api \
--text "要输入的文本" \
--speaker "说话人1" \
--emotion "情绪1" \
--streaming True
--reference_audio "参考音频路径1" "参考音频路径2" \
--reference_text "参考音频的文本内容1" "参考音频的文本内容2"\
--streaming False \
--output "generated" \
--format "mp3"
```

以上示例仅供测试.
上面的命令表示按照多个参考音频的信息,合成所需的`MP3`格式音频,并保存为当前目录的`generated.mp3`文件。

## GUI 推理
[下载客户端](https://github.com/AnyaCoder/fish-speech-gui/releases/tag/v0.1.0)

## WebUI 推理

Expand Down
2 changes: 2 additions & 0 deletions fish_speech/train.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import os

os.environ["USE_LIBUV"] = "0"
import sys
from typing import Optional

Expand Down
8 changes: 5 additions & 3 deletions fish_speech/webui/manage.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
from __future__ import annotations

import os

os.environ["USE_LIBUV"] = "0"
import datetime
import html
import json
import os
import platform
import shutil
import signal
Expand Down Expand Up @@ -862,15 +864,15 @@ def llama_quantify(llama_weight, quantify_mode):
minimum=1,
maximum=32,
step=1,
value=4,
value=2,
)
llama_data_max_length_slider = gr.Slider(
label=i18n("Maximum Length per Sample"),
interactive=True,
minimum=1024,
maximum=4096,
step=128,
value=1024,
value=2048,
)
with gr.Row(equal_height=False):
llama_precision_dropdown = gr.Dropdown(
Expand Down
Loading

0 comments on commit 9cb84a6

Please sign in to comment.