diff --git a/sat/README.md b/sat/README.md
index f2e8dc5..476ebcc 100644
--- a/sat/README.md
+++ b/sat/README.md
@@ -1,4 +1,4 @@
-# SAT CogView3 & CogView-3-Plus
+# SAT CogView3 & CogView3-Plus
 
 [Read this in Chinese](./README_zh.md)
 
@@ -20,43 +20,49 @@ pip install -r requirements.txt
 
 The following links are for different model weights:
 
-### CogView-3-Plus-3B
+### CogView3-Plus-3B
 
 + transformer: https://cloud.tsinghua.edu.cn/d/f913eabd3f3b4e28857c
 + vae: https://cloud.tsinghua.edu.cn/d/af4cc066ce8a4cf2ab79
 
-### CogView-3-Base-3B
+### CogView3-Base-3B
 
 + transformer:
-    + cogview3-base: https://cloud.tsinghua.edu.cn/d/242b66daf4424fa99bf0
-    + cogview3-base-distill-4step: https://cloud.tsinghua.edu.cn/d/d10032a94db647f5aa0e
-    + cogview3-base-distill-8step: https://cloud.tsinghua.edu.cn/d/1598d4fe4ebf4afcb6ae
+    + cogview3-base-3b: https://cloud.tsinghua.edu.cn/d/242b66daf4424fa99bf0
+    + cogview3-base-3b-distill-4step: https://cloud.tsinghua.edu.cn/d/d10032a94db647f5aa0e
+    + cogview3-base-3b-distill-8step: https://cloud.tsinghua.edu.cn/d/1598d4fe4ebf4afcb6ae
   
   **These three versions are interchangeable. Choose the one that suits your needs and run it with the corresponding configuration file.**
 
 + vae: https://cloud.tsinghua.edu.cn/d/c8b9497fc5124d71818a/ 
 
-### CogView-3-Base-3B-Relay
+### CogView3-Base-3B-Relay
 
 + transformer:
-    + cogview3-relay: https://cloud.tsinghua.edu.cn/d/134951acced949c1a9e1/
-    + cogview3-relay-distill-2step: https://cloud.tsinghua.edu.cn/d/6a902976fcb94ac48402
-    + cogview3-relay-distill-1step: https://cloud.tsinghua.edu.cn/d/4d50ec092c64418f8418/
+    + cogview3-relay-3b: https://cloud.tsinghua.edu.cn/d/134951acced949c1a9e1/
+    + cogview3-relay-3b-distill-2step: https://cloud.tsinghua.edu.cn/d/6a902976fcb94ac48402
+    + cogview3-relay-3b-distill-1step: https://cloud.tsinghua.edu.cn/d/4d50ec092c64418f8418/
   
   **These three versions are interchangeable. Choose the one that suits your needs and run it with the corresponding configuration file.**
 
-+ vae: Same as CogView-3-Base-3B
++ vae: Same as CogView3-Base-3B
 
 Next, arrange the model files into the following format:
 
 ```
-.cogview3-plus-3b
+cogview3-plus-3b
 ├── transformer
 │   ├── 1
 │   │   └── mp_rank_00_model_states.pt
 │   └── latest
 └── vae
     └── imagekl_ch16.pt
+cogview3-base-3b
+├── 1
+│   └──mp_rank_00_model_states.pt
+└──latest
+cogview3-base-3b-vae
+└──sdxl_vae.safetensors
 ```
 
 Clone the T5 model. This model is not used for training or fine-tuning but is necessary. You can download the T5 model separately, but it must be in `safetensors` format, not `bin` format (otherwise an error may occur).
@@ -73,6 +79,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 With this setup, you will have a safetensor format T5 file, ensuring no errors during Deepspeed fine-tuning.
 
 ```
+t5-v1_1-xxl
 ├── added_tokens.json
 ├── config.json
 ├── model-00001-of-00002.safetensors
@@ -92,8 +99,8 @@ Here is an example using `CogView3-Base`, with explanations for some of the para
 ```yaml
 args:
   mode: inference
-  relay_model: False # Set to True when using CogView-3-Relay
-  load: "cogview3_base/transformer" # Path to the transformer folder
+  relay_model: False # Set to True when using CogView3-Relay
+  load: "cogview3-base-3b" # Path to the folder with latest
   batch_size: 8 # Number of images per inference
   grid_num_columns: 2 # Number of columns in grid.png output
   input_type: txt # Input can be from command line or TXT file
@@ -105,9 +112,9 @@ args:
   # sampling_image_size_x: 1024 (width)
   # sampling_image_size_y: 1024 (height)
 
-  output_dir: "outputs/cogview3_base-512x512"
-  # This section is for CogView-3-Relay. Set the input_dir to the folder with base model generated images.
-  # input_dir: "outputs/cogview3_base-512x512" 
+  output_dir: "outputs/cogview3_base_512x512"
+  # This section is for CogView3-Relay. Set the input_dir to the folder with base model generated images.
+  # input_dir: "outputs/cogview3_base_512x512" 
   deepspeed_config: { }
 
 model:
@@ -119,13 +126,14 @@ model:
         input_key: txt
         target: sgm.modules.encoders.modules.FrozenT5Embedder
         params:
-          model_dir: "google/t5-v1_1-xxl" # Path to T5 safetensors
+          model_dir: "t5-v1_1-xxl" # Path to T5 safetensors
           max_length: 225 # Maximum prompt length
 
   first_stage_config:
     target: sgm.models.autoencoder.AutoencodingEngine
     params:
-      ckpt_path: "cogview3_base/vae/imagekl_ch16.pt" # Path to VAE PT file
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors" # Path to VAE file
+      # ckpt_path: "cogview3-plus-3b/vae/imagekl_ch16.pt" # Path to CogView3-Plus VAE PT file
       monitor: val/rec_loss
 ```
 
@@ -170,16 +178,18 @@ python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml
 The output image format will be a folder. The folder name will consist of the sequence number and the first 15 characters of the prompt, containing multiple images. The number of images is based on the `batch` parameter. The structure should look like this:
 
 ```
-.
-├── 000000000.png
-├── 000000001.png
-├── 000000002.png
-├── 000000003.png
-├── 000000004.png
-├── 000000005.png
-├── 000000006.png
-├── 000000007.png
-└── grid.png
+outputs
+└── cogview3_base_512x512
+    └── 0_
+        ├── 000000000.png
+        ├── 000000001.png
+        ├── 000000002.png
+        ├── 000000003.png
+        ├── 000000004.png
+        ├── 000000005.png
+        ├── 000000006.png
+        ├── 000000007.png
+        └── grid.png
 
 1 directory, 9 files
 ```
diff --git a/sat/README_zh.md b/sat/README_zh.md
index 8433a72..4f7787b 100644
--- a/sat/README_zh.md
+++ b/sat/README_zh.md
@@ -1,4 +1,4 @@
-# SAT CogView3 && CogView-3-Plus
+# SAT CogView3 && CogView3-Plus
 
 本文件夹包含了使用 [SAT](https://github.com/THUDM/SwissArmyTransformer) 权重的推理代码，以及 SAT 权重的微调代码。
 
@@ -18,43 +18,49 @@ pip install -r requirements.txt
 
 以下链接为各个模型权重:
 
-### CogView-3-Plus-3B
+### CogView3-Plus-3B
 
 + transformer: https://cloud.tsinghua.edu.cn/d/f913eabd3f3b4e28857c
 + vae: https://cloud.tsinghua.edu.cn/d/af4cc066ce8a4cf2ab79
 
-### CogView-3-Base-3B
+### CogView3-Base-3B
 
 + transformer:
-    + cogview3-base: https://cloud.tsinghua.edu.cn/d/242b66daf4424fa99bf0
-    + cogview3-base-distill-4step: https://cloud.tsinghua.edu.cn/d/d10032a94db647f5aa0e
-    + cogview3-base-distill-8step: https://cloud.tsinghua.edu.cn/d/1598d4fe4ebf4afcb6ae
+    + cogview3-base-3b: https://cloud.tsinghua.edu.cn/d/242b66daf4424fa99bf0
+    + cogview3-base-3b-distill-4step: https://cloud.tsinghua.edu.cn/d/d10032a94db647f5aa0e
+    + cogview3-base-3b-distill-8step: https://cloud.tsinghua.edu.cn/d/1598d4fe4ebf4afcb6ae
     + 
   **以上三个版本为替换关系，选择适合自己的版本和对应的配置文件进行运行**
 
 + vae: https://cloud.tsinghua.edu.cn/d/c8b9497fc5124d71818a/ 
 
-### CogView-3-Base-3B-Relay
+### CogView3-Base-3B-Relay
 
 + transformer:
-    + cogview3-relay: https://cloud.tsinghua.edu.cn/d/134951acced949c1a9e1/
-    + cogview3-relay-distill-2step: https://cloud.tsinghua.edu.cn/d/6a902976fcb94ac48402
-    + cogview3-relay-distill-1step: https://cloud.tsinghua.edu.cn/d/4d50ec092c64418f8418/
+    + cogview3-relay-3b: https://cloud.tsinghua.edu.cn/d/134951acced949c1a9e1/
+    + cogview3-relay-3b-distill-2step: https://cloud.tsinghua.edu.cn/d/6a902976fcb94ac48402
+    + cogview3-relay-3b-distill-1step: https://cloud.tsinghua.edu.cn/d/4d50ec092c64418f8418/
   
   **以上三个版本为替换关系，选择适合自己的版本和对应的配置文件进行运行**
 
-+ vae: 与 CogView-3-Base-3B 相同
++ vae: 与 CogView3-Base-3B 相同
 
 接着，你需要将模型文件排版成如下格式：
 
 ```
-.cogview3-plus-3b
+cogview3-plus-3b
 ├── transformer
 │   ├── 1
 │   │   └── mp_rank_00_model_states.pt
 │   └── latest
 └── vae
     └── imagekl_ch16.pt
+cogview3-base-3b
+├── 1
+│   └──mp_rank_00_model_states.pt
+└──latest
+cogview3-base-3b-vae
+└──sdxl_vae.safetensors
 ```
 
 克隆 T5 模型，该模型不用做训练和微调，但是必须使用。这里，您可以单独下载T5模型，必须是`safetensors`类型，不能是`bin`
@@ -72,6 +78,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 通过上述方案，你将会得到一个 safetensor 格式的T5文件，确保在 Deepspeed微调过程中读入的时候不会报错。
 
 ```
+t5-v1_1-xxl
 ├── added_tokens.json
 ├── config.json
 ├── model-00001-of-00002.safetensors
@@ -91,22 +98,22 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 ```yaml
 args:
   mode: inference
-  relay_model: False # 当模型类型为 CogView-3-Relay 时，需要将该参数设置为 True
-  load: "cogview3_base/transformer" # 这里填写到transformer文件夹
+  relay_model: False # 当模型类型为 CogView3-Relay 时，需要将该参数设置为 True
+  load: "cogview3-base-3b" # 这里填写到含有latest的文件夹
   batch_size: 8 # 每次推理图像数
   grid_num_columns: 2 # 推理结束后，每个提示词文件夹下会有 grid.png 图片，该数字代表列数。
   input_type: txt # 可以选择命令行输入，或者TXT文件输入
   input_file: configs/test.txt # 如果使用命令行，不需要这个参数
-  fp16: True # CogView-3-Plus 模型 需要更换为 bf16 推理
+  fp16: True # CogView3-Plus 模型 需要更换为 bf16 推理
   # bf16: True
   sampling_image_size: 512 # 固定大小，支持512 * 512 分辨率图像
   # CogView-3-Plus 模型可以使用以下两个参数。
   # sampling_image_size_x: 1024 宽 
   # sampling_image_size_y: 1024 高
 
-  output_dir: "outputs/cogview3_base-512x512"
+  output_dir: "outputs/cogview3_base_512x512"
   # # 这个部分是给 CogView-3-Relay 模型使用的，需要将该参数设置为推理模型的输入文件夹，提示词建议与 base 模型生成图片时的提示词的一致。
-  # input_dir: "outputs/cogview3_base-512x512" 
+  # input_dir: "outputs/cogview3_base_512x512" 
   deepspeed_config: { }
 
 model:
@@ -118,13 +125,14 @@ model:
         input_key: txt
         target: sgm.modules.encoders.modules.FrozenT5Embedder
         params:
-          model_dir: "google/t5-v1_1-xxl" # T5 safetensors的绝对路径
+          model_dir: "t5-v1_1-xxl" # T5 safetensors的绝对路径
           max_length: 225 # 支持输入的提示词的最大长度
 
   first_stage_config:
     target: sgm.models.autoencoder.AutoencodingEngine
     params:
-      ckpt_path: "cogview3_base/vae/imagekl_ch16.pt" # VAE PT文件绝对路径
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors" # VAE文件绝对路径
+      # ckpt_path: "cogview3-plus-3b/vae/imagekl_ch16.pt" # CogView3-Plus VAE PT文件绝对路径
       monitor: val/rec_loss
 ```
 
@@ -170,18 +178,20 @@ python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml
 其结构应该如下：
 
 ```
-.
-├── 000000000.png
-├── 000000001.png
-├── 000000002.png
-├── 000000003.png
-├── 000000004.png
-├── 000000005.png
-├── 000000006.png
-├── 000000007.png
-└── grid.png
+outputs
+└── cogview3_base_512x512
+    └── 0_
+        ├── 000000000.png
+        ├── 000000001.png
+        ├── 000000002.png
+        ├── 000000003.png
+        ├── 000000004.png
+        ├── 000000005.png
+        ├── 000000006.png
+        ├── 000000007.png
+        └── grid.png
 
 1 directory, 9 files
 ```
 
-上述例子中，`batch` 为8。因此，有8张图像并带有一张`grid.png`的图像。
\ No newline at end of file
+上述例子中，`batch` 为8。因此，有8张图像并带有一张`grid.png`的图像。
diff --git a/sat/configs/cogview3_base.yaml b/sat/configs/cogview3_base.yaml
index f03757d..2928cf0 100644
--- a/sat/configs/cogview3_base.yaml
+++ b/sat/configs/cogview3_base.yaml
@@ -1,15 +1,15 @@
 args:
   mode: inference
   relay_model: False
-  load: "transformer"
+  load: "cogview3-base-3b"
   batch_size: 4
   grid_num_columns: 2
   input_type: txt
-  input_file: "configs/test_old.txt"
+  input_file: "configs/test.txt"
   fp16: True
   force_inference: True
   sampling_image_size: 512
-  output_dir: "outputs/cogview3_base-512x512"
+  output_dir: "outputs/cogview3_base_512x512"
   deepspeed_config: { }
 
 model:
@@ -61,7 +61,7 @@ model:
           input_key: txt
           target: sgm.modules.encoders.modules.FrozenT5Embedder
           params:
-            model_dir: "google/t5-v1_1-xxl"
+            model_dir: "t5-v1_1-xxl"
             max_length: 225
 
         # vector cond
@@ -86,7 +86,7 @@ model:
   first_stage_config:
     target: sgm.models.autoencoder.AutoencoderKLInferenceWrapper
     params:
-      ckpt_path: "vae/sdxl_vae.safetensors"
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors"
       embed_dim: 4
       monitor: val/rec_loss
       ddconfig:
diff --git a/sat/configs/cogview3_base_distill_4step.yaml b/sat/configs/cogview3_base_distill_4step.yaml
index 2f832c9..849ab95 100644
--- a/sat/configs/cogview3_base_distill_4step.yaml
+++ b/sat/configs/cogview3_base_distill_4step.yaml
@@ -1,7 +1,7 @@
 args:
   mode: inference
   relay_model: False
-  load: "transformer"
+  load: "cogview3-base-3b-distill-4step"
   batch_size: 4
   grid_num_columns: 2
   input_type: txt
@@ -9,7 +9,7 @@ args:
   fp16: True
   force_inference: True
   sampling_image_size: 512
-  output_dir: "outputs/cogview3_base_distill-4step"
+  output_dir: "outputs/cogview3_base_distill_4step"
   deepspeed_config: {}
 
 model:
@@ -61,7 +61,7 @@ model:
             input_key: txt
             target: sgm.modules.encoders.modules.FrozenT5Embedder
             params:
-              model_dir: "google/t5-v1_1-xxl"
+              model_dir: "t5-v1_1-xxl"
               max_length: 225
 
           # vector cond
@@ -86,7 +86,7 @@ model:
   first_stage_config:
     target: sgm.models.autoencoder.AutoencoderKLInferenceWrapper
     params:
-      ckpt_path: "vae/sdxl_vae.safetensors"
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors"
       embed_dim: 4
       monitor: val/rec_loss
       ddconfig:
@@ -94,7 +94,7 @@ model:
         double_z: true
         z_channels: 4
         resolution: 256
-        in_channels: 3f
+        in_channels: 3
         out_ch: 3
         ch: 128
         ch_mult: [ 1, 2, 4, 4 ]
diff --git a/sat/configs/cogview3_plus.yaml b/sat/configs/cogview3_plus.yaml
index 30b9a30..baca1c0 100644
--- a/sat/configs/cogview3_plus.yaml
+++ b/sat/configs/cogview3_plus.yaml
@@ -1,7 +1,7 @@
 args:
   mode: inference
   relay_model: False
-  load: "transformer"
+  load: "cogview3-plus-3b/transformer"
   batch_size: 4
   grid_num_columns: 2
   input_type: txt
@@ -77,7 +77,7 @@ model:
           input_key: txt
           target: sgm.modules.encoders.modules.FrozenT5Embedder
           params:
-            model_dir: "google/t5-v1_1-xxl"
+            model_dir: "t5-v1_1-xxl"
             max_length: 224
         # vector cond
         - is_trainable: False
@@ -101,7 +101,7 @@ model:
   first_stage_config:
     target: sgm.models.autoencoder.AutoencodingEngine
     params:
-      ckpt_path: "vae/imagekl_ch16.pt"
+      ckpt_path: "cogview3-plus-3b/vae/imagekl_ch16.pt"
       monitor: val/rec_loss
 
       loss_config:
diff --git a/sat/configs/cogview3_relay.yaml b/sat/configs/cogview3_relay.yaml
index b9a3056..a2994f7 100644
--- a/sat/configs/cogview3_relay.yaml
+++ b/sat/configs/cogview3_relay.yaml
@@ -1,7 +1,7 @@
 args:
   mode: inference
   relay_model: True
-  load: "transformer"
+  load: "cogview3-relay-3b"
   batch_size: 4
   grid_num_columns: 2
   input_type: txt
@@ -9,8 +9,8 @@ args:
   fp16: True
   force_inference: True
   sampling_image_size: 1024
-  output_dir: "outputs/cogview3_relay-1024x1024"
-  input_dir: "outputs/cogview3_base-512x512"
+  output_dir: "outputs/cogview3_relay_1024x1024"
+  input_dir: "outputs/cogview3_base_512x512"
   deepspeed_config: { }
 
 model:
@@ -63,7 +63,7 @@ model:
           input_key: txt
           target: sgm.modules.encoders.modules.FrozenT5Embedder
           params:
-            model_dir: "google/t5-v1_1-xxl"
+            model_dir: "t5-v1_1-xxl"
             max_length: 225
         # vector cond
         - is_trainable: False
@@ -87,7 +87,7 @@ model:
   first_stage_config:
     target: sgm.models.autoencoder.AutoencoderKLInferenceWrapper
     params:
-      ckpt_path: "vae/sdxl_vae.safetensors"
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors"
       embed_dim: 4
       monitor: val/rec_loss
       ddconfig:
diff --git a/sat/configs/cogview3_relay_distill_1step.yaml b/sat/configs/cogview3_relay_distill_1step.yaml
index 480de44..3793776 100644
--- a/sat/configs/cogview3_relay_distill_1step.yaml
+++ b/sat/configs/cogview3_relay_distill_1step.yaml
@@ -1,6 +1,7 @@
 args:
   mode: inference
-  load: "transformer"
+  relay_model: True
+  load: "cogview3-relay-3b-distill-1step"
   batch_size: 4
   grid_num_columns: 2
   input_type: txt
@@ -9,7 +10,7 @@ args:
   force_inference: True
   sampling_image_size: 1024 # 这个值应该是你输入的图像分辨率的两倍
   output_dir: "outputs/cogview3_relay_distill_1step"
-  input_dir: "inputs" # the inputs image should follow the order of input_file or cli input
+  input_dir: "outputs/cogview3_base_512x512" # the inputs image should follow the order of input_file or cli input
   deepspeed_config: { }
 
 model:
@@ -63,7 +64,7 @@ model:
           input_key: txt
           target: sgm.modules.encoders.modules.FrozenT5Embedder
           params:
-            model_dir: "google/t5-v1_1-xxl"
+            model_dir: "t5-v1_1-xxl"
             max_length: 225
         # vector cond
         - is_trainable: False
@@ -87,7 +88,7 @@ model:
   first_stage_config:
     target: sgm.models.autoencoder.AutoencoderKLInferenceWrapper
     params:
-      ckpt_path: "vae/sdxl_vae.safetensors"
+      ckpt_path: "cogview3-base-3b-vae/sdxl_vae.safetensors"
       embed_dim: 4
       monitor: val/rec_loss
       ddconfig:
diff --git a/sat/configs/test.txt b/sat/configs/test.txt
new file mode 100644
index 0000000..f59cc91
--- /dev/null
+++ b/sat/configs/test.txt
@@ -0,0 +1 @@
+Model portrait with pink hair, her long hair is soft and flowy. At dawn, she was surrounded by delicate flowers in the misty countryside. The style should be ethereal and dreamy, with soft and bright sunlight shining on her face to create a soft atmosphere. Her face has a classical beauty, her eyes are large, deep and bright, and her facial expressions reflect calmness, mystery and elegance, adding to the overall surrealist atmosphere. (Good proportions, cinematic angle:1.3)
\ No newline at end of file
diff --git a/sat/requirements.txt b/sat/requirements.txt
index 43a2b98..1930e61 100644
--- a/sat/requirements.txt
+++ b/sat/requirements.txt
@@ -16,4 +16,5 @@ scipy>=1.14.1
 SwissArmyTransformer>=0.4.12
 tqdm>=4.66.5
 wandb>=0.18.1
-openai>=1.48.0
\ No newline at end of file
+openai>=1.48.0
+triton==2.1.0