Pretrain Hubert on english and chinese speech dataset. #5526

shihuai · 2024-07-19T09:09:26Z

Hi~I'm trying pretrain hubert from scratch on english and chinese speech dataset. During pretrain, the 1st iteration loss dropped from 6.7 to 3.3, the 2nd iteration loss dropped from 11.2 to 4.0. Both of two stage iterations loss are too large, is this a normal phenomenon?

zw76859420 · 2024-07-22T08:51:48Z

Can you show the config of your training?

shihuai · 2024-07-24T09:29:54Z

Can you show the config of your training?

I use the hubert_base_librispeech.yaml for pretraining, only change the ddp_backend and max_sample_size.

common:
  fp16: true
  log_format: json
  log_interval: 200
  seed: 1337
  tensorboard_logdir: tblog

checkpoint:
  save_interval_updates: 25000
  keep_interval_updates: 1
  no_epoch_checkpoints: true


distributed_training:
  ddp_backend: c10d
  distributed_backend: 'nccl'
  distributed_world_size: 4
  distributed_port: 29671
  nprocs_per_node: 4
  find_unused_parameters: true

task:
  _name: hubert_pretraining
  data: ${task.data}
  label_dir: ${task.label_dir}
  labels: ${task.labels}
  label_rate: ${model.label_rate}
  sample_rate: 16000
  max_sample_size: 320000 #250000
  min_sample_size: 32000
  pad_audio: false
  random_crop: true
  normalize: false # must be consistent with extractor

dataset:
  num_workers: 6
  max_tokens: 1400000
  skip_invalid_size_inputs_valid_test: true
  validate_interval: 5
  validate_interval_updates: 10000

criterion:
  _name: hubert
  pred_masked_weight: 1.0
  pred_nomask_weight: 0.0
  loss_weights: [10,]

optimization:
  max_update: 400000
  lr: [0.00025]
  clip_norm: 10.0

optimizer:
  _name: adam
  adam_betas: (0.9,0.98)
  adam_eps: 1e-06
  weight_decay: 0.01

lr_scheduler:
  _name: polynomial_decay
  warmup_updates: 32000

model:
  _name: hubert
  label_rate: 100
  skip_masked: false
  skip_nomask: false
  mask_prob: 0.80
  extractor_mode: default
  conv_feature_layers: '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2'
  final_dim: 256
  encoder_layerdrop: 0.05
  dropout_input: 0.1
  dropout_features: 0.1
  dropout: 0.1
  attention_dropout: 0.1
  feature_grad_mult: 0.1
  untie_final_proj: true
  activation_dropout: 0.0

hydra:
  job:
    config:
      override_dirname:
        kv_sep: '-'
        item_sep: '__'
        exclude_keys:
          - run
          - task.data
          - task.label_dir
  run:
    dir: ???
  sweep:
    dir: ???
    subdir: ${hydra.job.config_name}__${hydra.job.override_dirname}

zw76859420 · 2024-07-26T02:03:50Z

The loss of training hubert on my side can eventually converge to around 2.5, and I used the wenetspeech dataset as the pretrain dataset,which used 10,000 hours of pure Chinese data.

zw76859420 · 2024-07-26T02:08:35Z

We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.

shihuai · 2024-07-26T03:27:27Z

We believe that the key of training hubert base model is to look at the performance of the pre-trained model on main downstream tasks. You can finetune the pre-trained model trained by your recipe, and then test its accuracy on your tasks.

OK, Thank you for your reply! We have tried to train the SpeechTokenizer with feature from Hubert, and the reconstructed speech is also good. We will try more experiments on downstream tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrain Hubert on english and chinese speech dataset. #5526

Pretrain Hubert on english and chinese speech dataset. #5526

shihuai commented Jul 19, 2024

zw76859420 commented Jul 22, 2024

shihuai commented Jul 24, 2024

zw76859420 commented Jul 26, 2024

zw76859420 commented Jul 26, 2024

shihuai commented Jul 26, 2024

Pretrain Hubert on english and chinese speech dataset. #5526

Pretrain Hubert on english and chinese speech dataset. #5526

Comments

shihuai commented Jul 19, 2024

zw76859420 commented Jul 22, 2024

shihuai commented Jul 24, 2024

zw76859420 commented Jul 26, 2024

zw76859420 commented Jul 26, 2024

shihuai commented Jul 26, 2024