Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nan outputs from encoder #1731

Open
Manjunath-mlp opened this issue Aug 22, 2024 · 12 comments
Open

Nan outputs from encoder #1731

Manjunath-mlp opened this issue Aug 22, 2024 · 12 comments

Comments

@Manjunath-mlp
Copy link

I am getting nan outputs from the encoder of pruned transducer streaming model.
tensor([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], grad_fn=)
I am running on mac cpu.Any suggestions?

@csukuangfj
Copy link
Collaborator

There should be some logs telling you how to do with it. Have you followed the logs?

@Manjunath-mlp
Copy link
Author

I am using a pretrained model to decode.I am not sure about which logs you are talking about

@csukuangfj
Copy link
Collaborator

Would you mind posting all of the logs?

The info you give is toooo limited.

@Manjunath-mlp
Copy link
Author

Manjunath-mlp commented Aug 23, 2024

These are the args i used :

{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 
'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50,
 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4,
 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release',
 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 
'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 
'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 
'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty',
 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall',
 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/__init__.py',
 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/__init__.py', 
'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 
'use_averaged_model': False,
 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 
'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 
'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 
'decoding_method': 'fast_beam_search', 'beam_size': 4,
 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 
'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 
'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 
'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 
'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 
'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 
'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 
'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 
'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 
'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 
'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0,
 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True,
 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True,
 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 
'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 
'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048,
 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 
'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768,
 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8,
 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 
'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

@csukuangfj
Copy link
Collaborator

Also, would you mind sharing the command you are using?
And could you tell us what steps you have done?

More details are always helpful.

@Manjunath-mlp
Copy link
Author

Manjunath-mlp commented Aug 23, 2024

Sorry ,i just clicked enter before pasting all ,here are the code blocks i am using

These are the args i used :
{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty', 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall', 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/init.py', 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/init.py', 'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 'use_averaged_model': False, 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

#initiated the model using above args
model = get_transducer_model(args)

and i used librispeech cuts dataset

args1=Namespace(epoch=30, avg=1, use_averaged_model=True, exp_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/', lang_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/', decoding_method='fast_beam_search', iter=0, context_size=2, max_sym_per_frame=1, return_cuts=True, on_the_fly_feats=False, input_strategy='PrecomputedFeatures', max_duration=10, num_workers=2)

librispeech = LibriSpeechAsrDataModule(args1)
test_clean_cuts = librispeech.test_clean_cuts_soft()
test_other_cuts = librispeech.dev_other_cuts_soft()

test_clean_dl = librispeech.test_dataloaders(test_clean_cuts)
test_other_dl = librispeech.test_dataloaders(test_other_cuts)

test_sets = ["test-clean", "test-other"]
test_dl = [test_clean_dl, test_other_dl]

#Used first input to decode
for i,j in enumerate(test_clean_dl):
print(i,j)
break

#i,j are
0 {'inputs': tensor([[[-1.4938e+01, -1.3318e+01, -1.3666e+01, ..., -9.2335e+00,
-9.9011e+00, -1.0107e+01],
[-1.4294e+01, -1.2946e+01, -1.2869e+01, ..., -9.3337e+00,
-1.0197e+01, -1.0312e+01],
[-1.5064e+01, -1.5173e+01, -1.5958e+01, ..., -9.8856e+00,
-9.9233e+00, -1.0065e+01],
...,
[-1.2570e+01, -1.2061e+01, -1.3426e+01, ..., 5.1158e+37,
9.4185e+37, 1.7210e+38],
[-1.4552e+01, -1.3632e+01, -1.3024e+01, ..., 8.7674e+37,
1.6231e+38, 2.9824e+38],
[-1.5214e+01, -1.5527e+01, -1.3573e+01, ..., 1.4966e+38,
2.7860e+38, inf]]]), 'supervisions': {'text': ['BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED'], 'sequence_idx': tensor([0], dtype=torch.int32), 'start_frame': tensor([0], dtype=torch.int32), 'num_frames': tensor([1608], dtype=torch.int32), 'cut': [MonoCut(id='7127-75946-0028-495', start=0, duration=16.075, channel=0, supervisions=[SupervisionSegment(id='7127-75946-0028', recording_id='7127-75946-0028', start=0.0, duration=16.075, channel=0, text='BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED', language='English', speaker='7127', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=1608, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=16.075, storage_type='lilcom_chunky', storage_path='../data/fbank/librispeech_feats_test-clean/feats-0.lca', storage_key='2337650,45819,45198,44901,10324', recording_id='None', channels=0), recording=Recording(id='7127-75946-0028', sources=[AudioSource(type='file', channels=[0], source='/grid/codes/icefall/egs/librispeech/ASR/download/LibriSpeech/test-clean/7127/75946/7127-75946-0028.flac')], sampling_rate=16000, num_samples=257200, duration=16.075, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}})]}}

feature=j["inputs"]
supervisions = j["supervisions"]
texts = j["supervisions"]["text"]
feature_lens = supervisions["num_frames"]
feature_lens += 30

import torch
import math
LOG_EPS = math.log(1e-10)

feature = torch.nn.functional.pad(
feature,
pad=(0, 0, 0, 30),
value=LOG_EPS,
)
encoder_out, encoder_out_lens = model.encoder(x=feature, x_lens=feature_lens)

Here for encoder_out i am getting nans

@csukuangfj
Copy link
Collaborator

Could you share the complete file?

You can upload your code file as an attachment in the comment.

@Manjunath-mlp
Copy link
Author

will this works?
fast_beam_search.txt

@csukuangfj
Copy link
Collaborator

Could you post a runnable PYTHON CODE FILE?

We need to know which script you are using.

@csukuangfj
Copy link
Collaborator

csukuangfj commented Aug 23, 2024

By the way, I suggest that you follow the doc
https://k2-fsa.github.io/icefall/model-export/export-model-state-dict.html
to learn how to use pre-trained models.

@Manjunath-mlp
Copy link
Author

Manjunath-mlp commented Aug 23, 2024

thats the ipynb file i am using to run ,i am unable to attach py or ipynb file.I am trying to implement this https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L444 for stateless7 streaming model ,I am trying to see the outputs at each timestep.

@Manjunath-mlp
Copy link
Author

Manjunath-mlp commented Aug 23, 2024

I think have loaded the model dict of pretrained model pretty much the same ,you guys have implemented.For model.decoder i am able to see the model is predicting numbers .I dont know why encoder is predicting nan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants