pad_length in streaming decode #1736

1215thebqtic · 2024-09-04T02:32:26Z

Hi,

I'm confused about the pad_length in set_features. Since the frames that are less than chunk_size*2+7+2*3 have already been padded in streaming_decode.py, why are we still adding 7+2*3 when initially receiving the feature here? Is the 7+2*3 necessary in set_features?

icefall/egs/librispeech/ASR/zipformer/decode_stream.py

Lines 108 to 121 in cea0dbe

    
           def set_features( 
        
               self, 
        
               features: torch.Tensor, 
        
               tail_pad_len: int = 0, 
        
           ) -> None: 
        
               """Set features tensor of current utterance.""" 
        
               assert features.dim() == 2, features.dim() 
        
               self.features = torch.nn.functional.pad( 
        
                   features, 
        
                   (0, 0, 0, self.pad_length + tail_pad_len), 
        
                   mode="constant", 
        
                   value=self.LOG_EPS, 
        
               ) 
        
               self.num_frames = self.features.size(0)

icefall/egs/librispeech/ASR/zipformer/streaming_decode.py

Lines 472 to 484 in cea0dbe

    
           # Make sure the length after encoder_embed is at least 1. 
        
           # The encoder_embed subsample features (T - 7) // 2 
        
           # The ConvNeXt module needs (7 - 1) // 2 = 3 frames of right padding after subsampling 
        
           tail_length = chunk_size * 2 + 7 + 2 * 3 
        
           if features.size(1) < tail_length: 
        
               pad_length = tail_length - features.size(1) 
        
               feature_lens += pad_length 
        
               features = torch.nn.functional.pad( 
        
                   features, 
        
                   (0, 0, 0, pad_length), 
        
                   mode="constant", 
        
                   value=LOG_EPS, 
        
               )

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pad_length in streaming decode #1736

pad_length in streaming decode #1736

1215thebqtic commented Sep 4, 2024

pad_length in streaming decode #1736

pad_length in streaming decode #1736

Comments

1215thebqtic commented Sep 4, 2024