Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading from reco2dur in lhotse.kaldi.load_kaldi_data_dir #788

Closed
wgb14 opened this issue Aug 12, 2022 · 3 comments · Fixed by #832 · May be fixed by #693
Closed

Support reading from reco2dur in lhotse.kaldi.load_kaldi_data_dir #788

wgb14 opened this issue Aug 12, 2022 · 3 comments · Fixed by #832 · May be fixed by #693
Assignees

Comments

@wgb14
Copy link
Contributor

wgb14 commented Aug 12, 2022

For now in lhotse.kaldi.load_kaldi_data_dir, it reads the original audio file, and get the duration info. but this is not friendly to long recodings with sox or ffmpeg conversion, when reco2dur already exists in the kaldi data directory.

Will you support loading reco2dur and get duration information from this file?

@pzelasko
Copy link
Collaborator

We initially supported reco2dur but unfortunately it was not precise enough for the durations and we were running into issues with mismatched manifest metadata and audio that was loaded from file/command. I see the following options:

  1. modify Kaldi's reco2dur to have precise duration information (num_samples / sampling_rate without truncation after 2 decimal points IIRC) - I don't know if this would break anything else or not though.
  2. since that time, Lhotse supports setting a tolerance threshold for duration mismatch between audio and manifests, we could technically support reading imprecise reco2dur and the user could increase the mismatch threshold if necessary. But I think it could be confusing and not the right thing to do in general.

@wgb14
Copy link
Contributor Author

wgb14 commented Aug 12, 2022

Make sense. So for option 1 we still have to recalculate the precise duration. I would try the option 2. I guess it's fairly safe to set tolerance threshold to 0.01 s as this is usually what the frame_shift is.

@pzelasko
Copy link
Collaborator

If you can contribute the relevant option for load_kaldi_data_dir (disabled by default, enabled via argument/flag) in Lhotse I'd be happy to merge that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants