-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: Add audio feature extractor component MFCC #5964
Conversation
The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.
The above commands create this plot: The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance. |
src/audio/Kconfig
Outdated
@@ -483,6 +483,23 @@ config COMP_RTNR | |||
proprietary binary libSOF_RTK_MA_API.a, libSuite_rename.a, libNet.a and libPreset.a. | |||
Please contact [email protected] for any question about the binary. | |||
|
|||
config COMP_MFCC | |||
bool "MFCC component" | |||
default y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for now default yes to get the full CI scan. Since it's quite big default no would be assumed later.
65828ef
to
9708e05
Compare
The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source. |
This version separated matrix, window and Mel frequency functions into separate generic library functions. |
This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches. |
b1ef453
to
cb18fb3
Compare
The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs. |
#define DEBUGFILES | ||
#endif | ||
|
||
#ifdef MFCC_DEBUGFILES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be done as a cmocka test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will move libraries tests to cmocka after they have been split out of this main PR. The MFCC overall test will be testbench based. Like with TDFB I will attempt to extract needed internal state information from traces instead of these files. So, these will go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to add cmocka unit test with other PR since this is quite large. It could be part of #5769 that creates the reference data with Pytorch to compare. Would that be OK @lgirdwood ?
@ShriramShastry would you be able to review. Thanks ! |
Sure, I'II review the PR. |
@singalsu conflicts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Seppo
For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so?
like design? filter type, filter stage, Q format, etc
Thanks
Tim
cb18fb3
to
ff608a4
Compare
The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob. |
The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component. |
Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this. If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so? Thanks |
@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs. For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html |
The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input. |
52b8e44
to
e333199
Compare
To fix this I tried to change to src/audio/Kconfig
Edit: Seems to not help, change back to depends that other module_adapter clients use. |
|
||
for (j = 0; j < fft->fft_size; j++) { | ||
x = fft->fft_buf[i + j].real; | ||
absx = (x < 0) ? -x : x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use ABS()
from src/include/sof/math/numbers.h ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hot path, the macro is more than above line, is there overhead from typeof() typecast operation there? Also I wonder if Zephyr comes with a different ABS()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the potential overhead from using ABS()
would be not from using typeof()
but from allocating a new variable on stack. ABS()
does the right thing with that to avoid repeated expression evaluation. E.g. it's safe to do ABS(f(x))
whereas if ABS()
were just a trivial #define ABS(x) ((x) < 0 ? -(x) : (x))
that would lead to evaluating f(x)
twice which can have undesirable side effects. Whereas the SOF ABS(x)
implementation avoids that. In the above case it isn't needed since x
is a simple variable, but it's important for a generic macro. I would guess that the compiler would optimise that additional variable on stack out, but if you're concerned about that - I'd understand that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah for that, makes sense. In the beginning we made for most things macros but I've been wondering if inline functions would be better to avoid unknowns. Inlines would be also easier for seamless intrinsic replace. I'd like to keep this as current. Also mfcc_generic.c would be later converted with intrinsics.
e333199
to
dedf93e
Compare
dedf93e
to
adc0e1d
Compare
Changed vs. previous push, in Kconfig
that seems to avoid the imx build issue in CI. |
* Configuration blob | ||
*/ | ||
struct sof_mfcc_config { | ||
int32_t sample_frequency; /**< Hz. e.g. 16000 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a size as first word for ABI tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a good catch! I added also some reserved words to begin. The three reserved bool in the end are to make the blob multiple of 32 bits size. The sof_setup() function now checks for size match also.
This patch adds basic macros needed for MFCC in testbench and in developmemnt topologies for hda-generic-2ch and up2. The configuration blob in this matches the reference Matlab code as configured to match Pytorch default MFCC. Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds load of MFCC component to testbench. Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds initial version of MFCC setup tool setup_mfcc.m. It outputs a configuration topology macro file that matches the current Matlab concept code. The configuration can be tested in testbench with the supplied scripts run_mfcc.sh and decode_ceps.m. Signed-off-by: Seppo Ingalsuo <[email protected]>
The increase of non-32bit aligned blob sizes needs to be removed because it can cause mismatch of blob binary header vs. actual size. Instead error if blob size is not multiple of four bytes. Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds the SOF component for Mel Frequency Cepstral Coefficients (MFCC) streaming to sink from source PCM stream. The MFCC audio features are commonly used for neural network based speech recognition services. Signed-off-by: Seppo Ingalsuo <[email protected]>
adc0e1d
to
099fb4c
Compare
@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready |
No description provided.