Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: Add audio feature extractor component MFCC #5964

Merged
merged 5 commits into from
Oct 5, 2022

Conversation

singalsu
Copy link
Collaborator

No description provided.

@singalsu
Copy link
Collaborator Author

singalsu commented Jun 29, 2022

The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.

cd $SOF_WORKSPACE/sof
scripts/build-tools.sh -t
scripts/rebuild-testbench.sh 
cd tools/tune/mfcc/
./run_mfcc.sh /usr/share/sounds/alsa/Front_Center.wav 
octave --gui &
decode_ceps('mfcc.raw',13);

The above commands create this plot:

Screenshot from 2022-06-29 19-20-26

The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance.

@@ -483,6 +483,23 @@ config COMP_RTNR
proprietary binary libSOF_RTK_MA_API.a, libSuite_rename.a, libNet.a and libPreset.a.
Please contact [email protected] for any question about the binary.

config COMP_MFCC
bool "MFCC component"
default y
Copy link
Collaborator Author

@singalsu singalsu Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for now default yes to get the full CI scan. Since it's quite big default no would be assumed later.

src/audio/mfcc/mfcc_mat.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_psy.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_win.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc.c Outdated Show resolved Hide resolved
@singalsu
Copy link
Collaborator Author

singalsu commented Jul 1, 2022

The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source.

@singalsu
Copy link
Collaborator Author

This version separated matrix, window and Mel frequency functions into separate generic library functions.

@singalsu
Copy link
Collaborator Author

This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches.

@singalsu
Copy link
Collaborator Author

The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs.

src/math/auditory.c Outdated Show resolved Hide resolved
src/math/window.c Outdated Show resolved Hide resolved
src/math/window.c Outdated Show resolved Hide resolved
#define DEBUGFILES
#endif

#ifdef MFCC_DEBUGFILES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be done as a cmocka test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will move libraries tests to cmocka after they have been split out of this main PR. The MFCC overall test will be testbench based. Like with TDFB I will attempt to extract needed internal state information from traces instead of these files. So, these will go away.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to add cmocka unit test with other PR since this is quite large. It could be part of #5769 that creates the reference data with Pytorch to compare. Would that be OK @lgirdwood ?

src/audio/mfcc/mfcc_generic.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved
@lgirdwood
Copy link
Member

@ShriramShastry would you be able to review. Thanks !

@ShriramShastry
Copy link
Contributor

@ShriramShastry would you be able to review. Thanks !

Sure, I'II review the PR.
Thank you

@lgirdwood
Copy link
Member

@singalsu conflicts

src/audio/mfcc/mfcc.c Outdated Show resolved Hide resolved
Copy link
Contributor

@btian1 btian1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so?
like design? filter type, filter stage, Q format, etc

Thanks
Tim

@singalsu
Copy link
Collaborator Author

singalsu commented Sep 1, 2022

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so? like design? filter type, filter stage, Q format, etc

Thanks Tim

The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob.

@singalsu
Copy link
Collaborator Author

singalsu commented Sep 1, 2022

The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component.

@btian1
Copy link
Contributor

btian1 commented Sep 2, 2022

Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this.

If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so?

Thanks
Tim

@singalsu
Copy link
Collaborator Author

singalsu commented Sep 2, 2022

@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs.

For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html

src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved
@singalsu
Copy link
Collaborator Author

singalsu commented Sep 20, 2022

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input.

image

@singalsu singalsu force-pushed the mfcc_component branch 2 times, most recently from 52b8e44 to e333199 Compare September 20, 2022 14:55
@singalsu
Copy link
Collaborator Author

singalsu commented Sep 20, 2022

To fix this I tried to change to src/audio/Kconfig select COMP_MODULE_ADAPTER instead of depends.

[ 96%] Linking C executable sof
/home/sof/work/xtensa-imx-elf/lib/gcc/xtensa-imx-elf/10.2.0/../../../../xtensa-imx-elf/bin/ld: CMakeFiles/sof.dir/src/audio/mfcc/mfcc.c.o: in function `sys_comp_module_mfcc_interface_init':
/home/sof/work/sof.git/src/audio/mfcc/mfcc.c:282: undefined reference to `module_adapter_new'

Edit: Seems to not help, change back to depends that other module_adapter clients use.

src/audio/mfcc/mfcc.c Outdated Show resolved Hide resolved
src/audio/mfcc/mfcc.c Show resolved Hide resolved
src/audio/mfcc/mfcc.c Show resolved Hide resolved

for (j = 0; j < fft->fft_size; j++) {
x = fft->fft_buf[i + j].real;
absx = (x < 0) ? -x : x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use ABS() from src/include/sof/math/numbers.h ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hot path, the macro is more than above line, is there overhead from typeof() typecast operation there? Also I wonder if Zephyr comes with a different ABS()?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the potential overhead from using ABS() would be not from using typeof() but from allocating a new variable on stack. ABS() does the right thing with that to avoid repeated expression evaluation. E.g. it's safe to do ABS(f(x)) whereas if ABS() were just a trivial #define ABS(x) ((x) < 0 ? -(x) : (x)) that would lead to evaluating f(x) twice which can have undesirable side effects. Whereas the SOF ABS(x) implementation avoids that. In the above case it isn't needed since x is a simple variable, but it's important for a generic macro. I would guess that the compiler would optimise that additional variable on stack out, but if you're concerned about that - I'd understand that too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah for that, makes sense. In the beginning we made for most things macros but I've been wondering if inline functions would be better to avoid unknowns. Inlines would be also easier for seamless intrinsic replace. I'd like to keep this as current. Also mfcc_generic.c would be later converted with intrinsics.

@singalsu
Copy link
Collaborator Author

singalsu commented Sep 27, 2022

Changed vs. previous push, in Kconfig

        depends on COMP_MODULE_ADAPTER
        depends on !COMP_LEGACY_INTERFACE

that seems to avoid the imx build issue in CI.

@singalsu singalsu marked this pull request as ready for review September 27, 2022 17:29
* Configuration blob
*/
struct sof_mfcc_config {
int32_t sample_frequency; /**< Hz. e.g. 16000 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a size as first word for ABI tracking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a good catch! I added also some reserved words to begin. The three reserved bool in the end are to make the blob multiple of 32 bits size. The sof_setup() function now checks for size match also.

This patch adds basic macros needed for MFCC in testbench and in
developmemnt topologies for hda-generic-2ch and up2. The
configuration blob in this matches the reference Matlab code
as configured to match Pytorch default MFCC.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds load of MFCC component to testbench.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds initial version of MFCC setup tool setup_mfcc.m. It
outputs a configuration topology macro file that matches the current
Matlab concept code. The configuration can be tested in testbench
with the supplied scripts run_mfcc.sh and decode_ceps.m.

Signed-off-by: Seppo Ingalsuo <[email protected]>
The increase of non-32bit aligned blob sizes needs to be removed
because it can cause mismatch of blob binary header vs. actual
size. Instead error if blob size is not multiple of four bytes.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch adds the SOF component for Mel Frequency Cepstral
Coefficients (MFCC) streaming to sink from source PCM stream.
The MFCC audio features are commonly used for neural network
based speech recognition services.

Signed-off-by: Seppo Ingalsuo <[email protected]>
@lgirdwood
Copy link
Member

@wszypelt @lrudyX the failed logs are timing out. Can you check. Thanks !

@wszypelt
Copy link

wszypelt commented Oct 3, 2022

@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready

@lgirdwood lgirdwood merged commit 025c64e into thesofproject:main Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants