Audio: Add audio feature extractor component MFCC #5964

singalsu · 2022-06-29T16:13:33Z

No description provided.

singalsu · 2022-06-29T16:23:00Z

The work so far runs in testbench. It reads a wav file and outputs to raw binary file the MFCC data. A Matlab/Octave script is provided to parse the output and extract the MFCC payload from audio capture file. E.g.

cd $SOF_WORKSPACE/sof
scripts/build-tools.sh -t
scripts/rebuild-testbench.sh 
cd tools/tune/mfcc/
./run_mfcc.sh /usr/share/sounds/alsa/Front_Center.wav 
octave --gui &
decode_ceps('mfcc.raw',13);

The above commands create this plot:

The MFCC operation followed configuration that was defined in mfcc_setup.m. It was used to output the configuration blob. Editing it and redoing above step with test topologies would change the audio features plot appearance.

singalsu · 2022-06-29T16:27:05Z

src/audio/Kconfig

@@ -483,6 +483,23 @@ config COMP_RTNR
 	  proprietary binary libSOF_RTK_MA_API.a, libSuite_rename.a, libNet.a and libPreset.a.
 	  Please contact [email protected] for any question about the binary.

+config COMP_MFCC
+	bool "MFCC component"
+	default y


This is for now default yes to get the full CI scan. Since it's quite big default no would be assumed later.

src/audio/mfcc/mfcc_mat.c

src/audio/mfcc/mfcc_psy.c

src/audio/mfcc/mfcc_win.c

src/audio/mfcc/mfcc.c

singalsu · 2022-07-01T18:16:00Z

The just pushed version ran successfully in my TGL-H test device. Used topology was sof-hda-generic-2ch-mfcc.tplg. Average load was quite decent 24 MCPS for 16 kHz mono MFCC computation, FFT size 512, FFT hop 10 ms, hamming window, 23 Mel bands, 13 cepstral coefficients out. The output stream is not ALSA compress, but a fake PCM stream with magic sync word followed by 16 bit data when ceps were inserted. Otherwise zeros to maintain in sink same PCM format as in source.

singalsu · 2022-07-27T11:41:30Z

This version separated matrix, window and Mel frequency functions into separate generic library functions.

singalsu · 2022-07-29T16:55:02Z

This version changes FFT to a new 16 bit version. It saves a lot of RAM with minimal impact to quality with 16 bit data. Since there's a lot to review I will split the FFT change to other PR. I'm now happy with FFT quality so FFT should be ready after that PR for xtensa SIMD optimization patches.

singalsu · 2022-08-15T15:55:51Z

The just pushed version fixed a build issue with functions comp_update_buffer_consume/produce() those have changed names. No other changes. Also more recent versions of FFT library and window functions library are now in their own PRs.

src/math/auditory.c

src/math/window.c

lyakh · 2022-08-16T08:33:16Z

src/audio/mfcc/mfcc_generic.c

+#define DEBUGFILES
+#endif
+
+#ifdef MFCC_DEBUGFILES


can this be done as a cmocka test?

Yes, I will move libraries tests to cmocka after they have been split out of this main PR. The MFCC overall test will be testbench based. Like with TDFB I will attempt to extract needed internal state information from traces instead of these files. So, these will go away.

I was thinking to add cmocka unit test with other PR since this is quite large. It could be part of #5769 that creates the reference data with Pytorch to compare. Would that be OK @lgirdwood ?

src/audio/mfcc/mfcc_generic.c

src/audio/mfcc/mfcc_init.c

lgirdwood · 2022-08-24T11:19:21Z

@ShriramShastry would you be able to review. Thanks !

ShriramShastry · 2022-08-24T11:31:34Z

@ShriramShastry would you be able to review. Thanks !

Sure, I'II review the PR.
Thank you

src/include/sof/math/matrix.h

lgirdwood · 2022-08-31T15:57:58Z

@singalsu conflicts

src/audio/mfcc/mfcc.c

src/include/sof/audio/coefficients/fft/twiddle_32.h

btian1

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so?
like design? filter type, filter stage, Q format, etc

Thanks
Tim

singalsu · 2022-09-01T16:05:21Z

Hi, Seppo

For such a big feature, if people want to know the background, do we have a design document to explain the details? if not, I would suggest add a readme to describe the whole feature, then reviewer can have better understanding on this, do you think so? like design? filter type, filter stage, Q format, etc

Thanks Tim

The reference code for this component is in #5769. The work target is a low-power SOF component that is setup parameters compatible Pytorch library Kaldi MFCC and librosa MFCC. The Matlab concept achieves with a limited set of parameters fair match with Pytorch. Librosa compatibilility need to be improved. The output stream needs to be changed to ALSA compress type. It's currently only a fake PCM stream. I hope I can make a user space demo small scale ASR with those libraries that demonstrates the FW MFCC. I will keep adding more Pytorch and Librosa like options to improve compatibility with parameters variation. See tools/tune/mfcc how to set up it via binary blob.

singalsu · 2022-09-01T16:09:54Z

The just pushed version is without libraries and with minor updates. This should build OK when #6178 is merged. I will next address the review feedback for this component.

btian1 · 2022-09-02T03:00:36Z

Thanks, Seppo, that's will be helpful, I roughly went through MFCC design(outside), it is complexed, since our design is low power, there should have some tradeoffs. Do we have local C environment test for ASR with MFCC? with matlab code, seems only few people know this.

If more people want to know current MFCC framework, a diagram for the whole MFCC flow is helpful, especially compared with full MFCC flow, then more people will know our design's benefit, do you think so?

Thanks
Tim

singalsu · 2022-09-02T07:33:53Z

@btian1 I would like to make a demo ASR (limited to e.g. numbers recognize) with e.g. python libraries, run it in e.g. UP extreme or UP2 user space with MFCC data from DSP. The MFCC flow is simple but there's complexity in details. Matlab offline processed ASR would be even quicker path but it I'd prefer a demo that could run on our test DUTs.

For now the best documents are about librosa and Pytorch: https://pytorch.org/audio/0.11.0/tutorials/audio_feature_extractions_tutorial.html

src/audio/mfcc/mfcc_init.c

singalsu · 2022-09-20T13:38:31Z

@singalsu is the small accuracy delta a result of using 32bit numbers for FW ?

The largest difference contribution is from 16 bit FFT and 16 bit Mel band triangles. I need to retest with with 32 bit FFT version. It's a kconfig option, so easy to switch.

I'd like to understand how much the difference in fixed point MFCC impacts speech recognition word error rate. I should be able to do such test too with not too much work.

The RMS error in chirp test drops from 4.708 to 1.771, so the 32 bit FFT improves quite a bit. Below is the error vs. Pytorch for Matlab float, 32 bit FFT version version of component, 16 bit FFT version. All use 16 bit PCM data as input.

singalsu · 2022-09-20T14:58:29Z

To fix this I tried to change to src/audio/Kconfig select COMP_MODULE_ADAPTER instead of depends.

[ 96%] Linking C executable sof
/home/sof/work/xtensa-imx-elf/lib/gcc/xtensa-imx-elf/10.2.0/../../../../xtensa-imx-elf/bin/ld: CMakeFiles/sof.dir/src/audio/mfcc/mfcc.c.o: in function `sys_comp_module_mfcc_interface_init':
/home/sof/work/sof.git/src/audio/mfcc/mfcc.c:282: undefined reference to `module_adapter_new'

Edit: Seems to not help, change back to depends that other module_adapter clients use.

src/audio/mfcc/mfcc.c

lyakh · 2022-09-21T08:55:27Z

src/audio/mfcc/mfcc_generic.c

+
+	for (j = 0; j < fft->fft_size; j++) {
+		x = fft->fft_buf[i + j].real;
+		absx = (x < 0) ? -x : x;


why not use ABS() from src/include/sof/math/numbers.h ?

This is hot path, the macro is more than above line, is there overhead from typeof() typecast operation there? Also I wonder if Zephyr comes with a different ABS()?

the potential overhead from using ABS() would be not from using typeof() but from allocating a new variable on stack. ABS() does the right thing with that to avoid repeated expression evaluation. E.g. it's safe to do ABS(f(x)) whereas if ABS() were just a trivial #define ABS(x) ((x) < 0 ? -(x) : (x)) that would lead to evaluating f(x) twice which can have undesirable side effects. Whereas the SOF ABS(x) implementation avoids that. In the above case it isn't needed since x is a simple variable, but it's important for a generic macro. I would guess that the compiler would optimise that additional variable on stack out, but if you're concerned about that - I'd understand that too.

Ah for that, makes sense. In the beginning we made for most things macros but I've been wondering if inline functions would be better to avoid unknowns. Inlines would be also easier for seamless intrinsic replace. I'd like to keep this as current. Also mfcc_generic.c would be later converted with intrinsics.

singalsu · 2022-09-27T12:02:00Z

Changed vs. previous push, in Kconfig

        depends on COMP_MODULE_ADAPTER
        depends on !COMP_LEGACY_INTERFACE

that seems to avoid the imx build issue in CI.

lgirdwood · 2022-09-28T13:56:13Z

src/include/user/mfcc.h

+ * Configuration blob
+ */
+struct sof_mfcc_config {
+	int32_t sample_frequency; /**< Hz. e.g. 16000 */


We need a size as first word for ABI tracking.

That was a good catch! I added also some reserved words to begin. The three reserved bool in the end are to make the blob multiple of 32 bits size. The sof_setup() function now checks for size match also.

This patch adds basic macros needed for MFCC in testbench and in developmemnt topologies for hda-generic-2ch and up2. The configuration blob in this matches the reference Matlab code as configured to match Pytorch default MFCC. Signed-off-by: Seppo Ingalsuo <[email protected]>

This patch adds load of MFCC component to testbench. Signed-off-by: Seppo Ingalsuo <[email protected]>

This patch adds initial version of MFCC setup tool setup_mfcc.m. It outputs a configuration topology macro file that matches the current Matlab concept code. The configuration can be tested in testbench with the supplied scripts run_mfcc.sh and decode_ceps.m. Signed-off-by: Seppo Ingalsuo <[email protected]>

The increase of non-32bit aligned blob sizes needs to be removed because it can cause mismatch of blob binary header vs. actual size. Instead error if blob size is not multiple of four bytes. Signed-off-by: Seppo Ingalsuo <[email protected]>

This patch adds the SOF component for Mel Frequency Cepstral Coefficients (MFCC) streaming to sink from source PCM stream. The MFCC audio features are commonly used for neural network based speech recognition services. Signed-off-by: Seppo Ingalsuo <[email protected]>

lgirdwood · 2022-10-03T14:13:09Z

@wszypelt @lrudyX the failed logs are timing out. Can you check. Thanks !

wszypelt · 2022-10-03T14:19:23Z

@lgirdwood a lot of tests are running today, but I have already added everything to the queue, so within 2-3 hours it should be ready

singalsu commented Jun 29, 2022

View reviewed changes

singalsu requested review from ShriramShastry, aiChaoSONG and andrula-song June 30, 2022 06:47

singalsu force-pushed the mfcc_component branch 2 times, most recently from 65828ef to 9708e05 Compare July 1, 2022 17:54

singalsu force-pushed the mfcc_component branch from 9708e05 to 49573cc Compare July 27, 2022 11:37

singalsu requested a review from a team July 27, 2022 11:39

singalsu force-pushed the mfcc_component branch from 49573cc to b1ef453 Compare July 29, 2022 16:47

singalsu force-pushed the mfcc_component branch from b1ef453 to cb18fb3 Compare August 15, 2022 15:52

lyakh reviewed Aug 16, 2022

View reviewed changes

lyakh reviewed Aug 25, 2022

View reviewed changes

src/include/sof/math/matrix.h Outdated Show resolved Hide resolved

btian1 reviewed Sep 1, 2022

View reviewed changes

src/audio/mfcc/mfcc.c Outdated Show resolved Hide resolved

btian1 reviewed Sep 1, 2022

View reviewed changes

src/include/sof/audio/coefficients/fft/twiddle_32.h Outdated Show resolved Hide resolved

btian1 reviewed Sep 1, 2022

View reviewed changes

src/include/sof/audio/coefficients/fft/twiddle_32.h Outdated Show resolved Hide resolved

btian1 reviewed Sep 1, 2022

View reviewed changes

singalsu force-pushed the mfcc_component branch from cb18fb3 to ff608a4 Compare September 1, 2022 15:56

singalsu commented Sep 2, 2022

View reviewed changes

src/audio/mfcc/mfcc_init.c Outdated Show resolved Hide resolved

singalsu force-pushed the mfcc_component branch 2 times, most recently from 52b8e44 to e333199 Compare September 20, 2022 14:55

lyakh requested changes Sep 21, 2022

View reviewed changes

singalsu force-pushed the mfcc_component branch from e333199 to dedf93e Compare September 26, 2022 14:57

lyakh approved these changes Sep 27, 2022

View reviewed changes

singalsu force-pushed the mfcc_component branch from dedf93e to adc0e1d Compare September 27, 2022 11:40

singalsu marked this pull request as ready for review September 27, 2022 17:29

singalsu requested review from marc-hb, aborisovich, ranj063, a team, plbossart, mmaka1, lbetlej and dbaluta as code owners September 27, 2022 17:29

lgirdwood reviewed Sep 28, 2022

View reviewed changes

singalsu added 5 commits September 30, 2022 17:51

Tools: Testbench: Add MFCC component

0f4cff6

This patch adds load of MFCC component to testbench. Signed-off-by: Seppo Ingalsuo <[email protected]>

singalsu force-pushed the mfcc_component branch from adc0e1d to 099fb4c Compare September 30, 2022 14:53

singalsu requested a review from lgirdwood September 30, 2022 14:56

lgirdwood approved these changes Oct 3, 2022

View reviewed changes

lgirdwood merged commit 025c64e into thesofproject:main Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio: Add audio feature extractor component MFCC #5964

Audio: Add audio feature extractor component MFCC #5964

singalsu commented Jun 29, 2022

singalsu commented Jun 29, 2022 •

edited

Loading

singalsu Jun 29, 2022 •

edited

Loading

singalsu commented Jul 1, 2022 •

edited

Loading

singalsu commented Jul 27, 2022

singalsu commented Jul 29, 2022

singalsu commented Aug 15, 2022

lyakh Aug 16, 2022

singalsu Aug 17, 2022

singalsu Sep 28, 2022

lgirdwood commented Aug 24, 2022

ShriramShastry commented Aug 24, 2022

lgirdwood commented Aug 31, 2022

btian1 left a comment

singalsu commented Sep 1, 2022 •

edited

Loading

singalsu commented Sep 1, 2022

btian1 commented Sep 2, 2022

singalsu commented Sep 2, 2022

singalsu commented Sep 20, 2022 •

edited

Loading

singalsu commented Sep 20, 2022 •

edited

Loading

lyakh Sep 21, 2022

singalsu Sep 26, 2022

lyakh Sep 27, 2022

singalsu Sep 27, 2022

singalsu commented Sep 27, 2022 •

edited

Loading

lgirdwood Sep 28, 2022

singalsu Sep 30, 2022

lgirdwood commented Oct 3, 2022

wszypelt commented Oct 3, 2022

Audio: Add audio feature extractor component MFCC #5964

Audio: Add audio feature extractor component MFCC #5964

Conversation

singalsu commented Jun 29, 2022

singalsu commented Jun 29, 2022 • edited Loading

singalsu Jun 29, 2022 • edited Loading

Choose a reason for hiding this comment

singalsu commented Jul 1, 2022 • edited Loading

singalsu commented Jul 27, 2022

singalsu commented Jul 29, 2022

singalsu commented Aug 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood commented Aug 24, 2022

ShriramShastry commented Aug 24, 2022

lgirdwood commented Aug 31, 2022

btian1 left a comment

Choose a reason for hiding this comment

singalsu commented Sep 1, 2022 • edited Loading

singalsu commented Sep 1, 2022

btian1 commented Sep 2, 2022

singalsu commented Sep 2, 2022

singalsu commented Sep 20, 2022 • edited Loading

singalsu commented Sep 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

singalsu commented Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood commented Oct 3, 2022

wszypelt commented Oct 3, 2022

singalsu commented Jun 29, 2022 •

edited

Loading

singalsu Jun 29, 2022 •

edited

Loading

singalsu commented Jul 1, 2022 •

edited

Loading

singalsu commented Sep 1, 2022 •

edited

Loading

singalsu commented Sep 20, 2022 •

edited

Loading

singalsu commented Sep 20, 2022 •

edited

Loading

singalsu commented Sep 27, 2022 •

edited

Loading