[DNM] Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814
Open
singalsu wants to merge 8 commits into
Open
[DNM] Audio: MFCC: Use the MFCC module as compress PCM encoder with discontinuous stream#10814singalsu wants to merge 8 commits into
singalsu wants to merge 8 commits into
Conversation
Add mfcc_vad module with A-weighted energy-based voice activity detection that operates on the Mel log spectrum produced by the MFCC component. The algorithm tracks a per-bin noise floor with instant-down and slow-rise behavior, then computes a weighted energy delta above the floor. Speech is declared when the delta exceeds a threshold (0.35 in Q9.23) with a 20-frame hangover to prevent rapid toggling. The VAD is gated on the new enable_vad flag in sof_mfcc_config. Add struct mfcc_data_header with six int32 fields (magic, frame_number, reserved, energy, noise_energy, vad_flag) prepended to every output frame in all format paths (S16, S24, S32). This replaces the previous magic-word-only header. The header carries the VAD decision and energy values from the DSP for downstream consumers. Extend sof_mfcc_config in user/mfcc.h with reserved16[3] padding for 32-bit alignment, and new boolean fields enable_vad, enable_dtx, update_controls, and reserved_bool[5]. The config blob size increases from 104 to 116 bytes. Update Matlab/Octave decode scripts (decode_mel.m, decode_ceps.m, decode_all.m) and setup_mfcc.m for the expanded header and config struct. Regenerate topology2 configuration blobs (default.conf, mel80.conf) with the new blob size. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add sof_mel_to_text_live_dsp_vad.py that captures mel spectrogram frames from ALSA with embedded DSP VAD flag and performs live speech-to-text transcription using OpenVINO Whisper. The script buffers mel frames during speech and triggers Whisper inference when silence is detected after speech. Capture runs continuously in a separate thread during inference to avoid frame drops. Replace the old README.txt with a comprehensive README.md that documents the MFCC tuning tools, testbench usage with run_mfcc.sh, output file formats, Matlab/Octave decode and plotting scripts, and the new live transcription workflow. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add IPC4 notification that sends the VAD state to user space via a switch control whenever the VAD decision changes between speech and silence. The notification is initialized during prepare and sent from the audio processing path on VAD state transitions. The implementation follows the TDFB/sound_dose notification pattern: mfcc_ipc4.c contains the IPC4-specific notification init and send functions, while mfcc.c provides weak stubs so IPC3 builds link without the IPC4 dependencies. Add handling for SOF_IPC4_SWITCH_CONTROL_PARAM_ID in mfcc_get_config and mfcc_set_config so the kernel driver can read back the current VAD state after receiving a notification. The switch control is read-only from the DSP side. Both the notification init and the VAD state change detection are gated on the update_controls flag in the configuration blob struct. Add a switch control (mixer) to the MFCC topology2 widget definition for the VAD notification. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
mfcc_reset() did not free buffers allocated by mfcc_setup(), so a stop->reset->prepare->start cycle would leak all MFCC allocations (FFT buffers, mel filterbank, DCT matrix, lifter, VAD buffers). This patch fixes the issue by calling mfcc_free_buffers() from mfcc_reset(). The pointers are set to NULL after free via a helper function mfcc_free_and_null(), so mfcc_free() won't double-free when it calls mfcc_free_buffers() again later. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Collaborator
Author
|
Note: To run the MFCC compress topologies, need kernel patches thesofproject/linux#5647 and thesofproject/linux#5789. |
singalsu
commented
May 26, 2026
d5267b3 to
969d644
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the SOF MFCC component and related tooling/topology to support VAD + DTX behavior and to use MFCC as a compress PCM “encoder” that can emit discontinuous (DTX-suppressed) feature frames, including optional IPC4 control notifications for VAD state.
Changes:
- Add MFCC VAD/DTX support in firmware (new VAD implementation, frame header with VAD/energy fields, optional IPC4 notifications, and compress-output mode).
- Add/adjust topology2 definitions to expose MFCC feature capture for both normal PCM and compress PCM on SDW jack/DMIC, including new build targets.
- Update MFCC tuning/export and host-side decode/visualization/transcription tools (Matlab/Octave + Python scripts), plus new documentation.
Reviewed changes
Copilot reviewed 40 out of 40 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for jack feature capture. |
| tools/topology/topology2/platform/intel/sdw-jack-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for jack (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf | Adds MFCC frame sizing define and VAD mixer control naming for DMIC feature capture. |
| tools/topology/topology2/platform/intel/sdw-dmic-audio-feature-compress.conf | New compress PCM MFCC feature-capture topology for DMIC (MFCC encoder type, blob selection, VAD control). |
| tools/topology/topology2/platform/intel/dmic1-mfcc.conf | Renames MFCC bytes control and adds VAD mixer control naming. |
| tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf | Adds MFCC_FRAME_BYTES-driven ibs/obs to support variable-sized (compress) MFCC frames. |
| tools/topology/topology2/include/components/mfcc/mel80.conf | Updates exported MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/mel80_compress.conf | New exported MFCC configuration blob for compress output. |
| tools/topology/topology2/include/components/mfcc/mel80_compress_dtx.conf | New exported MFCC configuration blob for compress output + DTX. |
| tools/topology/topology2/include/components/mfcc/default.conf | Updates exported default MFCC configuration blob. |
| tools/topology/topology2/include/components/mfcc/ceps13_compress_dtx.conf | New exported MFCC configuration blob for cepstral output + compress + DTX. |
| tools/topology/topology2/include/components/mfcc.conf | Adds mixer control template to MFCC widget and allows type override (e.g., encoder). |
| tools/topology/topology2/include/common/common_definitions.conf | Adds default feature flags for SDW jack/DMIC compress MFCC capture. |
| tools/topology/topology2/include/bench/mfcc_controls_playback.conf | Enables an MFCC mixer switch control in bench playback controls. |
| tools/topology/topology2/include/bench/mfcc_controls_capture.conf | Enables an MFCC mixer switch control in bench capture controls. |
| tools/topology/topology2/development/tplg-targets.cmake | Renames MFCC topology targets and adds compress MFCC mel/ceps variants with frame sizing + blob selection. |
| tools/topology/topology2/cavs-sdw.conf | Adds feature-gated includes for new compress MFCC capture topologies. |
| src/include/user/mfcc.h | Extends MFCC config ABI with VAD/DTX/compress flags and timing parameters. |
| src/include/sof/audio/mfcc/mfcc_vad.h | New VAD API/state definitions for MFCC. |
| src/include/sof/audio/mfcc/mfcc_comp.h | Refactors MFCC component interfaces (source/sink API, frame header, VAD/DTX state, IPC4 helpers). |
| src/audio/mfcc/tune/sof_mel_to_text_live_dsp_vad.py | New live Whisper transcription script using DSP VAD embedded in PCM stream. |
| src/audio/mfcc/tune/sof_mel_to_text_live_compress.py | New live Whisper transcription script for compress PCM + DTX/discontinuous frames. |
| src/audio/mfcc/tune/sof_mel_spectrogram_compress.py | New live mel spectrogram viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/sof_ceps_spectrogram_compress.py | New live cepstral viewer for compress PCM MFCC frames. |
| src/audio/mfcc/tune/setup_mfcc.m | Updates blob export for new config layout; adds compress + DTX blob exports. |
| src/audio/mfcc/tune/README.txt | Removed in favor of README.md. |
| src/audio/mfcc/tune/README.md | New markdown documentation for tuning, decoding, and live scripts. |
| src/audio/mfcc/tune/decode_mel.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_ceps.m | Updates decoder for new int32 + header format and DTX gap filling. |
| src/audio/mfcc/tune/decode_all.m | Updates batch decode to new decoder signatures and int32 outputs. |
| src/audio/mfcc/mfcc.c | Moves MFCC to source/sink API processing, hooks VAD notifications and compress/DTX behavior. |
| src/audio/mfcc/mfcc_vad.c | New VAD implementation (noise floor tracking + weighted energy + hangover). |
| src/audio/mfcc/mfcc_setup.c | Adds VAD init, DTX/compress state init, buffer free fixes, sample-rate limit check. |
| src/audio/mfcc/mfcc_ipc4.c | New IPC4 control notification plumbing for VAD state reporting. |
| src/audio/mfcc/mfcc_hifi4.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_hifi3.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_generic.c | Removes old stream-buffer source copy implementations (now in common source/sink code). |
| src/audio/mfcc/mfcc_common.c | Adds source/sink copy funcs, header/VAD handling, legacy vs compress output paths, and DTX suppression logic. |
| src/audio/mfcc/CMakeLists.txt | Registers new mfcc_vad.c and conditionally mfcc_ipc4.c in build. |
| src/audio/base_fw.c | Advertises BESPOKE codec capability for MFCC compress capture. |
Switch from process_audio_stream to source/sink API. Add compress PCM output mode (variable-size frames, no zero padding) alongside legacy mode (full period with zero-fill). Unify all output to int32 Q9.23 regardless of source format. Remove out_data_ptr_32, mel_spectra int16 copy, mfcc_func typedef, and per-format output functions from mfcc_common/hifi3/hifi4. Add DTX for compress mode: suppress silence frames after configurable trailing count, with optional periodic keepalive. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Register SND_AUDIOCODEC_BESPOKE capture in codec info TLV when CONFIG_COMP_MFCC is enabled so the kernel detects compress capture support via IPC4_SOF_CODEC_INFO. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Update Octave decode scripts for int32 Q9.23 output and DTX gap filling. Add DTX blob generation to setup_mfcc.m. Add Python compress capture tools: sof_mel_spectrogram_compress.py, sof_ceps_spectrogram_compress.py, sof_mel_to_text_live_compress.py. Refactor sof_mel_to_text_live_dsp_vad.py to use shared compress capture code. Add README with usage examples. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add sdw-jack-audio-feature-compress.conf (PCM 53, pipeline 132) and sdw-dmic-audio-feature-compress.conf (PCM 54, pipeline 133) for compress MFCC capture with DTX blobs. Fix buffer sizes: set MFCC obs and host-copier ibs/obs to 344 bytes (24-byte header + 80 x int32). Add mel and ceps compress topology targets for MTL and ARL. Rename normal MFCC topologies to *-mfcc-mel-normal for clarity. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
35e56d5 to
71404ce
Compare
Comment on lines
+407
to
+441
| int ret; | ||
|
|
||
| if (num_ceps <= 0) | ||
| return 0; | ||
|
|
||
| out_bytes = sizeof(state->header) + num_ceps * sizeof(int32_t); | ||
|
|
||
| if (cd->config->enable_vad && !cd->vad.is_speech) { | ||
| state->vad_silence_count++; | ||
| /* With DTX enabled, send trailing silence frames | ||
| * (configurable count) then suppress. After trailing | ||
| * frames, optionally send periodic silence updates | ||
| * at the configured interval. This gives the host | ||
| * enough silence to detect end-of-speech while | ||
| * keeping alive updates during long silence. | ||
| * Without DTX, output every frame regardless of VAD. | ||
| */ | ||
| if (cd->config->enable_dtx) { | ||
| if (state->vad_silence_count > state->dtx_trailing_silence) { | ||
| /* Check periodic silence frame send */ | ||
| if (state->dtx_silence_interval > 0) { | ||
| state->dtx_silence_counter++; | ||
| if (state->dtx_silence_counter >= state->dtx_silence_interval) { | ||
| state->dtx_silence_counter = 0; | ||
| goto send_frame; | ||
| } | ||
| } | ||
| state->header_pending = false; | ||
| state->out_remain = 0; | ||
| return 0; | ||
| } | ||
| } | ||
| } else { | ||
| state->vad_silence_count = 0; | ||
| state->dtx_silence_counter = 0; |
Comment on lines
+444
to
+448
| send_frame: | ||
| commit_bytes = out_bytes; | ||
|
|
||
| static int32_t *mfcc_sink_copy_data_s32(const struct audio_stream *sink, int32_t *w_ptr, | ||
| int samples, int32_t *r_ptr) | ||
| { | ||
| int copied; | ||
| int nmax; | ||
| int n; | ||
| if (sink_get_free_size(sinks[0]) < commit_bytes) | ||
| return -ENOSPC; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds commits to previous VAD add PR #10782
A kernel PR for encoder type ALSA controlx fix is needed to run this.