WebSocket Closure, Background Noise Interruption, and ASR Resource Management in vosk-asterisk #53

siship · 2024-10-28T10:02:31Z

Hello,

We have configured the vosk-asterisk and tried to create some demos with custom Kaldi models. We have been using Vosk offline API - it's great! thank you @nshmyrev.

While creating the demo for a streaming application where interruption is important and doing the testing we got a few issues.

Vosk websocket streaming-based ASR server:

• Number of cores: 12
• RAM: 56GB
• Total number of instances: 51
• Load averages:
    ◦ For ~50 active calls: ~17
    ◦ For ~67 active calls: ~22
    ◦ For ~30 active calls: ~12
    ◦ For ~15 active calls: ~5

model.config

--min-active=200
--max-active=7000
--beam=11.0
--lattice-beam=6.0
--acoustic-scale=1.0
--frame-subsampling-factor=3
--endpoint.silence-phones=1:2:3:4:5:
--endpoint.rule1.min-trailing-silence=20
--endpoint.rule2.min-trailing-silence=0.5
--endpoint.rule3.min-trailing-silence=1.0
--endpoint.rule4.min-trailing-silence=2.0

A few observations:
• It can only handle 50 active streaming calls without delay with the mentioned resources (offline vosk api can handle atleast 300 calls at a time with the same resources)
• The WebSocket closes unexpectedly before sending the final text result, without any error. As a result, we receive None from Vosk.
• Even small background noise interrupts the call.

Questions:
1. How can we prevent the IVR hold tone from triggering the ASR model? Is there any easy solution?
2. The ASR is triggered continuously, even during long silences (e.g., when the user is listening to long IVR questions), consuming more resources. Is there a more efficient way to handle this?
3. [Related to point 2] Is it possible to integrate an external VAD algorithm, such as Silero-VAD, with vosk-asterisk to sense silence and noise before sending to ASR?
4. We noticed that when speaking on speakerphone (hands-free), the recognition results are somewhat inconsistent and inaccurate. Could this be an issue with how Asterisk is recording the audio?

Versions:
Asterisk 18.21.0
Centos 7

Thanks in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebSocket Closure, Background Noise Interruption, and ASR Resource Management in vosk-asterisk #53

WebSocket Closure, Background Noise Interruption, and ASR Resource Management in vosk-asterisk #53

siship commented Oct 28, 2024

WebSocket Closure, Background Noise Interruption, and ASR Resource Management in vosk-asterisk #53

WebSocket Closure, Background Noise Interruption, and ASR Resource Management in vosk-asterisk #53

Comments

siship commented Oct 28, 2024