You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have configured the vosk-asterisk and tried to create some demos with custom Kaldi models. We have been using Vosk offline API - it's great! thank you @nshmyrev.
While creating the demo for a streaming application where interruption is important and doing the testing we got a few issues.
Vosk websocket streaming-based ASR server:
• Number of cores: 12
• RAM: 56GB
• Total number of instances: 51
• Load averages:
◦ For ~50 active calls: ~17
◦ For ~67 active calls: ~22
◦ For ~30 active calls: ~12
◦ For ~15 active calls: ~5
A few observations:
• It can only handle 50 active streaming calls without delay with the mentioned resources (offline vosk api can handle atleast 300 calls at a time with the same resources)
• The WebSocket closes unexpectedly before sending the final text result, without any error. As a result, we receive None from Vosk.
• Even small background noise interrupts the call.
Questions:
1. How can we prevent the IVR hold tone from triggering the ASR model? Is there any easy solution?
2. The ASR is triggered continuously, even during long silences (e.g., when the user is listening to long IVR questions), consuming more resources. Is there a more efficient way to handle this?
3. [Related to point 2] Is it possible to integrate an external VAD algorithm, such as Silero-VAD, with vosk-asterisk to sense silence and noise before sending to ASR?
4. We noticed that when speaking on speakerphone (hands-free), the recognition results are somewhat inconsistent and inaccurate. Could this be an issue with how Asterisk is recording the audio?
Versions:
Asterisk 18.21.0
Centos 7
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Hello,
We have configured the vosk-asterisk and tried to create some demos with custom Kaldi models. We have been using Vosk offline API - it's great! thank you @nshmyrev.
While creating the demo for a streaming application where interruption is important and doing the testing we got a few issues.
Vosk websocket streaming-based ASR server:
model.config
A few observations:
• It can only handle 50 active streaming calls without delay with the mentioned resources (offline vosk api can handle atleast 300 calls at a time with the same resources)
• The WebSocket closes unexpectedly before sending the final text result, without any error. As a result, we receive None from Vosk.
• Even small background noise interrupts the call.
Questions:
1. How can we prevent the IVR hold tone from triggering the ASR model? Is there any easy solution?
2. The ASR is triggered continuously, even during long silences (e.g., when the user is listening to long IVR questions), consuming more resources. Is there a more efficient way to handle this?
3. [Related to point 2] Is it possible to integrate an external VAD algorithm, such as Silero-VAD, with vosk-asterisk to sense silence and noise before sending to ASR?
4. We noticed that when speaking on speakerphone (hands-free), the recognition results are somewhat inconsistent and inaccurate. Could this be an issue with how Asterisk is recording the audio?
Versions:
Asterisk 18.21.0
Centos 7
Thanks in advance.
The text was updated successfully, but these errors were encountered: