You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I applied SQUIM to assess speech quality as a way to correct the direction-of-arrival of a location-based speech enhancement system. More info here.
I'm feeding the last 3-second window of the input to SQUIM, every 0.1 seconds. It is able to respond in less than that time: it featured a maximum response time of 0.0704 seconds. Thus, in terms of response time, SQUIM seems to be able to run in real-time.
However, it does seem to struggle in providing a constant speech quality assessment throughout. I'm using the SI-SDR metric from the objective model. With the a speech recording with no enhancement or spatial variation carried out, the ideal behavior would be that SQUIM provided the same SI-SDR measurement through time, but, as it can be seen in Figure 2 of the aforementioned paper, it does not. It varies wildly, which required some smoothing to work well with the rest of the system.
So here are my questions:
Is it possible to modify SQUIM for this type of real-time application? I'm assuming it would need some sort of causalness built into it. Or not? I was actually impressed it was able to provide a workable result without any modification. Maybe a fine-tuning would be enough?
If so, what are the steps you would reccomend that I partake in fine-tuning SQUIM? I've taken a look at this paper that @nateanl provided to another user inquired about it (in Batch processing torchaudio-squim #3424), but it is still not clear to me how I should proceed.
Is SQUIM the best alternative for this? I've looked at other techniques for non-reference speech quality assessment, and it seems SQUIM is up there with the best of them for offline applications. But for real-time scenarios, I'm not sure.
Thank you in advance for any help/guidance you can provide. I'm open to help out in any way, if need be, to make SQUIM work better in real-time applications.
The text was updated successfully, but these errors were encountered:
I applied SQUIM to assess speech quality as a way to correct the direction-of-arrival of a location-based speech enhancement system. More info here.
I'm feeding the last 3-second window of the input to SQUIM, every 0.1 seconds. It is able to respond in less than that time: it featured a maximum response time of 0.0704 seconds. Thus, in terms of response time, SQUIM seems to be able to run in real-time.
However, it does seem to struggle in providing a constant speech quality assessment throughout. I'm using the SI-SDR metric from the objective model. With the a speech recording with no enhancement or spatial variation carried out, the ideal behavior would be that SQUIM provided the same SI-SDR measurement through time, but, as it can be seen in Figure 2 of the aforementioned paper, it does not. It varies wildly, which required some smoothing to work well with the rest of the system.
So here are my questions:
Thank you in advance for any help/guidance you can provide. I'm open to help out in any way, if need be, to make SQUIM work better in real-time applications.
The text was updated successfully, but these errors were encountered: