Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQUIM running in real-time #3870

Open
balkce opened this issue Jan 9, 2025 · 0 comments
Open

SQUIM running in real-time #3870

balkce opened this issue Jan 9, 2025 · 0 comments

Comments

@balkce
Copy link

balkce commented Jan 9, 2025

I applied SQUIM to assess speech quality as a way to correct the direction-of-arrival of a location-based speech enhancement system. More info here.

I'm feeding the last 3-second window of the input to SQUIM, every 0.1 seconds. It is able to respond in less than that time: it featured a maximum response time of 0.0704 seconds. Thus, in terms of response time, SQUIM seems to be able to run in real-time.

However, it does seem to struggle in providing a constant speech quality assessment throughout. I'm using the SI-SDR metric from the objective model. With the a speech recording with no enhancement or spatial variation carried out, the ideal behavior would be that SQUIM provided the same SI-SDR measurement through time, but, as it can be seen in Figure 2 of the aforementioned paper, it does not. It varies wildly, which required some smoothing to work well with the rest of the system.

So here are my questions:

  • Is it possible to modify SQUIM for this type of real-time application? I'm assuming it would need some sort of causalness built into it. Or not? I was actually impressed it was able to provide a workable result without any modification. Maybe a fine-tuning would be enough?
  • If so, what are the steps you would reccomend that I partake in fine-tuning SQUIM? I've taken a look at this paper that @nateanl provided to another user inquired about it (in Batch processing torchaudio-squim #3424), but it is still not clear to me how I should proceed.
  • Is SQUIM the best alternative for this? I've looked at other techniques for non-reference speech quality assessment, and it seems SQUIM is up there with the best of them for offline applications. But for real-time scenarios, I'm not sure.

Thank you in advance for any help/guidance you can provide. I'm open to help out in any way, if need be, to make SQUIM work better in real-time applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant