-
Notifications
You must be signed in to change notification settings - Fork 483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surge.fft() and numpy.fft.fft() seem to produce different results #94
Comments
Looks like they are using different scales. How are you computing the x-axis? Maybe try removing |
Yeah. Here is the dummy data used to create the above signals/graphs above. Column 1, "RightLegAngleSignal", is what I used to construct the periodogram in Swift as well as numpy/python. The last column, "pgram", is the periodogram which Surge outputted (First graph above). If you run the python code on Column 1, "RightLegAngleSignal", you will get the second graph. Thanks so much |
I see there are a ton of |
Yes. In both Swift and Python, I filtered them out. |
I get a different result, are you on the latest version? |
I'm on Surge 2.0.0. Are both your Python and Swift periodograms matching? |
Try 2.2.0 |
Thanks I'll try that right now |
Ok I just ran the app again (using Surge 2.2.0) and generated a new CSV, Swift Periodogram, and Numpy Periodogram. Data attached below: They are still different (but have some similarities in shape). Do you think this has something to do with the symmetry of the FFT around the center point? Also it also looks like the scales are different, do you know why that is? Thanks so much |
DSP guy here. I've managed to match Surge's FFT and numpy's FFT. They are not equivalent as it stands. Here are the differences:
I rewrote the Surge FFT code to window with a hamming window and return the power spectrum (no scaling) to match the librosa one-liner: I tested on 1 frame of the same audio wav file within Python and on iOS using surge and got basically the same results (except for some floating point precision differences).
Not sure what the actionable items are for the Surge maintainer(s) to patch up the FFT function. It's not necessarily wrong, I think it just needs some more parameters so people can match output from other libraries. Possibly adding a window parameter to the Surge FFT function as well as an option to specify whether to scale the result. In my opinion letting people scale the result outside of this function is probably best. |
Hey thanks for writing that. What does the mul(input,win) do? does it multiply the two arrays? hows does that work? |
Element-wise multiplication |
I'm personally not familiar enough with the FFT, so … @gburlet would you consider your modified implementation a bug fix or an alternative to our current one? Or put differently: would you say our current implementation is wrong and should be fixed, or is it simply a different way of using FFT and we might want to provide a choice to the user? Either way I'd be more than happy to merge a PR addressing the issue appropriately. 🙂 |
I would say the Surge FFT algorithm returns a scaled amplitude spectrum (something like My personal recommendation would be to try to match output of heavy-use libraries like numpy, scipy for compatibility between experimental systems (done off device, e.g., machine learning experimentation) and production systems (e.g., prediction on device). Specific Recommendations:
|
This is great, thanks a lot @gburlet! |
@gburlet Do you have the source for the mul() in the following line? |
Hi @brett-eugenelabs, it's element-wise multiplication: same function as |
@gburlet is there a possibility in your example to make This is my torch.stft example in case needed.
|
Hi @abokhalel2, my example is just for the FFT on a single window, not an STFT. To get the STFT you would sliding window the samples and send it to the function above. Typically both the window and hop size are powers of 2 with the hop being some division of the window size. Hope that helps. |
I have a perhaps naive question for @gburlet - (im not a DSP guy at all so forgive me if this is a newbie q) - why do you fill only the real components of of the DSP Split Complex with the source audio frame? Looking at Apples Mel sample code on the vDSP page, they bind the incoming time domain audio signal via With their implementation, for a FFT size of 400, you'd get 200 complex samples back, or 200 |S|^2 magnitude values. With your implementation, you'd get 400 magnitude values back? Am I understanding that correctly? And if so, what are the differences in the FFT output when populating the Split Complex with only real components, vs running Thank you in advance. This thread has been INCREDIBLY helpful in trying to match Torches STFT code to produce a Log Mel Spectrogram. Im not there yet, but im getting closest thanks to this thread. |
@vade I unfortunately don't have time / brain bandwidth to jump into this in more detail but see above, this point specifically, which may help you understand:
|
Hey @gburlet - no worries, Your input here has been super helpful. I appreciate the prompt reply! thank you! |
@vade I see you're contributing to an OpenAI whisper CoreML project. I like that. Find my email online and I'll share my Accelerate vDSP Mel Spectrogram code written in swift privately. It took a while to make it match librosa output. Not sure if it will match Torches STFT code but might give you another reference point. Also, from what I remember, it's really fast. |
Much obliged! For real! |
@gburlet I'm trying to use your
At the line:
Am I missing something? What kind of multiplication is this element wise, or ...? Sorry if this question seems obtuse. |
@annoybot see above, it has been answered |
Ah yes, indeed it has been answered, thanks. 😬 |
@gburlet, I hope the following question is not too off topic for this thread. Do you think there would be an advantage in using Welch's method over a plain FFT when processing sound data from an instrument? I was wondering if it would help eliminate noise? |
I am attempting to plot a periodogram in Swift of a signal using Surge.
Code:
Plotting pgram yields the following results
However, after loading and creating the exact same periodogram in Python I get a very different periodogram.
Code:
pgram = (2.0/len(signal)) * numpy.power(numpy.fft.fft(signal), 2)
Since I am using the exact same method to plot the periodogram (and the same data as well), I was wondering if there are some differences in the implementation of the Surge fft and numpy fft which might cause this issue?
The text was updated successfully, but these errors were encountered: