Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially get worse fits when fitting knee. #35

Closed
TomDonoghue opened this issue Sep 26, 2017 · 13 comments
Closed

Potentially get worse fits when fitting knee. #35

TomDonoghue opened this issue Sep 26, 2017 · 13 comments
Labels

Comments

@TomDonoghue
Copy link
Member

TomDonoghue commented Sep 26, 2017

In theory, asking to fit a knee, should reduce to a linear fit, if that is indeed a better fit. In practice, that is not the case - there are PSDs in which setting to not fit the knee (linear fit) leads to a better fit (in R^2 sense), then also fitting a knee parameter.

It's weird that with an extra parameter to play with, it can perform worse. It might have to do with the interaction between fitting slope & oscillations. It may also simply be due to the fact that fitting a knee effectively adds a new constraint - that the slope of the background go to zero below the knee - and this, in at least some cases, may be unhelpful.

Something to perhaps look into, play around with a bit. If nothing else, supports a strong suggestion to only try to fit a knee if there really is one (although that's potentially hard to evaluate).

@TomDonoghue
Copy link
Member Author

@rdgao Any thoughts on this?

@rdgao
Copy link
Contributor

rdgao commented Sep 30, 2017

hmm, that's weird. Was this before or after the b=1 fix in lorentzian_nk? It's possible that the "linear" fit was better but R^2 was computed on the b=1 fit.

Although, theoretically, the Lorentzian case should still converge to the same optimal fit because it's a superset, in which case it might be a problem with seeding? A few things to potentially explore:

  • do multiple fits with random seeds and see if the converge
  • visually examine test cases when linear fit returns a greater R^2

From a fitting perspective (as oppose to interpretability), it should use whatever gives the best R^2. Theoretically, that should be the general Lorentzian, but if that's not always the case, then yeah having a note would help. Or we can make modality specific suggestions, i.e. EEG/MEG use linear, esp if fitting only up to 30-40Hz, whereas LFP/ECoG should benefit from the knee fit.

I'll take a look in the monkey data to see whether any of the linear fits give a better R^2 and check back.

@rdgao
Copy link
Contributor

rdgao commented Sep 30, 2017

Left panel: linear vs. lorentzian fit R^2 for awake eyes closed (blue) and anesthetized (red)
middle and right panels: example PSDs where linear R^2 > lorentzian R^2
image

It seems like Lorentzian has the most trouble when the low freq region is not flat, which is unsurprising given it tries to fit a flat top, but unexpected because it should converge to a linear fit anyway, so the optimization procedure perhaps warrants some more exploration.

This brings out another issue: what do we do about PSDs with oscillations in the low frequency region? In the right panel, the reason for the non-flat top PSDs is due to a slow oscillation (~1Hz) during anesthesia. This is apparent in the time domain, as well as when you increase the freq resolution of the PSDs (below). It may be the case that people expect to fit a delta oscillation that has the left half cut off?

image

@rdgao
Copy link
Contributor

rdgao commented Nov 5, 2017

@TomDonoghue was this the only issue with the knee fitting, that it doesn't find the knee=0 solution even when default linear fits better?

This seems more to be a scipy optimize problem. In any case, I'm trying to collect all the problems about it to see if I can fix them in one go.

@TomDonoghue
Copy link
Member Author

In terms of why I thought perhaps b=0 is some kind of weird (discontinuous) special case, that optimize wouldn't land on, but it seems from just playing around with the function that it's not really that. It's perhaps/probably some weird interaction of the many steps of FOOOF. A sort of brute-force fix might just be to explicitly test and choose b=0 if it's better...

I'm currently working on the synthetic datasets, which will then give us a better way to parse between versions. I'm also going to do a sweep of the other issues first. This I consider relatively low priority - FOOOF paper as is doesn't really use knee fits, so it's tractable to tag as a v0.1, with a marker that knee fits are still experimental. Long story short - if you have a suggested fix great, but if it's a matter of poking around, I'd wait a bit until we settle any other things over the next couple days, and have synthetic data to properly test with.

@rdgao
Copy link
Contributor

rdgao commented Nov 6, 2017

you're not using knee fits at all? LFP & ECoG are (typically) much better with a knee fit, so I'd recommend at least showing an example of that to capture a greater audience.

in terms of the error, yeah that was my guess too, since background and gaussians are fit in separate steps, so it's possible that error is minimize for the initial background fit, leading to a worse oscillation fit because some of it was already "captured". I was going to try fitting the linear first and use those parameters to fit the background, like you were doing with the quadratic fit originally, but that feels like just piling more hacks. I might play around with some LFP data in the meantime, but won't push anything till you are done with the synthetic test data.

@TomDonoghue
Copy link
Member Author

So, @rdgao we should revisit this, and check in on it. Now that I have a more proper synthetic data testing thing set up, we can more formally explore this a bit - perhaps come to some fixes and/or guidelines.

At a first pass - synthetic fitting is not wildly bad, or good - it can seem to have some problems reconstructing generated knees/slopes (so sometimes, it's not all that good at reconstructing generated parameters).

I think it's partly a degeneracy of the solution space - multiple slope/knee combinations can capture the bend. In some tests, once you have a little noise, it seems to end up with a solution that very reasonable captures the background, but with a different parameter combo than actually generated the data.

So: knee-fitting, as currently implemented, is a great way to capture the background (and thus extract oscillations well), but the actual background fits may be degenerated, and difficult to interpret. It might be a curve_fitting thing that we can tune with bounds, etc, but overall I'm not too sure what to do here.

So far I've only run a small number of tests. It might be, for example, that for more extreme cases (larger frequency range, knee's & slopes that come apart more) it does much better - but even if so, it's not necessarily clear how to relate that back to guidelines and interpretations for real data, plus the issue from above that the procedure doesn't necessarily converge on a background fit that leads to the best fit overall.

@rdgao
Copy link
Contributor

rdgao commented Nov 27, 2017

interesting...so let's say you generate fake data with knee, how often is it that the with-knee fit gives you a slope that's worse than just the linear fit, over the full range?

I think the knee fitting is most useful in getting good oscillations, esp when the knee gets confused as an oscillation, but for slope fitting it's perhaps not as good. So this gets at two potentially different use cases:
#1. fit PSDs as well as it can, in terms of minimizing MSE and capturing oscillations when there are oscillations and don't fit oscillations when there are none.
#2. get best estimates for slope for regions that are actually 1/f.

imo, fitting the knee gets you #1, by virtue of having one more parameter but also in cases when there is obviously a knee, although this is not a given (see monkey plots above).
If the experiment turns out that even with non-accurate parameter retrieval, slope fit with exp is still better than with linear, then it gets you #2 as well, and I think we're good in that case.
If it doesn't, then we should make some decisions, basically either explicitly informing the user of this distinction (i.e. trust slope values less when you fit with exp even if you get a better model overall), or bake in some mechanism to do the fits separately, although it might be confusing since the "best slope fit" slope is a bit different from the "best overall fit" slope.

THAT BEING SAID, I advocate for fitting slope (for the explicit slope value) separately anyway, over a region where it is definitely linear, because otherwise it's less meaningful.

I have a few ideas, starting from easiest to hardest to implement:

  • just run a linear fit anyway prior to the exp fit, when asked to fit exp, and use that slope as seed, or compare the MSE and pick the better model (i.e. manually override the knee fit even when asked)

  • provide both exp and linear fit models regardless, and remove the user option to specify.

  • run exp a few times and pick some convergent value, with potential tradeoff for accuracy in favor of more found sets, or just pick the best fit. The evaluation should probably come after the oscillation fit, since the background likely captures some oscillations, but this might not be worth the complexity for the little gain (compared to just re-run prior to osc fit).

  • fit slope linearly after exp fit, over the region PAST the knee parameter (where it's more likely to be actually linear)

  • iteratively exclude oscillatory regions found with an initial run and redo background fit, up to some time or until convergence.

@rdgao
Copy link
Contributor

rdgao commented Dec 8, 2017

did you want the knee fitting to be relatively not-stupid for the release? If so, I'll try to push something for tomorrow, at the very least implementing the linear vs knee comparison for with-knee fits.

@TomDonoghue
Copy link
Member Author

I'm not sure what you mean by 'implementing linear vs knee comparison' but at this point, I think no algorithmic / API changes for a v0.1.0 tag / soft release (we can note knee fitting is still experimental, with caveats as mentioned above).

Figuring out what's best to do the for knee fitting is probably best served after properly running through simulations, then some exploring options, which is all development over and above tagging a first version, after which I'll focus more on the simulations. From there, if knee-related updates are relatively minor, we can add to v0.1.1, with any other small updates, that being the tagged version I foresee publicizing. (If updates are bigger, maybe supporting proper knee-fitting becomes a v0.2.0 thing, and then we figure out what goes into paper, etc., trying to be careful about scope creep and eternal beta, etc).

@rdgao
Copy link
Contributor

rdgao commented Dec 8, 2017

i mean when the user requests a knee fit, internally running a linear fit beforehand and return the better model (with knee = 0 if linear is better)

@TomDonoghue
Copy link
Member Author

For v0.1.0, we'll note that 'knee' fitting is still somewhat experimental, and in particular, only knee fit when you have high confidence there is a knee. A fuller figuring out / updating of this is pushed to v0.2

@TomDonoghue
Copy link
Member Author

Okay, so I'm going to say this is more of a development question (concept / algorithm related more code related) and also that some aspects of this thread are quite outdated.

Moving this over to the development board here:
fooof-tools/Development#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants