[Feature]: Kernel Density Estimation #193

humphreylee · 2023-09-22T00:23:18Z

Thanks for sharing the good work. Is there any implementation for kernel density estimation available (univariate, bivariate or multivariate)? Thanks.

wangjiawen2013 · 2023-10-08T03:46:29Z

I think this is very helpful ! Here is repo for kernel density estimation (https://github.com/seatonullberg/kernel-density-estimation)
It would be great to incorporate it statrs

YeungOnion · 2024-03-08T21:46:41Z

I'm unsure how well this would fit into the exist tooling in statrs. While we do have the Empirical distribution, introducing hyperparameters to the data for data driven distributions starts to fit into a broader realm of statistics that is data-driven.

@henryjac what would you say? We're amidst defining new direction, but I think this would fit better in a different crate.

henryjac · 2024-03-09T15:57:20Z

I would not be against adding functionality for performing more data driven distributions and functionality. With this being a statistics crate most features regarding any kind of statistics fits quite well.

We even already have functionality for statistics in the statistics module, so expanding that with KDE etc. makes sense to me.

YeungOnion · 2024-03-11T00:21:51Z

Well, I would be happy to support data-driven distributions as long as we look ahead a little bit at what we'll choose (for near-future) as out of scope. Perhaps I'd have done better had I said that I'm averse to looking at it now since we've got some short-term priorities right now.

Overall, what I think is that it just takes some discussion on clearly choosing scope, starting with upper bound (things that will certainly be out of scope). What can you say won't be in scope (for near-future)?

But all cards on the table, I actually think KDE would be good candidate compared to some other data-driven distributions, reasons being:

the distribution function for it similar to named random variables in statrs::distribution in that it is,
- invariant under permutation of data
- deterministic and closed form in terms of specified data and kernel (plus possible hyperparameters)
I'd expect real-world use-cases where data volume is less than scale of memory (i.e. not developing API to support larger-than-memory beyond considering IntoIterator)

wangjiawen2013 mentioned this issue Oct 8, 2023

statrs kernel density estimation seatonullberg/kernel-density-estimation#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Kernel Density Estimation #193

[Feature]: Kernel Density Estimation #193

humphreylee commented Sep 22, 2023

wangjiawen2013 commented Oct 8, 2023

YeungOnion commented Mar 8, 2024

henryjac commented Mar 9, 2024

YeungOnion commented Mar 11, 2024

[Feature]: Kernel Density Estimation #193

[Feature]: Kernel Density Estimation #193

Comments

humphreylee commented Sep 22, 2023

wangjiawen2013 commented Oct 8, 2023

YeungOnion commented Mar 8, 2024

henryjac commented Mar 9, 2024

YeungOnion commented Mar 11, 2024