Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgIntervalExcl #303

Open
rm-minus-r-star opened this issue Sep 24, 2024 · 14 comments
Open

ArgIntervalExcl #303

rm-minus-r-star opened this issue Sep 24, 2024 · 14 comments

Comments

@rm-minus-r-star
Copy link

I guess I've hit an edge case that was never expected to fail ...

While trying to (I think) get my gradients right in optimization_engine to find a best-fit gamma distribution under the statrs crate, I get

thread 'gamma_from_quantile_check_3' panicked at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/function/gamma.rs:245:28:
called `Result::unwrap()` on an `Err` value: ArgIntervalExcl("x", 0.0, inf)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The RUST_BACKTRACE=full is

thread 'gamma_from_quantile_check_3' panicked at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/function/gamma.rs:245:28:
called `Result::unwrap()` on an `Err` value: ArgIntervalExcl("x", 0.0, inf)
stack backtrace:
   0:     0x55e3bf67c895 - std::backtrace_rs::backtrace::libunwind::trace::h649ab3318d3445c5
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:116:5
   1:     0x55e3bf67c895 - std::backtrace_rs::backtrace::trace_unsynchronized::hf4bb60c3387150c3
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x55e3bf67c895 - std::sys::backtrace::_print_fmt::hd9186c800e44bd00
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/backtrace.rs:65:5
   3:     0x55e3bf67c895 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::h1b9dad2a88e955ff
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/backtrace.rs:40:26
   4:     0x55e3bf6a230b - core::fmt::rt::Argument::fmt::h351a7824f737a6a0
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/fmt/rt.rs:173:76
   5:     0x55e3bf6a230b - core::fmt::write::h4b5a1270214bc4a7
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/fmt/mod.rs:1182:21
   6:     0x55e3bf67a25f - std::io::Write::write_fmt::hd04af345a50c312d
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/io/mod.rs:1827:15
   7:     0x55e3bf67e0b1 - std::sys::backtrace::BacktraceLock::print::h68d41b51481bce5c
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/backtrace.rs:43:9
   8:     0x55e3bf67e0b1 - std::panicking::default_hook::{{closure}}::h96ab15e9936be7ed
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:269:22
   9:     0x55e3bf67dd8c - std::panicking::default_hook::h3cacb9c27561ad33
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:296:9
  10:     0x55e3bf67e711 - std::panicking::rust_panic_with_hook::hfe205f6954b2c97b
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:800:13
  11:     0x55e3bf67e577 - std::panicking::begin_panic_handler::{{closure}}::h6cb44b3a50f28c44
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:674:13
  12:     0x55e3bf67cd59 - std::sys::backtrace::__rust_end_short_backtrace::hf1c1f2a92799bb0e
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/backtrace.rs:168:18
  13:     0x55e3bf67e204 - rust_begin_unwind
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:665:5
  14:     0x55e3bf57b6d3 - core::panicking::panic_fmt::h3d8fc78294164da7
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panicking.rs:74:14
  15:     0x55e3bf57bbe6 - core::result::unwrap_failed::hfa79a499befff387
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:1679:5
  16:     0x55e3bf6165e1 - core::result::Result<T,E>::unwrap::h4d5069e52e690aad
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:1102:23
  17:     0x55e3bf6165e1 - statrs::function::gamma::gamma_lr::hb4c143124c0eee71
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/function/gamma.rs:245:5
  18:     0x55e3bf61580b - <statrs::distribution::gamma::Gamma as statrs::distribution::ContinuousCDF<f64,f64>>::cdf::ha0b4a6b5543280f0
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/distribution/gamma.rs:118:13
  19:     0x55e3bf6158fc - <statrs::distribution::gamma::Gamma as statrs::distribution::ContinuousCDF<f64,f64>>::inverse_cdf::h72660ef191f61bd8
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/distribution/gamma.rs:161:15
  20:     0x55e3bf5c2aa8 - verteilungen::ops_gammashifted::GammaShiftedDistribution::fit_to_quantiles::{{closure}}::hbf3b9a90dbad0f56
                               at /home/me/projdir/crates.local/verteilungen/src/ops_gammashifted.rs:211:31
  21:     0x55e3bf5f1d27 - core::ops::function::impls::<impl core::ops::function::Fn<A> for &F>::call::h26741b6cf369fbe2
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:262:13
  22:     0x55e3bf5f8242 - optimization_engine::core::panoc::panoc_engine::PANOCEngine<GradientType,ConstraintType,CostType>::update_lipschitz_constant::h9f83205051cb52da
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/optimization_engine-0.9.1/src/core/panoc/panoc_engine.rs:168:9
  23:     0x55e3bf5f2859 - <optimization_engine::core::panoc::panoc_engine::PANOCEngine<GradientType,ConstraintType,CostType> as optimization_engine::core::AlgorithmEngine>::step::h1fe822003e09f10a
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/optimization_engine-0.9.1/src/core/panoc/panoc_engine.rs:330:9
  24:     0x55e3bf5fe462 - <optimization_engine::core::panoc::panoc_optimizer::PANOCOptimizer<GradientType,ConstraintType,CostType> as optimization_engine::core::Optimizer>::solve::hd920e90590649da6
                               at /home/me/.cargo/registry/src/index.crates.io-6f17d22bba15001f/optimization_engine-0.9.1/src/core/panoc/panoc_optimizer.rs:151:29
  25:     0x55e3bf5e3e43 - verteilungen::optimierung::Optimierungproblem::machen::h87d5bb488d161c91
                               at /home/me/projdir/crates.local/verteilungen/src/optimierung.rs:125:22
  26:     0x55e3bf5ee119 - verteilungen::ops_gammashifted::GammaShiftedDistribution::fit_to_quantiles::h5451b3ca911dc406
                               at /home/me/projdir/crates.local/verteilungen/src/ops_gammashifted.rs:430:24
  27:     0x55e3bf5d4b96 - verteilungen::ops_alles::OpsDistribution::fit_to_quantiles::h170f113577677218
                               at /home/me/projdir/crates.local/verteilungen/src/ops_alles.rs:66:65
  28:     0x55e3bf57de01 - test_gammashifted::gamma_from_quantile_check_3::h541798544e0aa6fc
                               at /home/me/projdir/crates.local/verteilungen/tests/test_gammashifted.rs:42:11
  29:     0x55e3bf580077 - test_gammashifted::gamma_from_quantile_check_3::{{closure}}::hdbfc21990461ffaa
                               at /home/me/projdir/crates.local/verteilungen/tests/test_gammashifted.rs:29:33
  30:     0x55e3bf5806c6 - core::ops::function::FnOnce::call_once::h9485c7780f7932fb
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
  31:     0x55e3bf5ba80b - core::ops::function::FnOnce::call_once::h81f56a195fe4862e
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
  32:     0x55e3bf5ba80b - test::__rust_begin_short_backtrace::h919c79c8b896f9e2
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/test/src/lib.rs:624:18
  33:     0x55e3bf5ba0b5 - test::run_test_in_process::{{closure}}::h7b3d5751c5b4dd75
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/test/src/lib.rs:647:60
  34:     0x55e3bf5ba0b5 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hdabd61465e4dbd80
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panic/unwind_safe.rs:272:9
  35:     0x55e3bf5ba0b5 - std::panicking::try::do_call::hc813c79fd64b0a90
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:557:40
  36:     0x55e3bf5ba0b5 - std::panicking::try::h055c5de7e7bfc209
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:521:19
  37:     0x55e3bf5ba0b5 - std::panic::catch_unwind::h4265d6525195c807
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panic.rs:350:14
  38:     0x55e3bf5ba0b5 - test::run_test_in_process::he72c277a35f96567
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/test/src/lib.rs:647:27
  39:     0x55e3bf5ba0b5 - test::run_test::{{closure}}::h974e632522c0fbcf
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/test/src/lib.rs:568:43
  40:     0x55e3bf5820a4 - test::run_test::{{closure}}::hdc2c89ce8b601dda
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/test/src/lib.rs:598:41
  41:     0x55e3bf5820a4 - std::sys::backtrace::__rust_begin_short_backtrace::h342cb8e53aeb2076
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/backtrace.rs:152:18
  42:     0x55e3bf5857d2 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h67b1b5c1709ad95b
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/thread/mod.rs:538:17
  43:     0x55e3bf5857d2 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hd8c7a030ea8b7676
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/panic/unwind_safe.rs:272:9
  44:     0x55e3bf5857d2 - std::panicking::try::do_call::h512c2ab2c15b7d31
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:557:40
  45:     0x55e3bf5857d2 - std::panicking::try::h5c2903f8937bc868
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panicking.rs:521:19
  46:     0x55e3bf5857d2 - std::panic::catch_unwind::h242c80217c2dbece
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/panic.rs:350:14
  47:     0x55e3bf5857d2 - std::thread::Builder::spawn_unchecked_::{{closure}}::h6cb4494ebdd8caf7
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/thread/mod.rs:537:30
  48:     0x55e3bf5857d2 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h42193b008049ba94
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250:5
  49:     0x55e3bf682b2b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::ha1963004222e7822
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/alloc/src/boxed.rs:2070:9
  50:     0x55e3bf682b2b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h1086ced1f7c494c2
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/alloc/src/boxed.rs:2070:9
  51:     0x55e3bf682b2b - std::sys::pal::unix::thread::Thread::new::thread_start::ha8af9c992ef0b208
                               at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/std/src/sys/pal/unix/thread.rs:108:17
  52:     0x7fed2c8ed609 - start_thread
                               at /build/glibc-LcI20x/glibc-2.31/nptl/pthread_create.c:477:8
  53:     0x7fed2cb76353 - clone
                               at /build/glibc-LcI20x/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  54:                0x0 - <unknown>
test gamma_from_quantile_check_3 ... FAILED
@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 24, 2024

Looks like an overflow of x * self.rate in this line:

gamma::gamma_lr(self.shape, x * self.rate)

(we know that self.rate and x are greater than zero, so the error most likely means the parameter is +INF)

This is not nice from an API perspective.. maybe add (x * self.rate).is_infinite() should be added to the list of checks in cdf and sf to avoid this?

@rm-minus-r-star
Copy link
Author

A different error is thrown if either of the parameters are invalid, so they are definitely both positive. But (presumably owing to my gradients being wrong) both parameters are infinitesimally small. So while the gamma function does instantiate, the inverse_cdf can't handle it. I'm sure it's only in my trial-and-error with something else that I've run into it. As it's possibly unlikely to be encountered in any real applications, it's probably not a high priority bug.

@YeungOnion
Copy link
Contributor

What would the behavior be if cdf or sf fail? At this point, most are panicking via bounds checking on the input (sample value, x), assuming valid state for the realization of the variable / instantiation of Gamma, but this would be a separate kind of failure.

As alternative to a failure, we can look at a way to expose gamma function evaluating these cases,

  • gamma_lr_underflowed_x(a, beta, x) or gamma_lr_underflowed_x(a, x_scale, x) for more general variable naming
  • gamma_lr_overflowed_x(a, x_scale, x)

The underflow implementation could naively use the lowest order that has all the terms separately without underflowing multiply
$$\gamma\left(\alpha, \beta x\right) \approx \exp\left[\alpha\left(\log \beta + \log x\right) - \log \alpha\right]$$
would need to do a fair bit of testing to see how good/bad this is and what ranges it's well suited to.

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 25, 2024

But (presumably owing to my gradients being wrong) both parameters are infinitesimally small.

Ah, I missed that possibility. That same term can also underflow, which should mean that the result (=param passed to gamma_lr) is 0.0, not infinity.

@FreezyLemon
Copy link
Contributor

This opens up a bit of a question on how to handle cases like this. Is it unreasonable to expect that the API will handle special cases like this?

Catching over/underflows like this may not be trivial. E.g. it's evidently not enough to check

x > 0, self.rate > 0

because x * self.rate can overflow (to +INF) and underflow (to 0). Even

x.is_normal() && self.rate.is_normal()

is not enough because 1e-200 is normal but 1e-200 * 1e-200 will underflow (there are also examples of normal numbers which will overflow). The real solution here (code-wise) would be checking the result of self.rate * x, but that opens up another question of what the affected functions (cdf, sf, etc) should do in a case like this. I'd argue it could be better to at least catch these cases explicitly instead of what happened here. But it's not simple to say

@YeungOnion
Copy link
Contributor

Is it unreasonable to expect that the API will handle special cases like this?

In a general sense,

I think a human directly evaluating the density and distribution functions should be a little bit more wary or willing to take fault out of misunderstanding, but evaluation at fairly unpredictable points but a numerical iteration scheme for fitting or root finding would run across these cases. statrs doesn't have features that search across distribution parameters, but this is something we'd address if we get to that. However, the inverse_cdf for Gamma does search the sample space.

I think since we have checking on constructing the distribution, then an Ok variant would indicate that the distribution parameters should be considered good to go for "reasonable inputs" to trait methods and other implemented functions. Docs could add detail at the trait method level.

I'd expect evaluating a function near a finite boundary of its support would be considered "reasonable input" - my expectation is that there are usually ways one can implement a "small input" regime with approximations to the function that are more numerically stable so I'd expect them to work for normal floats near a function's edge of support at zero.

Specific to Gamma,

Maybe the question is where to reject constructions of "unreasonably" small beta? Then we can do some numerical tricks to ensure that reasonable accuracy for normal float input to trait methods. After all, in terms of the cumulative distribution function, the scale parameter is exactly a scale, and it's easy to argue that if a scale parameter is to be that small, maybe the user should consider moving some of that scale into the "units" of the random variable so that an iterative search scheme has some freedom to hop around inputs. They'd need to know that a search is being used internally.


Would probably require similar changes to checks in other distribution constructors.

@YeungOnion
Copy link
Contributor

@rm-minus-r-star was this underflow issue what helped you identify the optimizer was incorrectly implemented?

@rm-minus-r-star
Copy link
Author

rm-minus-r-star commented Sep 28, 2024

@rm-minus-r-star was this underflow issue what helped you identify the optimizer was incorrectly implemented?

No, I was writing debugging messages to stdout in the closures that I fed into the optimiser, and could see it making much bigger leaps in parameter values than I thought it should ... into the negative range.

I then implemented taking the .exp() of the parameter inputs to keep them positive, and set the starting parameters to the correct answer. My debugging output on tests tracks them moving away from those correct initial parameters, all the way down into the infinitesimally small range. So actually it's the same behaviour from the optimiser for reasons I still can't explain -- the optimiser keeps pushing the parameters it knows more and more negative, which become infinitesimally small upon taking the .exp(). in my closure functions.

Whereas the negative parameter case was caught appropriately at the point of setting the function ("Bad distribution parameters"), the .exp()/underflow case of panics subsequently when the .inverse_cdf() function assumes success with .unwrap().

@YeungOnion
Copy link
Contributor

At the moment, I'm leaning toward rejecting values of beta too small at ctor. As we should require that $\beta x$ be numerically stable from the multiplication.

If this is done well - for phenomena that are not multi-scale - then x should be $\mathcal{O}(1)$. If x is around this scale, and one is doing a fit, but the data is driving beta to incredibly small or large values, it seems to reason that the Gamma distribution is not very suitable. Or in this case, that the optimizer is acting up.

I'm trying to think of a counterexample where I have appropriate units and extreme values of beta are still useful. Obtaining one and discussing where this change affects it would be useful.

@rm-minus-r-star
Copy link
Author

rm-minus-r-star commented Oct 18, 2024

I've run into this again, and this time I'm not quite sure why.

My function that draws on the statrs crate is

`fn two_tailed_p_value(beta: f64, se: f64, dof: usize) -> Result<f64, Box<dyn Error>> {
    let t_value = beta.abs() / se;
    let t_dist = StudentsT::new(0_f64, 1_f64, dof as f64).expect("Couldn't get t-distribution for degrees of freedom");
    let p_value = 2_f64 * (1_f64 - t_dist.cdf(t_value));

    Ok(p_value)
}`

thread 'tokio-runtime-worker' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/statrs-0.17.1/src/function/beta.rs:101:31:
called Result::unwrap() on an Err value: ArgIntervalIncl("x", 0.0, 1.0)
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: <statrs::distribution::students_t::StudentsT as statrs::distribution::ContinuousCDF<f64,f64>>::cdf
4: linear_regrs::wls::WLS::two_tailed_p_value
5: linear_regrs::wls::WLS::regression
6: value_regression::SuburbValueNowcast::run_wls
7: tokio::runtime::task::core::Core<T,S>::poll
8: tokio::runtime::task::harness::Harness<T,S>::poll
9: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
10: tokio::runtime::scheduler::multi_thread::worker::Context::run
11: tokio::runtime::context::runtime::enter_runtime
12: tokio::runtime::scheduler::multi_thread::worker::run
13: tokio::runtime::task::core::Core<T,S>::poll
14: tokio::runtime::task::harness::Harness<T,S>::poll
15: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
Error: JoinError::Panic(Id(1975), "called Result::unwrap() on an Err value: ArgIntervalIncl("x", 0.0, 1.0)", ...)

I guess there are reasons why the beta distribution is being called, but I can't find any pathological t_values at the level of my function that could be causing problems in the t_dist.cdf() function. Also, it isn't consistent. This function is being called as part of an async/parallel processing of a vector, and sometimes I get stacktraces from two threads, sometimes one, and sometimes it completes without error.

@YeungOnion
Copy link
Contributor

The repeatability concern immediately raises a flag in my head about memory. What platform are you running this on and are you using musl or glibc for floating point?

Suggestions:

  • lower effort: try single threaded to see if it's less brittle
  • higher effort: see what args are passed that fail (perhaps many retries via rust-gdb if rust installed via rustup) and what inconsistency appears.

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Nov 11, 2024

I did a quick dive into this. StudentsT::cdf calls beta_reg(self.freedom / 2.0, 0.5, h), and h is the only parameter that matters here regarding this error. h must be in [0, 1] or this panic happens.

Now, what is h? It's dof / (dof + k^2), with k = (x - loc) / scale (x being the parameter passed to the cdf function).

In your code snippet, you set loc = 0 and scale = 1. So the term should reduce to k = x and therefore h = dof / (dof + x^2).

The "variable-inserted" code (see StudentsT::cdf) would be

let k = (x - 0.0) / 1.0;
let h = self.freedom / (self.freedom + k * k);

I think that even with FP math and its weirdness, this should always be in [0, 1] unless a NaN is present somewhere..

StudentsT::new(0.0, 1.0, 3.0).unwrap().cdf(f64::NAN) will show the same error that you got. Is it possible you somehow passed a NaN to the function?

EDIT: This should not be as big of a surprise to users as it is. cdf(NAN) isn't really explicitly mentioned in the docs, basically falling back to the cdf may panic depending on the implementor warning in the ContinuousCDF trait. This would've been way easier to debug with a simple assert!(!x.is_nan(), "x must not be NaN"); or something similar. Degenerate cases like this should probably also be handled consistently and at least be called out in the crate docs.

@YeungOnion
Copy link
Contributor

YeungOnion commented Dec 4, 2024

NaN is a tricky topic and it doesn't have a well defined role within defensive programming. Scipy has a NaN policy that can be set globally or at the call site.

There are different stages at which the policy could be set, least granular is as a compilation option, the policy would be on the consumption of NaN, so we could still return NaN and that could allow us to identify bugs in statrs.

More granular is to offer more functions at the API, foo and foo_nan with naming convention aligning to some policies. For scipy,

  • propagate is the loosest policy but more formalized and consistent than ours,
  • omit would be useful in the context of data, but not parametric distributions
  • raise encapsulates two different options for Rust, panicking (callee could catch the unwind if they expect NaN to appear) or Result

Are you, @FreezyLemon, wanting to move toward enforcing a policy consistency or do you want to figure out a policy we should strive for first?

@FreezyLemon
Copy link
Contributor

I'm not sure... it might make sense to talk about NaN in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants