Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling().sum() is numerically unstable #7128

Closed
3 tasks done
ilemhadri opened this issue Oct 4, 2022 · 4 comments
Closed
3 tasks done

rolling().sum() is numerically unstable #7128

ilemhadri opened this issue Oct 4, 2022 · 4 comments
Labels
plan to close May be closeable, needs more eyeballs topic-faq

Comments

@ilemhadri
Copy link

ilemhadri commented Oct 4, 2022

What happened?

On an input array like

array([0.        , 0.        , 0.        , 0.57392103, 0.57392103,
       0.57392103, 0.57392103, 0.57392103, 0.57392103, 0.57392103,
       0.57392103, 0.57392103, 0.        , 0.57392103, 0.57392103,
       0.57392103, 0.57392103, 0.57392103, 0.57392103, 2.29551022,
       2.29551022, 2.29551022, 2.29551022, 2.29551022, 2.29551022,
       2.29551022, 2.29551022, 2.29551022, 2.29551022, 2.29551022,
       2.29551022, 2.29551022, 2.29551022, 0.57383408, 0.57383408,
       0.57383408, 0.57383408, 0.57383408, 0.57383408, 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ])

that has all positive values, and then zeros, computing the rolling sum (or mean) leads to negative values

mydata.rolling(t=3, min_periods=1).sum().values[0,0,:]
array([-3.33066907e-14, -3.33066907e-14, -3.33066907e-14,  5.73921029e-01,                                                                                                                                                                                                                     
        2.14784206e+00,  1.72176309e+00,  1.72176309e+00,  1.72176309e+00,                                                                                                                                                                                                                     
        1.72176309e+00,  1.72176309e+00,  1.72176309e+00,  1.72176309e+00,                                                                                                                                                                                                                     
        1.14784206e+00,  1.14784206e+00,  1.14784206e+00,  1.72176309e+00,                                                                                                                                                                                                                     
        1.72176309e+00,  1.72176309e+00,  1.72176309e+00,  3.44335228e+00,
        5.16494146e+00,  6.88653065e+00,  6.88653065e+00,  6.88653065e+00,
        6.88653065e+00,  6.88653065e+00,  6.88653065e+00,  6.88653065e+00,
        6.88653065e+00,  6.88653065e+00,  6.88653065e+00,  6.88653065e+00,
        6.88653065e+00,  5.16485452e+00,  3.44317838e+00,  1.72150224e+00,
        1.72150224e+00,  1.72150224e+00,  1.72150224e+00,  1.14766816e+00,
        5.73834081e-01, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14,
       -3.35287353e-14, -3.35287353e-14, -3.35287353e-14, -3.35287353e-14])

Both arrays have dtype = float64.

The issue aggravates as the rolling window increases.

What did you expect to happen?

the rolling calculation could be more numerically precise by keeping track for instance of the Kahan compensation term.
https://en.wikipedia.org/wiki/Kahan_summation_algorithm

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Environment

this is reproducible across xarray versions, but mine is 2022.09.0.

@ilemhadri ilemhadri added bug needs triage Issue that has not been reviewed by xarray team member labels Oct 4, 2022
@ilemhadri ilemhadri changed the title rolling().mean() is numerically unstable rolling().sum() is numerically unstable Oct 4, 2022
@mathause
Copy link
Collaborator

mathause commented Oct 5, 2022

Do you have bottleneck installed? Could you try xr.set_options(use_bottleneck=False)?

See pydata/bottleneck#379

@ilemhadri
Copy link
Author

ilemhadri commented Oct 5, 2022

Do you have bottleneck installed? Could you try xr.set_options(use_bottleneck=False)?

use_bottleneck=False seems to solve it on the latest version of xarray, in the sense that the negative values are now exactly 0. However, I don't think it's compatible with previous versions of xarray like 0.11.3:

ValueError: argument name 'use_bottleneck' is not in the set of valid options set(['keep_attrs', '
enable_cftimeindex', 'cmap_sequential', 'arithmetic_join', 'warn_for_unclosed_files', 'file_cache_maxsize', 'cmap_
divergent', 'display_width'])                             

In our setting, we would love to preserve backwards-compatibility and be able to do this in v0.11.3 as well!

@mathause
Copy link
Collaborator

mathause commented Oct 5, 2022

You'll have to uninstall bottleneck then - the use_bottleneck option was added more recently.

@max-sixty max-sixty added plan to close May be closeable, needs more eyeballs and removed bug needs triage Issue that has not been reviewed by xarray team member labels Oct 5, 2022
@ilemhadri
Copy link
Author

ilemhadri commented Oct 6, 2022

You'll have to uninstall bottleneck then - the use_bottleneck option was added more recently.

@mathause Does xarray use bottleneck in other places than rolling operations?

That'd help me assess the scope of this change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to close May be closeable, needs more eyeballs topic-faq
Projects
None yet
Development

No branches or pull requests

4 participants