Skip to content

Latest commit

 

History

History
232 lines (161 loc) · 7.22 KB

predx_classes_v2.md

File metadata and controls

232 lines (161 loc) · 7.22 KB

Continuous

Numerical predictions with continuous real numbers. For all continuous predictions, minima and maxima are defined as lower and upper (defaults: lower = -Inf, upper = Inf).

Point

A numeric point prediction.

CSV column name: point

Validity:

  • Numeric
  • Not NA
  • lower <= point <= upper

Samples

Numeric samples for continous outcomes between (and including) lower and upper.

CSV column name: sample

Validity:

  • Numeric
  • No NAs
  • lower <= sample <= upper

Binned

Predictions specified as a set of probabilities corresponding to a discrete set of bins across a range of possible numeric outcomes defined by lower and upper. The specific bins may be specified by a bin interval (generates equally sized bins) or a vector of specific bins defined by the lower bounds of each bin. Either version assumes the lower bound is inclusive and upper bound not inclusive, except for the final bin ending at upper. For example, for observable values in x, the bins include the probability that observation x is greater than or equal to the bin-specific lower bound and less than the bin-specific upper bound: bin_lwr <= x < bin_upr, except for at the upper bound, where bin_lwr <= x <= upper.

Interval-defined binned predictions

Bins are defined by bounds and intervals. For example, Continous(prob = probs, type='Bin', lower = 0, upper = 100, interval = 1) requires 101 probabilities (probs) that cover the bins 0 <= probs[1] < 1, 1 <= probs[2] < 2, ... 98 <= probs[1] < 99, 99 <= probs[101] <= 100.

Interval-defined binned predictions are represented internally as a list of:

  • lower: the lower bound of the range of possible predictions
  • upper: the upper bound of the range of possible predictions
  • interval: the span of each bin
  • prob: the ordered probabilities assigned to each bin

Validity:

  • All inputs are numeric
  • No NAs
  • lower != -Inf and upper != Inf
  • A probability is specified for each bin defined by lower, upper, and interval
  • The sum of prob is 1.0

Bin-defined binned predictions

Bins are defined explicitly by their lower bounds. For example, Continous(prob = probs, type='Bin', lwr = lwr_bounds) defines the bins by their lower bounds (lwr) and accepts an equal number of probabilities (probs), which are associated in order with those bins.

Bin-defined binned predictions are represented internally as a data.frame with two columns:

  • lwr: inclusive numeric lower bounds for sequential bins (equal intervals)
  • prob: probabilities assigned to each bin

Validity:

  • All inputs are numeric
  • No NAs
  • lower != -Inf
  • lwr[1] == lower
  • max(lwr) < upper
  • A probability is specified for each bin (length(prob) == length(lwr))
  • The sum of prob is 1.0

Parametric

Predictions characterized by parametric distributions defined according to base R. Distribution truncation has not been configure, so upper and lower should not be specified and default to those for the respective distribution.

Parametric predictions are represented internally as a data.frame with 2 columns:

  • parameter_name with the parameter name (from the set describe below)
  • parameter_value the corresponding numeric parameter

The following distributions and parameters are currently supported:

  • Normal: mean, sd (Support: lower = -Inf, upper = Inf)
  • Log-normal: meanlog, sdlog (Support: lower = 0, upper = Inf)
  • Gamma: shape, rate (or shape, scale) (Support: lower = 0, upper = Inf)
  • Beta: shape1, shape2 (Support: lower = 0, upper = 1)

Validity:

  • The supplied parameter names (parameter_name) must exactly match those of the specified parametric distribution
  • The parameter values (parameter_value) must be numeric and not include NA
  • The parameter values (parameter_value) must be appropriate for the specified parametric distribution (e.g. shape > 0)
  • lower and upper must be equivalent to those of the specified parametric distribution


Discrete

Quantitative discrete numeric forecasts.

Point

A numeric point prediction.

CSV column name: point

Validity:

  • Numeric
  • Not NA

Categorical

Point

A prediction of the most likely categorical outcome.

CSV column name: point

Validity:

  • String
  • Not NA

Binary

Point

A prediction of the most likely categorical outcome.

CSV column name: point

Validity:

  • Numeric
  • Not NA
  • 0 <= value <= 1

Dates & Times

All dates are formated in ISO standard format: YYYY-MM-DD. Forecasts may be specific to a time period, such as a week, month, or year. Those should be consistently defined in the context of the forecast as they are not defined explicitly in the predx object.

Times are formatted in ISO standard 24 hour format: YYYY-MM-DDTHH:MM:SS+HH:MM, where the final HH:MM is the adjustment for the time zone compared to Coordinated Universal Time (UTC). If the final :MM in the time zone is :00, it may be dropped. Examples:

  • 2020-12-18T13:20:37+00:00 is 13:20:37 (1:20 PM with 37 seconds) on 12 December 2020 in UTC (Greenwich Mean Time)
  • 2025-01-03T02:00:00-05:00 or 2025-01-03T02:00:00-05 is 2:00 (2:00 AM) on 3 January 2025 in UTC-05 (Eastern Standard Time).

Point

A prediction of the most likely date.

CSV column name: point

Validity:

  • ISO date or time (described above)
  • Not NA

Time

All dates are formated in

Point

A prediction of the most likely categorical outcome.

CSV column name: point

Validity:

  • Numeric
  • Not NA
  • 0 <= value <= 1

Point predictions

PointCat

A character string point prediction, e.g. associated with SampleCat or BinCat.

CSV column name: point

Validity:

  • Not NA

Continuous distributions

Normal: mean, sd

Log-normal: mean, sd

Gamma: shape, rate

Beta: a, b

Discrete distributions

Binary: prob

A numeric probability.

CSV column name: prob

Validity:

  • Not NA
  • 0 <= prob <= 1

Binomial: p, n

Poisson: mean

Negative-Binomial: r, p

Negative-Binomial2: mean, dispersion

Empirical distributions

BinLwr

Binned distribution defined by inclusive lower bounds for each bin.

A data.frame object with two columns:

  • lwr: inclusive numeric lower bounds for sequential bins (equal intervals)
  • prob: probabilities assigned to each bin

CSV column names: lwr, prob

Validity:

  • No NAs in lwr or prob
  • Probabilities are positive
  • Probabilities sum to 1.0
  • Bins are in ascending order
  • Bin sizes are uniform

BinCat

Binned distribution with a category for each bin.

A data.frame object with two columns:

  • cat: character strings representing each possible outcome category
  • prob: probabilities assigned to each bin

CSV column names: cat, prob

Validity:

  • No NAs in lwr or prob
  • Probabilities are positive
  • Probabilities sum to 1.0

Sample

Numeric samples.

CSV column name: sample

Validity:

  • No NAs

SampleCat

Character string samples.

CSV column name: sample

Validity:

  • No NAs