🆕 Schema v4 release on 2024-11-25 #32
zkamvar
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We will be releasing schema version v4.0.0 on Monday, 25 November 2024. This contains changes that are incompatible with earlier schema versions. We have taken care to ensure backward compatibility across the hubverse packages so you should not notice any change to your existing workflows.
In this announcement, we discuss the important changes in v4 and provide a checklist to update to v4.
Hub- and Round-level
derived_task_ids
In v4, you can add a
derived_task_ids
property at the hub or round level to define task IDs that are derived from other task IDs. This allows more efficient validations.If you are not familiar with these, derived task IDs are non-independent task IDs that are derived from other task IDs. A common example of a derived task ID is
target_end_date
which is most often derived from theorigin_date
andhorizon
task ids.All
output_type
elements gain theis_required
elementIn v4, all
output_type
s gain theis_required
element to indicate whether or not that particular output type is required (is_required: true
) or if it is optional (is_required: false
). A special note for thesample
output type: theis_required
is moved from being a property ofoutput_type_id_params
to being a property ofsample
.In addition, this means that we also disallow the
optional
property foroutput_type_id
objects (this is still allowed and necessary intask_id
objects).Take for example, this optional set of quantiles for a v3 hub:
In English, this object is stating that a hub can optionally accept a quantile output type with any, all, or
none of the
output_type_id
s in the set[0, 0.5, 1]
.This is the identical object in a v4 hub:
The interpretation differs slightly, however. In English, this is stating that a hub can optionally accept a quantile output type. If a quantile output type is submitted, it must have all
output_type_id
s in the set[0, 0.5, 1]
.Discussion: no more mixing optional and required output type IDs
The impetus for this change was the fact that it was possible to include both
optional
andrequired
output type IDs in a model submission. While this made things flexible on the side of the modelers, downstream analyses became more difficult because of the heterogeneity of the outputs.Part of the reason is that
output_type_id
s are ordered. It becomes impossible to know how to combine theseoutput_type_id
s if they are split between "required" and "optional". Take for instance a CDF output that requires forecasts for every other epiweek, but optionally modelers could submit every week:Situations like this mean that it becomes more difficult to validate a model submission because we cannot programmatically confirm that the output type IDs are in the correct order.
By setting the
output_type
to be either required or not allows for straightforward validation of these elements.point estimate
output_type_id
s: Usenull
instead ofNA
Since the beginning of the hubverse, point estimate (e.g. mean and mean)
output_type_id
s are not applicable, that is, they are encoded as missing values.In v3, if you wanted to specify
output_type_id
for a required meanoutput_type
, you would write["NA"]
to indicate a presence of an absence:This lead many modelers to incorrectly assume that the
output_type_id
column of their submissions should be the character "NA". While we updated our documentation to reflect this, adding theis_required
property tooutput_type
s allows us to make the expectation clear. Now to specify a required meanoutput_type
, you would write:Documentation
The full documentation for v4 on https://hubverse.io/ is still in progress and will be updated shortly (we will release a news item for that later).
Updating to v4
tasks.json
&admin.json
.output_type_id
values to therequired
property and delete anyoptional
properties.output_type
add anis_required
property and use it to indicate whether an output type is required or not. If you've already been collecting samples make sure to move this property from theoutput_type_id_params
object.null
instead of["NA"]
in theiroutput_type_ids.required
propertyderived_task_ids
property at the to level of the config.hubAdmin::validate_hub_config()
to ensure your hub config is validFull list of updates
is_required
boolean property at theoutput_type
level to configure whether the output type is required for submissions to be considered valid (#99).optional
property inoutput_type_id
objects. As such, when a given output type is submitted, values for all output type IDs much be submitted (#100,#101, #102).output_type_id
required
properties now encoded withnull
instead of["NA"]
(#109).derived_task_ids
properties to enable hub administrators to define derived task IDs (i.e. task IDs whose values depend on the values of other task IDs). The higher levelderived_task_ids
property sets the property globally at the hub level but can be overriden by the round levelderived_task_ids
property. The property allows for primarily validation functionality to ignore such task IDs when appropriate which can significantly improve validation efficency (#96). For more information seehubValidations
documentation on ignoring derived task IDs.target_keys
to ensure onlystring
properties are allowed (#97)cdf
numericoutput_type_id
s (#113).output_type
objects etc). Custom additional properties are only allowed at theround
level, while additional task ID objects that match the expected task ID schema are allowed in thetask_ids
object (#114).Beta Was this translation helpful? Give feedback.
All reactions