Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for disambiguation of terms "ID" and "value" #221

Open
zkamvar opened this issue Dec 6, 2024 · 4 comments
Open

Need for disambiguation of terms "ID" and "value" #221

zkamvar opened this issue Dec 6, 2024 · 4 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested

Comments

@zkamvar
Copy link
Member

zkamvar commented Dec 6, 2024

I don't know exactly where this should go, so I figured I would open it here in the documentation. I'm opening this in hopes that there can be a discussion about the best way to handle these.

There are a couple of things that always make me pause and make sure I'm understanding them correctly and this may be addressed by either a clear summary of conventions or a reevaluation. At the moment, I can think of two terms that are used in different ways in our documentation (and in our schemas): "ID" and "value." This may seem like a minor quibble, but it does affect how easy it is to understand the documentation and I am interested in hearing how others perceive these.

ID

The term ID is used in task_id and output_type_id. A task_id describes a column of data (factors used for modeling) while an output_type_id describes elements/values/items within the output_type_id column.

I'm honestly not sure the best way to disambiguate these, but I have to keep reminding myself of this every time I look at them.

value

The term "value" is to describe elements/items within a column, within a JSON list, and to describe a column of model output.

There is an unstated convention of using value (formatted as inline code) to refer to the column and value (formatted as plain text) to refer to the elements/items/contents within a column.

This can get especially confusing when parsing sentences like this:

Values are required to sum to 1 across all output_type_id values within each combination of values of task ID variables.

I think this one could be solved by using an analogous term for values in a column or clarifying when we mean "value column."

@elray1
Copy link
Contributor

elray1 commented Dec 6, 2024

It's been brought up before that the use of "id" in output_type_id is confusing, and we actually thought about changing the name of this column a couple of times. But we couldn't think of another name that was better. If you were going to call that column something else, the common name for it would depend on the output type:

  • output type = mean or median: concept doesn't apply
  • output type = quantile: "quantile level" or "probability" or "cumulative probability"
  • output type = cdf: "target variable value"
  • output type = pmf: "target variable value or bin or category"
  • output type = sample: "sample index"

We basically couldn't think of a meaningful name for the column that felt like it improved upon output_type_id and was appropriately decriptive across the different output types.

@elray1
Copy link
Contributor

elray1 commented Dec 6, 2024

I might also tend to try to use the phrase "task id variable" to help remember that it refers to a variable/column?

@zkamvar zkamvar added good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested labels Dec 6, 2024
@zkamvar
Copy link
Member Author

zkamvar commented Dec 6, 2024

I might also tend to try to use the phrase "task id variable" to help remember that it refers to a variable/column?

I think this might be best/most immediate solution.

So in this case a sentence like:

If any required task IDs have an associated derived task ID, it is essential for derived_task_ids to be specified.

Would become

If any required task ID variables have an associated derived task ID variable, it is essential for derived_task_ids to be specified.

@zkamvar
Copy link
Member Author

zkamvar commented Jan 2, 2025

This has been partially addressed in #222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed question Further information is requested
Projects
Status: Todo
Development

No branches or pull requests

2 participants