Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling calculations in between two related Measurements by adding a new criteria attribute #210

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

alex-odysseus
Copy link

MeasurementOperand attribute is possible to configure for a Measurement so that calculation formula will be possible to construct in between two related Measurements

If multiple related Measurement events are at place it is possible to specify that the earliest/latest event should be used, another configuration option is that Measurements should belong to the same Visit only

The formula result value is treated as a number then and compared to some value(s) of interest

@@ -38,6 +45,34 @@ protected String getQueryTemplate() {
return MEASUREMENT_TEMPLATE;
}

/*
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be deleted

@@ -151,7 +151,9 @@ c.visitType??><#local temp>a visit occurrence that is: <@inputTypes.ConceptList
measurement<#if isPlural && !(c.first!false)>s</#if> of ${utils.codesetName(c.codesetId!"", "any measurement")}<#if
c.measurementSourceConcept??> (including ${utils.codesetName(c.measurementSourceConcept, "any measurement")} source concepts)</#if><#if
c.first!false> for the first time in the person's history</#if><#if attrs?size gt 0>, ${attrs?join("; ")}</#if><#if
c.CorrelatedCriteria??>; <@Group group=c.CorrelatedCriteria level=level indexLabel=utils.codesetName(c.codesetId!"", "any measurement") /><#else>.</#if></#macro>
c.CorrelatedCriteria??>; <@Group group=c.CorrelatedCriteria level=level indexLabel=utils.codesetName(c.codesetId!"", "any measurement") /><#else>.</#if>
<#if c.measurementOperand??>numeric value calculated as <@MeasurementOperand c=c.measurementOperand level=level+1 /></#if><#nested />
Copy link
Author

@alex-odysseus alex-odysseus Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to enhance it so that the sentence is better readable

}

protected abstract String getTableColumnForCriteriaColumn(CriteriaColumn column);

protected String getAdditionalColumns(List<CriteriaColumn> columns) {
protected String getAdditionalColumns(T criteria, List<CriteriaColumn> columns) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the criteria parameter added to this function when it's not even used in it?

@chrisknoll
Copy link
Collaborator

I wanted to offer some test case implementations but I noticed this is on a fork that I might not have access to push branches, so I'll see about requesting access (or if you could grant) and then I can push up PRs for your branch on your fork (I'd rather not fork-a-fork in order to collaborate).

if (filteredColumns.size() > 0) {
query = StringUtils.replace(query, "@additionalColumns", ", " + this.getAdditionalColumns(filteredColumns));
return StringUtils.replace(query, "@additionalColumns", ", " + this.getAdditionalColumns(criteria, filteredColumns));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe changing the 3 query assignments into 3 individual returns doesn't improve anything with the code, and makes it harder to debug when you just want to know what the query result that will be returned.

Comment on lines +276 to +278
whereClauses.add(buildNumericRangeClause(
MessageFormat.format("(C.value_as_number {0} SC.value_as_number)", criteria.measurementOperand.operator),
criteria.measurementOperand.valueAsNumber, ".4f")
Copy link
Collaborator

@chrisknoll chrisknoll Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried that we embed the operator (ie the sql operator) as the string in the payload.

I feel like the operators should be a distinct list of supported operators, and based on the user selection we translate it into the sql operation.

And, it's not quite clear how this is used. The template is:

(C.value_as_number {0} SC.value_as_number)

So can the operator be > or ==?

Or is the operator supposed to be an arithmetic operation (+, - , / *)

Unfortunately, I can't find documentation on this feature about how it's supposed to be used and what functions it supports.

@JsonProperty("ValueAsNumber")
public NumericRange valueAsNumber;
@JsonProperty("SameVisit")
public Boolean sameVisit = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this needs to be true by default. Most of the time, we default boolean values to false.

Comment on lines +177 to +185
if (criteria.measurementOperand != null && criteria.measurementOperand.measurement != null) {
BuilderOptions options = new BuilderOptions();
options.additionalColumns = Collections.singletonList(CriteriaColumn.VALUE_AS_NUMBER);
String subquery = getCriteriaSql((T) criteria.measurementOperand.measurement, options);
joinClauses.add(MessageFormat.format("JOIN (SELECT E.person_id, E.event_id, E.visit_occurrence_id, E.value_as_number, row_number() over (PARTITION BY E.person_id ORDER BY E.start_date, E.event_id {1}) as ordinal FROM (\n{0}\n) E) SC on C.person_id = SC.person_id",
subquery,
Optional.ofNullable(criteria.measurementOperand.limit).map(limit -> limit.equals("Last") ? "DESC" : "").orElse("")
));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be time-bounding to looking for these measurements (either prior to or post the 'index event'). I believe you would want to specify this somehow, and maybe the extent of the prior/post time to select records from. I understand 'same visit' addresses some of this, but I don't think you can depend on Measurement domains to come from the same visit, especially if the idea is to compare 2 measurements that could be spread across different visits.

@chrisknoll
Copy link
Collaborator

Hi @wivern and @alex-odysseus ,

Thanks very much for the efforts you put into this feature. As it stands, I think there's necessary additions that are going to be necessary in order to accept this PR:

Time bounding measurements

As I was writing tests, I imagined an index measurement and then an associated measurements being located before and after this index within a time window. So I put my target measurement in the data, and then scattered other measurements before and after at 1mo, 3mo, 6mo time. When i wrote my usecase to use the earliest event that came from 3/6/9 months prior to the measurement, I noticed there wasn't an attribute to define this. This meant that a associated measurements could appear anywhere in the patient history and I'm pretty sure this would never be useful. I think the 'same visit' function would narrow the context to find the correlated measurement but frequently there is no visit-to-measurement relationship in CDMs, so depending on that behavior is not a good strategy. So limiting finding measurements to a time horizon is necessary.

Operators are free-form text

I was wondering what sort of operations are allowed in the MeasurementOperands operator field, and it looks like it's going to inject any string the user inputs into the SQL. Sql injection attacks aside, we would want to define what the valid operators are (maybe through an enum) and map user choices to the available operator, and then build the correct sql statement based on that choice (as well as the correct print-friendly output)

Measurements can refer to itself

Consider a patient with exactly 1 measurement, and you create the measurement criteria to look for latest other measurement and subtract the values. Should it use itself as the 'other' measurement in these operations? I'm thinking 'no'.

Joins lead to multiple rows to count when used in the CriteriaGroups

This might be hard to describe, but given the index measurement event, you want it to appear once for counting when you are dealing with criteria groups (which say 'having at least one or exactly 0'). So if you do this measurement calculation, you could return multiple rows (if you don't specify first or last). I should confirm if you must specify the first or last in the criteria, but assuming you can let multiple measurement records into the query, if you have N number of related measurements that meet the NumericRange criteria, then the index event will yield multiple times from the JOIN statement. I'm wondering if it would be more appropriate to use exists (select 1 from Measurement....) kind of query. Then if you find > 1 measurement in the subquery, the exists will only return the single row (the measurement index) from the root query.

If it is the case that the first/last is requred such that it will only evaluate 1 record from the subquery and therefore never return more than 1 record, then I think the change here would be to add aggregated functions to the co-measurement query to find the AVG, MIN, MAX of the records so you can say things like 'what was the average measurements values compared to this index measurement's value' or 'what was the min value in history compared to the index measurement's value'.

Naming

I'm not sure if MeasurementOperand is a good name for this...I need to think of a name but the terminology here is too technical (I'm not sure your typical phenotype researcher is going to know what an 'Operand' is) I was thinking something like CoMeasurement (don't cite me on this one yet) but I was looking at the object model and wasn't sure about this choice of naming.

Conclusion

This is as far as I got on the review, so there still may be more to come (ie: the above is not a complete review). It would have really helped if the intention of how this functionality was supposted to work and what the tradeoffs / limitations were in implementation so that I would have a better understanding about what to look for and what to accept as limitations (such as it was perfectly fine to not limit associated measurements to time windows because the default behavior is to use same-visit).

Thank you again for your hard work on this. Being able to make assertions about associated or correlated measurements is a powerful feature (such as looking for people who experienced weight gain (at least 30 lbs) could be defined as taking a weight measurement and determing if the min() weight in the past year - index weight > 30). We just need some minimum functionality before I would want to incorporate it into the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants