Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for datetime calculations in cohort definitions #200

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

jcnamendiOdysseus
Copy link

No description provided.

@jcnamendiOdysseus jcnamendiOdysseus self-assigned this Dec 21, 2023
@chrisknoll chrisknoll marked this pull request as draft December 21, 2023 15:09
Comment on lines 32 to 38
private static final Map<String, Integer> TIME_UNIT_CONVERSION = new HashMap<>();
static {
TIME_UNIT_CONVERSION.put(IntervalUnit.HOUR.getName(), 60 * 60);
TIME_UNIT_CONVERSION.put(IntervalUnit.MINUTE.getName(), 60);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't do it this way, it makes it seem like it's dynamic (ie: can change at runtime). Can't this function of conversion be associated to the enum itself?

@alex-odysseus
Copy link

Chris, please don't go deep into this pull request as it is a draft and incomplete @chrisknoll

@chrisknoll
Copy link
Collaborator

Will be easier to point out issues earlier instead of sifting through potentially thousands of lines of changes.

@jcnamendiOdysseus jcnamendiOdysseus force-pushed the issues-2886 branch 4 times, most recently from 5098a3a to dfa05f5 Compare January 4, 2024 14:20
.editorconfig Outdated Show resolved Hide resolved
Comment on lines 110 to 118
int range1 , range2Start , range2End;
if(Objects.nonNull(window.start) && Objects.nonNull(window.start.days)){
range1 = filter.postDays + filter.priorDays;
range2Start = window.start.coeff * window.start.days;
range2End = Objects.nonNull(window.end) && Objects.nonNull(window.end.days) ? window.end.coeff * window.end.days : 0;
}else {
range1 = (filter.postDays + filter.priorDays) * 24 * 60 * 60;
range2Start = getTimeInSeconds(window.start);
range2End = getTimeInSeconds(window.end);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the 4sp indent is glaring, so before you have to go back and change all of it back to 2sp, i'd adjust your formatting.

Comment on lines 3 to 8
### ${conceptSet.name}
<#list conceptSet.expression.items>
|Concept ID|Concept Name|Code|Vocabulary|Excluded|Descendants|Mapped
|:---|:---------------------------------------|:--|:-----|:--:|:--:|:--:|
<#items as conceptSetItem>
|${conceptSetItem.concept.conceptId?c}|<#--
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to be extremly careful with changing the formatting of the Freemarker templates (ftl files) that are building the markdown format, since these are very sensitive to whitespacing. Unless you need to change this, I would not touch any of these.

# Conflicts:
#	src/main/java/org/ohdsi/circe/check/checkers/Comparisons.java
#	src/main/java/org/ohdsi/circe/check/checkers/ExitCriteriaDaysOffsetCheck.java
#	src/main/java/org/ohdsi/circe/check/checkers/FirstTimeInHistoryCheck.java
#	src/main/java/org/ohdsi/circe/check/checkers/RangeCheck.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CohortExpressionQueryBuilder.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CollapseSettings.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CustomEraStrategy.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/DateOffsetStrategy.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/IntervalUnit.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/Window.java
#	src/main/resources/resources/cohortdefinition/printfriendly/cohortExpression.ftl
#	src/main/resources/resources/cohortdefinition/printfriendly/criteriaTypes.ftl
#	src/main/resources/resources/cohortdefinition/printfriendly/inputTypes.ftl
#	src/main/resources/resources/cohortdefinition/sql/customEraStrategy.sql
#	src/main/resources/resources/cohortdefinition/sql/dateOffsetStrategy.sql
#	src/main/resources/resources/cohortdefinition/sql/generateCohort.sql
#	src/test/java/org/ohdsi/circe/check/checkers/CriteriaCheckValueTest.java
#	src/test/java/org/ohdsi/circe/check/checkers/DeathTimeWindowCheckTest.java
#	src/test/java/org/ohdsi/circe/cohortdefinition/builders/CriteriaUtils.java
#	src/test/java/org/ohdsi/circe/cohortdefinition/printfriendly/PrintFriendlyTest.java
@jcnamendiOdysseus jcnamendiOdysseus marked this pull request as ready for review January 12, 2024 07:38
jcnamendiOdysseus and others added 4 commits January 17, 2024 09:47
# Conflicts:
#	src/main/java/org/ohdsi/circe/check/checkers/Comparisons.java
#	src/main/java/org/ohdsi/circe/check/checkers/ExitCriteriaDaysOffsetCheck.java
#	src/main/java/org/ohdsi/circe/check/checkers/FirstTimeInHistoryCheck.java
#	src/main/java/org/ohdsi/circe/check/checkers/RangeCheck.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CohortExpressionQueryBuilder.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CollapseSettings.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/CustomEraStrategy.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/DateOffsetStrategy.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/IntervalUnit.java
#	src/main/java/org/ohdsi/circe/cohortdefinition/Window.java
#	src/main/resources/resources/cohortdefinition/printfriendly/cohortExpression.ftl
#	src/main/resources/resources/cohortdefinition/printfriendly/criteriaTypes.ftl
#	src/main/resources/resources/cohortdefinition/printfriendly/inputTypes.ftl
#	src/main/resources/resources/cohortdefinition/sql/customEraStrategy.sql
#	src/main/resources/resources/cohortdefinition/sql/dateOffsetStrategy.sql
#	src/main/resources/resources/cohortdefinition/sql/generateCohort.sql
#	src/test/java/org/ohdsi/circe/check/checkers/CriteriaCheckValueTest.java
#	src/test/java/org/ohdsi/circe/check/checkers/DeathTimeWindowCheckTest.java
#	src/test/java/org/ohdsi/circe/cohortdefinition/builders/CriteriaUtils.java
#	src/test/java/org/ohdsi/circe/cohortdefinition/printfriendly/PrintFriendlyTest.java
…le resolving the SELECT clause in SQL builders
return range1 - (range2End - range2Start);
public static int compareTo(ObservationFilter filter, Window window) {
int range1, range2Start, range2End;
if (Objects.nonNull(window.start) && (Objects.nonNull(window.start.days) || IntervalUnit.DAY.getName().equals(window.start.timeUnit))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces an NPE when window.start is not set and timeUnit is a DAY.

…ntation of the datediff() function.

- Modify the cdm_v5.0.sql file to incorporate additional datetime fields.
- Include timeIntervalUnit parameters to handle various scenarios such as hours/minutes/seconds.
- Write tests to validate the functionality under different time interval units, including hours/minutes/seconds.
@TomWhite-MedStar
Copy link

TomWhite-MedStar commented Feb 15, 2024

@chrisknoll , if this pull request is related to OHDSI/Atlas#2886, I'm looking for advice on getting past a hurdle. the GUI looks good, and when one explicitly specifies hour or minute for a widget, the generated code looks correct for that widget. However, if there is a single reference to datetime in the JSON, then all dates that can be converted to datetime must be.

There are two areas of challenge in the current efforts to implement this:

  1. Identifying that datetime is used somewhere within the cohort logic
  2. Ensuring that all references to date are converted to datetime (for the parts of the CDM that support datetime).

I think this can be handled by post-processing the SQL generated by circe-be:

The following python function addresses both steps:

def Fix_OHDSI_sql_if_uses_datetime(ohdsi_sql):
    import re

    # Test whether generate SQL uses datetime.  Another option is to query the cohort JSON definition for TimeUnit of hour|minute|second
    if (re.search(r'_datetime', ohdsi_sql)):
        # Convert all references to _date to _datetime, unless CDM does not support it (e.g. for observation_period and cohort)
        step1 = re.sub(r'(?i)(?<!op_)(?<!observation_period_)(start|end|event|sort)_date\b',r'\1_datetime',ohdsi_sql)
        # Cohort final table will have datetime, but cohort table only supports date.
        # The below substitution assumes that step1 has already occured, thus the reference to datetime in the first parameter to re.sub
        step2 = re.sub(r'as cohort_definition_id, person_id, start_datetime, end_datetime',r'as cohort_definition_id, person_id, cast(start_datetime as date), cast(end_datetime as date)', step1)
        return step2
    else:
        return ohdsi_sql

If we find that the generated SQL doesn't always have a datetime reference, another option is to use regular expressions to search the JSON definition for "TimeUnit" of "hour|minute|second".

Can the circe-be function that does the SQL template generation be amended (or intercepted) to call this additional function (or its Java equivalent) before the final String is returned to the requestor? From there, SqlRender should work fine on the syntax-specific translation.

@chrisknoll
Copy link
Collaborator

chrisknoll commented Feb 15, 2024

Hi, @TomWhite-MedStar ,

I haven't dug deeply into this, as there's missing information in the PR description about what the changes are (in moderate detail), but currently has nothing yet. Second issue is the entire notion of minute/second granularity in the cohort result, it is not clear what the impact on the external tools will be if there is a unit of time other than day.

Also, looks like this has some conflicts based on recent PRs that have been merged into master.

As far as using this for your own purposes, a branch or fork approach would work for you to use this in your own environment if there is that need.

As you might imagine that this cohort definition work is central to a lot of the OHDSI tool stack, we must be extremely careful what gets put into a formal release. On the forums, I've observed support that it's a good idea (hour/minute granularity), but beyond that there are technical considerations (such as: sometimes the result query will put _datetime as an output column other times will be simply _date) that haven't been vetted, but we're making the decision on how that should work here.

So for starters, I think the PR author should include details on what the intended changes are that are being introduced in this PR (so that we have something to cross-check between the 'intended change' vs. the actual change in the 'lines of code changed'). This is important because we don't want to be left to reverse-engineer the lines of code that were changed to infer the intent of the author...it should be declared, and then we can judge the code changes based on that declared intent.

On small changes, it's pretty straight forward to declare what the intended changes of a PR is. On something larger like this, we have new test cases, enums, date calculations (I think I saw in some cases that we intended to resolve the time diffs to seconds, but in certain places I see we're accounting for sometimes hours, sometimes minutes, sometimes seconds), so I just would like to know what I should be expecting when looking through the code changes before digesting all the lines of code changes.

Re: your approach to fix sql to date or date_time via string replace
I'm not a fan of this approach, I don't know if we can assume that date-time cols are always _datetime I am thinking that it may be a cohort expression param that you specify the granularity of time for the output such that if you want it by hour, then you query all the entry events at the hour level, vs some other granularity. This is definitely something that should be thought about at the design level, but we sorta jumped right into the implementation so there wasn't much time to talk about...but we have this implementation as a 'frame of reference' that we could raise technical issues, so that is also a fine perspective to reason from.

@TomWhite-MedStar
Copy link

@chrisknoll , thanks for the lengthy reply. I am presuming, but not sure, that this PR might be for work that Odysseus is doing for me at MedStar. If so, I'm trying to get past a hurdle in that work. They have working for months, and finding that the source code is very complex - so I'm wanting to brainstorm whether there might be post-processing approach that is easier to implement and maintain.

@chrisknoll
Copy link
Collaborator

Thanks for your understanding, @TomWhite-MedStar . Although the codebase may be complicated, at the core it's a string-builder plus logic validator (through the checkers). This means that, at the core, it's assembling the string through the various builder functions. Introducing a notion of building the final string through a process of find/replaces might seem straightforward from an implementation perspective, but out of place from a functional perspective.

So, lets get the description of what's in the PR in the description of this PR, resolve the conflicts, evaluate if the code is doing what it intended, and then we can make implementation recommendations based on findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Feature: Support for datetime calculations in cohort definitions
5 participants