Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Feedback - FHIR Search: Challenges with Deep Chaining: Tagged Patients pattern #84

Open
bwalsh opened this issue Dec 5, 2024 · 6 comments
Assignees

Comments

@bwalsh
Copy link

bwalsh commented Dec 5, 2024

The Challenge of Deep Chaining in FHIR Searches

FHIR’s RESTful API provides powerful mechanisms like chaining and reverse chaining for searching across interconnected resources. However, deep chaining—searching through multiple levels of relationships (e.g., ResearchStudyResearchSubjectPatientSpecimen)—often runs into practical and architectural limitations.

The Issue with Deep Chaining

  1. Performance Concerns:

    • Deeply chained searches require traversing multiple resource relationships, which can involve significant database joins or recursive queries. This impacts query performance, especially in large datasets with complex relationships.
  2. Limited Server Support:

    • Not all FHIR servers support chaining beyond one or two levels, leaving users unable to construct queries that navigate through deeply nested relationships.
  3. Ambiguous Results:

    • Complex chaining can result in ambiguous or overly broad results, requiring additional client-side filtering, which negates the efficiency of server-side processing.
  4. Usability Challenges:

    • Querying deeply nested relationships is not intuitive for users. Writing and debugging such queries can be cumbersome, especially without robust documentation or testing tools.

Example Problem

A researcher may want to count all Specimens tied to a ResearchStudy. This involves the following chain:

  • ResearchStudyResearchSubjectPatientSpecimen

A direct query like this is not supported by many FHIR servers due to depth limitations.

A Potential Workaround: Tagging Patients

To address this, we can leverage tags or extensions on Patient resources to simplify queries:

  1. Tagging Patients:

    • Tag or extend Patient resources with metadata indicating their association with a specific ResearchStudy.
    • Example: Add a ResearchStudy identifier as a tag or extension to all related Patient resources.
  2. Simplified Query:

    • Instead of chaining, directly query Patients by their tag and use reverse chaining to find associated Specimens:
      GET [base]/Specimen?_has:Patient:subject:_tag=ResearchStudy/[study-identifier]
  3. Benefits:

    • Simplifies queries by reducing chaining depth.
    • Improves performance since fewer resource relationships need to be traversed.
    • Provides flexibility for analytics use cases without requiring extensive server-side changes.

Considerations for Tagging

  • Governance: Establish clear guidelines for tagging to maintain consistency across resources.
  • Scalability: Ensure that tagging does not introduce additional performance overhead, data redundancy or ETL complexity.

By strategically tagging or extending resources, FHIR implementers can address the limitations of deep chaining, enabling efficient and effective data queries while maintaining compliance with the FHIR standard.

@bwalsh
Copy link
Author

bwalsh commented Dec 5, 2024

Example:

Count a cohort from Patient->ResearchSubject->ResearchStudy

# get count of patients in a study
curl -s $FHIR_BASE'/Patient?_has:ResearchSubject:subject:study.identifier=TCGA-KIRC&_summary=count'
# returns "total": 537

However, this seems to be unsupported Specimen->Patient->ResearchSubject->ResearchStudy

curl -v -s $FHIR_BASE'/Specimen?subject=Patient:_has:ResearchSubject:subject:study.identifier=TCGA-KIRC&_total=accurate'
# returns "total": 0
# no errors/ warnings in server logs

Your mileage may vary, but I see many examples in the spec for queries of the form:

[parameter]=[type]/[id]

However, I can't see any examples in the spec for:

[parameter]=[type]:_has

@bwalsh bwalsh changed the title General Feedback - FHIR Search: Challenges with Deep Chaining: Tag Patients General Feedback - FHIR Search: Challenges with Deep Chaining: Tagged Patients pattern Dec 5, 2024
@bwalsh bwalsh self-assigned this Dec 5, 2024
@bwalsh
Copy link
Author

bwalsh commented Dec 5, 2024

@teslajoy Can you review and comment?
@RobertJCarroll FYI

@teslajoy
Copy link
Collaborator

teslajoy commented Dec 6, 2024

LGTM 👍 confirming, don't see :_has

@RadixSeven
Copy link

Tags are a poor choice because of their odd update semantics

An extension and a supported search parameter are better for linking to other resources than tags. This is because update operations for tags do not allow removing tags by default. Instead, the result is the union of old and new tags. I have also heard of servers that resurrect tags on a resource when it is deleted and re-created. The same problem exists for security labels. This is rarely an issue since security labels seldom change. Still, it is serious if you need to change them since it has implications for maintaining subject consent and privacy requirements.

If you have access to your server's source code, the standard allows you to deviate from this default behavior, but we wouldn't want to limit the NCPI IG only to servers that users could customize in this way.

Use extensions

Instead, I would want custom extensions. For example, one experimental internal system we set up has a "partOfStudy" extension of type "Reference(ResearchStudy)" and an associated search parameter. This can be searched directly or can be used for chaining. We put this on all resources derived from a particular study. This would reduce the example query to {base}/Specimen?part-of-study=ResearchStudy/TCGA-KIRC&_total=accurate - requiring no joins.

Governance

Using extensions also helps ensure good governance over the available "tags" - since we'd define an extension for each one. This is something commonly specified in an IG, so users would know exactly where to look and whether the IG handles a case or not. However, governance is not unique to this solution. One can also limit tags in an IG by sub-classing the Meta element and using the appropriate subclass in your IG-specialized resources.

A note on chaining

Some servers have limited reverse chaining (_has) and forward chaining (.) implementations. In particular, Google Cloud Healthcare limits the number of results to 100 joined resources. So, we should put less weight on chaining for essential use cases.

@bwalsh
Copy link
Author

bwalsh commented Dec 10, 2024

This would reduce the example query to {base}/Specimen?part-of-study=ResearchStudy/TCGA-KIRC&_total=accurate - requiring no joins

@RadixSeven Thanks Eric. Agreed, you had me at requiring no joins. Anything that will simplify querying for downstream analysts.

Question: When you set this up, did you need to create a SearchParameter?

Something like:

{
  "resourceType": "SearchParameter",
  "id": "patient-partOfStudy",
  "url": "http://example.org/fhir/SearchParameter/patient-partOfStudy",
  "version": "1.0.0",
  "name": "PartOfStudy",
  "status": "active",
  "publisher": "Example Organization",
  "description": "Search for Patients who are part of a specific ResearchStudy.",
  "code": "partOfStudy",
  "base": ["Patient"],
  "type": "reference",
  "expression": "Patient.extension.where(url='http://example.org/fhir/StructureDefinition/patient-partOfStudy').valueReference",
  "target": ["ResearchStudy"]
}

@RadixSeven
Copy link

RadixSeven commented Dec 11, 2024

Question: When you set this up, did you need to create a SearchParameter?

Yes. We created several SearchParameter resources. If you'd like more details, I'll need to check with the developer who did the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

4 participants