Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(external_data_files): load data from other YAML files #1880

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yajo
Copy link
Member

@yajo yajo commented Dec 4, 2024

When composing templates, it's often needed to be able to load answers from other templates that you know are usually combined with yours. Or any other kind of external data.

@moduon MT-8282

@yajo yajo marked this pull request as ready for review December 10, 2024 12:18
@yajo
Copy link
Member Author

yajo commented Dec 10, 2024

@copier-org/maintainers this one was a bit hard to get right, but I think it's OK now.

The whole point of this PR is to open a new venue for template composability. Now you can have a template that reads information from answers written for other templates. Still, both templates can stay independent. But if joined, they can have some extra features.

For example, some default values from one template can depend on the answers for another template. And others' answers can be accessed in current template.

Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 95.38462% with 3 lines in your changes missing coverage. Please review.

Project coverage is 97.81%. Comparing base (507cab3) to head (a586b28).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
copier/user_data.py 81.81% 2 Missing ⚠️
copier/main.py 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1880      +/-   ##
==========================================
+ Coverage   97.76%   97.81%   +0.05%     
==========================================
  Files          49       49              
  Lines        5232     5267      +35     
==========================================
+ Hits         5115     5152      +37     
+ Misses        117      115       -2     
Flag Coverage Δ
unittests 97.81% <95.38%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yajo yajo requested a review from sisp December 10, 2024 12:36
Copy link
Member

@sisp sisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reviewing the implementation in detail, would you mind discussing the design first?

Even prior to this PR, multiple Copier templates could be applied to the same subproject. But each template was applied completely independently, and when some templates had overlapping questions that should be answered the same for consistency, then it was the user's responsibility to keep the answers in sync. The way I understand this PR, it aims to improve the UX in this regard by allowing a child template to load external data (e.g., answers from an answers file from a previously applied parent template) and, e.g., use them as default values for questions in the child template, facilitating the synchronization of semantically identical questions across a parent and child template.

That said, I have some doubts about the current design:

  • A child template makes assumptions about the answers structure of a parent template. Although there is a graceful fallback when, e.g., the parent template's answers file is not found, the child template's external data loading is tailored to the parent template. This limits template composition with enhanced UX to composing the child template with a specific parent template.
  • Enhanced UX for template composition relies on the order in which templates are applied. A child template that reuses answers from a parent template must be applied after the parent template has been applied.
  • Although answers from a parent template may be used as default values for questions in a child template, the questions are asked nevertheless and may be answered differently, which may lead to inconsistencies.

I have been thinking about a similar feature myself but with a different design, inspired by how, e.g., React components are composed.

Instead of coupling the child template with the parent template, how about adding a way for a parent template to configure child templates to be applied? The new parent template may have a copier.yml file like this:

_templates:
  # A Copier template for generating a Python project using
  # Poetry as package manager
  https://git.example.com/copier-python.git:
    _ref: v1.2.3 # optional but highly recommended
    project_name: "{{ project_name }}"
    python_versions: "{{ python_versions }}"

  # A Copier template for generating GitHub Actions configs for
  # a Python project with support for several package managers
  # including Poetry
  https://git.example.com/copier-python-github-actions.git:
    _ref: v2.3.4 # optional but highly recommended
    python_versions: "{{ python_versions }}"
    package_manager: poetry # hardcoded because `copier-python` uses Poetry

# This Copier template must define all questions needed for itself
# and all child templates.

project_name:
  type: str
  help: The name of the Python project

python_versions:
  type: str
  help: The supported Python versions
  choices: &python_versions
    - "3.9"
    - "3.10"
    - "3.11"
    - "3.12"
    - "3.13"
  default: *python_versions
  multiselect: true

The answers of this template are recorded in its answers file, and any answers files that are rendered by the child templates are omitted because they aren't needed.

I see a few advantages of this design:

  • There is hardly any coupling between child templates, as the parent template defines the exposed questionnaire, passes the answers down to the child templates as needed, and orchestrates their application.
  • There is no order to remember in which templates must be applied because the parent template orchestrates the application of the child templates.
  • There might even be a way to handle multi-template merge conflicts (during both "copy" and "update" operations) via some kind of N-way merge (just dreaming, I have no proof of concept).
  • There could even be a hierarchy of child templates. It's important to note that this hierarchy is different than what we have discussed in Updatable templates for template hierarchies and reuse #934.

But I also see a few disadvantages of this design:

  • Composing templates requires one additional template that acts as the root/parent template to orchestrate the child templates' applications. So, ad-hoc template composition is not supported – at least I don't see how it would be possible.
  • Some questions from the child templates need to be (approximately) duplicated in the root/parent template. While this sounds redundant and cumbersome, it's the same situation that, e.g., React component developers are facing. I think there's no way around it.

WDYT, @yajo?

@yajo
Copy link
Member Author

yajo commented Dec 12, 2024 via email

Copy link
Member

@sisp sisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarifications, this was quite helpful. Indeed, template composition as shown is only one example of the feature; it's actually more general – and lightweight, which is nice.

I've left a few remarks related to documentation and license poisoning.

Now, thinking of your proposal, if you really need that, can't you do it
already? Just by using a clever combination of git submodules, !include
tags, and symlinks in a parent template that includes 2 child ones... my
intuition is that it'll work out-of-the-box right now.

Almost, I believe. Mapping answers in the parent template to questions in a child template seems to be missing though. When using !include, the questionnaire of a child template is simply merged with the parent template, which is not the same as passing down answers in the parent template as data to the child template. Also, !include doesn't offer encapsulation of child templates, e.g. templates with different Jinja settings wouldn't be properly supported.

But as you said, the feature of this PR does not block or prevent my suggested design from becoming available in the future. 👍

docs/configuring.md Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@hparfr
Copy link
Contributor

hparfr commented Dec 17, 2024

It's not really clear for me why should I use this instead of --data-file argument

Copy link
Contributor

@hparfr hparfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's _external_data_files but the key in the template is different "{{ _ext.parent1.name }}"
may be use the same term to be more clear like : "{{ _external_data_file.parent1.name}}"

When composing templates, it's often needed to be able to load answers from other templates that you know are usually combined with yours. Or any other kind of external data.

@moduon MT-8282
@yajo
Copy link
Member Author

yajo commented Dec 30, 2024

All attended. Could you please review again?

@yajo yajo requested a review from sisp December 30, 2024 10:20
Copy link
Member

@sisp sisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍, just a minor remark regarding type hints.

@@ -329,6 +329,14 @@ def exclude(self) -> tuple[str, ...]:
)
)

@cached_property
def external_data(self) -> Dict[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simply use dict[str, str] here for the return type and omit the import above:

Suggested change
def external_data(self) -> Dict[str, str]:
def external_data(self) -> dict[str, str]:

Copy link
Member Author

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix

@pawamoy
Copy link
Contributor

pawamoy commented Jan 18, 2025

Since you removed DEFAULT_DATA it will also fix #1929 🙂

Copy link
Contributor

@pawamoy pawamoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I simply second @hparfr here:

It's not really clear for me why should I use this instead of --data-file argument

A small note could be added in each to make the distinction. My own, quick wording:

The CLI argument --data-file is not the same thing as the template configuration external_data: the former lets you provide a dictionary of answers matching a template questionnaire as a user, while the latter lets you enhance the Jinja rendering context of your template as a template writer.

Something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants