Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move away from orConditionGroup to UNION-ized queries #1068

Draft
wants to merge 8 commits into
base: 2.x
Choose a base branch
from

Conversation

adam-vessey
Copy link

GitHub Issue: (link)

  • Other Relevant Links (Google Groups discussion, related pull requests,
    Release pull requests, etc.)

What does this Pull Request do?

Issues reporting slowness in queries that was being patched by effectively avoiding the orConditionGroup, so, it seems we want to avoid them. At the same time, submitting the queries separately can be avoided by constructing the set of results by UNION-ing all the queries together so, let's make it so.

What's new?

A in-depth description of the changes made by this PR. Technical details and
possible side effects.

  • Changes x feature to such that y
  • Added x
  • Removed y
  • Does this change add any new dependencies?
  • Does this change require any other modifications to be made to the repository
    (i.e. Regeneration activity, etc.)?
  • Could this change impact execution of existing code?

How should this be tested?

A description of what steps someone could take to:

  • Reproduce the problem you are fixing (if applicable)
  • Test that the Pull Request does what is intended.
  • Please be as detailed as possible.
  • Good testing instructions help get your PR completed faster.

Documentation Status

  • Does this change existing behaviour that's currently documented?
  • Does this change require new pages or sections of documentation?
  • Who does this need to be documented for?
  • Associated documentation pull request(s): ___ or documentation issue ___

Additional Notes:

Any additional information that you think would be helpful when reviewing this
PR.

Interested parties

Tag (@ mention) interested parties or, if unsure, @Islandora/committers

Was looking at other callers of the original method; however, missed
that the caller as a DGI-specific module. No need to roll such a method
in the core code... not that it would really hurt per-se, just not
wanting to add method for it at the moment.
Also, handle other return-types.
@joecorall joecorall closed this Jan 22, 2025
@joecorall joecorall reopened this Jan 22, 2025
@joecorall
Copy link
Member

I tested this PR locally and 100 calls to getTermForUri is still taking over 10s

time drush scr scripts/1067.php 
real    0m 12.29s
user    0m 0.56s
sys     0m 0.35s

for getTermForUri this PR creates a query like

SELECT base_table.revision_id AS revision_id, base_table.tid AS tid
FROM
taxonomy_term_data base_table
INNER JOIN taxonomy_term_field_data taxonomy_term_field_data ON taxonomy_term_field_data.tid = base_table.tid
WHERE taxonomy_term_field_data.tid IN (SELECT f.entity_id AS entity_id
FROM
taxonomy_term__field_authority_link f
WHERE f.field_authority_link_uri = 'http://purl.org/coar/resource_type/c_18cc' UNION SELECT f.entity_id AS entity_id
FROM
taxonomy_term__field_external_uri f
WHERE f.field_external_uri_uri = 'http://purl.org/coar/resource_type/c_18cc')
GROUP BY base_table.revision_id, base_table.tid
LIMIT 1 OFFSET 0;

Which will have the same result of running N number of queries, where N is the number of external URI fields in the site config. Which was the original issue with #1067 (comment)

Just yeah, I find that I'm really not a fan of the idea of starting to make arbitrarily many queries as more fields are introduced

@adam-vessey
Copy link
Author

Which will have the same result of running N number of queries, where N is the number of external URI fields in the site config.

Unsure what you're trying to say. The same result is expected; however, it's the actual execution of the query with which I'm concerned. Like, sure there will be N parts under the UNION; however, the query as a whole is submitted to the server as a single query, with only a single pass of the query tagging. Building and tagging multiple queries, if there were actually other conditions being added for access control or whatever could start to overshadow.

That said, I am unable to see the same query performance you're seeing between any of:

  • our internal envs (though we use PostgreSQL)
  • my ddev branch of islandora-starter-site (https://github.com/adam-vessey/islandora-starter-site/tree/feature/ddev; should be able to use with a ddev delete -Oy ; ddev full-install... IIRC, there's presently issues generating derivatives, but it uses MariaDB, more in-line with what would be running elsewhere, possibly?)
    adam@compy:~/repos/islandora-starter-site$ git log
    commit 6449ba063b86ff19e4fe2496a90ed32796d08289 (HEAD -> feature/ddev, me/feature/ddev)
    Merge: a3f782f 9bf6d90
    Author: Adam Vessey <...>
    Date:   Wed Dec 11 14:38:26 2024 -0400
    [...]
    adam@compy:~/repos/islandora-starter-site$ ddev delete -Oy ; ddev full-install
    [...]
    adam@compy:~/repos/islandora-starter-site$ /usr/bin/time ddev drush php:eval "
    \$i = 0;
    while (++\$i < 100) {
      \$term = \Drupal::service('islandora.utils')->getTermForUri('http://purl.org/coar/resource_type/c_18cc');
      if (\$term === NULL) {
        print 'Failed to query term
    ';
        exit(1);
      }
    }
    print 'Finished
    ';
    "
    Finished
            0.86 real         0.08 user         0.12 sys
    adam@compy:~/repos/islandora-starter-site$
    
  • a fresh isle-dc env (@ origin/development, commit current at Islandora-Devops/isle-dc@57bbd3f ): (adding in print statements for paranoia, just to be sure results were being returned)
    adam@DockerBusiness:~/repos/isle-dc$ git log -n1
    commit 57bbd3f21da96385243bda364322ec26b59bb176 (HEAD -> development, origin/development, origin/HEAD)
    Author: Don Richards <[email protected]>
    Date:   Tue Dec 10 18:55:55 2024 -0500
    [...]
    adam@DockerBusiness:~/repos/isle-dc$ make starter
    [...]
    make[2]: Leaving directory '/home/adam/repos/isle-dc'
    make wait-for-drupal-locally
    make[2]: Entering directory '/home/adam/repos/isle-dc'
    make[2]: Leaving directory '/home/adam/repos/isle-dc'
    make[1]: Leaving directory '/home/adam/repos/isle-dc'
    adam@DockerBusiness:~/repos/isle-dc$ docker exec -it isle-dc-drupal-1 /bin/bash
    27eed0f0f257:/var/www/drupal# /usr/bin/time drush php:eval "
    \$i = 0;
    while (++\$i < 100) {
      \$term = \Drupal::service('islandora.utils')->getTermForUri('http://purl.org/coar/resource_type/c_18cc');
      if (\$term === NULL) {
        print 'Failed to query term
    ';
        exit(1);
      }
    }
    print 'Finished
    ';
    "
    Finished
    real	0m 0.56s
    user	0m 0.24s
    sys	0m 0.09s
    27eed0f0f257:/var/www/drupal#
    exit
    adam@DockerBusiness:~/repos/isle-dc$
    

The islandora-starter-site and isle-dc tests timings here were using the current (non-UNION) code.

@joecorall
Copy link
Member

joecorall commented Jan 23, 2025

however, it's the actual execution of the query with which I'm concerned. ... Building and tagging multiple queries, if there were actually other conditions being added for access control or whatever could start to overshadow.

Oh, I see. Yeah, that's reasonable. I thought you were concered about running N number of queries vs just one in a general sense. Yeah, makes sense since Drupal can tack on a lot of extras on a per-query basis.


I am unable to see the same query performance you're seeing between any of:

With this script

<?php

$i = 0;
while (++$i < 100) {
  $term = \Drupal::service('islandora.utils')->getTermForUri("http://purl.org/coar/resource_type/c_18cc");
}

on a fresh install I get

That's because there's pretty much nothing in taxonomy_term_data and taxonomy_term_field_data. I was testing this against a repo with 105K taxonomy terms.

On a fresh isle-site-template install I imported taxonomy_term_* from my prod database and ran the drush script again and got


So I think we still have a performance issue with this PR as-is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants