-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve query speed for IslandoraUtils::getTermForUri() #1067
base: 2.x
Are you sure you want to change the base?
Conversation
foreach ($fields as $field) { | ||
$orGroup->condition("$field.uri", $uri); | ||
} | ||
|
||
$results = $query | ||
->accessCheck(TRUE) | ||
->condition($orGroup) | ||
->execute(); | ||
|
||
if (empty($results)) { | ||
return NULL; | ||
$query = $storage->getQuery(); | ||
$results = $query | ||
->accessCheck(TRUE) | ||
->condition("$field.uri", $uri) | ||
->execute(); | ||
if (!empty($results)) { | ||
return $storage->load(reset($results)); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a general solution to the issue? Sure, it might be faster if it happens to try querying with a field containing the URI earlier; however, if the field happens to be last (i.e, EXTERNAL_URI_FIELD
, with (m)any other populated authority_link
fields) , it would still have to perform all the other queries first? Would be entirely up to the given sites configuration and content?
I feel like the mention of improving the indexing of uri
values was very much overlooked (as originally mentioned in #997 (comment)); Only the first 30 characters are indexed by default. With this default configuration, longer values cannot be uniquely identified by the index, probably falling back to table scans.
Another potential easy optimization might be to provide a limit to the query? If we only expect one value (and it is all we plan to use), the DB engine might be able to return early after it finds a matching value. Vague thoughts about avoiding sorts (which would require the full result set in order to calculate the ordering) or possibly reworking to use UNION
queries across the fields; however, unsure if it is really necessary to go that far.
Really, the indexing is probably the big thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it might be faster if it happens to try querying with a field containing the URI earlier; however, if the field happens to be last
FWIW in my particular example case it is the second table that gets called in my little test script that has the value. I'd suspect whatever N is (N being the number of EXTERNAL_URI_FIELD fields on the site) will grow the query speed with the current implementation or the proposal in this PR. Tested this by manually adding 8 new authority field and timing it.
N=2
Current implementation
time drush scr scripts/1067.php
real 0m 10.39s
user 0m 0.54s
sys 0m 0.26s
This PR
time drush scr scripts/1067.php
real 0m 1.18s
user 0m 0.52s
sys 0m 0.25s
N=10
Current implementation
time drush scr scripts/1067.php
real 0m 26.47s
user 0m 0.57s
sys 0m 0.24s
This PR
(confirmed the found result was in the last table out of 10 in the iteration)
time drush scr scripts/1067.php
real 0m 1.19s
user 0m 0.57s
sys 0m 0.18s
So we actually saw worse performance as N grew with the current implementation.
I feel like the mention of improving the indexing of uri values was very much overlooked
I tried manually adding indexes to the two tables that get called with this function on my site and it had no effect on improving the query speed
e.g.
CREATE TABLE `taxonomy_term__field_external_uri` (
`bundle` varchar(128) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT '' COMMENT 'The
...
...
KEY `field_external_uri_uri_2` (`field_external_uri_uri`(768))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='Data storage for taxonomy_term field field_external_uri.';
time drush scr scripts/1067.php
real 0m 10.39s
user 0m 0.54s
sys 0m 0.26s
Also tried adding ->range(0, 1)
to the entity query with no effect to query speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... things might be more from the left joins used by the query builder to make things the or-group happen. The UNION
approach would allow to move away from them, but isn't supported by Drupal "entity query" mechanism, so would be somewhat of a bigger refactor with different access checking.
Just yeah, I find that I'm really not a fan of the idea of starting to make arbitrarily many queries as more fields are introduced. Might end up looking something like: 2.x...adam-vessey:islandora:fix/union-query ? Will probably give this a bit more of a look next week... not seeing the same slowness here, but unsure if due to different DB or fields or what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I don't have an opinion on the implementation details as long as it makes this hot path faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuation of #997
Closes Islandora/documentation#2272
What does this Pull Request do?
Improve performance for
IslandoraUtils::getTermForUri()
How should this be tested?
Made a simple drush script
Then ran it and it took 17s to complete
$ time drush scr scripts/1067.php real 0m 17.46s user 0m 0.52s sys 0m 0.38s
Applied this PR and ran again and execution dropped to 1s
It's definitely the query here causing the slowdown, as if I replace the query with $results = [12]; I get similar 1s executions.
Here's the query that entityTypeManager is running:
I think due to the orGroup this is slowing down these queries. It appears be more efficient to run several queries.
Documentation Status
Additional Notes:
Any additional information that you think would be helpful when reviewing this
PR.
Interested parties
Tag (@ mention) interested parties or, if unsure, @Islandora/committers