-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(Python): Improve slicing performance #6042
base: master
Are you sure you want to change the base?
chore(Python): Improve slicing performance #6042
Conversation
(I think I'm missing permissions to add #2313 as an issue linked on the PR) |
Thanks for the contribution!
AFAIK you can just say phrases like "Resolves #2313" to make that link happen. But in this case we shouldn't anyway, since that issue is tracking a more general cross-backend improvement, and this change is just improving things for Python. |
Cool, thanks! I made these changes as part debugging performance issues in Crypto Tools products. Now, I'm running into more performance "hiccups", and might need to make more changes. |
I'd say if the additional changes are small and somewhat related, feel free to add them here. Otherwise getting this merged and working on a new PR is probably better. |
I added my changes here because my new changes conflicted with the previous changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is great! Do you have data on the impact of _SeqSlice
as well?
@@ -116,11 +116,71 @@ def flatten(self): | |||
e = q.pop() | |||
if isinstance(e, list): | |||
l += e | |||
elif isinstance(e, _SeqSlice): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about merging that with the previous case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way -- I'll leave it up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer merging
elif isinstance(e, Concat): | ||
q.append(e.r) | ||
q.append(e.l) | ||
return l | ||
|
||
class _SeqSlice: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little strange, that we have Concat
and _SeqSlice
. Unifying the naming scheme seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ideally, Concat
would be _Concat
or _SeqConcat
. But you're right that unifying is better. Will just change to Slice
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way
def __repr__(self): | ||
return f"_SeqSlice({list(self)})" | ||
|
||
def __contains__(self, item): | ||
return any(x == item for x in self) | ||
|
||
def index(self, value): | ||
for i, x in enumerate(self): | ||
if x == value: | ||
return i | ||
raise ValueError(f"{value} is not in list") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be leftover from debugging. I probably don't need these anymore. Will clean up.
self._step = source._step * step | ||
self._stop = ( | ||
source._start + (stop * source._step if stop is not None else len(source) * source._step) | ||
if stop is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flip?
self._start = source._start + start * source._step | ||
self._step = source._step * step | ||
self._stop = ( | ||
source._start + (stop * source._step if stop is not None else len(source) * source._step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source._start + (stop * source._step if stop is not None else len(source) * source._step) | |
source._start + stop * source._step |
Right?
Co-authored-by: Fabio Madge <[email protected]>
Co-authored-by: Fabio Madge <[email protected]>
Co-authored-by: Fabio Madge <[email protected]>
What was changed?
Changed Python Seq implementation:
Seq
is provided as theiterable
input to a newSeq
, just copy its internals to the newSeq
.islice
into a list, improving performance.islice
.How has this been tested?
No new functional testing. I hope that existing tests would catch any issues, but let me know if there are any doubts.
I ran a slicing performance test locally.
Results from my local:
TestNum = 10000, Python before change: 9s
TestNum = 10000, Python after change: <1s
TestNum = 10000, .NET: ~1s
TestNum = 100000, Python after change: 18s
TestNum = 100000, NET: 64s
I don't plan committing the performance test file, since there's no functional testing there, but let me know if I should.
(I have separate testing in Crypto Tools' Python DBESDK (JSON encryption library) that brings a particular test execution from ~15 minutes to ~30 seconds.)
By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.