Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(Python): Improve slicing performance #6042

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

lucasmcdonald3
Copy link
Contributor

@lucasmcdonald3 lucasmcdonald3 commented Jan 10, 2025

What was changed?

Changed Python Seq implementation:

  1. If a Seq is provided as the iterable input to a new Seq, just copy its internals to the new Seq.
  • Before, this would create a new list out of the Seq, which is expensive.
  • Now, with the changes in 2, this delays turning the islice into a list, improving performance.
  1. Replace the genexpr to evaluate the slice with islice.
  • This is evaluated lazily, improving performance.

How has this been tested?

No new functional testing. I hope that existing tests would catch any issues, but let me know if there are any doubts.

I ran a slicing performance test locally.

module Main {
  method Main(args: seq<string>) {
    var longList := seq(TestNum, i => i);
    var currentSeq := longList;
    while |currentSeq| > 0
    {
        var firstElem := currentSeq[0];
        currentSeq := currentSeq[1..];
    }
  }
}

Results from my local:

  • TestNum = 10000, Python before change: 9s

  • TestNum = 10000, Python after change: <1s

  • TestNum = 10000, .NET: ~1s

  • TestNum = 100000, Python after change: 18s

  • TestNum = 100000, NET: 64s

I don't plan committing the performance test file, since there's no functional testing there, but let me know if I should.

(I have separate testing in Crypto Tools' Python DBESDK (JSON encryption library) that brings a particular test execution from ~15 minutes to ~30 seconds.)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@lucasmcdonald3 lucasmcdonald3 marked this pull request as ready for review January 10, 2025 22:53
@lucasmcdonald3
Copy link
Contributor Author

(I think I'm missing permissions to add #2313 as an issue linked on the PR)

@robin-aws
Copy link
Member

Thanks for the contribution!

(I think I'm missing permissions to add #2313 as an issue linked on the PR)

AFAIK you can just say phrases like "Resolves #2313" to make that link happen.

But in this case we shouldn't anyway, since that issue is tracking a more general cross-backend improvement, and this change is just improving things for Python.

@robin-aws robin-aws requested a review from fabiomadge January 13, 2025 17:40
@lucasmcdonald3
Copy link
Contributor Author

Cool, thanks!

I made these changes as part debugging performance issues in Crypto Tools products. Now, I'm running into more performance "hiccups", and might need to make more changes.
Would y'all prefer I add any other changes to this PR, or a new PR?

@robin-aws
Copy link
Member

I'd say if the additional changes are small and somewhat related, feel free to add them here. Otherwise getting this merged and working on a new PR is probably better.

@lucasmcdonald3
Copy link
Contributor Author

I added my changes here because my new changes conflicted with the previous changes.
I'm probably done making significant changes.

Copy link
Collaborator

@fabiomadge fabiomadge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great! Do you have data on the impact of _SeqSlice as well?

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py Outdated Show resolved Hide resolved
@@ -116,11 +116,71 @@ def flatten(self):
e = q.pop()
if isinstance(e, list):
l += e
elif isinstance(e, _SeqSlice):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about merging that with the previous case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way -- I'll leave it up to you

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer merging

elif isinstance(e, Concat):
q.append(e.r)
q.append(e.l)
return l

class _SeqSlice:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little strange, that we have Concat and _SeqSlice. Unifying the naming scheme seems appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally, Concat would be _Concat or _SeqConcat. But you're right that unifying is better. Will just change to Slice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way

Comment on lines +172 to +182
def __repr__(self):
return f"_SeqSlice({list(self)})"

def __contains__(self, item):
return any(x == item for x in self)

def index(self, value):
for i, x in enumerate(self):
if x == value:
return i
raise ValueError(f"{value} is not in list")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be leftover from debugging. I probably don't need these anymore. Will clean up.

self._step = source._step * step
self._stop = (
source._start + (stop * source._step if stop is not None else len(source) * source._step)
if stop is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flip?

self._start = source._start + start * source._step
self._step = source._step * step
self._stop = (
source._start + (stop * source._step if stop is not None else len(source) * source._step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
source._start + (stop * source._step if stop is not None else len(source) * source._step)
source._start + stop * source._step

Right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants