New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

chore(Python): Improve slicing performance #6042

Open

lucasmcdonald3 wants to merge 12 commits into dafny-lang:master from lucasmcdonald3:python-lazy-slicing

Contributor

lucasmcdonald3 commented Jan 10, 2025 •

edited

Loading

What was changed?

Changed Python Seq implementation:

If a Seq is provided as the iterable input to a new Seq, just copy its internals to the new Seq.

Before, this would create a new list out of the Seq, which is expensive.
Now, with the changes in 2, this delays turning the islice into a list, improving performance.

Replace the genexpr to evaluate the slice with islice.

This is evaluated lazily, improving performance.

How has this been tested?

No new functional testing. I hope that existing tests would catch any issues, but let me know if there are any doubts.

I ran a slicing performance test locally.

module Main {
  method Main(args: seq<string>) {
    var longList := seq(TestNum, i => i);
    var currentSeq := longList;
    while |currentSeq| > 0
    {
        var firstElem := currentSeq[0];
        currentSeq := currentSeq[1..];
    }
  }
}

Results from my local:

TestNum = 10000, Python before change: 9s
TestNum = 10000, Python after change: <1s
TestNum = 10000, .NET: ~1s
TestNum = 100000, Python after change: 18s
TestNum = 100000, NET: 64s

I don't plan committing the performance test file, since there's no functional testing there, but let me know if I should.

(I have separate testing in Crypto Tools' Python DBESDK (JSON encryption library) that brings a particular test execution from ~15 minutes to ~30 seconds.)

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

lucasmcdonald3 added 2 commits

January 10, 2025 11:48


          chore(Python): Improve slicing performance

479be8b


          Merge branch 'master' into python-lazy-slicing

848f6c7

lucasmcdonald3 marked this pull request as ready for review

January 10, 2025 22:53

Contributor Author

lucasmcdonald3 commented Jan 10, 2025

(I think I'm missing permissions to add #2313 as an issue linked on the PR)

Member

robin-aws commented Jan 13, 2025

Thanks for the contribution!

(I think I'm missing permissions to add #2313 as an issue linked on the PR)

AFAIK you can just say phrases like "Resolves #2313" to make that link happen.

But in this case we shouldn't anyway, since that issue is tracking a more general cross-backend improvement, and this change is just improving things for Python.

robin-aws requested a review from fabiomadge

January 13, 2025 17:40

Contributor Author

lucasmcdonald3 commented Jan 13, 2025

Cool, thanks!

I made these changes as part debugging performance issues in Crypto Tools products. Now, I'm running into more performance "hiccups", and might need to make more changes.
Would y'all prefer I add any other changes to this PR, or a new PR?

Member

robin-aws commented Jan 15, 2025

I'd say if the additional changes are small and somewhat related, feel free to add them here. Otherwise getting this merged and working on a new PR is probably better.

lucasmcdonald3 added 6 commits

January 17, 2025 10:34


          list view wip

2a01d07


          lazy slicing

bf615d5

6d2ed2d

263be55


          Merge branch 'master' into python-lazy-slicing

b019fca

Contributor Author

lucasmcdonald3 commented Jan 17, 2025

I added my changes here because my new changes conflicted with the previous changes.
I'm probably done making significant changes.


          Merge branch 'master' into python-lazy-slicing

58fa9f7

fabiomadge reviewed

View reviewed changes

Collaborator

fabiomadge left a comment

Thanks, this is great! Do you have data on the impact of _SeqSlice as well?

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py Show resolved Hide resolved

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py Show resolved Hide resolved

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py Outdated Show resolved Hide resolved

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

@@ @@ -116,11 +116,71 @@ def flatten(self): @@
                           e = q.pop()
                           if isinstance(e, list):
                               l += e
+                          elif isinstance(e, _SeqSlice):

Collaborator

fabiomadge Jan 23, 2025

How do you feel about merging that with the previous case?

Contributor Author

lucasmcdonald3 Jan 23, 2025

I'm fine either way -- I'll leave it up to you

Collaborator

fabiomadge Jan 23, 2025

I prefer merging

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

                           elif isinstance(e, Concat):
                               q.append(e.r)
                               q.append(e.l)
                       return l
+              class _SeqSlice:

Collaborator

fabiomadge Jan 23, 2025

It's a little strange, that we have Concat and _SeqSlice. Unifying the naming scheme seems appropriate.

Contributor Author

lucasmcdonald3 Jan 23, 2025

I think ideally, Concat would be _Concat or _SeqConcat. But you're right that unifying is better. Will just change to Slice.

Collaborator

fabiomadge Jan 23, 2025

I'm fine either way

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

Comment on lines +172 to +182

+                  def __repr__(self):
+                      return f"_SeqSlice({list(self)})"
+                  def __contains__(self, item):
+                      return any(x == item for x in self)
+                  def index(self, value):
+                      for i, x in enumerate(self):
+                          if x == value:
+                              return i
+                      raise ValueError(f"{value} is not in list")

Collaborator

fabiomadge Jan 23, 2025

Are those necessary?

Contributor Author

lucasmcdonald3 Jan 23, 2025

This might be leftover from debugging. I probably don't need these anymore. Will clean up.

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

+                          self._step = source._step * step
+                          self._stop = (
+                              source._start + (stop * source._step if stop is not None else len(source) * source._step)
+                              if stop is not None

Collaborator

fabiomadge Jan 23, 2025

Flip?

Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

+                          self._start = source._start + start * source._step
+                          self._step = source._step * step
+                          self._stop = (
+                              source._start + (stop * source._step if stop is not None else len(source) * source._step)

Collaborator

fabiomadge Jan 23, 2025

Suggested change

      
                            source._start + (stop * source._step if stop is not None else len(source) * source._step)
          
                            source._start + stop * source._step

Right?

lucasmcdonald3 and others added 3 commits

January 23, 2025 14:28


          Update Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

a7c5a28

Co-authored-by: Fabio Madge <[email protected]>


          Update Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

2167e55

Co-authored-by: Fabio Madge <[email protected]>


          Update Source/DafnyRuntime/DafnyRuntimePython/_dafny/__init__.py

b9b8b97

Co-authored-by: Fabio Madge <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet