Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Faster repr() logic for String, StringRef, UInt, DType and others #3736

Closed
wants to merge 20 commits into from

Conversation

msaelices
Copy link
Contributor

@msaelices msaelices commented Nov 3, 2024

  • Implement StringRef.__repr__()
  • Faster path.stat() implementation using Writer
  • Faster UInt.__repr__()
  • Faster Error.__repr__()
  • Faster DType.__repr__()

Signed-off-by: Manuel Saelices <[email protected]>
This improves the representation to be like String.__repr__()

Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
@msaelices msaelices requested a review from a team as a code owner November 3, 2024 23:56
@msaelices msaelices changed the title [stdlib] Better repr() logic for String and StringRef [stdlib] Faster repr() logic for String and StringRef Nov 3, 2024
Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind looking into the pathlib test failure as a result of this change? See this CI run for example.

stdlib/src/os/fstat.mojo Outdated Show resolved Hide resolved
@JoeLoser JoeLoser requested a review from jackos November 4, 2024 19:27
@msaelices msaelices force-pushed the better-representable branch from 8be18b2 to 348b434 Compare November 7, 2024 21:47
@msaelices
Copy link
Contributor Author

Do you mind looking into the pathlib test failure as a result of this change? See this CI run for example.

I think the check is flaky as it depends on the OS, and I don't have the MacOS to test it. I've just removed off-topic checks

@msaelices msaelices requested a review from JoeLoser November 7, 2024 22:21
@msaelices
Copy link
Contributor Author

@JoeLoser could you please TAL?

@msaelices msaelices changed the title [stdlib] Faster repr() logic for String and StringRef [stdlib] Faster repr() logic for String, StringRef, UInt, DType and others Nov 8, 2024
@JoeLoser
Copy link
Collaborator

JoeLoser commented Nov 9, 2024

@JoeLoser could you please TAL?

Yep, CI is failing though - do you mind looking into that first?

value: The string to represent.
"""
self._value = StringSlice[ImmutableAnyOrigin](
unsafe_from_utf8_ptr=value.unsafe_ptr(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsafe_from_utf8_ptr=value.unsafe_ptr(),
ptr=value.unsafe_ptr(),

stdlib/src/collections/string.mojo Outdated Show resolved Hide resolved
stdlib/src/collections/string.mojo Outdated Show resolved Hide resolved
stdlib/src/collections/string.mojo Outdated Show resolved Hide resolved
@@ -46,7 +46,7 @@ fn _constrain_unix():


@value
struct stat_result(Stringable):
struct stat_result(Stringable, Writable):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion Split this change to stat_result to be Writable along with its tests to a separate PR, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this commit was to have a faster representation of path.stat() which uses stat_result underneath, which was already representable but slow, so to me was in the same scope of "Faster repr() logic" msaelices@950d36a

Do you still want me to split those changes into a new PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the scope of the commit, but in the future, I'd recommend teasing apart the two ideas (applying Writable to a struct and making repr() faster for a few types). They're orthogonal things in my mind and could have been two separate PRs. It's fine for this one though, but something to consider in the future 😄

@JoeLoser JoeLoser self-assigned this Nov 10, 2024
Copy link
Contributor

@martinvuyk martinvuyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I've been working on faster repr and ascii logic for about 3 weeks now, see PR #3686. This does not add much speed really.

I think you can still merge the other improvements you made on this PR. But see the _repr function that I've been working on, there are several other things that need to exist as helpers to make this much faster.

Comment on lines 720 to 722
var quote = "'"
if len(self._value) > 0 and self._value[0] == "'":
quote = '"'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not how python works

>>> print(repr("hi"))
'hi'
>>> print(repr("'hi'"))
"'hi'"
>>> print(repr("a 'hi' a"))
"a 'hi' a"

Copy link
Contributor Author

@msaelices msaelices Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find the issue here. This is a side-by-side comparison:
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current nightly implementation does this:

var use_dquote = False
for s in self:
    use_dquote = use_dquote or (s == "'")

which checks if any character in the string is ', and that makes the __repr__ change to using double quotes. The logic above only checks if the first character is ' and would fail in the 3rd example.

Not sure where the command line execution you're showing here comes from, but if it's from the results of this code then I must've misread the logic 😄

Comment on lines 723 to 743
writer.write(quote)
for s in self._value:
if s == "\\":
writer.write(r"\\")
elif s == "\t":
writer.write(r"\t")
elif s == "\n":
writer.write(r"\n")
elif s == "\r":
writer.write(r"\r")
else:
var codepoint = ord(s)
if isprintable(codepoint):
writer.write(s)
elif codepoint < 0x10:
writer.write(hex(codepoint, prefix=r"\x0"))
elif codepoint < 0x20 or codepoint == 0x7F:
writer.write(hex(codepoint, prefix=r"\x"))
else: # multi-byte character
writer.write(s)
writer.write(quote)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really much faster. It still uses write hex() and doesn't alloc before doing everything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was not a great optimization. I only mapped the top-level constructor to use .write() but did not spend the needed work on making it way faster.

msaelices and others added 2 commits November 12, 2024 14:50
Co-authored-by: Joe Loser <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
We will take advantage of a faster StringRef.__repr__() when
modular#3686 was merged

Signed-off-by: Manuel Saelices <[email protected]>
@msaelices msaelices force-pushed the better-representable branch from c85a70a to cb66ec9 Compare November 12, 2024 15:30
@msaelices
Copy link
Contributor Author

Hi, I've been working on faster repr and ascii logic for about 3 weeks now, see PR #3686. This does not add much speed really.

I think you can still merge the other improvements you made on this PR. But see the _repr function that I've been working on, there are several other things that need to exist as helpers to make this much faster.

Ohh! What an awesome optimization. I've reverted the changes here:

msaelices@cb66ec9
msaelices@958106b

@msaelices
Copy link
Contributor Author

@martinvuyk, @JoeLoser could you please TAL?

Copy link
Contributor

@martinvuyk martinvuyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +164 to +180
writer.write("os.stat_result(")
writer.write("st_mode=", self.st_mode)
writer.write(", st_ino=", self.st_ino)
writer.write(", st_dev=", self.st_dev)
writer.write(", st_nlink=", self.st_nlink)
writer.write(", st_uid=", self.st_uid)
writer.write(", st_gid=", self.st_gid)
writer.write(", st_size=", self.st_size)
writer.write(", st_atime=", str(self.st_atimespec))
writer.write(", st_mtime=", str(self.st_mtimespec))
writer.write(", st_ctime=", str(self.st_ctimespec))
writer.write(", st_birthtime=", str(self.st_birthtimespec))
writer.write(", st_blocks=", self.st_blocks)
writer.write(", st_blksize=", self.st_blksize)
writer.write(", st_rdev=", self.st_rdev)
writer.write(", st_flags=", self.st_flags)
writer.write(")")
Copy link
Contributor

@martinvuyk martinvuyk Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion:
The String.write_bytes() reserves the amount of bytes to join the next item every time, in the future we should be able to constrain traits and know if they implement for example as_bytes() and be able to reserve the total amount of bytes beforehand. So this suggestion doesn't make a difference currently but maybe in the future:

these can all go in the same function call separated by commas. They will be formatted similarly by the Mojo formatter so it won't make much difference in the aesthetics other than the name of the field won't be directly to the left, still it's just a matter of personal preference and not performance currently 🤷‍♂️ .

(This would all become redundant if somebody reserves enough capacity for the whole os.stat result before calling write_to())

@JoeLoser
Copy link
Collaborator

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Nov 14, 2024
@modularbot
Copy link
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added the merged-internally Indicates that this pull request has been merged internally label Nov 14, 2024
@modularbot
Copy link
Collaborator

Landed in b27d751! Thank you for your contribution 🎉

@modularbot modularbot added the merged-externally Merged externally in public mojo repo label Nov 15, 2024
modularbot pushed a commit that referenced this pull request Nov 15, 2024
[External] [stdlib] Make `StringRef` Representable

- [X] Implement `StringRef.__repr__()`
- [X] Implement `Writable` for `os.stat_result`

Co-authored-by: Manuel Saelices <[email protected]>
Closes #3736
MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
@modularbot modularbot closed this Nov 15, 2024
msaelices added a commit to msaelices/mojo that referenced this pull request Nov 20, 2024
[External] [stdlib] Make `StringRef` Representable

- [X] Implement `StringRef.__repr__()`
- [X] Implement `Writable` for `os.stat_result`

Co-authored-by: Manuel Saelices <[email protected]>
Closes modular#3736
MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
msaelices added a commit to msaelices/mojo that referenced this pull request Nov 22, 2024
[External] [stdlib] Make `StringRef` Representable

- [X] Implement `StringRef.__repr__()`
- [X] Implement `Writable` for `os.stat_result`

Co-authored-by: Manuel Saelices <[email protected]>
Closes modular#3736
MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b

Signed-off-by: Manuel Saelices <[email protected]>
modularbot pushed a commit that referenced this pull request Dec 17, 2024
[External] [stdlib] Make `StringRef` Representable

- [X] Implement `StringRef.__repr__()`
- [X] Implement `Writable` for `os.stat_result`

Co-authored-by: Manuel Saelices <[email protected]>
Closes #3736
MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants