-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Faster repr() logic for String, StringRef, UInt, DType and others #3736
Conversation
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
This improves the representation to be like String.__repr__() Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind looking into the pathlib test failure as a result of this change? See this CI run for example.
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
8be18b2
to
348b434
Compare
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
…e OS Signed-off-by: Manuel Saelices <[email protected]>
I think the check is flaky as it depends on the OS, and I don't have the MacOS to test it. I've just removed off-topic checks |
@JoeLoser could you please TAL? |
Yep, CI is failing though - do you mind looking into that first? |
stdlib/src/collections/string.mojo
Outdated
value: The string to represent. | ||
""" | ||
self._value = StringSlice[ImmutableAnyOrigin]( | ||
unsafe_from_utf8_ptr=value.unsafe_ptr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsafe_from_utf8_ptr=value.unsafe_ptr(), | |
ptr=value.unsafe_ptr(), |
@@ -46,7 +46,7 @@ fn _constrain_unix(): | |||
|
|||
|
|||
@value | |||
struct stat_result(Stringable): | |||
struct stat_result(Stringable, Writable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Split this change to stat_result
to be Writable
along with its tests to a separate PR, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this commit was to have a faster representation of path.stat()
which uses stat_result
underneath, which was already representable but slow, so to me was in the same scope of "Faster repr() logic" msaelices@950d36a
Do you still want me to split those changes into a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the scope of the commit, but in the future, I'd recommend teasing apart the two ideas (applying Writable
to a struct and making repr()
faster for a few types). They're orthogonal things in my mind and could have been two separate PRs. It's fine for this one though, but something to consider in the future 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I've been working on faster repr and ascii logic for about 3 weeks now, see PR #3686. This does not add much speed really.
I think you can still merge the other improvements you made on this PR. But see the _repr function that I've been working on, there are several other things that need to exist as helpers to make this much faster.
stdlib/src/collections/string.mojo
Outdated
var quote = "'" | ||
if len(self._value) > 0 and self._value[0] == "'": | ||
quote = '"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not how python works
>>> print(repr("hi"))
'hi'
>>> print(repr("'hi'"))
"'hi'"
>>> print(repr("a 'hi' a"))
"a 'hi' a"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current nightly implementation does this:
var use_dquote = False
for s in self:
use_dquote = use_dquote or (s == "'")
which checks if any character in the string is '
, and that makes the __repr__
change to using double quotes. The logic above only checks if the first character is '
and would fail in the 3rd example.
Not sure where the command line execution you're showing here comes from, but if it's from the results of this code then I must've misread the logic 😄
stdlib/src/collections/string.mojo
Outdated
writer.write(quote) | ||
for s in self._value: | ||
if s == "\\": | ||
writer.write(r"\\") | ||
elif s == "\t": | ||
writer.write(r"\t") | ||
elif s == "\n": | ||
writer.write(r"\n") | ||
elif s == "\r": | ||
writer.write(r"\r") | ||
else: | ||
var codepoint = ord(s) | ||
if isprintable(codepoint): | ||
writer.write(s) | ||
elif codepoint < 0x10: | ||
writer.write(hex(codepoint, prefix=r"\x0")) | ||
elif codepoint < 0x20 or codepoint == 0x7F: | ||
writer.write(hex(codepoint, prefix=r"\x")) | ||
else: # multi-byte character | ||
writer.write(s) | ||
writer.write(quote) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really much faster. It still uses write hex()
and doesn't alloc before doing everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was not a great optimization. I only mapped the top-level constructor to use .write()
but did not spend the needed work on making it way faster.
Signed-off-by: Manuel Saelices <[email protected]>
Signed-off-by: Manuel Saelices <[email protected]>
Co-authored-by: Joe Loser <[email protected]> Signed-off-by: Manuel Saelices <[email protected]>
We will take advantage of a faster StringRef.__repr__() when modular#3686 was merged Signed-off-by: Manuel Saelices <[email protected]>
c85a70a
to
cb66ec9
Compare
waiting for being merged here: modular#3686 Signed-off-by: Manuel Saelices <[email protected]>
Ohh! What an awesome optimization. I've reverted the changes here: |
@martinvuyk, @JoeLoser could you please TAL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
writer.write("os.stat_result(") | ||
writer.write("st_mode=", self.st_mode) | ||
writer.write(", st_ino=", self.st_ino) | ||
writer.write(", st_dev=", self.st_dev) | ||
writer.write(", st_nlink=", self.st_nlink) | ||
writer.write(", st_uid=", self.st_uid) | ||
writer.write(", st_gid=", self.st_gid) | ||
writer.write(", st_size=", self.st_size) | ||
writer.write(", st_atime=", str(self.st_atimespec)) | ||
writer.write(", st_mtime=", str(self.st_mtimespec)) | ||
writer.write(", st_ctime=", str(self.st_ctimespec)) | ||
writer.write(", st_birthtime=", str(self.st_birthtimespec)) | ||
writer.write(", st_blocks=", self.st_blocks) | ||
writer.write(", st_blksize=", self.st_blksize) | ||
writer.write(", st_rdev=", self.st_rdev) | ||
writer.write(", st_flags=", self.st_flags) | ||
writer.write(")") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion:
The String.write_bytes()
reserves the amount of bytes to join the next item every time, in the future we should be able to constrain traits and know if they implement for example as_bytes()
and be able to reserve the total amount of bytes beforehand. So this suggestion doesn't make a difference currently but maybe in the future:
these can all go in the same function call separated by commas. They will be formatted similarly by the Mojo formatter so it won't make much difference in the aesthetics other than the name of the field won't be directly to the left, still it's just a matter of personal preference and not performance currently 🤷♂️ .
(This would all become redundant if somebody reserves enough capacity for the whole os.stat
result before calling write_to()
)
!sync |
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the nightly branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
Landed in b27d751! Thank you for your contribution 🎉 |
[External] [stdlib] Make `StringRef` Representable - [X] Implement `StringRef.__repr__()` - [X] Implement `Writable` for `os.stat_result` Co-authored-by: Manuel Saelices <[email protected]> Closes #3736 MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
[External] [stdlib] Make `StringRef` Representable - [X] Implement `StringRef.__repr__()` - [X] Implement `Writable` for `os.stat_result` Co-authored-by: Manuel Saelices <[email protected]> Closes modular#3736 MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
[External] [stdlib] Make `StringRef` Representable - [X] Implement `StringRef.__repr__()` - [X] Implement `Writable` for `os.stat_result` Co-authored-by: Manuel Saelices <[email protected]> Closes modular#3736 MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b Signed-off-by: Manuel Saelices <[email protected]>
[External] [stdlib] Make `StringRef` Representable - [X] Implement `StringRef.__repr__()` - [X] Implement `Writable` for `os.stat_result` Co-authored-by: Manuel Saelices <[email protected]> Closes #3736 MODULAR_ORIG_COMMIT_REV_ID: f8fe27aa7b0274fa27665b8cd0b19af22f3cac3b
StringRef.__repr__()
path.stat()
implementation using WriterUInt.__repr__()
Error.__repr__()
DType.__repr__()