Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stdlib] Add stol function #3178

Closed
wants to merge 44 commits into from

Conversation

jjvraw
Copy link
Contributor

@jjvraw jjvraw commented Jul 5, 2024

Introduces stol (string to long) function for more flexible string-to-integer parsing (issue #2639).

Implementation and logic closely follows that of atol. The function parses integer literals with base handling (2-36), supports base prefixes, and stops at the first invalid character. Unlike atol, it returns a tuple (parsed_int, remaining_string), allowing partial parsing without raising errors. This can make stol more flexible and suitable for mixed-content strings.

@jjvraw jjvraw requested a review from a team as a code owner July 5, 2024 07:43
@martinvuyk
Copy link
Contributor

Hey @jjvraw this looks good, just a couple of comments on some edge cases where python accepts leading underscores and zeroes for non base 10 numbers (remember to always check against python since we are aiming for maximum compatibility).

Other Changes

  • Added _handle_base_prefix function for improved reusability and readability.
    • Modularises previously inline code.
  • Added _trim_and_handle_sign function for improved reusability and readability.
    • Modularises previously inline code.
  • Renamed _atol_error to _str_to_base_error for reusability and clarity.
  • Revised the docstring of atol for improved clarity.

Could you split this into another PR ? It will be easier to review those changes separately and then incorporate this PR on top after merging that

@jjvraw
Copy link
Contributor Author

jjvraw commented Jul 5, 2024

Hi @martinvuyk, thank you!

python accepts leading underscores and zeroes for non base 10 numbers

Thanks for catching that, as well as the overflow! I misread the lexical syntax for Python's Integer literals, and intentionally tried to catch the leading underscore.

Could you split this into another PR ? It will be easier to review those changes separately and then incorporate this PR on top after merging that

Sure, I had a feeling these should be separate. Regarding StringRef to StringSlice, shall I apply the same change to _atol's signature in the mentioned PR?

@martinvuyk
Copy link
Contributor

lexical syntax for Python's Integer literals

I actually hadn't read the docs just tried it out. That table they have there seems pretty nice for documentation we could either copy it or put a more visible link to it (I hadn't actually seen the link at the bottom of your docstrings).

Regarding StringRef to StringSlice, shall I apply the same change to _atol's signature in the mentioned PR?

hmm I didn't realize atol uses StringRef as well, maybe we should leave changing both to StringSlice for another PR. Someone from the Mojo team would have to weigh in here.

jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
@jjvraw
Copy link
Contributor Author

jjvraw commented Jul 5, 2024

With "Other Changes" moved to #3180, lets leave StringRef for now, on the basis that atol uses it? We can get this merged, then have the StringRef -> StringSlice as a seperate issue or discussion? Thoughts @martinvuyk ?

jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 5, 2024
- Support leading underscores
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modular#3178.

Signed-off-by: Joshua James Venter <[email protected]>
jjvraw added a commit to jjvraw/mojo that referenced this pull request Jul 6, 2024
- Support leading underscores (bug fix)
- Add handle_base_prefix and trim_and_handle_sign helper functions
- Rename atol_error to str_to_base_error for clarity
- Update atol docstring for improved clarity

Breaks up the functionality of atol for better readability
and reusability, as suggested in PR modularml#3178.

Co-authored-by: martinvuyk <[email protected]>
Co-authored-by: soraros <[email protected]>

Signed-off-by: Joshua James Venter <[email protected]>
ConnorGray and others added 27 commits January 16, 2025 11:58
MODULAR_ORIG_COMMIT_REV_ID: ebb0b89772d7462416c97559b41e4f4cd198a74f

Signed-off-by: Joshua James Venter <[email protected]>
Add conversion from all floating point types to fp8.

MODULAR_ORIG_COMMIT_REV_ID: 6c8a09290a6a9b67ea2e95167ceb8e9673125ded

Signed-off-by: Joshua James Venter <[email protected]>
low-level utils

This is part of preparing the way to change `StringSlice.__len__()` to
return the string length in bytes instead of codepoints.

MODULAR_ORIG_COMMIT_REV_ID: 05955b1974e7682b8d9da1214049e850a2278e21

Signed-off-by: Joshua James Venter <[email protected]>
`StaticTuple` has some issues that `InlineArray` fixes. By switching to
`InlineArray` the payload copies behave properly. One side effect of
this change is that we have to get rid of the `address_space` tag on the
`payload_t` pointers, since non-trivial types (`InlineArray`) can't be
copied across address spaces.

MODULAR_ORIG_COMMIT_REV_ID: f48eedfc43db45c3465508390aa9ae2421c52bdd

Signed-off-by: Joshua James Venter <[email protected]>
In many cases, code that currently iterates over `String` using it's
`__iter__()` method that
return single-character substring slice as `StringSlice` elements, would
instead be more
readable if it was written instead using a `Char` iterator. This will
enable us to do that
refactoring.

MODULAR_ORIG_COMMIT_REV_ID: 1f9a40abee881a7438707b4e6159fac7b42206f9

Signed-off-by: Joshua James Venter <[email protected]>
Phase in removal of `int` with a deprecation warning.

MODULAR_ORIG_COMMIT_REV_ID: 8d6dee6333592145c3a7945a1a6f8624fc32326b

Signed-off-by: Joshua James Venter <[email protected]>
Accidentally multiplying by chunk size.

MODULAR_ORIG_COMMIT_REV_ID: 26a6cb4c7a2546794ef3f63728a71097326a2b50

Signed-off-by: Joshua James Venter <[email protected]>
`StringSlice.isspace()`

This adds a new `Char.is_python_space()` utility function, and cleans up
`StringSlice.isspace()` to use the it, along with the recently added
character iterator.

MODULAR_ORIG_COMMIT_REV_ID: 0739c7a540cedefec7e31ced1550310b5a6a56e2

Signed-off-by: Joshua James Venter <[email protected]>
API.

MODULAR_ORIG_COMMIT_REV_ID: bcc8fb2803ffa8e234a27467ad18c422c5990aa1

Signed-off-by: Joshua James Venter <[email protected]>
…(#53928)

[External] [stdlib] Use named output for _ListIter __next__() method

Add a trivial optimization to use a named output for the `__next__()`
method in `_ListIter` to save a subtraction operation on every iteration

Co-authored-by: bgreni <[email protected]>
Closes modularml#3941
MODULAR_ORIG_COMMIT_REV_ID: 426ada8f07f91a0d14ffe0c6e77ae074405e52c2

Signed-off-by: Joshua James Venter <[email protected]>
Now that we have nightly docs, people can see the documentation for
otherwise closed-source things in the standard library, such as the
`benchmark` module. So, remove the temporary markdown docs we provided
as an interim thing for users and contributors of the standard library.
They can be read in a more up-to-date form at
https://docs.modular.com/nightly/mojo/stdlib/benchmark.

MODULAR_ORIG_COMMIT_REV_ID: f5dfec06fc337ca87349888e14079971d09e75b9

Signed-off-by: Joshua James Venter <[email protected]>
[External] [stdlib] Fix `input()` segfaults on EOF

pressing `ctrl-d` with no input when `input()` is called causes mojo to
crash because `read_until_delimiter()` doesn't check the return value of
the C function `getdelim()`. it assumes `getdelim()` always succeeds and
so, in the case of an error, it blindly creates a `StringRef` with its
length set to the return value - 1 (so the length is -2 in this case).
this `StringRef` is then passed to `String()` which in turn passes the
`StringRef` to `memcpy()` with a count of -2 and ultimately crashing
mojo.

this pr adds a check in `read_until_delimiter()` to check if
`getdelim()` failed and raise an error if it does, along with a test to
ensure `read_until_delimiter()` continues to behave as it should.

Fixes modular#3908

Co-authored-by: mahiro21h <[email protected]>
Closes modular#3919
MODULAR_ORIG_COMMIT_REV_ID: c3457f3377bfcfe0379e31fbd31e72ec53fe7516

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: 639aca98351b3616d041da634ec24c456cb44f4d

Signed-off-by: Joshua James Venter <[email protected]>
be `Char` methods

MODULAR_ORIG_COMMIT_REV_ID: ea353f81276ccc4a700ec8373ee150b38f0bb17c

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: fd1b1dafcf04683dae78ad61f7d5086db50ebd6b

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: 73f546472958e8f645c868ffa41db6b3e331ede6

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: 6df31c8fe894ab751471259dae63af76a6ace222

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: 111e67f7d270db6368c3b592ff5437d24c4a464c

Signed-off-by: Joshua James Venter <[email protected]>
`__iter__()` method

MODULAR_ORIG_COMMIT_REV_ID: a50d7efa09350c65b20877feec2d99217c8da65b

Signed-off-by: Joshua James Venter <[email protected]>
This allows overloading keyword-only argument names, and overloading a
positional arg with a keyword-only argument:

```mojo
struct OverloadedKwArgs:
    var val: Int

    fn __init__(out self, single: Int):
        self.val = single

    fn __init__(out self, *, double: Int):
        self.val = double * 2

    fn __init__(out self, *, triple: Int):
        self.val = triple * 3

fn main():
    OverloadedKwArgs(1)        # val=1
    OverloadedKwArgs(double=1) # val=2
    OverloadedKwArgs(triple=2) # val=3
```

It also enables indexing overloading:

```mojo
struct OverloadedKwArgs:
    var vals: List[Int]

    fn __init__(out self):
        self.vals = List[Int](0, 1, 2, 3, 4)

    fn __getitem__(self, idx: Int) -> Int:
        return self.vals[idx]

    fn __getitem__(self, *, idx2: Int) -> Int:
        return self.vals[idx2 * 2]

    fn __setitem__(mut self, idx: Int, val: Int):
        self.vals[idx] = val

    fn __setitem__(mut self, val: Int, *, idx2: Int):
        self.vals[idx2 * 2] = val

fn main():
    var x = OverloadedKwArgs()
    print(x[1])       # 1
    print(x[idx2=1])  # 2

    x[1] = 42
    x[idx2=1] = 84

    print(x[1])       # 42
    print(x[idx2=1])  # 84
```

MODULAR_ORIG_COMMIT_REV_ID: 3a2fbcd84ea3ecb83e30ae21d34006759d61743c

Signed-off-by: Joshua James Venter <[email protected]>
[External] [stdlib] Make examples compile in `math`

- This is needed because of
modular#3828.

Co-authored-by: soraros <[email protected]>
Closes modular#3912
MODULAR_ORIG_COMMIT_REV_ID: 1b67d641b8fc389d43fab99ebf6785fc9c906d55

Signed-off-by: Joshua James Venter <[email protected]>
MODULAR_ORIG_COMMIT_REV_ID: e80f95fab01d0a24aeabdfa7438e94131b3b2493

Signed-off-by: Joshua James Venter <[email protected]>
And add deprecation warnings to the `str` functions, to be phased out to
a compiler error in the release after next.

MODULAR_ORIG_COMMIT_REV_ID: 54247b2e5fd77efc455237f0b372fa7c983c8cda

Signed-off-by: Joshua James Venter <[email protected]>
And add deprecation warnings to the `bool` functions, to be phased out
to a compiler error in the release after next.

MODULAR_ORIG_COMMIT_REV_ID: 6aac03285e2f69659e1208a372d5130c4f203f3d

Signed-off-by: Joshua James Venter <[email protected]>
Signed-off-by: Joshua James Venter <[email protected]>
@jjvraw jjvraw force-pushed the feature/add-stol-function branch from 4f7cdcd to e4cb52b Compare January 16, 2025 09:59
@jjvraw jjvraw closed this Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for response Needs action/response from contributor before a PR can proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.