Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(typing): allow annotate methods with pos_only when only have the self argument #5403

Merged
merged 10 commits into from
Nov 11, 2024

Conversation

XuehaiPan
Copy link
Contributor

@XuehaiPan XuehaiPan commented Oct 9, 2024

Description

This PR allows to annotate methods such as __repr__(self) to be positional-only. For example:

py::class_<MyClass>(mod, "MyClass")
.def("__repr__", &MyClass::ToString, py::pos_only());
.def("my_method_no_args", &MyClass::MyMethodNoArgs, py::pos_only());

will generate annotation:

class MyClass:
    def __repr__(self: MyClass, /) -> str: ...
    def my_method_no_args(self: MyClass, /) -> RType: ...

Previously, the compilation failed here:

constexpr bool has_kw_only_args = any_of<std::is_same<kw_only, Extra>...>::value,
has_pos_only_args = any_of<std::is_same<pos_only, Extra>...>::value,
has_arg_annotations = any_of<is_keyword<Extra>...>::value;
static_assert(has_arg_annotations || !has_kw_only_args,
"py::kw_only requires the use of argument annotations");
static_assert(has_arg_annotations || !has_pos_only_args,
"py::pos_only requires the use of argument annotations (for docstrings "
"and aligning the annotations to the argument)");

Although there is an implicitly defined argument self not present by py::arg("self").

Suggested changelog entry:

- Allow annotate methods with ``py::pos_only`` when only have the ``self`` argument.
- Make arguments for auto-generated dunder methods positional-only.

@XuehaiPan XuehaiPan force-pushed the method-pos-only-args branch from 9cd2ff6 to 567000e Compare October 9, 2024 08:17
@XuehaiPan XuehaiPan force-pushed the method-pos-only-args branch from 93cc71f to fa887bd Compare October 14, 2024 12:55
@rwgk
Copy link
Collaborator

rwgk commented Nov 8, 2024

This PR allows to annotate methods such as __repr__(self) to be positional-only.

Could you please explain in the PR description why this is beneficial?

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 8, 2024

This PR allows to annotate methods such as __repr__(self) to be positional-only.

Could you please explain in the PR description why this is beneficial?

We generate the help message and function signature from the method definition.

>>> import pybind_module
>>> help(pybind_module.MyClass)
>>> print(pybind_module.MyClass.some_method.__doc__)

For those methods are not explicitly defined by the users, they will use the definition from CPython:

https://github.com/python/cpython/blob/54c63a32d06cb5f07a66245c375eac7d7efb964a/Objects/typeobject.c#L10558-L10753

E.g.:

    TPSLOT(__repr__, tp_repr, slot_tp_repr, wrap_unaryfunc,
           "__repr__($self, /)\n--\n\nReturn repr(self)."),
    TPSLOT(__hash__, tp_hash, slot_tp_hash, wrap_hashfunc,
           "__hash__($self, /)\n--\n\nReturn hash(self)."),
    TPSLOT(__str__, tp_str, slot_tp_str, wrap_unaryfunc,
           "__str__($self, /)\n--\n\nReturn str(self)."),
    TPSLOT(__getattr__, tp_getattro, _Py_slot_tp_getattr_hook, NULL,
           "__getattr__($self, name, /)\n--\n\nImplement getattr(self, name)."),
    TPSLOT(__setattr__, tp_setattro, slot_tp_setattro, wrap_setattr,
           "__setattr__($self, name, value, /)\n--\n\nImplement setattr(self, name, value)."),
    TPSLOT(__delattr__, tp_setattro, slot_tp_setattro, wrap_delattr,
           "__delattr__($self, name, /)\n--\n\nImplement delattr(self, name)."),

Their arguments are positional-only.

Before this PR, if the user defines __repr__. The auto-generated help message will like this:

>>> help(pybind_module.MyClass)
 |  ...
 |
 |  __repr__(...)
 |      __repr__(self: pybind_module.MyClass) -> str  # <--- generated by pybind11: not positional-only
 |
 |      Return a string representation of pybind_module.MyClass.
 |
 |  __str__(self, /)  # <--- from cpython/Objects/typeobject.c: positional-only
 |      Return str(self).

@rwgk
Copy link
Collaborator

rwgk commented Nov 8, 2024

Thanks. So is this PR to make the pybind11 behavior compatible with standard Python behavior? Is this only for cosmetic reasons, or do you see failures in some context/environment without this PR?

I will need some time to fully understand the static_assert you're changing. Could you maybe help with an explanation why your change is correct in all situations? Concretely, why do we not have to worry about distinguishing:

  • method without arguments other then self
  • method with additional arguments

?

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 9, 2024

So is this PR to make the pybind11 behavior compatible with standard Python behavior?

Yes. The previous static_assert only counts the non-self argument in methods. It disallows method definitions like my_method(self, /) while the Python can.

Is this only for cosmetic reasons, or do you see failures in some context/environment without this PR?

Before this PR (the current master branch), if I define a positional-only def __repr__(self, /) -> str:

py::class_<MyClass>(...)
    .def("__repr__", &MyClass::ToString, "Return a string representation.", py::pos_only())
    ...;

The compilation fails:

pybind11/include/pybind11/pybind11.h:306:27: error: static assertion failed due to requirement 'has_arg_annotations || !has_pos_only_args': py::pos_only requires the use of argument annotations (for docstrings and aligning the annotations to the argument)
  306 |             static_assert(has_arg_annotations || !has_pos_only_args,
      |                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pybind11/include/pybind11/pybind11.h:157:9: note: in instantiation of function template specialization 'pybind11::cpp_function::initialize<(lambda at pybind11/include/pybind11/pybind11.h:157:20), std::string, const MyClass *, pybind11::name, pybind11::is_method, pybind11::sibling, char[32], pybind11::pos_only>' requested here
  157 |         initialize([f](const Class *c,
      |         ^
pybind11/include/pybind11/pybind11.h:1621:22: note: in instantiation of function template specialization 'pybind11::cpp_function::cpp_function<std::string, MyClass, pybind11::name, pybind11::is_method, pybind11::sibling, char[32], pybind11::pos_only>' requested here
  1621 |         cpp_function cf(method_adaptor<type>(std::forward<Func>(f)),
      |                      ^
myclass.cpp:342:10: note: in instantiation of function template specialization 'pybind11::class_<MyClass>::def<std::string (MyClass::*)() const, char[32], pybind11::pos_only>' requested here
  342 |         .def("__repr__", &MyClass::ToString, "Return a string representation.", py::pos_only())

Because the static_assert complains there is no argument before py::pos_only(). However, for methods, there is at least one positional-only argument: the self argument. The self-argument is automatically prepended into the front of the parameter list. The previous static_assert statement does not count for that.

I will need some time to fully understand the static_assert you're changing. Could you maybe help with an explanation why your change is correct in all situations? Concretely, why do we not have to worry about distinguishing:

  • method without arguments other then self
  • method with additional arguments

For functions, this PR does not change any behavior other than the current master:

// before and after this PR
// signature: my_function(/)  # this is also not allowed in Python
mod.def("my_function", py::pos_only())                              // fail(expected): no annotated argument
// signature: my_function(x, /)
mod.def("my_function", py::arg("x"), py::pos_only())                // OK
// signature: my_function(x, /, y)
mod.def("my_function", py::arg("x"), py::pos_only(), py::arg("y"))  // OK

For methods, it allows a standalone py::pos_only() annotation in the method definition:

// before this PR
// signature: my_method(self, /)  # allowed in Python
cls.def("my_method", py::pos_only())                              // fail(unexpected): no annotated argument
                                                                  // (this is not correct because there is
                                                                  //  an implicitly added `self` argument
                                                                  //  automatically added in `py::class_.def(...)`
                                                                  //  (which set `py::is_method`))
// signature: my_method(self, x, /)
cls.def("my_method", py::arg("x"), py::pos_only())                // OK
// signature: my_method(self, x, /, y)
cls.def("my_method", py::arg("x"), py::pos_only(), py::arg("y"))  // OK

// after this PR
// signature: my_method(self, /)  # allowed in Python
cls.def("my_method", py::pos_only())                              // OK
// signature: my_method(self, x, /)
cls.def("my_method", py::arg("x"), py::pos_only())                // OK
// signature: my_method(self, x, /, y)
cls.def("my_method", py::arg("x"), py::pos_only(), py::arg("y"))  // OK

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for 09a4992, that makes it pretty easy to follow the logic.

Did you consider making pos_only the default for the is_method_with_self_arg_only case? Then we (and the rest of the world) wouldn't need all the changes adding pos_only() explicitly.

include/pybind11/pybind11.h Outdated Show resolved Hide resolved
@XuehaiPan
Copy link
Contributor Author

Did you consider making pos_only the default for the is_method_with_self_arg_only case? Then we (and the rest of the world) wouldn't need all the changes adding pos_only() explicitly.

I'd like to leave this decision to the users. They should explicitly add pos_only() if they want. Because the Python side does not restrict this: self can be a kwarg

In [1]: class Foo:
   ...:     def foo(self):
   ...:         print('Hello')
   ...:

In [2]: obj = Foo()

In [3]: obj.foo()
Hello

In [4]: Foo.foo(obj)
Hello

In [5]: Foo.foo(self=obj)
Hello

In this PR, I add pos_only() in duder methods in tests (e.g., __repr__) and pybind11 generated methods (e.g. __iter__, __getstate__).

@rwgk
Copy link
Collaborator

rwgk commented Nov 10, 2024

I added a test (34e808f), in part to verify my understanding that adding pos_only() changes the generated __doc__, but also to have a test for the changed static_assert().

I'd like to leave this decision to the users.

OK, I agree after thinking about it more.

I strongly agree that changing the static_assert() is a good move (I think of it as a bug fix, more than a feature btw, but OK to keep the PR title as is).

I'm less certain about the rest of the changes, adding pos_only() to the dunder methods. Strictly speaking those are breaking changes, although I think in practice disallowing self=obj will not break anything ... except, will it change mypy stubgen stubs? — That'd actually be likely to break Google-internal tests (@nevedaren, for awareness): they use mypy stubgen to generate .pyi files, check them into their version control system, and then run tests to compare the generated .pyi files with the checked-in files, failing if there is a difference due to pybind11 behavior changes.

Anyway, that's a weird/unfortunate/artificial problem, and Google has ways to deal with it, so I'd support making the change here.

If you want to keep the changes for the dunder methods, could you please add unit tests to cover all changes you're making, by adding asserts for __doc__, similar to my commit?

Sorry it's a bit of a chore to figure out where to insert those tests, but it's important to have them for long-term stability. (To pin-point where to add the tests, I'd try to intentionally break one changed dunder method at a time, then run all existing unit tests locally.)

@rwgk
Copy link
Collaborator

rwgk commented Nov 10, 2024

BTW

Earlier I wrote:

Is this only for cosmetic reasons, or do you see failures in some context/environment without this PR?

What I had in mind: Did you see failures that forced you to add pos_only()?

(I realized before that the static_assert() is in the way add adding pos_only(), but why did you want to add pos_only()?)

@XuehaiPan
Copy link
Contributor Author

XuehaiPan commented Nov 11, 2024

I'm less certain about the rest of the changes, adding pos_only() to the dunder methods. Strictly speaking those are breaking changes, although I think in practice disallowing self=obj will not break anything ...

Yes, it is a breaking change for generated dunder methods. For user-defined methods, it is an opt-in feature.

except, will it change mypy stubgen stubs?

No. I have tested with the latest mypy-stubgen locally. I found it ignores the pos-only attribute in the method signature.

pip3 install -U mypy
pip3 install -e tests
cd tests
stubgen -m pybind11_tests.enums
class UnscopedEnum:
    __members__: ClassVar[dict] = ...  # read-only
    EOne: ClassVar[UnscopedEnum] = ...
    EThree: ClassVar[UnscopedEnum] = ...
    ETwo: ClassVar[UnscopedEnum] = ...
    __entries: ClassVar[dict] = ...
    def __init__(self, value: int) -> None: ...
    def __and__(self, other): ...
    def __eq__(self, other: object) -> bool: ...
    def __ge__(self, other: object) -> bool: ...
    def __gt__(self, other: object) -> bool: ...
    def __hash__(self) -> int: ...
    def __index__(self) -> int: ...
    def __int__(self) -> int: ...
    def __invert__(self): ...
    def __le__(self, other: object) -> bool: ...
    def __lt__(self, other: object) -> bool: ...
    def __ne__(self, other: object) -> bool: ...
    def __or__(self, other): ...
    def __rand__(self, other): ...
    def __ror__(self, other): ...
    def __rxor__(self, other): ...
    def __xor__(self, other): ...
    @property
    def name(self): ...
    @property
    def value(self) -> int: ...

In the generated stub file, no argument (including unary and binary methods) is marked as pos-only.

If you want to keep the changes for the dunder methods, could you please add unit tests to cover all changes you're making, by adding asserts for __doc__, similar to my commit?

Sorry it's a bit of a chore to figure out where to insert those tests, but it's important to have them for long-term stability. (To pin-point where to add the tests, I'd try to intentionally break one changed dunder method at a time, then run all existing unit tests locally.)

I have added corresponding tests for each change in the latest commit in this PR.


Earlier I wrote:

Is this only for cosmetic reasons, or do you see failures in some context/environment without this PR?

What I had in mind: Did you see failures that forced you to add pos_only()?

There was no failure that forced me to add pos_only().

(I realized before that the static_assert() is in the way add adding pos_only(), but why did you want to add pos_only()?)

As commented in #5403 (comment), just for consistency with the CPython source code and the Python code where self can be pos-only.

I'm working on a PR to drop Python 3.7 and enable Python 3.8 features for my C extension. I got failures as commented in #5403 (comment).

@XuehaiPan XuehaiPan force-pushed the method-pos-only-args branch from 07c7c7d to 601e499 Compare November 11, 2024 15:24
@XuehaiPan XuehaiPan force-pushed the method-pos-only-args branch from 41dccd4 to ebaf3c8 Compare November 11, 2024 15:50
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the thorough testing!

@@ -198,10 +198,11 @@ def pytest_assertrepr_compare(op, left, right): # noqa: ARG001


def gc_collect():
"""Run the garbage collector twice (needed when running
"""Run the garbage collector three times (needed when running
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't realize we had this here. Skipping for PyPy (and GraalPy I guess) would be better. But for later, another PR, maybe.

@rwgk rwgk merged commit 7f94f24 into pybind:master Nov 11, 2024
81 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Nov 11, 2024
@XuehaiPan XuehaiPan deleted the method-pos-only-args branch November 12, 2024 08:00
@XuehaiPan
Copy link
Contributor Author

Out of curiosity, is there any version that this patch targeted? I tried to enable this feature in my source code with a version guard.

#if PYBIND11_VERSION_HEX >= 0x020E00F0  // pybind11 2.14.0
#define def_method_pos_only(...) def(__VA_ARGS__ __VA_OPT__(, ) py::pos_only())
#else
#define def_method_pos_only(...) def(__VA_ARGS__)
#endif

I'm not sure if this is the correct version that my code will not break in the future.

@rwgk
Copy link
Collaborator

rwgk commented Dec 12, 2024

Seems fine to me:

Your code will break only if we revert this PR, which seems very unlikely to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants