Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Handle public symbols declared as Unknown like undeclared symbols #15715

Closed
wants to merge 2 commits into from

Conversation

sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Jan 24, 2025

Summary

For definitely declared symbols, we currently rely on the declared type only, even if we also see bindings for that symbol. This means that we handle the two cases (1) a symbol is undeclared and (2) a symbol is declared with type Any/Unknown differently, which seems inconsistent.

One potential solution to this problem is the following: Instead of returning T_declared as the public type of a declared symbol, we return T_declared | T_declared & T_inferred instead. This represents a type that is no smaller than and no larger than T_declared. For fully static types, this construct simplifies to T_declared. But for non-fully-static types, this can lead to more precise types. For example, if the declared type is Any (or Unknown), we now use Any | Any & T which can be simplified to Any | T. This represents the fact that the public type for this symbol must be at least as large as the inferred type T. This allows us to issue diagnostics in cases like the following:

from typing import Any

class C:
    x: Any = 1

reveal_type(C.x)  # Any | Literal[1]    (previously: Any)

y: str = C.x  # error: [invalid-assignment]

Red knot (main), mypy and pyright issue no diagnostic in the last line here.

In the example given above, it looks like the intersection is not necessary. However, consider the following scenario. We still want to see int as the public type here. This is possible because int | int & Any = int:

from typing import Any

def any() -> Any: ...

class C:
    x: int = any()

reveal_type(C.x)  # int

The same behavior should also apply to possibly-declared symbols.

Why (T_declared | T_declared & T_inferred)?

The nice property of this expression is that it works for other cases in the matrix below as well.

Undeclared (T_declared = Unknown)

If a symbol is definitely undeclared, we can set T_declared = Unknown. In this case, the expression T_declared | T_declared & T_inferred = Unknown | Unknown & T_inferred simplifies to Unknown | T_inferred. This is exactly what we have in the last column of the matrix below (where the entry in the last row further simplifies because Unknown | Unknown = Unknown).

Unbound (T_inferred = Unknown)

If a symbol is definitely unbound, we can set T_inferred = Unknown. In this case, the expression T_declared | T_declared & T_inferred = T_declared | T_declared & Unknown simplifies to T_declared. This is exactly what we have in the last row of the matrix (where the entry in the last column simplifies because T_declared is Unknown in this case as well)

Type inference matrix

(changes affect the first column only)

Before

Public type declared possibly-undeclared undeclared
bound T_decl                                   T_decl | T_inf Unknown | T_inf
possibly-unbound T_decl T_decl | T_inf Unknown | T_inf
unbound T_decl T_decl Unknown

After

Public type declared possibly-undeclared undeclared
bound T_decl | T_decl & T_inf T_decl | T_inf Unknown | T_inf
possibly-unbound T_decl | T_decl & T_inf T_decl | T_inf Unknown | T_inf
unbound T_decl T_decl Unknown

What about possibly-undeclared?

It would be great if we could replace T_declared -> T_declared | Unknown for this case (and similar for possibly unbound symbols). This is doable, but leads to lots of unions/intersections with Unknown, something that we don't fully simplify at the moment, so I have not fully looked into this yet. As it stands, our handling of possibly-undeclared symbols is at least debatable. For example, should we issue a diagnostic in the following example (we currently do not).

def flag() -> bool: return True
def get_int() -> int: ...

class C:
    if flag():
        x: str
    else:
        x = get_int()

C.x = 1  # this seems to violate the `str` declaration?

Test Plan

Updated existing tests.

@sharkdp sharkdp added the red-knot Multi-file analysis & type inference label Jan 24, 2025
Comment on lines +52 to +57
reveal_type(e) # revealed: @Todo(full tuple[...] support) | tuple[()]
reveal_type(f) # revealed: @Todo(full tuple[...] support) | tuple[Literal["42"], Literal[b"42"]]
reveal_type(g) # revealed: @Todo(full tuple[...] support) | tuple[Literal["42"], Literal[b"42"]]

# TODO: support more kinds of type expressions in annotations
reveal_type(h) # revealed: @Todo(full tuple[...] support)
reveal_type(h) # revealed: @Todo(full tuple[...] support) | tuple[list, list]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look a bit strange, but they will all go away when the TODO is resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For f, and g once we understand the annotation we will infer the type as being

tuple[str, *tuple[int, ...], bytes] | tuple[Literal["42"], Literal[b"42"]]

Which, since the second element in the union is a subtype of the first element, simplifies to

tuple[str, *tuple[int, ...], bytes]

which is just what they have as their annotation already

Copy link
Contributor Author

@sharkdp sharkdp Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And that reasoning is true for every fully-static declared+inferred type. If we did not issue a diagnostic on the definition x: T_declared = <expr of type T_inferred>, it means that T_inferred is assignable to T_declared. And if T_declared and T_inferred are fully-static, that means T_inferred <: T_declared, i.e. T_declared | T_inferred = T_declared

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests where the assigned value is inferred as having a non-fully-static type but the declared type is fully static? If not, could we add some?

E.g. for this:

from typing_extensions import Any, reveal_type

def f(x: Any):
    y: int = x

    def g():
        reveal_type(y)

We reveal int on main. What do we reveal with this PR?

Copy link
Contributor Author

@sharkdp sharkdp Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we reveal with this PR?

int | Any

Copy link
Contributor

@carljm carljm Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's a reasonable outcome. The assignment y: int = x is only valid because x has a not fully static type that can materialize to something that's a subtype of int. I don't think after that assignment we should be anymore accounting for a possibility that y has a type wider than int. If anything it seems like the type of y (and maybe also x!) here should be int & Any, but I don't think that's quite right either, it corresponds to effectively ignoring annotated types on valid assignments, which is what we do in local inference but is not right for public types.

Overall I'm not sold on this PR yet, would like to think it over a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's a good point that I hadn't thought about. And I'm not sure I like the behavior. In the x: Any = 1 case, we make the resulting public type more precise. But in the case x: int = returns_any() here, we make the public type of x less precise. I'm not sure if that's desirable. Maybe we want (T_declared | T_inferred) & T_declared? 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything it seems like this example should introduce a narrowing constraint on x: x: Any & int

Copy link
Contributor Author

@sharkdp sharkdp Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember that this is about public uses of x. Someone could have modified x in the meantime. This is why we also want a lower bound, I think. So (int | Any) & int = int | Any & int = int for this case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should document this by adding a test, since this PR changes behaviour for things like this!

Done.

@@ -18,7 +18,7 @@ def f(*args: Unpack[Ts]) -> tuple[Unpack[Ts]]:
# TODO: should understand the annotation
reveal_type(args) # revealed: tuple

reveal_type(Alias) # revealed: @Todo(Unsupported or invalid type in a type expression)
reveal_type(Alias) # revealed: @Todo(Unsupported or invalid type in a type expression) | Literal[int]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the other comment: This change look a bit strange, but it will go away when the TODO is resolved.

@sharkdp sharkdp force-pushed the david/union-for-declared-types branch from fde7f20 to 5bb5ca5 Compare January 24, 2025 14:41
@sharkdp sharkdp marked this pull request as draft January 24, 2025 18:44
@sharkdp sharkdp changed the title [red-knot] Use T_declared | T_inferred for declared public symbols [red-knot] Handle public symbols declared as Unknown like undeclared symbols Jan 24, 2025
@sharkdp sharkdp force-pushed the david/union-for-declared-types branch from 5bb5ca5 to a5e3304 Compare January 24, 2025 19:53
@sharkdp sharkdp force-pushed the david/union-for-declared-types branch from a5e3304 to a3d730b Compare January 24, 2025 20:33
symbol, not just declarations), we use `T_decl | T_decl & T_inf` as the public type, which
simplifies to `T_decl` for `T_inf = Unknown` (the unbound case).

[TODO: more explanation]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for me.

@sharkdp sharkdp closed this Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants