Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[json schema] Do not throw errors when parsing non annotated parameters #1157

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

h-mayorquin
Copy link
Collaborator

@h-mayorquin h-mayorquin commented Dec 13, 2024

This comes from the IBL project.

Summary:
If someone creates their own interface (inherits from BaseDataInterface) but does not annotate the __init__ function of the json validation will throw an obscure error:

self = <pydantic.json_schema.GenerateJsonSchema object at 0x77dfd6ff52d0>, schema = {'cls': <class 'inspect._empty'>, 'metadata': {}, 'type': 'is-instance'}
error_info = "core_schema.IsInstanceSchema (<class 'inspect._empty'>)"

    def handle_invalid_for_json_schema(self, schema: CoreSchemaOrField, error_info: str) -> JsonSchemaValue:
>       raise PydanticInvalidForJsonSchema(f'Cannot generate a JsonSchema for {error_info}')
E       pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'inspect._empty'>)
E       
E       For further information visit https://errors.pydantic.dev/2.7/u/invalid-for-json-schema

../../miniconda3/envs/work/lib/python3.11/site-packages/pydantic/json_schema.py:2152: PydanticInvalidForJsonSchema
================================================================================================ short test summary info =================================================================================================
FAILED tests/test_minimal/test_utils/test_get_json_schema_from_method_signature.py::test_get_json_schema_from_method_signature_unannotated - pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'inspect._empty'>)
==============================================================

The fix, in my view, is to omit validation when the user does not annotate their signature parameters as there is nothing to validate. I added a fix and a test here.

@h-mayorquin h-mayorquin marked this pull request as ready for review December 13, 2024 16:42
@bendichter
Copy link
Contributor

bendichter commented Dec 16, 2024

I don't really see the point of leaving an arg un-annotated, especially if one is going to use the get_json_schema_from_method_signature method. I agree that the error message is obscure, but if that is the problem I think the best solution would be to improve the error message. If they really have a compelling use-case for leaving an arg fully unconstrained then I guess I would be OK with supporting that by throwing a warning instead of an error.

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Dec 16, 2024

I don't really see the point of leaving an arg un-annotated, especially if one is going to use the get_json_schema_from_method_signature method. I agree that the error message is obscure, but if that is the problem I think the best solution would be to improve the error message. If they really have a compelling use-case for leaving an arg fully unconstrained then I guess I would be OK with supporting that by throwing a warning instead of an error.

Important context: the method get_json_schema_from_method_signature is always used when initializing interfaces. Currently, there is no option to avoid source validation, either at the interface or converter level, so they need to use it.

The argument to leave arguments unannotated is the one that you have made before: lowering the barriers to contribution for users by not adding hurdles. Forcing them to use type annotations to inherit from BaseDataInterface is a contentious hurdle. My impression is that the Python community is still quite divided about using annotations or not.

If you still prefer to force the user to use annotations, then the error can be something like:

"The arguments x, y, and z lack type annotations. Add them to initialize the interface."

What do you think?

@bendichter
Copy link
Contributor

the method get_json_schema_from_method_signature is always used when initializing interfaces

Why are we always calling this function? Is this necessary?

@bendichter
Copy link
Contributor

If we want to enforce typing, we could use beartype: https://github.com/beartype/beartype

@h-mayorquin
Copy link
Collaborator Author

h-mayorquin commented Dec 17, 2024

the method get_json_schema_from_method_signature is always used when initializing interfaces

Why are we always calling this function? Is this necessary?

To validate source data. To me it seems like a nice feature to have and it has been like that since the nwb-conversion-tools times. At least for the NWBConverter.

If we were to disable source validation under which conditions we would?

I personally would prefer the outcome of this PR. Once this PR is merged it will work like this for interfaces and converters:

  • If someone does annotate their interfaces __init__ then we do the source schema validation.
  • If someone does not or only annotates some of them then we respect their intent and we don't validate the ones without type annotations (that is, we don't force them to annotate or add typing.Any)

beartype looks nice. We have something similar with Pydantic validate_call:
https://docs.pydantic.dev/latest/api/validate_call/

@bendichter
Copy link
Contributor

I think these issues are separable:

  1. We want to generate a json schema for input args. This is useful for interfacing with other applications like NWB GUIDE
  2. We want to validate inputs to the init function.

We don't necessarily need to do validation with the json schema, when working within Python. There are some nuances in the conversion from args to json schema that can cause complications here.

  1. It currently requires annotation for every arg, which is cumbersome
  2. There are some inconsistencies in requirements between Python and JSON. For example there is this pydantic issue where JSON schema dates require timezones and Pydantic-validated dates do not.

The pydantic library has come a long way since we originally wrote this code, and now facilitates a much simpler approach. Now we can change the code so that instead of attempting to create a json schema and then validate every call with it, we use the method you mentioned, @validated_call. In so doing, we are removing the get_json_schema_from_method_signature method from this workflow, and we can have stricter requirements for this without worrying about creating a burden for users.

@h-mayorquin
Copy link
Collaborator Author

Oh, I see how you are thinking. I fully agree with this reasoning that they are two distinct responsibilities! I was under the wrong impression that you wanted to have a JSON validation mechanism for source data of some sort. My bad. I think that Pydantic (or Beartype) does a better job and offers many advantages: easier to opt out, automatic handling of non-annotated parameters, etc.

For your information, we already have a @validate_call mechanism for most interfaces. When this was introduced, I argued for removing the JSON source checks here, but I failed to convince Cody. Perhaps some of the context from that discussion is important here.

So, what do you think of the following plan?

  • The current PR should throw a more explicit error instead of omitting validation of untyped parameters as it currently does.
  • In another PR we should remove source validation with JSON on the interfaces and converters.

The first point makes sense because the get_json_schema_from_method_signature will then only be called explicitly by users like the NWBGuide, and then an error is better aligned with the intent than an omission in that case.

For the second point, we are already performing validate_call at the interface level, so removing what is arguably redundant functionality makes sense.

@bendichter
Copy link
Contributor

Sounds great!

@h-mayorquin
Copy link
Collaborator Author

I implemented your feedback, @bendichter, and changed from omission to error and modified the test. Thanks for the discussion.

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.75%. Comparing base (96dfdff) to head (5bd670b).
Report is 12 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1157      +/-   ##
==========================================
+ Coverage   90.69%   90.75%   +0.06%     
==========================================
  Files         129      129              
  Lines        8189     8287      +98     
==========================================
+ Hits         7427     7521      +94     
- Misses        762      766       +4     
Flag Coverage Δ
unittests 90.75% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...nterfaces/ecephys/basesortingextractorinterface.py 80.19% <100.00%> (ø)
src/neuroconv/utils/json_schema.py 95.06% <100.00%> (+0.51%) ⬆️

... and 3 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants