Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for typing.Annotated, support for extractFrom #141

Merged
merged 18 commits into from
Dec 12, 2023

Conversation

wRAR
Copy link
Contributor

@wRAR wRAR commented Oct 11, 2023

scrapy_zyte_api/providers.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 11, 2023

Codecov Report

Merging #141 (412475e) into main (d701f48) will decrease coverage by 0.33%.
The diff coverage is 100.00%.

❗ Current head 412475e differs from pull request most recent head f5bb894. Consider uploading reports for the commit f5bb894 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #141      +/-   ##
==========================================
- Coverage   98.96%   98.64%   -0.33%     
==========================================
  Files          10       10              
  Lines         776      810      +34     
==========================================
+ Hits          768      799      +31     
- Misses          8       11       +3     
Files Coverage Δ
scrapy_zyte_api/providers.py 96.87% <100.00%> (-3.13%) ⬇️

scrapy_zyte_api/providers.py Outdated Show resolved Hide resolved
for option in ExtractFrom:
if option in metadata:
product_options = zyte_api_meta.setdefault("productOptions", {})
if "extractFrom" in product_options:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check if the previous value is the same, maybe we should do that instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we should treat values from different sources with different priorities and use the top one instead of raising this error.

@attrs.define
class AnnotatedProductPage(BasePage):
product: Annotated[Product, ExtractFrom.httpResponseBody]
product2: Annotated[Product, ExtractFrom.httpResponseBody]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't raise a double-extractFrom exception because there is only one unique dep.

@@ -117,6 +117,14 @@ def render_POST(self, request):
"price": "10",
"currency": "USD",
}
extract_from = request_data.get("productOptions", {}).get("extractFrom")
if extract_from:
from scrapy_zyte_api.providers import ExtractFrom
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively it can be moved to a different module (importing providers needs additional deps).

Comment on lines +125 to +126
assert isinstance(response_data["product"], dict)
assert isinstance(response_data["product"]["name"], str)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are needed because of how response_data is typed, otherwise mypy thinks the values could be anything supported by JSON.

product2: Annotated[Product, ExtractFrom.httpResponseBody]

class AnnotatedZyteAPISpider(ZyteAPISpider):
def parse_(self, response: DummyResponse, page: AnnotatedProductPage): # type: ignore[override]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: Argument 2 of "parse_" is incompatible with supertype "ZyteAPISpider"; supertype defines the argument type as "ProductPage" [override]
note: This violates the Liskov substitution principle
note: See https://mypy.readthedocs.io/en/stable/common_issues.html#incompatible-overrides

I think this is a general problem with scrapy-poet-annotated callbacks, it will also be worse when we release typed Scrapy as Scrapy.parse has Response but we often use DummyResponse.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please open an issue for this in scrapy-poet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will, but I'm not sure what to do with it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wRAR wRAR changed the title Initial support for typing.Annotated. Initial support for typing.Annotated, support for extractFrom Dec 11, 2023
@wRAR
Copy link
Contributor Author

wRAR commented Dec 12, 2023

So the last question is where should we define ExtractFrom. Do we want to expose it in the top-level __init__.py? If so, we shouldn't put it in scrapy_zyte_api.providers as importing it needs additional deps.

@wRAR wRAR closed this Dec 12, 2023
@wRAR wRAR reopened this Dec 12, 2023
@Gallaecio
Copy link
Contributor

What about a scrapy_zyte_api._annotations module?

@wRAR wRAR merged commit 11a9b03 into main Dec 12, 2023
18 checks passed
@wRAR wRAR deleted the annotated-support branch December 12, 2023 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants