-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets module #492
base: main
Are you sure you want to change the base?
Datasets module #492
Conversation
…docstring in previous datasets
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #492 +/- ##
==========================================
- Coverage 90.52% 83.73% -6.80%
==========================================
Files 62 69 +7
Lines 2513 2718 +205
==========================================
+ Hits 2275 2276 +1
- Misses 238 442 +204
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
srai/datasets/_base.py
Outdated
data = load_dataset(dataset_name, version, token=hf_token, trust_remote_code=True) | ||
processed_data = self._preprocessing(data) | ||
|
||
return processed_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add proper index name compatible with loaders in terms of using it later with srai
srai/datasets/_base.py
Outdated
|
||
averages_hex = joined_gdf.groupby("region_id").size().reset_index(name=target_column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, I would set the index name in the loaded dataset to "region_id"
to keep it aligned with the rest of our loaders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant "feature_id"
of course ;)
Datasets module cleaned up version