Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets module #492

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Datasets module #492

wants to merge 16 commits into from

Conversation

mskaa3
Copy link

@mskaa3 mskaa3 commented Jan 3, 2025

Datasets module cleaned up version

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 0% with 204 lines in your changes missing coverage. Please review.

Project coverage is 83.73%. Comparing base (22662f3) to head (f63fe1a).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
srai/datasets/_base.py 0.00% 107 Missing ⚠️
srai/datasets/philadelphia_crime.py 0.00% 19 Missing ⚠️
srai/datasets/airbnb_multicity.py 0.00% 18 Missing ⚠️
srai/datasets/chicago_crime.py 0.00% 18 Missing ⚠️
srai/datasets/police_department_incidents.py 0.00% 18 Missing ⚠️
srai/datasets/house_sales_in_king_county.py 0.00% 17 Missing ⚠️
srai/datasets/__init__.py 0.00% 7 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (22662f3) and HEAD (f63fe1a). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (22662f3) HEAD (f63fe1a)
windows-latest-python3.12 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   90.52%   83.73%   -6.80%     
==========================================
  Files          62       69       +7     
  Lines        2513     2718     +205     
==========================================
+ Hits         2275     2276       +1     
- Misses        238      442     +204     
Flag Coverage Δ
macos-13-python3.12 83.73% <0.00%> (-6.72%) ⬇️
ubuntu-latest-python3.10 83.66% <0.00%> (-6.87%) ⬇️
ubuntu-latest-python3.11 83.66% <0.00%> (-6.87%) ⬇️
ubuntu-latest-python3.12 83.73% <0.00%> (-6.80%) ⬇️
ubuntu-latest-python3.9 83.71% <0.00%> (-6.81%) ⬇️
windows-latest-python3.12 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

data = load_dataset(dataset_name, version, token=hf_token, trust_remote_code=True)
processed_data = self._preprocessing(data)

return processed_data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add proper index name compatible with loaders in terms of using it later with srai


averages_hex = joined_gdf.groupby("region_id").size().reset_index(name=target_column)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, I would set the index name in the loaded dataset to "region_id" to keep it aligned with the rest of our loaders.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "feature_id" of course ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants