Datasets module #492

mskaa3 · 2025-01-03T12:12:17Z

Datasets module cleaned up version

…docstring in previous datasets

srai/loaders/huggingface_loader.py

codecov · 2025-01-03T12:26:53Z

Codecov Report

Attention: Patch coverage is 0% with 204 lines in your changes missing coverage. Please review.

Project coverage is 83.73%. Comparing base (22662f3) to head (f63fe1a).
Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
srai/datasets/_base.py	0.00%	107 Missing ⚠️
srai/datasets/philadelphia_crime.py	0.00%	19 Missing ⚠️
srai/datasets/airbnb_multicity.py	0.00%	18 Missing ⚠️
srai/datasets/chicago_crime.py	0.00%	18 Missing ⚠️
srai/datasets/police_department_incidents.py	0.00%	18 Missing ⚠️
srai/datasets/house_sales_in_king_county.py	0.00%	17 Missing ⚠️
srai/datasets/__init__.py	0.00%	7 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (22662f3) and HEAD (f63fe1a). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (22662f3) HEAD (f63fe1a)

windows-latest-python3.12 1 0

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #492      +/-   ##
==========================================
- Coverage   90.52%   83.73%   -6.80%     
==========================================
  Files          62       69       +7     
  Lines        2513     2718     +205     
==========================================
+ Hits         2275     2276       +1     
- Misses        238      442     +204

Flag	Coverage Δ
macos-13-python3.12	`83.73% <0.00%> (-6.72%)`	⬇️
ubuntu-latest-python3.10	`83.66% <0.00%> (-6.87%)`	⬇️
ubuntu-latest-python3.11	`83.66% <0.00%> (-6.87%)`	⬇️
ubuntu-latest-python3.12	`83.73% <0.00%> (-6.80%)`	⬇️
ubuntu-latest-python3.9	`83.71% <0.00%> (-6.81%)`	⬇️
windows-latest-python3.12	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

piotrgramacki · 2025-01-03T12:38:19Z

srai/datasets/_base.py

+        data = load_dataset(dataset_name, version, token=hf_token, trust_remote_code=True)
+        processed_data = self._preprocessing(data)
+
+        return processed_data


I would add proper index name compatible with loaders in terms of using it later with srai

piotrgramacki · 2025-01-17T21:57:58Z

srai/datasets/_base.py


-            averages_hex = joined_gdf.groupby("region_id").size().reset_index(name=target_column)


Instead, I would set the index name in the loaded dataset to "region_id" to keep it aligned with the rest of our loaders.

I meant "feature_id" of course ;)

mskaa3 and others added 6 commits January 2, 2025 17:20

feat: added initial files for philadelphia crime dataset

b3750ff

feat: added initial files for chciago crime dataset

0bb6263

feat: added initial files for police incidents dataset

e240707

feat: added initial files for airbnb dataset & updated load function …

b8344dc

…docstring in previous datasets

feat: added initial files for house sales in king county dataset

cbf930a

Merge branch 'main' into develop

bd9ad65

piotrgramacki reviewed Jan 3, 2025

View reviewed changes

srai/loaders/huggingface_loader.py Outdated Show resolved Hide resolved

feat: added required libraries for dataset module

3afa07e

piotrgramacki added 2 commits January 3, 2025 13:27

feat: remove HF loader

9f0e979

fix: update lockfile with new pdm

daa05f8

piotrgramacki reviewed Jan 3, 2025

View reviewed changes

piotrgramacki and others added 6 commits January 3, 2025 13:46

fix: update lockfile platform

1c1b0b5

fix: pdm lock now working?

eea4e88

chore: simplify directory structure for datasets

ed5a0d0

fix: fixed loading train/test splits & corrected dataset attributes

00e6daa

fix: deleted unneccesary code

4306636

fix: fixed train test split

9593a42

piotrgramacki reviewed Jan 17, 2025

View reviewed changes

feat: added funtion to retrieve h3 indexes with target labels

f63fe1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets module #492

Datasets module #492

mskaa3 commented Jan 3, 2025

codecov bot commented Jan 3, 2025 •

edited

Loading

piotrgramacki Jan 3, 2025

piotrgramacki Jan 17, 2025

piotrgramacki Jan 17, 2025


		averages_hex = joined_gdf.groupby("region_id").size().reset_index(name=target_column)

Datasets module #492

Are you sure you want to change the base?

Datasets module #492

Conversation

mskaa3 commented Jan 3, 2025

codecov bot commented Jan 3, 2025 • edited Loading

Codecov Report

piotrgramacki Jan 3, 2025

Choose a reason for hiding this comment

piotrgramacki Jan 17, 2025

Choose a reason for hiding this comment

piotrgramacki Jan 17, 2025

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2025 •

edited

Loading