Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

171 implement multi table helper functions #180

Merged
merged 12 commits into from
Sep 16, 2024
Merged
3 changes: 3 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,16 @@ repos:
rev: 0.29.0
hooks:
- id: check-github-workflows
name: gh-workflows
args: [--verbose]
- id: check-github-actions
name: gh-actions
popescu-v marked this conversation as resolved.
Show resolved Hide resolved
args: [--verbose]
- repo: https://github.com/jumanjihouse/pre-commit-hooks
rev: 3.0.0
hooks:
- id: shellcheck
name: shellcheck
- repo: local
hooks:
- id: samples-generation
Expand Down
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,17 @@
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
- Internals: Changes in *Internals* sections are unlikely to be of interest for data scientists.

## 10.2.2.5 - Unreleased

### Added

- (General) `train_test_split_dataset` helper function to ease the splitting in train/test for
multi-table datasets.
- (General) `sort_dataset` helper function to ease the sorting by key of multi-table datasets.
popescu-v marked this conversation as resolved.
Show resolved Hide resolved

## 10.2.2.4 - 2024-08-05

## Added
### Added
- (`sklearn`) Sklearn's attributes for supervised estimators.

## 10.2.2.3 - 2024-08-02
Expand Down
2 changes: 1 addition & 1 deletion doc/convert_samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def create_rest_page_header(script_name):
subtitle += ":py:mod:`khiops.core` module."
else:
title = "Samples sklearn"
subtitle += ":py:mod:`khiops.sklearn` module."
subtitle += ":py:mod:`khiops.sklearn <khiops.sklearn.estimators>` module."
return (
":orphan:\n"
"\n"
Expand Down
12 changes: 6 additions & 6 deletions doc/core/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ Main Modules
:recursive:
:nosignatures:

khiops.core.api
khiops.core.dictionary
khiops.core.analysis_results
khiops.core.coclustering_results
khiops.core.exceptions
khiops.core.helpers
api
dictionary
analysis_results
coclustering_results
exceptions
helpers
11 changes: 4 additions & 7 deletions doc/create-doc
Original file line number Diff line number Diff line change
Expand Up @@ -90,21 +90,18 @@ fi

# Create the coursework materials
echo "Creating ZIP files"
(cd "$KHIOPS_TUTORIAL_REPO_DIR" && cp -r data helper_functions.py "../$tutorials_dir")
cd "$tutorials_dir"
mkdir -p exercises
touch exercises/.dummy # Create a dummy so the "exercises" directory is created on unzip
zip "core_tutorials_solutions.zip" Core*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "sklearn_tutorials_solutions.zip" Sklearn*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "core_tutorials_solutions.zip" Core*.ipynb data/*/* exercises/.dummy
zip "sklearn_tutorials_solutions.zip" Sklearn*.ipynb data/*/* exercises/.dummy
cd "$KHIOPS_TUTORIAL_REPO_DIR"
python create-coursework.py
cd coursework
mkdir -p exercises
touch exercises/.dummy # Create a dummy so the "exercises" directory is created on unzip
zip "../../$tutorials_dir/core_tutorials.zip" \
Core*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "../../$tutorials_dir/sklearn_tutorials.zip" \
Sklearn*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "../../$tutorials_dir/core_tutorials.zip" Core*.ipynb data/*/* exercises/.dummy
zip "../../$tutorials_dir/sklearn_tutorials.zip" Sklearn*.ipynb data/*/* exercises/.dummy
cd "../.."

# Create the documentation with Sphinx
Expand Down
24 changes: 15 additions & 9 deletions doc/internal/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@ Internals
These are internal modules with no "data science" functionality. Their documentation is available
for completeness.

.. currentmodule:: khiops.utils
.. autosummary::
:nosignatures:
:toctree: generated

khiops.sklearn.tables
khiops.core.internals.common
khiops.core.internals.filesystems
khiops.core.internals.io
khiops.core.internals.runner
khiops.core.internals.scenario
khiops.core.internals.task
khiops.core.internals.types
khiops.core.internals.version
dataset

.. currentmodule:: khiops.core.internals
.. autosummary::
:nosignatures:
:toctree: generated

common
filesystems
io
runner
scenario
task
types
version
28 changes: 28 additions & 0 deletions doc/samples/samples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1185,6 +1185,34 @@ Samples
output_data_table_path,
sort_variables=["AccidentId", "VehicleId"],
)
.. autofunction:: sort_data_tables_mt
.. code-block:: python
# Imports
import os
from khiops.utils.helpers import sort_dataset
# Set the file paths
accidents_dir = os.path.join(kh.get_samples_dir(), "Accidents")
accidents_table_path = os.path.join(accidents_dir, "Accidents.txt")
vehicles_table_path = os.path.join(accidents_dir, "Vehicles.txt")
users_table_path = os.path.join(accidents_dir, "Users.txt")
places_table_path = os.path.join(accidents_dir, "Places.txt")
results_dir = os.path.join("kh_samples", "sort_data_tables_mt")
# Build the dataset spec
ds_spec = {
"main_table": "Accidents",
"tables": {
"Accidents": (accidents_table_path, "AccidentId"),
"Vehicles": (vehicles_table_path, ["AccidentId", "VehicleId"]),
"Users": (users_table_path, ["AccidentId", "VehicleId"]),
"Places": (places_table_path, "AccidentId"),
},
}
# Sort the dataset
sort_dataset(ds_spec, output_dir=results_dir)
.. autofunction:: extract_keys_from_data_table
.. code-block:: python
Expand Down
Loading