Skip to content

Commit

Permalink
Merge pull request #180 from KhiopsML/171-implement-multi-table-helpe…
Browse files Browse the repository at this point in the history
…r-functions

171 implement multi table helper functions
  • Loading branch information
folmos-at-orange authored Sep 16, 2024
2 parents 1518484 + 558866c commit 4ae7f30
Show file tree
Hide file tree
Showing 32 changed files with 3,982 additions and 3,518 deletions.
3 changes: 3 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,16 @@ repos:
rev: 0.29.0
hooks:
- id: check-github-workflows
name: gh-workflows
args: [--verbose]
- id: check-github-actions
name: gh-actions
args: [--verbose]
- repo: https://github.com/jumanjihouse/pre-commit-hooks
rev: 3.0.0
hooks:
- id: shellcheck
name: shellcheck
- repo: local
hooks:
- id: samples-generation
Expand Down
10 changes: 9 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,17 @@
- Example: 10.2.1.4 is the 5th version that supports khiops 10.2.1.
- Internals: Changes in *Internals* sections are unlikely to be of interest for data scientists.

## 10.2.2.5 - Unreleased

### Added

- (General) `train_test_split_dataset` helper function to ease the splitting in train/test for
multi-table datasets.
- (General) `sort_dataset` helper function to ease the sorting by key of multi-table datasets.

## 10.2.2.4 - 2024-08-05

## Added
### Added
- (`sklearn`) Sklearn's attributes for supervised estimators.

## 10.2.2.3 - 2024-08-02
Expand Down
2 changes: 1 addition & 1 deletion doc/convert_samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def create_rest_page_header(script_name):
subtitle += ":py:mod:`khiops.core` module."
else:
title = "Samples sklearn"
subtitle += ":py:mod:`khiops.sklearn` module."
subtitle += ":py:mod:`khiops.sklearn <khiops.sklearn.estimators>` module."
return (
":orphan:\n"
"\n"
Expand Down
12 changes: 6 additions & 6 deletions doc/core/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ Main Modules
:recursive:
:nosignatures:

khiops.core.api
khiops.core.dictionary
khiops.core.analysis_results
khiops.core.coclustering_results
khiops.core.exceptions
khiops.core.helpers
api
dictionary
analysis_results
coclustering_results
exceptions
helpers
11 changes: 4 additions & 7 deletions doc/create-doc
Original file line number Diff line number Diff line change
Expand Up @@ -90,21 +90,18 @@ fi

# Create the coursework materials
echo "Creating ZIP files"
(cd "$KHIOPS_TUTORIAL_REPO_DIR" && cp -r data helper_functions.py "../$tutorials_dir")
cd "$tutorials_dir"
mkdir -p exercises
touch exercises/.dummy # Create a dummy so the "exercises" directory is created on unzip
zip "core_tutorials_solutions.zip" Core*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "sklearn_tutorials_solutions.zip" Sklearn*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "core_tutorials_solutions.zip" Core*.ipynb data/*/* exercises/.dummy
zip "sklearn_tutorials_solutions.zip" Sklearn*.ipynb data/*/* exercises/.dummy
cd "$KHIOPS_TUTORIAL_REPO_DIR"
python create-coursework.py
cd coursework
mkdir -p exercises
touch exercises/.dummy # Create a dummy so the "exercises" directory is created on unzip
zip "../../$tutorials_dir/core_tutorials.zip" \
Core*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "../../$tutorials_dir/sklearn_tutorials.zip" \
Sklearn*.ipynb helper_functions.py data/*/* exercises/.dummy
zip "../../$tutorials_dir/core_tutorials.zip" Core*.ipynb data/*/* exercises/.dummy
zip "../../$tutorials_dir/sklearn_tutorials.zip" Sklearn*.ipynb data/*/* exercises/.dummy
cd "../.."

# Create the documentation with Sphinx
Expand Down
24 changes: 15 additions & 9 deletions doc/internal/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@ Internals
These are internal modules with no "data science" functionality. Their documentation is available
for completeness.

.. currentmodule:: khiops.utils
.. autosummary::
:nosignatures:
:toctree: generated

khiops.sklearn.tables
khiops.core.internals.common
khiops.core.internals.filesystems
khiops.core.internals.io
khiops.core.internals.runner
khiops.core.internals.scenario
khiops.core.internals.task
khiops.core.internals.types
khiops.core.internals.version
dataset

.. currentmodule:: khiops.core.internals
.. autosummary::
:nosignatures:
:toctree: generated

common
filesystems
io
runner
scenario
task
types
version
28 changes: 28 additions & 0 deletions doc/samples/samples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1185,6 +1185,34 @@ Samples
output_data_table_path,
sort_variables=["AccidentId", "VehicleId"],
)
.. autofunction:: sort_data_tables_mt
.. code-block:: python
# Imports
import os
from khiops.utils.helpers import sort_dataset
# Set the file paths
accidents_dir = os.path.join(kh.get_samples_dir(), "Accidents")
accidents_table_path = os.path.join(accidents_dir, "Accidents.txt")
vehicles_table_path = os.path.join(accidents_dir, "Vehicles.txt")
users_table_path = os.path.join(accidents_dir, "Users.txt")
places_table_path = os.path.join(accidents_dir, "Places.txt")
results_dir = os.path.join("kh_samples", "sort_data_tables_mt")
# Build the dataset spec
ds_spec = {
"main_table": "Accidents",
"tables": {
"Accidents": (accidents_table_path, "AccidentId"),
"Vehicles": (vehicles_table_path, ["AccidentId", "VehicleId"]),
"Users": (users_table_path, ["AccidentId", "VehicleId"]),
"Places": (places_table_path, "AccidentId"),
},
}
# Sort the dataset
sort_dataset(ds_spec, output_dir=results_dir)
.. autofunction:: extract_keys_from_data_table
.. code-block:: python
Expand Down
Loading

0 comments on commit 4ae7f30

Please sign in to comment.