Expose Dataset API #158
Labels
Priority/1-Medium
To do after P0
Status/Draft
The issue is still not well defined
Type/Feature
A new feature request or an improvement of a feature
Description
Currently the
Dataset
class is an internal utility for thesklearn
module. The idea is to render this class public so it is an utility to create multi-table datasets.Questions/Ideas
This feature would ease many tasks:
additional_data_tables
)get_dataset_sample("Accidents", type="pandas")
Main design element: A builder pattern.
Dataset()
)PandasDataset
FileDataset
add_table(self, name, source, key=None)
key
mandatory for multi-tablesource
will be different in eachDataset
subclasstrain_test_split
(implemented inPandasDataset
only)sort
sorts the dataset by their keys (implemented inFileDataset
only)create_khiops_dictionary_domain
create_additional_data_table_param
add_relation(self, parent_table_name, child_table_name, one_to_one=False)
remove_table(self, name)
remove_relation(self, parent_table_name, child_table_name)
check(self)
:add_external_relation(self, parent_table_name, key, another_dataset)
Design questions:
check
be called at each mutator call ?check
the consistency before using it ?FileDataset
:train_predictor_ds(ds, target_variable_name, output_dir, <kwargs without additional_data_tables, header_line, field_separator>)
deploy_model_ds(model_kdic, ds, output_dir, <kwargs - additional_data_tables, header_line, field_separator> )
The text was updated successfully, but these errors were encountered: