Khiops v11: Scaling New Heights in Automation, Scalability, and Interpretability #546

lucaurelien · 2025-01-23T15:02:09Z

lucaurelien
Jan 23, 2025
Maintainer

We’re excited to unveil some key features of our next major release of Khiops (v11), true to our DNA of offering automation, scalability, and interpretation. This major release introduces powerful capabilities to tackle real-world data science challenges with even greater precision.

Here’s a teaser of three standout features, all new features will be shared in the upcoming release notes.

1 - Instance-Level Interpretation

In addition to the global model-level interpretability, Khiops now introduces instance-based interpretation, a game-changing feature for domains requiring case-by-case analysis.

Users can now exploit precise explanations for individual predictions, an essential capability for tasks like fraud detection, churn prediction, or root cause analysis in manufacturing.

At the core of this feature is the exact computation of Shapley values, quantifying the contribution of each variable to an individual prediction.

Reference:

V. Lemaire, F. Clérot, M. Boullé. An Efficient Shapley Value Computation for the Naive Bayes Classifier. In European Conference on Machine Learning (ECML PKDD) - Workshop AIMLAI (Advances in Interpretable Machine Learning and Artificial Intelligence), 2023

2 - Text Data

Khiops v11 takes a major leap forward with native support for text data, enabling the use of verbatims in tabular and multi-table datasets. By automating the transformation of raw text into meaningful features, Khiops eliminates the need for manual preprocessing or feature engineering, making the process faster, easier, and fully interpretable.

Khiops offers three methods:

Words: Default automatic tokenization
N-grams: Byte-level tokenization for blob-like variables
Tokens: open to user-defined tokenization.

This new feature is particularly valuable for scenarios where text data is embedded within tabular datasets and can complement other variables. While Khiops’ approach to text data is not designed to replace specialized models (e.g. LLMs), it provides a lightweight, automated, and interpretable solution for incorporating textual insights into tabular analyses, with minimal effort.

3 - Optimal Histograms

Khiops v11 enhances data visualization with optimal histograms, providing fully automatic and scalable solutions for univariate data exploration. Both the number and placement of bins are determined using the Minimum Description Length (MDL) principle, ensuring the binning reflects the structure of the data without overfitting. This approach is resistant to outliers and heavy-tailed distributions, enabling accurate density estimation and meaningful pattern discovery in large-scale datasets. By summarizing univariate distributions effectively, these histograms are indispensable for gaining deeper insights into your data.

Reference:

M. Boullé: Floating-point histograms for exploratory analysis of large-scale real-world data sets. Intelligent Data Analysis, 28(5): 1347-1394, 2024.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Khiops

Khiops v11: Scaling New Heights in Automation, Scalability, and Interpretability #546

{{title}}

Replies: 0 comments

Select a reply

Khiops

Khiops v11: Scaling New Heights in Automation, Scalability, and Interpretability #546

lucaurelien Jan 23, 2025 Maintainer

1 - Instance-Level Interpretation

2 - Text Data

3 - Optimal Histograms

Replies: 0 comments

lucaurelien
Jan 23, 2025
Maintainer