Khiops v11: Scaling New Heights in Automation, Scalability, and Interpretability #546
lucaurelien
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We’re excited to unveil some key features of our next major release of Khiops (v11), true to our DNA of offering automation, scalability, and interpretation. This major release introduces powerful capabilities to tackle real-world data science challenges with even greater precision.
Here’s a teaser of three standout features, all new features will be shared in the upcoming release notes.
1 - Instance-Level Interpretation
In addition to the global model-level interpretability, Khiops now introduces instance-based interpretation, a game-changing feature for domains requiring case-by-case analysis.
Users can now exploit precise explanations for individual predictions, an essential capability for tasks like fraud detection, churn prediction, or root cause analysis in manufacturing.
At the core of this feature is the exact computation of Shapley values, quantifying the contribution of each variable to an individual prediction.
Reference:
2 - Text Data
Khiops v11 takes a major leap forward with native support for text data, enabling the use of verbatims in tabular and multi-table datasets. By automating the transformation of raw text into meaningful features, Khiops eliminates the need for manual preprocessing or feature engineering, making the process faster, easier, and fully interpretable.
Khiops offers three methods:
This new feature is particularly valuable for scenarios where text data is embedded within tabular datasets and can complement other variables. While Khiops’ approach to text data is not designed to replace specialized models (e.g. LLMs), it provides a lightweight, automated, and interpretable solution for incorporating textual insights into tabular analyses, with minimal effort.
3 - Optimal Histograms
Khiops v11 enhances data visualization with optimal histograms, providing fully automatic and scalable solutions for univariate data exploration. Both the number and placement of bins are determined using the Minimum Description Length (MDL) principle, ensuring the binning reflects the structure of the data without overfitting. This approach is resistant to outliers and heavy-tailed distributions, enabling accurate density estimation and meaningful pattern discovery in large-scale datasets. By summarizing univariate distributions effectively, these histograms are indispensable for gaining deeper insights into your data.
Reference:
Beta Was this translation helpful? Give feedback.
All reactions