Skip to content

Commit

Permalink
move essential README.md into docs/*.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
rolandrmgservices committed Dec 7, 2023
1 parent 028c63c commit 4b81be0
Show file tree
Hide file tree
Showing 3 changed files with 109 additions and 239 deletions.
221 changes: 13 additions & 208 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
# Disclaimer
This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to change.


# Causal ML: A Python Package for Uplift Modeling and Causal Inference with ML

**Causal ML** is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent
Expand All @@ -25,230 +26,34 @@ research [[1]](#Literature). It provides a standard interface that allows user t

* **Personalized engagement**: A company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

The package currently supports the following methods

* **Tree-based algorithms**
* Uplift tree/random forests on KL divergence, Euclidean Distance, and Chi-Square [[2]](#Literature)
* Uplift tree/random forests on Contextual Treatment Selection [[3]](#Literature)
* Uplift tree/random forests on DDP [[4]](#Literature)
* Uplift tree/random forests on IDDP [[5]](#Literature)
* Interaction Tree [[6]](#Literature)
* Conditional Interaction Tree [[7]](#Literature)
* Causal Tree [[8]](#Literature) - Work-in-progress
* **Meta-learner algorithms**
* S-learner [[9]](#Literature)
* T-learner [[9]](#Literature)
* X-learner [[9]](#Literature)
* R-learner [[10]](#Literature)
* Doubly Robust (DR) learner [[11]](#Literature)
* TMLE learner [[12]](#Literature)
* **Instrumental variables algorithms**
* 2-Stage Least Squares (2SLS)
* Doubly Robust (DR) IV [[13]](#Literature)
* **Neural-network-based algorithms**
* CEVAE [[14]](#Literature)
* DragonNet [[15]](#Literature) - with `causalml[tf]` installation (see [Installation](#installation))


# Installation

Installation with `conda` is recommended.

`conda` environment files for Python 3.7, 3.8 and 3.9 are available in the repository. To use models under the `inference.tf` module (e.g. `DragonNet`), additional dependency of `tensorflow` is required. For detailed instructions, see below.

## Install using `conda`:

Install `conda` with:

```
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.5.0-3-Linux-x86_64.sh
bash Miniconda3-py38_23.5.0-3-Linux-x86_64.sh -b
source miniconda3/bin/activate
conda init
source ~/.bashrc
```

### Install from `conda-forge`
Directly install from the conda-forge channel using conda.

```sh
conda install -c conda-forge causalml
```

### Install with the `conda` virtual environment
This will create a new `conda` virtual environment named `causalml-[tf-]py3x`, where `x` is in `[6, 7, 8, 9]`. e.g. `causalml-py37` or `causalml-tf-py38`. If you want to change the name of the environment, update the relevant YAML file in `envs/`

```bash
git clone https://github.com/uber/causalml.git
cd causalml/envs/
conda env create -f environment-py38.yml # for the virtual environment with Python 3.8 and CausalML
conda activate causalml-py38
(causalml-py38)
```

### Install `causalml` with `tensorflow`
```bash
git clone https://github.com/uber/causalml.git
cd causalml/envs/
conda env create -f environment-tf-py38.yml # for the virtual environment with Python 3.8 and CausalML
conda activate causalml-tf-py38
(causalml-tf-py38) pip install -U numpy # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
```

## Install from `PyPI`:

```bash
pip install causalml
```

### Install `causalml` with `tensorflow`
```bash
pip install causalml[tf]
pip install -U numpy # this step is necessary to fix [#338](https://github.com/uber/causalml/issues/338)
```

## Install from source:

### Create a clean conda environment

```
conda create -n causalml-py38 -y python=3.8
conda activate causalml-py38
conda install -c conda-forge cxx-compiler
conda install python-graphviz
conda install -c conda-forge xorg-libxrender
```

Then:

```bash
git clone https://github.com/uber/causalml.git
cd causalml
pip install .
python setup.py build_ext --inplace
```

with `tensorflow`:

```bash
pip install .[tf]
```
# Documentation

Documentation is available at:

# Quick Start
https://causalml.readthedocs.io/en/latest/about.html

## Average Treatment Effect Estimation with S, T, X, and R Learners

```python
from causalml.inference.meta import LRSRegressor
from causalml.inference.meta import XGBTRegressor, MLPTRegressor
from causalml.inference.meta import BaseXRegressor
from causalml.inference.meta import BaseRRegressor
from xgboost import XGBRegressor
from causalml.dataset import synthetic_data

y, X, treatment, _, _, e = synthetic_data(mode=1, n=1000, p=5, sigma=1.0)

lr = LRSRegressor()
te, lb, ub = lr.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Linear Regression): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(X, treatment, y)
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

nn = MLPTRegressor(hidden_layer_sizes=(10, 10),
learning_rate_init=.1,
early_stopping=True,
random_state=42)
te, lb, ub = nn.estimate_ate(X, treatment, y)
print('Average Treatment Effect (Neural Network (MLP)): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

xl = BaseXRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = xl.estimate_ate(X, treatment, y, e)
print('Average Treatment Effect (BaseXRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

rl = BaseRRegressor(learner=XGBRegressor(random_state=42))
te, lb, ub = rl.estimate_ate(X=X, p=e, treatment=treatment, y=y)
print('Average Treatment Effect (BaseRRegressor using XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))
```

See the [Meta-learner example notebook](https://github.com/uber/causalml/blob/master/docs/examples/meta_learners_with_synthetic_data.ipynb) for details.


## Interpretable Causal ML

Causal ML provides methods to interpret the treatment effect models trained as follows:

### Meta Learner Feature Importances

```python
from causalml.inference.meta import BaseSRegressor, BaseTRegressor, BaseXRegressor, BaseRRegressor
from causalml.dataset.regression import synthetic_data

# Load synthetic data
y, X, treatment, tau, b, e = synthetic_data(mode=1, n=10000, p=25, sigma=0.5)
w_multi = np.array(['treatment_A' if x==1 else 'control' for x in treatment]) # customize treatment/control names

slearner = BaseSRegressor(LGBMRegressor(), control_name='control')
slearner.estimate_ate(X, w_multi, y)
slearner_tau = slearner.fit_predict(X, w_multi, y)

model_tau_feature = RandomForestRegressor() # specify model for model_tau_feature

slearner.get_importance(X=X, tau=slearner_tau, model_tau_feature=model_tau_feature,
normalize=True, method='auto', features=feature_names)

# Using the feature_importances_ method in the base learner (LGBMRegressor() in this example)
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='auto')

# Using eli5's PermutationImportance
slearner.plot_importance(X=X, tau=slearner_tau, normalize=True, method='permutation')
# Installation

# Using SHAP
shap_slearner = slearner.get_shap_values(X=X, tau=slearner_tau)
Installation instructions are available at:

# Plot shap values without specifying shap_dict
slearner.plot_shap_values(X=X, tau=slearner_tau)
https://causalml.readthedocs.io/en/latest/installation.html

# Plot shap values WITH specifying shap_dict
slearner.plot_shap_values(X=X, shap_dict=shap_slearner)

# interaction_idx set to 'auto' (searches for feature with greatest approximate interaction)
slearner.plot_shap_dependence(treatment_group='treatment_A',
feature_idx=1,
X=X,
tau=slearner_tau,
interaction_idx='auto')
```
<div align="center">
<img width="629px" height="618px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/shap_vis.png">
</div>
# Quickstart

See the [feature interpretations example notebook](https://github.com/uber/causalml/blob/master/docs/examples/feature_interpretations_example.ipynb) for details.
Quickstarts with code-snippets are available at:

### Uplift Tree Visualization
https://causalml.readthedocs.io/en/latest/quickstart.html

```python
from IPython.display import Image
from causalml.inference.tree import UpliftTreeClassifier, UpliftRandomForestClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot

uplift_model = UpliftTreeClassifier(max_depth=5, min_samples_leaf=200, min_samples_treatment=50,
n_reg=100, evaluationFunction='KL', control_name='control')
# Example Notebooks

uplift_model.fit(df[features].values,
treatment=df['treatment_group_key'].values,
y=df['conversion'].values)
Example notebooks are available at:

graph = uplift_tree_plot(uplift_model.fitted_uplift_tree, features)
Image(graph.create_png())
```
<div align="center">
<img width="800px" height="479px" src="https://raw.githubusercontent.com/uber/causalml/master/docs/_static/img/uplift_tree_vis.png">
</div>
https://causalml.readthedocs.io/en/latest/examples.html

See the [Uplift Tree visualization example notebook](https://github.com/uber/causalml/blob/master/docs/examples/uplift_tree_visualization.ipynb) for details.

# Contributing

Expand Down
99 changes: 68 additions & 31 deletions docs/about.rst
Original file line number Diff line number Diff line change
@@ -1,36 +1,73 @@
About Causal ML
About CausalML
===========================

``Causal ML`` is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research.
It provides a standard interface that allows user to estimate the **Conditional Average Treatment Effect** (CATE) or **Individual Treatment Effect** (ITE) from experimental or observational data.
Essentially, it estimates the causal impact of intervention **T** on outcome **Y** for users with observed features **X**, without strong assumptions on the model form.
``CausalML`` is a Python package that provides a suite of uplift modeling and causal inference methods using machine learning algorithms based on recent research.
It provides a standard interface that allows user to estimate the **Conditional Average Treatment Effect** (CATE), also known as **Individual Treatment Effect** (ITE), from experimental or observational data.
Essentially, it estimates the causal impact of intervention **W** on outcome **Y** for users with observed features **X**, without strong assumptions on the model form.

Typical use cases include:
GitHub Repo
-----------

https://github.com/uber/causalml

Mission
-------

From the CausalML `Charter <https://github.com/uber/causalml/blob/master/CHARTER.md>`_:

CausalML is committed to democratizing causal machine learning through accessible, innovative, and well-documented open-source tools that empower data scientists, researchers, and organizations. At our core, we embrace inclusivity and foster a vibrant community where members exchange ideas, share knowledge, and collaboratively shape a future where CausalML drives advancements across diverse domains.

Contributing
------------
`Contributing.md <https://github.com/uber/causalml/blob/master/CONTRIBUTING.md>`_

Governance
----------
* `Charter <https://github.com/uber/causalml/blob/master/CHARTER.md>`_
* `Contributors <https://github.com/uber/causalml/graphs/contributors>`_
* `Maintainers <https://github.com/uber/causalml/blob/master/MAINTAINERS.md>`_

Intro to Causal Machine Learning
================================

What is Causal Machine Learning?
--------------------------------

Causal machine learning is a branch of machine learning that focuses on understanding the cause and effect relationships in data. It goes beyond just predicting outcomes based on patterns in the data, and tries to understand how changing one variable can affect an outcome.
Suppose we are trying to predict a student’s test score based on how many hours they study and how much sleep they get. Traditional machine learning models would find patterns in the data, like students who study more or sleep more tend to get higher scores.
But what if you want to know what would happen if a student studied an extra hour each day? Or slept an extra hour each night? Modeling these potential outcomes or counterfactuals is where causal machine learning comes in. It tries to understand cause-and-effect relationships - how much changing one variable (like study hours or sleep hours) will affect the outcome (the test score).
This is useful in many fields, including economics, healthcare, and policy making, where understanding the impact of interventions is crucial.
While traditional machine learning is great for prediction, causal machine learning helps us understand the difference in outcomes due to interventions.



Difference from Traditional Machine Learning
--------------------------------------------

Traditional machine learning and causal machine learning are both powerful tools, but they serve different purposes and answer different types of questions.
Traditional Machine Learning is primarily concerned with prediction. Given a set of input features, it learns a function from the data that can predict an outcome. It’s great at finding patterns and correlations in large datasets, but it doesn’t tell us about the cause-and-effect relationships between variables. It answers questions like “Given a patient’s symptoms, what disease are they likely to have?”
On the other hand, Causal Machine Learning is concerned with understanding the cause-and-effect relationships between variables. It goes beyond prediction and tries to answer questions about intervention: “What will happen if we change this variable?” For example, in a medical context, it could help answer questions like “What will happen if a patient takes this medication?”
In essence, while traditional machine learning can tell us “what is”, causal machine learning can help us understand “what if”. This makes causal machine learning particularly useful in fields where we need to make decisions based on data, such as policy making, economics, and healthcare.


Measuring Causal Effects
------------------------

Different causal effects can be measured using varying techniques.

**Randomized Control Trials (RCT)** are the gold standard for causal effect measurements. Subjects are randomly exposed to a treatment and the Average Treatment Effect (ATE) is measured as the difference between the mean effects in the treatment and control groups. Random assignment removes the effect of any confounders on the treatment.

**Instrumental Variables (IV)** is a technique in which subjects are randomly exposed to a variable that influences treatment, but has no direct effect on the outcome.

An example of an instrumental variable is a streamlined sign-up page that allows an Uber user to sign up for Uber Eats. Not all subjects will sign up for Uber Eats, but the streamlined sign-up makes it easier for them to experience both Uber and Uber Eats. Thereafter, the subject’s outcome is unrelated to their signup experience. In other words, the streamlined signup influences the treatment, not the outcome.

If RCTs are not an option and hence confounders cannot be controlled for, the next best option is to attempt to **control for confounders** and measure the Conditional Average Treatment Effect (CATE). The CATE is an estimate of the treatment effect conditioned on all available covariates and confounders. Even if an RCT is available, if the treatment effects are heterogeneous across covariates, it might be preferable to measure the CATE. We call these Heterogeneous Treatment Effects (HTEs).


Example Use Cases
-----------------

- **Campaign Targeting Optimization**: An important lever to increase ROI in an advertising campaign is to target the ad to the set of customers who will have a favorable response in a given KPI such as engagement or sales. CATE identifies these customers by estimating the effect of the KPI from ad exposure at the individual level from A/B experiment or historical observational data.
- **Personalized Engagement**: Company has multiple options to interact with its customers such as different product choices in up-sell or messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized recommendation system.

The package currently supports the following methods:

- Tree-based algorithms
- :ref:`Uplift Random Forests <Uplift Tree>` on KL divergence, Euclidean Distance, and Chi-Square
- :ref:`Uplift Random Forests <Uplift Tree>` on Contextual Treatment Selection
- :ref:`Uplift Random Forests <DDP>` on delta-delta-p (:math:`\Delta\Delta P`) criterion (only for binary trees and two-class problems)
- :ref:`Uplift Random Forests <IDDP>` on IDDP (only for binary trees and two-class problems)
- :ref:`Interaction Tree <IT>` (only for binary trees and two-class problems)
- :ref:`Causal Inference Tree <CIT>` (only for binary trees and two-class problems)
- Meta-learner algorithms
- :ref:`S-learner`
- :ref:`T-learner`
- :ref:`X-learner`
- :ref:`R-learner`
- :ref:`Doubly Robust (DR) learner`
- Instrumental variables algorithms
- :ref:`2-Stage Least Squares (2SLS)`
- :ref:`Doubly Robust Instrumental Variable (DRIV) learner`
- Neural network based algorithms
- CEVAE
- DragonNet
- Treatment optimization algorithms
- :ref:`Counterfactual Unit Selection`
- :ref:`Counterfactual Value Estimator`

- **Personalized Engagement**: A company might have multiple options to interact with its customers such as different product choices in up-sell or different messaging channels for communications. One can use CATE to estimate the heterogeneous treatment effect for each customer and treatment option combination for an optimal personalized engagement experience.

28 changes: 28 additions & 0 deletions docs/methodology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,34 @@
Methodology
===========

Supported Algorithms
--------------------
CausalML currently supports the following methods:

- Tree-based algorithms
- :ref:`Uplift Random Forests <Uplift Tree>` on KL divergence, Euclidean Distance, and Chi-Square
- :ref:`Uplift Random Forests <Uplift Tree>` on Contextual Treatment Selection
- :ref:`Uplift Random Forests <DDP>` on delta-delta-p (:math:`\Delta\Delta P`) criterion (only for binary trees and two-class problems)
- :ref:`Uplift Random Forests <IDDP>` on IDDP (only for binary trees and two-class problems)
- :ref:`Interaction Tree <IT>` (only for binary trees and two-class problems)
- :ref:`Causal Inference Tree <CIT>` (only for binary trees and two-class problems)
- Meta-learner algorithms
- :ref:`S-learner`
- :ref:`T-learner`
- :ref:`X-learner`
- :ref:`R-learner`
- :ref:`Doubly Robust (DR) learner`
- Instrumental variables algorithms
- :ref:`2-Stage Least Squares (2SLS)`
- :ref:`Doubly Robust Instrumental Variable (DRIV) learner`
- Neural network based algorithms
- CEVAE
- DragonNet
- Treatment optimization algorithms
- :ref:`Counterfactual Unit Selection`
- :ref:`Counterfactual Value Estimator`


Meta-Learner Algorithms
-----------------------

Expand Down

0 comments on commit 4b81be0

Please sign in to comment.