New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adding a new data generation function based on logistic regression #166

Closed

zhenyuz0500 wants to merge 4 commits into master from data_generation_function_logistic

Collaborator

zhenyuz0500 commented Apr 16, 2020

Adding a new data generation function for classification uplift problem.
This data generation function is based on logistic regression.
The advantages of this new function is to enable better control over feature importance and feature pattern.
In the previous data generation function, if we specify M number of informative features, then in the end the actual number of informative features is m < M, many informative features behave similarly to irrelevant features. In this function, all informative features would show a clear pattern for impacting the outcome.
In addition, now we can specify the uplift feature patterns: the uplift feature pattern can be one of the following ['linear','quadratic','cubic','relu','sin','cos']. For example, if the feature pattern is 'quadratic', then the treatment effect will increase or decrease quadratically with the feature.

zhenyuz0500 added 2 commits

April 15, 2020 18:53


          adding new data generation function based on logistic regression for …

d886db7

…classification


          adding new data generation function: modify docstring

bd9a8a3

zhenyuz0500 requested review from t-tte and jeongyoonlee

April 16, 2020 02:09

zhenyuz0500 added 2 commits

April 17, 2020 09:41


          adding example notebook for DGP

1c0bb0e


          adding example notebook for DGP

2bd6de1

jeongyoonlee requested changes

View reviewed changes

causalml/dataset/classification.py

+                  x = np.array(x)
+                  return np.maximum(x, 0)
+              # @staticmethod

Collaborator

jeongyoonlee Apr 19, 2020

We don't need the @staticmethod decorator for functions that are not a member function of a class.

causalml/dataset/classification.py

Comment on lines +118 to +122

+                  if f_index<len(fs):
+                      fi = [f_index]
+                  else:
+                      fi = np.random.choice(range(len(fs)), 1)
+                  y = fs[fi[0]](x)

Collaborator

jeongyoonlee Apr 19, 2020

We can simplify this block by using np.asscalar() and pass N instead of range(N) to np.random.choice() as follows:

try:
    y = fs[f_index](x)
except IndexError:
    y = fs[np.asscalar(np.random.choice(len(fs), 1))[x]

I also use try...IndexError to handle the case f_index < 0, which will fall into the if block even when it's not a valid index.

causalml/dataset/classification.py

+                  return res
+              # ------ Data generation function (V2) using logistic regression as underlying model
+              def make_uplift_classification_logistic(

Collaborator

jeongyoonlee Apr 19, 2020

What about add it to make_uplift_classification() with an additional input arg, e.g. use_logistic=True?

causalml/dataset/classification.py

Comment on lines +414 to +415

		df1['conversion_prob'] = [max(0, min(1, xi))
		for xi in df1['conversion_prob'].values]

Collaborator

jeongyoonlee Apr 19, 2020

df1['conversion_prob'] = np.clip(df1['conversion_prob'].values, 0, 1)

causalml/dataset/classification.py

Comment on lines +416 to +418

		Y1 = np.random.binomial(1, df1['conversion_prob'].values)

		df1[y_name] = Y1

Collaborator

jeongyoonlee Apr 19, 2020

Y1 is not necessary.

causalml/dataset/classification.py

Comment on lines +387 to +389

+                              #rcoef = [0]
+                              #while np.abs(rcoef) < 0.1:
+                              #    rcoef = np.random.uniform(-1, 1, 1)

Collaborator

jeongyoonlee Apr 19, 2020

Remove the commented code block

causalml/dataset/classification.py

Comment on lines +390 to +391

		rcoef = [0.5]
		coef_uplift.append(rcoef[0])

Collaborator

jeongyoonlee Apr 19, 2020

rcoef doesn't need to be an array.

causalml/dataset/classification.py

+                  xb : list
+                      An array, with each element as the sum of product of coefficient and feature value
+                  """
+                  sm_arr = 1/(1+np.exp(-(z+np.array(xb))))

Collaborator

jeongyoonlee Apr 19, 2020

from scipy.special import expit

sm_arr = expit(z + np.array(xb))

causalml/dataset/classification.py

+                      coef_classify.append(rcoef[0])
+                  x_classify = df1[x_informative_transformed].values
+                  p1 = positive_class_proportion
+                  a10 = np.log(p1/(1.-p1))

Collaborator

jeongyoonlee Apr 19, 2020

from scipy.special import logit

a10 = logit(pi)

causalml/dataset/classification.py

Comment on lines +603 to +606

Collaborator

jeongyoonlee Apr 19, 2020

Remove extra blank lines

jeongyoonlee added the enhancement label

Patiljd commented Sep 7, 2022

For dgp

Contributor

rolandrmgservices commented Nov 6, 2023

@zhenyuz0500 @jeongyoonlee if this PR is still active, I'd be happy to work to address the review request and get it merged in. I was able to check out the branch, build from source, and reproduce the notebook results with no issues.

Collaborator

jeongyoonlee commented Nov 7, 2023

@rolandrmgservices, that'd be great. Thanks!

ras44 mentioned this pull request

resolves change requests in #166 #701

Merged

10 tasks

jeongyoonlee pushed a commit that referenced this pull request


          resolves change requests in #166 (#701)

176cc78

* adding new data generation function based on logistic regression for classification
* adding new data generation function: modify docstring
* adding example notebook for DGP
* simplify function selection
* use np.clip
* remove Y1
* rm commented code block
* rm rcoef, use 0.5
* switch to expit
* switch to logit
* remove blank lines
* link with black

---------

Co-authored-by: Zhenyu Zhao <[email protected]>
Co-authored-by: Roland Stevenson <[email protected]>

Collaborator

jeongyoonlee commented Nov 8, 2023

Closing this PR in favor of #701.

jeongyoonlee closed this

ras44 mentioned this pull request

save and load the draggonnet model #639

Closed

10 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels