-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new data generation function based on logistic regression #166
Conversation
x = np.array(x) | ||
return np.maximum(x, 0) | ||
|
||
# @staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the @staticmethod
decorator for functions that are not a member function of a class.
if f_index<len(fs): | ||
fi = [f_index] | ||
else: | ||
fi = np.random.choice(range(len(fs)), 1) | ||
y = fs[fi[0]](x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this block by using np.asscalar()
and pass N
instead of range(N)
to np.random.choice()
as follows:
try:
y = fs[f_index](x)
except IndexError:
y = fs[np.asscalar(np.random.choice(len(fs), 1))[x]
I also use try...IndexError
to handle the case f_index < 0
, which will fall into the if
block even when it's not a valid index.
return res | ||
|
||
# ------ Data generation function (V2) using logistic regression as underlying model | ||
def make_uplift_classification_logistic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about add it to make_uplift_classification()
with an additional input arg, e.g. use_logistic=True
?
df1['conversion_prob'] = [max(0, min(1, xi)) | ||
for xi in df1['conversion_prob'].values] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df1['conversion_prob'] = np.clip(df1['conversion_prob'].values, 0, 1)
Y1 = np.random.binomial(1, df1['conversion_prob'].values) | ||
|
||
df1[y_name] = Y1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Y1
is not necessary.
#rcoef = [0] | ||
#while np.abs(rcoef) < 0.1: | ||
# rcoef = np.random.uniform(-1, 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the commented code block
rcoef = [0.5] | ||
coef_uplift.append(rcoef[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rcoef
doesn't need to be an array.
xb : list | ||
An array, with each element as the sum of product of coefficient and feature value | ||
""" | ||
sm_arr = 1/(1+np.exp(-(z+np.array(xb)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from scipy.special import expit
sm_arr = expit(z + np.array(xb))
coef_classify.append(rcoef[0]) | ||
x_classify = df1[x_informative_transformed].values | ||
p1 = positive_class_proportion | ||
a10 = np.log(p1/(1.-p1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from scipy.special import logit
a10 = logit(pi)
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra blank lines
For dgp |
@zhenyuz0500 @jeongyoonlee if this PR is still active, I'd be happy to work to address the review request and get it merged in. I was able to check out the branch, build from source, and reproduce the notebook results with no issues. |
@rolandrmgservices, that'd be great. Thanks! |
* adding new data generation function based on logistic regression for classification * adding new data generation function: modify docstring * adding example notebook for DGP * simplify function selection * use np.clip * remove Y1 * rm commented code block * rm rcoef, use 0.5 * switch to expit * switch to logit * remove blank lines * link with black --------- Co-authored-by: Zhenyu Zhao <[email protected]> Co-authored-by: Roland Stevenson <[email protected]>
Closing this PR in favor of #701. |
Adding a new data generation function for classification uplift problem.
This data generation function is based on logistic regression.
The advantages of this new function is to enable better control over feature importance and feature pattern.
In the previous data generation function, if we specify M number of informative features, then in the end the actual number of informative features is m < M, many informative features behave similarly to irrelevant features. In this function, all informative features would show a clear pattern for impacting the outcome.
In addition, now we can specify the uplift feature patterns: the uplift feature pattern can be one of the following ['linear','quadratic','cubic','relu','sin','cos']. For example, if the feature pattern is 'quadratic', then the treatment effect will increase or decrease quadratically with the feature.