Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a new data generation function based on logistic regression #166

Closed
wants to merge 4 commits into from

Conversation

zhenyuz0500
Copy link
Collaborator

Adding a new data generation function for classification uplift problem.
This data generation function is based on logistic regression.
The advantages of this new function is to enable better control over feature importance and feature pattern.
In the previous data generation function, if we specify M number of informative features, then in the end the actual number of informative features is m < M, many informative features behave similarly to irrelevant features. In this function, all informative features would show a clear pattern for impacting the outcome.
In addition, now we can specify the uplift feature patterns: the uplift feature pattern can be one of the following ['linear','quadratic','cubic','relu','sin','cos']. For example, if the feature pattern is 'quadratic', then the treatment effect will increase or decrease quadratically with the feature.

x = np.array(x)
return np.maximum(x, 0)

# @staticmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the @staticmethod decorator for functions that are not a member function of a class.

Comment on lines +118 to +122
if f_index<len(fs):
fi = [f_index]
else:
fi = np.random.choice(range(len(fs)), 1)
y = fs[fi[0]](x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this block by using np.asscalar() and pass N instead of range(N) to np.random.choice() as follows:

try:
    y = fs[f_index](x)
except IndexError:
    y = fs[np.asscalar(np.random.choice(len(fs), 1))[x]

I also use try...IndexError to handle the case f_index < 0, which will fall into the if block even when it's not a valid index.

return res

# ------ Data generation function (V2) using logistic regression as underlying model
def make_uplift_classification_logistic(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about add it to make_uplift_classification() with an additional input arg, e.g. use_logistic=True?

Comment on lines +414 to +415
df1['conversion_prob'] = [max(0, min(1, xi))
for xi in df1['conversion_prob'].values]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df1['conversion_prob'] = np.clip(df1['conversion_prob'].values, 0, 1)

Comment on lines +416 to +418
Y1 = np.random.binomial(1, df1['conversion_prob'].values)

df1[y_name] = Y1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Y1 is not necessary.

Comment on lines +387 to +389
#rcoef = [0]
#while np.abs(rcoef) < 0.1:
# rcoef = np.random.uniform(-1, 1, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented code block

Comment on lines +390 to +391
rcoef = [0.5]
coef_uplift.append(rcoef[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rcoef doesn't need to be an array.

xb : list
An array, with each element as the sum of product of coefficient and feature value
"""
sm_arr = 1/(1+np.exp(-(z+np.array(xb))))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from scipy.special import expit

sm_arr = expit(z + np.array(xb))

coef_classify.append(rcoef[0])
x_classify = df1[x_informative_transformed].values
p1 = positive_class_proportion
a10 = np.log(p1/(1.-p1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from scipy.special import logit

a10 = logit(pi)

Comment on lines +603 to +606




Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra blank lines

@jeongyoonlee jeongyoonlee added the enhancement New feature or request label Oct 7, 2020
@Patiljd
Copy link

Patiljd commented Sep 7, 2022

For dgp

@rolandrmgservices
Copy link
Contributor

@zhenyuz0500 @jeongyoonlee if this PR is still active, I'd be happy to work to address the review request and get it merged in. I was able to check out the branch, build from source, and reproduce the notebook results with no issues.

@jeongyoonlee
Copy link
Collaborator

@rolandrmgservices, that'd be great. Thanks!

@ras44 ras44 mentioned this pull request Nov 7, 2023
10 tasks
jeongyoonlee pushed a commit that referenced this pull request Nov 8, 2023
* adding new data generation function based on logistic regression for classification
* adding new data generation function: modify docstring
* adding example notebook for DGP
* simplify function selection
* use np.clip
* remove Y1
* rm commented code block
* rm rcoef, use 0.5
* switch to expit
* switch to logit
* remove blank lines
* link with black

---------

Co-authored-by: Zhenyu Zhao <[email protected]>
Co-authored-by: Roland Stevenson <[email protected]>
@jeongyoonlee
Copy link
Collaborator

Closing this PR in favor of #701.

@ras44 ras44 mentioned this pull request Nov 15, 2023
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants