Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The init status in initialize_latent #1078

Closed
kubotty opened this issue Jul 5, 2024 · 4 comments · Fixed by #1082
Closed

The init status in initialize_latent #1078

kubotty opened this issue Jul 5, 2024 · 4 comments · Fixed by #1082
Assignees

Comments

@kubotty
Copy link

kubotty commented Jul 5, 2024

In initialization.py

def initialize_latent(init, input_dim, Y):
    Xr = np.asfortranarray(np.random.normal(0, 1, (Y.shape[0], input_dim)))
    if 'PCA' in init:
        ...
    elif init in 'empirical_samples':
        ...
    else:
        ...

If the init has any strings that include three letters of "PCA", it takes the first branch, and if the init includes any length of strings in "empirical_samples" (even an empty string: ""!), it takes the second branch. There is a high possibility that an unexpected branch may be called.

I propose changing it to:

def initialize_latent(init, input_dim, Y):
    Xr = np.asfortranarray(np.random.normal(0, 1, (Y.shape[0], input_dim)))
    if init == 'PCA':
        ...
    elif init == 'empirical_samples':
        ...
    else:
        ...

In addition, I can't find any situations where 'empirical_samples' is used in any of the calling modules. The init possibly has two values: "PCA" and "random" (I apologize if I missed others), so it seems better like this:

def initialize_latent(init, input_dim, Y):
    Xr = np.asfortranarray(np.random.normal(0, 1, (Y.shape[0], input_dim)))
    if init == 'PCA':
        ...
    elif init == 'random':
        ...
    else:
        raise ValueError(...)
@MartinBubel
Copy link
Contributor

MartinBubel commented Jul 10, 2024

Hi @kubotty
thank you for the issue.

I agree with you that this is potentially unsafe and that your suggestion is better/safer.
Personally I would prefer an enum but that would introduce a major change which is not what I would like to have at the moment - so your suggestion remains great.

I also agree with your comments on the "empirical_samples" branch. However, I would like to keep this as is. I don't have the overview if this is something that may is in use by some power users or whether there is any pull request in the loop that is using it.
Therefore I recommend to keep the empirical samples even if it does not seem necessary at the moment.
Also, I would like to keep "random" in the else case. Even if I cannot think of a dangerous scenario at the moment, I don't feel comfortable with such a change at the moment.

I joined GPy just a while ago and came more or less for maintenance. I don't have the full overview on the backend and thus I am a bit hesitating when it comes to deeper changes.

With that said, feel free to start a pull request if you want :)

Best
Martin

@lawrennd
Copy link
Member

Hi Both,

@MartinBubel you're doing a great job and feel free to have confidence around deeper changes! I wouldn't be 100% sure of the protocol here, but perhaps a deprecation warning initially for empirical samples? Then that would trigger anyone who's using it to (hopefully) raise an issue if it's a problem to remove it?

Neil

@MartinBubel
Copy link
Contributor

thanks Neil for your input. Deprecation warning like "empirical samples is on a deprecation path will be removed in a future version, raise an issue if you need it" would definitely be a good solution - probably the best.

@MartinBubel
Copy link
Contributor

@kubotty are you interested in doing the PR? If so, feel free to choose me for the review.

@MartinBubel MartinBubel self-assigned this Jul 21, 2024
@MartinBubel MartinBubel linked a pull request Jul 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants