Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache_call messes up likelihood if more than one CW source is added #214

Open
bencebecsy opened this issue Dec 10, 2019 · 4 comments
Open

Comments

@bencebecsy
Copy link
Contributor

TLDR: It looks like cache_call does something funky if the PTA object has more than one CW signal in it and the likelihood is called multiple times (which, of course, happens always in MCMCs). For now, I have a workaround of changing the default limit to 1 in cache call (here: https://github.com/nanograv/enterprise/blob/master/enterprise/signals/signal_base.py#L853), but it would be nice if someone familiar with cache_call could look into this and figure out how to solve this properly.

Details:
I ran on Linux, with an enterprise version sometime from earlier this year. I set up my PTA object like this:

n_source = 2

efac = parameter.Constant(1.0)
ef = white_signals.MeasurementNoise(efac=efac)
tm = gp_signals.TimingModel(use_svd=True)

base_model = ef + tm

cws = []
for i in range(n_source):
    log10_fgw = parameter.Uniform(np.log10(3.5e-9), -7)(str(i)+'_'+'log10_fgw')
    log10_mc = parameter.Constant(np.log10(5e9))(str(i)+'_'+'log10_mc')
    cos_gwtheta = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_gwtheta')
    gwphi = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'gwphi')
    phase0 = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'phase0')
    psi = parameter.Uniform(0, np.pi)(str(i)+'_'+'psi')
    cos_inc = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_inc')
    log10_h = parameter.LinearExp(-18, -13)(str(i)+'_'+'log10_h')
    cw_wf = models.cw_delay(cos_gwtheta=cos_gwtheta, gwphi=gwphi, log10_mc=log10_mc,
                 log10_h=log10_h, log10_fgw=log10_fgw, phase0=phase0,
                 psi=psi, cos_inc=cos_inc, tref=53000*86400)
    cws.append(models.CWSignal(cw_wf, psrTerm=False, name='cw'+str(i)))

s = base_model
for i in range(n_source):
    s = s + cws[i]

model = []
for p in psrs:
    model.append(s(p))

pta = signal_base.PTA(model)

Then, I calculate the likelihood difference when moving a bit away from a given point:

xx = {'0_cos_gwtheta':np.cos(np.pi/3),
      '0_cos_inc':np.cos(1.0),
      '0_gwphi':4.5,
      '0_log10_fgw':np.log10(8e-9),
      '0_log10_h':np.log10(7.5e-15),
      '0_phase0':1.0,
      '0_psi':1.0,
      '1_cos_gwtheta':np.cos(np.pi/2),
      '1_cos_inc':np.cos(2.0),
      '1_gwphi':1.0,
      '1_log10_fgw':np.log10(2e-8),
      '1_log10_h':np.log10(5e-15),
      '1_phase0':2.0,
      '1_psi':0.5}

xx_keys = xx.keys()
print(xx_keys)
dx = 0.1
for key in xx_keys:
    xx_mod = copy.deepcopy(xx)
    xx_mod[key] += dx
    delta_L = pta.get_lnlikelihood(xx_mod) - pta.get_lnlikelihood(xx)
    print(delta_L)

When I run it first, I get the right answer:

-1.38655732293
-0.674862583255
-0.217860197201
-51.636416802
-2.66131148511
-0.424086374253
-1.72225094947
0.0709663308226
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

The second time I run the same code, I get different values:

0.0
0.703805200599
-0.214432425753
-51.6128221575
-2.66575064577
-0.415693103816
-1.7059718191
0.0169246321639
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

If I change the limit to 1 in cache_call as described above, I get the same results anytime I call this test which is identical to the one I get the first time.

@vallis
Copy link
Collaborator

vallis commented Dec 11, 2019

This seems to be a problem with the way that models.CWSignal manages its parameters.

I think I can fix this. Let me ask though, are you using a branch or fork? In enterprise_extensions master and dev, models.CWSignal() does not have the argument name.

@bencebecsy
Copy link
Contributor Author

I was using a fork, but now it's in master of e_e. Also models.CWSignal is deterministic.CWSignal now.

Not sure how related it is, but once we look at how CW parameters are handled, maybe we would also want to look at how the pulsar term parameters are handled if multiple CWs are present? As in this issue: #199

danielreardon added a commit to danielreardon/enterprise that referenced this issue Aug 26, 2024
Fixes this issue: nanograv#214

And related problems when using Deterministic signals other than CWs
@danielreardon
Copy link
Contributor

danielreardon commented Aug 26, 2024

Hi @bencebecsy and @vallis,

Unfortunately, I've found that this likelihood error can occur in much broader situations and that many people may have been unknowingly affected. The issue can occur when you use multiple Deterministic signals. For example solar wind, exponential dips, DM annual.

This is of course a significant problem but even moreso because once the likelihood calculation breaks it can stay broken, unnoticed, for a while. It can clearly manifest if the likelihood happens to be erroneously too large, which would make the PTMCMC chain get stuck (yet those using a nested sampler can continue on, unfortunately).

The fix I have is to set limit=1 as @bencebecsy did, but specifically for the get_delay function. I believe the function has similar "side-effects" as get_basis (see the comment here). Perhaps it would be safest just to use limit=1 for everything..?

Once the limit=1 fix is implemented, I also found that adding the function name to the cache_call key, makes the code faster. You can see my suggested changes here: master...danielreardon:enterprise:master

I'm keen to get your thoughts on the underlying cause, @vallis . If the fixes make sense I can make a pull request.

@danielreardon
Copy link
Contributor

danielreardon commented Aug 26, 2024

Ah! @vallis the issue existed in the version of enterprise in my conda environment, which was the version used for the NANOGrav 15yr analysis, but it does not exist in the most recent version.

This bug fix for deterministic_signals solves it!

Does it make sense that setting limit=1 as above also made the symptoms disappear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants