cache_call messes up likelihood if more than one CW source is added #214

bencebecsy · 2019-12-10T17:55:38Z

TLDR: It looks like cache_call does something funky if the PTA object has more than one CW signal in it and the likelihood is called multiple times (which, of course, happens always in MCMCs). For now, I have a workaround of changing the default limit to 1 in cache call (here: https://github.com/nanograv/enterprise/blob/master/enterprise/signals/signal_base.py#L853), but it would be nice if someone familiar with cache_call could look into this and figure out how to solve this properly.

Details:
I ran on Linux, with an enterprise version sometime from earlier this year. I set up my PTA object like this:

n_source = 2

efac = parameter.Constant(1.0)
ef = white_signals.MeasurementNoise(efac=efac)
tm = gp_signals.TimingModel(use_svd=True)

base_model = ef + tm

cws = []
for i in range(n_source):
    log10_fgw = parameter.Uniform(np.log10(3.5e-9), -7)(str(i)+'_'+'log10_fgw')
    log10_mc = parameter.Constant(np.log10(5e9))(str(i)+'_'+'log10_mc')
    cos_gwtheta = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_gwtheta')
    gwphi = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'gwphi')
    phase0 = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'phase0')
    psi = parameter.Uniform(0, np.pi)(str(i)+'_'+'psi')
    cos_inc = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_inc')
    log10_h = parameter.LinearExp(-18, -13)(str(i)+'_'+'log10_h')
    cw_wf = models.cw_delay(cos_gwtheta=cos_gwtheta, gwphi=gwphi, log10_mc=log10_mc,
                 log10_h=log10_h, log10_fgw=log10_fgw, phase0=phase0,
                 psi=psi, cos_inc=cos_inc, tref=53000*86400)
    cws.append(models.CWSignal(cw_wf, psrTerm=False, name='cw'+str(i)))

s = base_model
for i in range(n_source):
    s = s + cws[i]

model = []
for p in psrs:
    model.append(s(p))

pta = signal_base.PTA(model)

Then, I calculate the likelihood difference when moving a bit away from a given point:

xx = {'0_cos_gwtheta':np.cos(np.pi/3),
      '0_cos_inc':np.cos(1.0),
      '0_gwphi':4.5,
      '0_log10_fgw':np.log10(8e-9),
      '0_log10_h':np.log10(7.5e-15),
      '0_phase0':1.0,
      '0_psi':1.0,
      '1_cos_gwtheta':np.cos(np.pi/2),
      '1_cos_inc':np.cos(2.0),
      '1_gwphi':1.0,
      '1_log10_fgw':np.log10(2e-8),
      '1_log10_h':np.log10(5e-15),
      '1_phase0':2.0,
      '1_psi':0.5}

xx_keys = xx.keys()
print(xx_keys)
dx = 0.1
for key in xx_keys:
    xx_mod = copy.deepcopy(xx)
    xx_mod[key] += dx
    delta_L = pta.get_lnlikelihood(xx_mod) - pta.get_lnlikelihood(xx)
    print(delta_L)

When I run it first, I get the right answer:

-1.38655732293
-0.674862583255
-0.217860197201
-51.636416802
-2.66131148511
-0.424086374253
-1.72225094947
0.0709663308226
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

The second time I run the same code, I get different values:

0.0
0.703805200599
-0.214432425753
-51.6128221575
-2.66575064577
-0.415693103816
-1.7059718191
0.0169246321639
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

If I change the limit to 1 in cache_call as described above, I get the same results anytime I call this test which is identical to the one I get the first time.

The text was updated successfully, but these errors were encountered:

vallis · 2019-12-11T00:02:14Z

This seems to be a problem with the way that models.CWSignal manages its parameters.

I think I can fix this. Let me ask though, are you using a branch or fork? In enterprise_extensions master and dev, models.CWSignal() does not have the argument name.

bencebecsy · 2020-02-25T18:36:31Z

I was using a fork, but now it's in master of e_e. Also models.CWSignal is deterministic.CWSignal now.

Not sure how related it is, but once we look at how CW parameters are handled, maybe we would also want to look at how the pulsar term parameters are handled if multiple CWs are present? As in this issue: #199

Fixes this issue: nanograv#214 And related problems when using Deterministic signals other than CWs

danielreardon · 2024-08-26T16:11:28Z

Hi @bencebecsy and @vallis,

Unfortunately, I've found that this likelihood error can occur in much broader situations and that many people may have been unknowingly affected. The issue can occur when you use multiple Deterministic signals. For example solar wind, exponential dips, DM annual.

This is of course a significant problem but even moreso because once the likelihood calculation breaks it can stay broken, unnoticed, for a while. It can clearly manifest if the likelihood happens to be erroneously too large, which would make the PTMCMC chain get stuck (yet those using a nested sampler can continue on, unfortunately).

The fix I have is to set limit=1 as @bencebecsy did, but specifically for the get_delay function. I believe the function has similar "side-effects" as get_basis (see the comment here). Perhaps it would be safest just to use limit=1 for everything..?

Once the limit=1 fix is implemented, I also found that adding the function name to the cache_call key, makes the code faster. You can see my suggested changes here: master...danielreardon:enterprise:master

I'm keen to get your thoughts on the underlying cause, @vallis . If the fixes make sense I can make a pull request.

danielreardon · 2024-08-26T16:29:26Z

Ah! @vallis the issue existed in the version of enterprise in my conda environment, which was the version used for the NANOGrav 15yr analysis, but it does not exist in the most recent version.

This bug fix for deterministic_signals solves it!

Does it make sense that setting limit=1 as above also made the symptoms disappear?

danielreardon added a commit to danielreardon/enterprise that referenced this issue Aug 26, 2024

Fixed to cache_call errors related to the get_delay function

edc8a5d

Fixes this issue: nanograv#214 And related problems when using Deterministic signals other than CWs

danielreardon referenced this issue Aug 26, 2024

Bugfix delay caching in deterministic_signals.py

ae35f14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache_call messes up likelihood if more than one CW source is added #214

cache_call messes up likelihood if more than one CW source is added #214

bencebecsy commented Dec 10, 2019

vallis commented Dec 11, 2019

bencebecsy commented Feb 25, 2020

danielreardon commented Aug 26, 2024 •

edited

Loading

danielreardon commented Aug 26, 2024 •

edited

Loading

cache_call messes up likelihood if more than one CW source is added #214

cache_call messes up likelihood if more than one CW source is added #214

Comments

bencebecsy commented Dec 10, 2019

vallis commented Dec 11, 2019

bencebecsy commented Feb 25, 2020

danielreardon commented Aug 26, 2024 • edited Loading

danielreardon commented Aug 26, 2024 • edited Loading

danielreardon commented Aug 26, 2024 •

edited

Loading

danielreardon commented Aug 26, 2024 •

edited

Loading