Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fred.py #52

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Update fred.py #52

wants to merge 1 commit into from

Conversation

dsgerbc
Copy link

@dsgerbc dsgerbc commented May 10, 2022

The FRED API seems to have implemented a hard cap of 2000 vintages to be retrieved at once. It comes into play if one tries to run get_series_all_releases on any daily data series present in ALFRED (for example 'T1YFF'). Requests for data come back with a "Bad Request. This exceeds the maximum number of vintage dates allowed (2000)" error.
The fix splits data retrieval into batches of <= 2000 vintages.

The FRED API seems to have implemented a hard cap of 2000 vintages to be retrieved at once. It comes into play if one tries to run get_series_all_releases on any daily data series present in ALFRED (for example 'T1YFF'). Requests for data come back with a "Bad Request. This exceeds the maximum number of vintage dates allowed (2000)" error.
The fix splits data retrieval into batches of <= 2000 vintages.
@tommyjcarpenter
Copy link

hi - I have PRd an additional commit into dgerbc branch. Is it possible for @dsgerbc to merge that, and then we merge this PR? as of now I forked this and pushed to our private PYPI but i would prefer the upstream to have it.

@TomasKoutek
Copy link

Hi guys,
i have a similar issue from @almostintuitive in my pystlouisfed library. I have done various tests and I think this problem probably has no solution. Unfortunately, even this implementation is not correct.

The resulting "realtime_start" and "realtime_end" fields change depending on the "realtime_start" and "realtime_end" parameters.

For example:
For testing, we change "max_chunk_length = 2000" to "max_chunk_length = 90" and test:

from fredapi import Fred

fred = Fred(api_key='xyz')
df = fred.get_series_all_releases(series_id='GNPCA', realtime_start='1776-09-24', realtime_end='2001-03-27')

df[df['date'] == '1929-01-01']
idx realtime_start date value
0 1958-12-21 1929-01-01 181.8
1 1965-08-19 1929-01-01 203.6
2 1976-01-16 1929-01-01 314.7
3 1980-12-23 1929-01-01 315.7
4 1985-12-20 1929-01-01 709.6
0 1987-02-19 1929-01-01 709.6
1 1991-12-04 1929-01-01 NaT
2 1992-12-22 1929-01-01 827.4
3 1996-01-19 1929-01-01 NaT
4 1997-04-30 1929-01-01 796.8
5 1999-10-29 1929-01-01 NaT
6 2000-04-27 1929-01-01 828.9

The value "709.6" is duplicated in the resulting DataFrame, because it falls into two ranges:

1958-12-21 -> 1987-01-22
1987-02-19 -> 2000-07-28

https://api.stlouisfed.org/fred/series/observations?series_id=GNPCA&api_key=xyz&realtime_start=1776-09-24&realtime_end=2001-03-27

<observation realtime_start="1958-12-21" realtime_end="1965-08-18" date="1929-01-01" value="181.8"/>
<observation realtime_start="1965-08-19" realtime_end="1976-01-15" date="1929-01-01" value="203.6"/>
<observation realtime_start="1976-01-16" realtime_end="1980-12-22" date="1929-01-01" value="314.7"/>
<observation realtime_start="1980-12-23" realtime_end="1985-12-19" date="1929-01-01" value="315.7"/>
<observation realtime_start="1985-12-20" realtime_end="1991-12-03" date="1929-01-01" value="709.6"/>
<observation realtime_start="1991-12-04" realtime_end="1992-12-21" date="1929-01-01" value="."/>
<observation realtime_start="1992-12-22" realtime_end="1996-01-18" date="1929-01-01" value="827.4"/>
<observation realtime_start="1996-01-19" realtime_end="1997-04-29" date="1929-01-01" value="."/>
<observation realtime_start="1997-04-30" realtime_end="1999-10-28" date="1929-01-01" value="796.8"/>
<observation realtime_start="1999-10-29" realtime_end="2000-04-26" date="1929-01-01" value="."/>
<observation realtime_start="2000-04-27" realtime_end="2001-03-27" date="1929-01-01" value="828.9"/>

Even if it would be possible to calculate realtime_end (shift Series) from realtime_start, it is not possible to remove realtime_start duplicates with certainty on the client side. That's why I think it's not possible to apply pagination to vintages data.

FRED's proposal is wrong, when the vintages limit is exceeded, pagination should be enforced directly by FRED (limit/offset). In addition, this is an undocumented functionality. But FRED has more such problems...

Tomas

@almostintuitive
Copy link

@TomasKoutek : thanks for the investigation! I'll also look into it soon. We, specifically, are only interested in the original release date, which - if I understand it correctly - can be still extracted for these series. Do you know if this statement is correct?
Thanks!
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants