-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create geosoft_gxf_io.py #539
base: main
Are you sure you want to change the base?
Conversation
I found a small file that works from here: https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm Specifically: mag5001_gxf.gz | 2000-04-24 16:50 | 57K Potential citation: Kucks, R.P., and Hill, P.L., 2000, Wyoming aeromagnetic and gravity maps and data—A web site for distribution of data: U.S. Geological Survey Open-File Report 00-0198, https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm |
ToDo: Licensing of data file from USGS. |
ToDo: Add Attribution to gist |
For USGS, looks like we just need to add attribution as well: https://www.usgs.gov/faqs/are-usgs-reportspublications-copyrighted |
@santisoler added some text about the USGS, the attribution to the gist was already there. Let me know if I need to do anything else! I probably won't edit this further until I have some more feedback. |
@santisoler , is there anything else you want to see? |
Hi @ThomasMGeo. Thanks for pushing these changes. Let me take a look at this and I'll come back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ThomasMGeo for opening this PR!
I took a quick look at it and I have the following comments.
I'm not sure what's the benefit by not making the read_gxf
to always return an xarray.DataArray
. Once we have the DataArray, it's trivial to get the underlying Numpy array and the metadata as a dict. And by doing so, we only need to add one public function to Harmonica (read_gxf
), instead of read_gxf
and gxf_to_xarray
(which has only a single use). In my opinion we should make read_gxf
to behave similarly to the other readers in Harmonica and return a DataArray
.
BTW, we need to remember to add any new public function to harmonica/__init__.py
and to the API reference in doc/api/index.rst
.
One little thing to note is that the docstrings should follow numpydoc's style, so they can be nicely rendered in Sphinx.
I left some more comments below.
Let me know what do you think. We can iterate over it later on! And feel free to ask for help if you need so.
1. Grid eXchange Format (*.gxf) | ||
|
||
GXF (Grid eXchange File) is a standard ASCII file format for | ||
exchanging gridded data among different software systems. | ||
Software that supports the GXF standard will be able to import | ||
properly formatted GXF files and export grids in GXF format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that these paragraphs are coming from this PDF: https://pubs.usgs.gov/of/1999/of99-514/grids/gxf.pdf. Am I right?
Do we have permission to reproduce the same text here? Can we double check the license of those files? In doubt, I would remove these lines from this file, a reference should be enough. If we do have permission we can leave them, but I would make sure we are compliant with the license (reference the document, acknowledgement, include license text, etc?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to remove it.
if reading_data: | ||
data_list.append(line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understood these lines are populating the data_list
with the values of the grid. Since the number of lines for the grid are usually going to be way larger than the header lines, why not using np.loadtxt
to read them? It performs much faster than appending lines to a list, it already returns an array in the right shape, and it also has some sanity checks (the number of elements per row are consistent) that we don't have to take care of.
I think we could start reading the header lines until we find the #GRID
line, keep record of the number of rows for the header and stop the iteration. Then we can use np.loadtxt
to read the rest of the file by passing the number of roads to skiprows
(+/- 1, we need to figure out the correct indexing).
# Get grid dimensions | ||
nrows = int(headers['ROWS']) | ||
ncols = int(headers['POINTS']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In lines https://github.com/fatiando/harmonica/pull/539/files#diff-5ac31ec9d8d11b47ff39f59bf25e1cad43484857580dd7b6c65cddd8e0f504cfR219-R229 we populate the metadata
dictionary with parsed values (float, int, stripped str). Isn't it better to use metadata
to get the number of rows and columns, instead of having to parse them again as ints?
That being said, maybe we want the keys in metadata
to be lowercase (as you do in following lines with nx
, ny
, x_inc
, etc.).
def get_grid_info(metadata: Dict[str, Any]) -> None: | ||
""" | ||
Print comprehensive information about the GXF grid. | ||
|
||
Parameters: | ||
metadata (dict): Metadata dictionary from read_gxf | ||
""" | ||
print("=== Grid Information ===") | ||
print(f"Title: {metadata.get('TITLE', 'Not specified')}") | ||
print(f"Grid size: {metadata['nx']} x {metadata['ny']} points") | ||
print(f"Cell size: {metadata['x_inc']} x {metadata['y_inc']}") | ||
print(f"Origin: ({metadata['x_min']}, {metadata['y_min']})") | ||
print(f"Rotation: {metadata.get('ROTATION', 0)}") | ||
print(f"Dummy value: {metadata.get('DUMMY', 'Not specified')}") | ||
|
||
if 'projection' in metadata: | ||
print("\n=== Projection Information ===") | ||
print(f"Type: {metadata['projection']['type']}") | ||
print(f"Units: {metadata['projection']['units']}") | ||
|
||
params = metadata['projection']['parameters'] | ||
if any(params.values()): | ||
print("\nProjection Parameters:") | ||
for key, value in params.items(): | ||
if value is not None: | ||
print(f"{key}: {value}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is nice for debugging the reader. But I hardly see users using it (we would need to expose it as well). Instead, if read_gxf
returns a DataArray, then we don't need this function: xarray
can nicely display all the data.
What do you think? Is there any special reason to include this function?
Co-authored-by: Santiago Soler <[email protected]>
Hi @santisoler , thanks for the review, I just want to touch on this few comments before I dig into the rest of the issues.
I am happy to have it be a very plain DataArray without the CRS (usually the CRS is not given as a code in the files that I have seen). For some of my work, I have had to do some weird warping to get them to match, but this might be rare and isolated. The get_grid_info was just used to figure out the projections (again, these are not standard CRS's with modern EPSG codes from what I have seen). So my general workflow would be to load it as a numpy array, look at that output, and then load it into xarray with all of the EPSG codes attached. Let me know what you think is best, and I can tackle it! |
Hi @ThomasMGeo. I see your point. Since the gxf files don't have a strict specification for setting the CRS, I think we should just add to the I do like having the information about the projection in the metadata. A nested dict (like the one you already implemented) should be ok (we might want to make sure that that structure follows the CF metadata conventions). Another thing we were discussing today during the meeting is to use |
Awesome, that all makes sense. Let me work on it for a bit and get back to you. Thanks for giving it a look and for all the feedback! :) |
GXF File Format IO is a common file type for grav/mag for USGS releases. It's an ASCII file type with variable header structures.
On the dataset that I tested this on, it is upside down in the decoding to xarray. This is not impossible to fix, but I do like keeping the option for numpy and metadata output.
I bet this could be optimized further, will try to find more datasets to test with (smaller the better!). Any comments/optimizations are welcome.
Relevant issues/PRs:: This addresses #538 that has some more information about the format itself.