New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Create geosoft_gxf_io.py #539

Open

ThomasMGeo wants to merge 9 commits into fatiando:main from ThomasMGeo:gxf_reader

ThomasMGeo commented Oct 31, 2024

GXF File Format IO is a common file type for grav/mag for USGS releases. It's an ASCII file type with variable header structures.

On the dataset that I tested this on, it is upside down in the decoding to xarray. This is not impossible to fix, but I do like keeping the option for numpy and metadata output.

I bet this could be optimized further, will try to find more datasets to test with (smaller the better!). Any comments/optimizations are welcome.

Relevant issues/PRs:: This addresses #538 that has some more information about the format itself.


          Create geosoft_gxf_io.py

43880be

Author

ThomasMGeo commented Oct 31, 2024

I found a small file that works from here: https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm

Specifically: mag5001_gxf.gz | 2000-04-24 16:50 | 57K

Potential citation:

Kucks, R.P., and Hill, P.L., 2000, Wyoming aeromagnetic and gravity maps and data—A web site for distribution of data: U.S. Geological Survey Open-File Report 00-0198, https://pubs.usgs.gov/of/2000/ofr-00-0198/html/wyoming.htm

ThomasMGeo added 4 commits

October 31, 2024 11:22


          Create mag5001_gxf

5b41391


          Update geosoft_gxf_io.py

cea9cb3


          Update geosoft_gxf_io.py

42c2d0e


          Create test_gxf.py

3dcf98d

Author

ThomasMGeo commented Nov 12, 2024

ToDo: Licensing of data file from USGS.

Author

ThomasMGeo commented Nov 12, 2024

ToDo: Add Attribution to gist

Author

ThomasMGeo commented Nov 12, 2024

For USGS, looks like we just need to add attribution as well: https://www.usgs.gov/faqs/are-usgs-reportspublications-copyrighted

ThomasMGeo added 2 commits

November 12, 2024 11:00


          Updated attribution to USGS

d8e3fa0


          Merge branch 'main' into gxf_reader

aa0c6ce

Author

ThomasMGeo commented Nov 12, 2024

@santisoler added some text about the USGS, the attribution to the gist was already there. Let me know if I need to do anything else! I probably won't edit this further until I have some more feedback.

Author

ThomasMGeo commented Nov 22, 2024

@santisoler , is there anything else you want to see?


          Merge branch 'main' into gxf_reader

ca785fd

Member

santisoler commented Nov 22, 2024

Hi @ThomasMGeo. Thanks for pushing these changes. Let me take a look at this and I'll come back to you.

santisoler reviewed

View reviewed changes

Member

santisoler left a comment

Thanks @ThomasMGeo for opening this PR!

I took a quick look at it and I have the following comments.

I'm not sure what's the benefit by not making the read_gxf to always return an xarray.DataArray. Once we have the DataArray, it's trivial to get the underlying Numpy array and the metadata as a dict. And by doing so, we only need to add one public function to Harmonica (read_gxf), instead of read_gxf and gxf_to_xarray (which has only a single use). In my opinion we should make read_gxf to behave similarly to the other readers in Harmonica and return a DataArray.

BTW, we need to remember to add any new public function to harmonica/__init__.py and to the API reference in doc/api/index.rst.

One little thing to note is that the docstrings should follow numpydoc's style, so they can be nicely rendered in Sphinx.

I left some more comments below.

Let me know what do you think. We can iterate over it later on! And feel free to ask for help if you need so.

harmonica/_io/geosoft_gxf_io.py

Comment on lines +10 to +15

+. Grid eXchange Format (*.gxf)
+              GXF (Grid eXchange File) is a standard ASCII file format for
+              exchanging gridded data among different software systems.
+              Software that supports the GXF standard will be able to import
+              properly formatted GXF files and export grids in GXF format.

Member

santisoler Nov 26, 2024

I suspect that these paragraphs are coming from this PDF: https://pubs.usgs.gov/of/1999/of99-514/grids/gxf.pdf. Am I right?

Do we have permission to reproduce the same text here? Can we double check the license of those files? In doubt, I would remove these lines from this file, a reference should be enough. If we do have permission we can leave them, but I would make sure we are compliant with the license (reference the document, acknowledgement, include license text, etc?).

Author

ThomasMGeo Nov 26, 2024

I am happy to remove it.

harmonica/_io/geosoft_gxf_io.py Show resolved Hide resolved

harmonica/_io/geosoft_gxf_io.py

Comment on lines +187 to +188

		if reading_data:
		data_list.append(line)

Member

santisoler Nov 26, 2024

From what I understood these lines are populating the data_list with the values of the grid. Since the number of lines for the grid are usually going to be way larger than the header lines, why not using np.loadtxt to read them? It performs much faster than appending lines to a list, it already returns an array in the right shape, and it also has some sanity checks (the number of elements per row are consistent) that we don't have to take care of.

I think we could start reading the header lines until we find the #GRID line, keep record of the number of rows for the header and stop the iteration. Then we can use np.loadtxt to read the rest of the file by passing the number of roads to skiprows (+/- 1, we need to figure out the correct indexing).

harmonica/_io/geosoft_gxf_io.py

Comment on lines +238 to +240

+                  # Get grid dimensions
+                  nrows = int(headers['ROWS'])
+                  ncols = int(headers['POINTS'])

Member

santisoler Nov 26, 2024

In lines https://github.com/fatiando/harmonica/pull/539/files#diff-5ac31ec9d8d11b47ff39f59bf25e1cad43484857580dd7b6c65cddd8e0f504cfR219-R229 we populate the metadata dictionary with parsed values (float, int, stripped str). Isn't it better to use metadata to get the number of rows and columns, instead of having to parse them again as ints?

That being said, maybe we want the keys in metadata to be lowercase (as you do in following lines with nx, ny, x_inc, etc.).

harmonica/_io/geosoft_gxf_io.py

Comment on lines +283 to +308

+              def get_grid_info(metadata: Dict[str, Any]) -> None:
+                  """
+                  Print comprehensive information about the GXF grid.
+                  Parameters:
+                  metadata (dict): Metadata dictionary from read_gxf
+                  """
+                  print("=== Grid Information ===")
+                  print(f"Title: {metadata.get('TITLE', 'Not specified')}")
+                  print(f"Grid size: {metadata['nx']} x {metadata['ny']} points")
+                  print(f"Cell size: {metadata['x_inc']} x {metadata['y_inc']}")
+                  print(f"Origin: ({metadata['x_min']}, {metadata['y_min']})")
+                  print(f"Rotation: {metadata.get('ROTATION', 0)}")
+                  print(f"Dummy value: {metadata.get('DUMMY', 'Not specified')}")
+                  if 'projection' in metadata:
+                      print("\n=== Projection Information ===")
+                      print(f"Type: {metadata['projection']['type']}")
+                      print(f"Units: {metadata['projection']['units']}")
+                      params = metadata['projection']['parameters']
+                      if any(params.values()):
+                          print("\nProjection Parameters:")
+                          for key, value in params.items():
+                              if value is not None:
+                                  print(f"{key}: {value}")

Member

santisoler Nov 26, 2024

I think this function is nice for debugging the reader. But I hardly see users using it (we would need to expose it as well). Instead, if read_gxf returns a DataArray, then we don't need this function: xarray can nicely display all the data.

What do you think? Is there any special reason to include this function?


          Update harmonica/_io/geosoft_gxf_io.py

a7081c0

Co-authored-by: Santiago Soler <[email protected]>

Author

ThomasMGeo commented Nov 26, 2024

Hi @santisoler , thanks for the review, I just want to touch on this few comments before I dig into the rest of the issues.

I'm not sure what's the benefit by not making the read_gxf to always return an xarray.DataArray

I am happy to have it be a very plain DataArray without the CRS (usually the CRS is not given as a code in the files that I have seen). For some of my work, I have had to do some weird warping to get them to match, but this might be rare and isolated.

The get_grid_info was just used to figure out the projections (again, these are not standard CRS's with modern EPSG codes from what I have seen). So my general workflow would be to load it as a numpy array, look at that output, and then load it into xarray with all of the EPSG codes attached.

Let me know what you think is best, and I can tackle it!

Member

santisoler commented Nov 27, 2024

Hi @ThomasMGeo. I see your point. Since the gxf files don't have a strict specification for setting the CRS, I think we should just add to the attrs in the xarray.DataArray whatever information we can find in the header of the file. If that information is enough to determine the CRS, that's cool. But if it's not, too bad. I don't think there's nothing the reader function can do to solve that, am I right?

I do like having the information about the projection in the metadata. A nested dict (like the one you already implemented) should be ok (we might want to make sure that that structure follows the CF metadata conventions).

Another thing we were discussing today during the meeting is to use easting and northing for the x and y directions. But use x and y in case the grid is rotated (very much like the read_oasis_grd function does).

Author

ThomasMGeo commented Nov 27, 2024

Awesome, that all makes sense. Let me work on it for a bit and get back to you. Thanks for giving it a look and for all the feedback! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet