Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading local Zarr files into stars #663

Open
oshuwilson opened this issue Jan 30, 2024 · 23 comments
Open

Reading local Zarr files into stars #663

oshuwilson opened this issue Jan 30, 2024 · 23 comments

Comments

@oshuwilson
Copy link

oshuwilson commented Jan 30, 2024

Hi,

After looking at the vignette for reading Zarr files in stars, I am unsure how to read local Zarr directories into R. I have been trying to work with satellite imagery for the Southern Ocean downloaded from Copernicus' Marine Data Client.

Here is my attempt at coding this

`library(stars)

dsn <- 'ZARR:"sic_daily_samples.zarr/"'

read_mdim(dsn)`

Which gives the error message

Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL' In addition: Warning messages: 1: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 2: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 3: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled 4: In CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL Error 1: Decompressor blosc not handled

I've uploaded a subset of the data for ease but I can't figure out how to read it as a zipped or unzipped file, so any help with this would be appreciated!

Thanks,
Josh

sic_daily_samples.zarr.zip

@edzer
Copy link
Member

edzer commented Jan 30, 2024

I get

> read_mdim("sic_daily_sample.zarr/")
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
           Min. 1st Qu. Median Mean 3rd Qu. Max.  NA's
siconc [1]   NA      NA     NA  NaN      NA   NA 1e+05
dimension(s):
          from   to  refsys point
longitude    1 4320  WGS 84    NA
latitude     1  961  WGS 84    NA
time         1    1 POSIXct  TRUE
                                                      values x/y
longitude       [-180.0417,-179.9583),...,[179.875,179.9583) [x]
latitude  [-80.04167,-79.95833),...,[-0.04166667,0.04166667) [y]
time                                          2021-01-09 UTC    

What is your sessionInfo() and sf_extSoftVersion() output, after loading stars?

@oshuwilson
Copy link
Author

Thanks Edzer, I tried the same code and got the same error message.

My sessionInfo() gives

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.6-4 sf_1.0-14   abind_1.4-5

loaded via a namespace (and not attached):
 [1] utf8_1.2.4         R6_2.5.1           tidyselect_1.2.0   e1071_1.7-13       magrittr_2.0.3    
 [6] glue_1.6.2         tibble_3.2.1       KernSmooth_2.23-22 parallel_4.3.2     pkgconfig_2.0.3   
[11] generics_0.1.3     dplyr_1.1.3        lifecycle_1.0.4    classInt_0.4-10    cli_3.6.1         
[16] fansi_1.0.5        vctrs_0.6.4        grid_4.3.2         DBI_1.2.1          proxy_0.4-27      
[21] class_7.3-22       compiler_4.3.2     rstudioapi_0.15.0  tools_4.3.2        pillar_1.9.0      
[26] Rcpp_1.0.11        rlang_1.1.2        units_0.8-4       

And my sf_extSoftVersion() prints

   GEOS           GDAL         proj.4 GDAL_with_GEOS     USE_PROJ_H           PROJ 
      "3.11.2"        "3.7.2"        "9.3.0"         "true"         "true"        "9.3.0" 

@edzer
Copy link
Member

edzer commented Jan 30, 2024

Please update sf to 1.0-15, and try again.

@oshuwilson
Copy link
Author

That still printed the same error message as previously. I haven't yet downloaded the latest version of RStudio but I don't imagine that would cause this error?

@edzer
Copy link
Member

edzer commented Jan 30, 2024

See also #566 (comment)

@oshuwilson
Copy link
Author

Apologies, I'm not yet proficient with R. How do I install that patch? I tried using remotes::install_github("rspatial/sf") but I'm still seeing the same error code.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

No need for you to install that patch.

@oshuwilson
Copy link
Author

Sorry I'm a bit lost as to what steps I can take from the other issue to fix my issue.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

I'm just cross linking them; I can reproduce the error on GitHub actions here: https://github.com/r-spatial/stars/actions/runs/7712573313/job/21020420577#step:6:297

@pepijn-devries
Copy link

@oshuwilson,

It seems that this issue is specific to the Windows binary release. Note that you can use CopernicusMarine for subsetting Copernicus Marine data as well. However, it does not yet support ZARR data because of the issue reported here and #566 (comment)

@oshuwilson
Copy link
Author

Thanks @pepijn-devries - I'll look at doing that to download as a netCDF if the Zarr format remains unusable for my setup. My main issue is that the full data I need is massive (~1.3TB as a netCDF but only ~250GB as Zarr), so Zarr would be preferable if it can work! But if not, I'll get a new hard drive and put my computer to the test.

@edzer
Copy link
Member

edzer commented Jan 30, 2024

It seems that this issue is specific to the Windows binary release.

Windows and MacOS binary releases; we added blosc, at least to windows binary builds, but this suggests it's not working.

@pepijn-devries
Copy link

pepijn-devries commented Mar 11, 2024

Hi @edzer,

Is there any news on the Windows build and blosc decompression of ZARR files? Thanks for your work on the package!

By the way, I did some additional testing. The issue does not only occur on Windows, but also on a Linux Fedora (virtual) machine I have set up:

library(stars)
#> Loading required package: abind
#> Loading required package: sf
#> Linking to GEOS 3.12.1, GDAL 3.7.3, PROJ 9.2.1; sf_use_s2() is TRUE
dsn <- 'ZARR:"/vsicurl/https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr"'
bounds <- c(longitude = "lon_bounds", latitude = "lat_bounds")
r <- read_mdim(dsn, bounds = bounds)
#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled

#> Warning in CPL_read_mdim(file, array_name, options, offset, count, step, : GDAL
#> Error 1: Decompressor blosc not handled
#> Error in CPL_read_mdim(file, array_name, options, offset, count, step, : CHAR() can only be applied to a 'CHARSXP', not a 'NULL'

Created on 2024-03-11 with reprex v2.1.0

With sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora Linux 39 (Workstation Edition)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=nl_NL.UTF-8       LC_NUMERIC=C               LC_TIME=nl_NL.UTF-8        LC_COLLATE=nl_NL.UTF-8    
 [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=nl_NL.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Amsterdam
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] gtable_0.3.4       dplyr_1.1.4        compiler_4.3.2     tidyselect_1.2.0   reprex_2.1.0       Rcpp_1.0.12       
 [7] clipr_0.8.0        callr_3.7.5        scales_1.3.0       yaml_2.3.8         fastmap_1.1.1      ggplot2_3.5.0     
[13] R6_2.5.1           generics_0.1.3     classInt_0.4-10    sf_1.0-15          knitr_1.45         tibble_3.2.1      
[19] units_0.8-5        munsell_0.5.0      DBI_1.2.2          pillar_1.9.0       rlang_1.1.3        utf8_1.2.4        
[25] xfun_0.42          fs_1.6.3           cli_3.6.2          withr_3.0.0        magrittr_2.0.3     ps_1.7.6          
[31] class_7.3-22       processx_3.8.3     digest_0.6.34      grid_4.3.2         rstudioapi_0.15.0  lifecycle_1.0.4   
[37] vctrs_0.6.5        KernSmooth_2.23-22 proxy_0.4-27       evaluate_0.23      glue_1.7.0         fansi_1.0.6       
[43] e1071_1.7-14       colorspace_2.1-0   rmarkdown_2.26     tools_4.3.2        pkgconfig_2.0.3    htmltools_0.5.7   

@Artur-man
Copy link

Same here, using MacOS.

library(stars)
> dsn = 'ZARR:"/vsicurl/https://storage.googleapis.com/cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/highresSST-present/r1i1p1f1/6hrPlev/psl/gn/v20170706"/'
> gdal_utils("info", dsn)
Warning messages:
1: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
2: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
3: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
4: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
5: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
6: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled
7: In CPL_gdalinfo(if (missing(source)) character(0) else source, options,  :
  GDAL Error 1: Decompressor blosc not handled

With sessionInfo():

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sf_1.0-16

loaded via a namespace (and not attached):
 [1] compiler_4.3.1     magrittr_2.0.3     class_7.3-22       DBI_1.2.3          tools_4.3.1        units_0.8-5        proxy_0.4-27       rstudioapi_0.16.0  Rcpp_1.0.13        KernSmooth_2.23-24 grid_4.3.1         e1071_1.7-14       classInt_0.4-10 

@ateucher
Copy link
Contributor

This was failing for me on Mac, both in R and on the command line, accessing ERA5 dataset on GCP

gdalinfo ZARR:"/vsigs/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr":/time

and

stars::read_stars('ZARR:"/vsigs/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr":/time')

Failed with the GDAL Error 1: Decompressor blosc not handled errors.

On Mac, it appears that gdal is not built with blosc by default by Homebrew (c-blosc is not in the depends_on list in the brew formula). However, looking at the GDAL build docs, it will use BLOSC if it is found.

So I did:

brew install c-blosc
brew reinstall gdal --build-from-source
$ gdalinfo ZARR:"/vsigs/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr":/time
Driver: Zarr/Zarr
Files: /vsigs/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr/time/.zarray
Size is 374016, 1
Origin = (-0.500000000000000,-0.500000000000000)
Pixel Size = (1.000000000000000,1.000000000000000)
Metadata:
  calendar=proleptic_gregorian
  long_name=initial time of forecast
  standard_name=forecast_reference_time
Corner Coordinates:
Upper Left  (  -0.5000000,  -0.5000000)
Lower Left  (  -0.5000000,   0.5000000)
Upper Right (  374015.500,      -0.500)
Lower Right (  374015.500,       0.500)
Center      (  187007.500,       0.000)
Band 1 Block=374016x1 Type=Int64, ColorInterp=Undefined
  Unit Type: hours since 1979-01-01 00:00:00

Then

install.packages(c("sf", "stars"), type = "source")

and voila:

stars::read_stars('ZARR:"/vsigs/gcp-public-data-arco-era5/co/single-level-reanalysis.zarr":/time')
#> stars object with 2 dimensions and 1 attribute
#> attribute(s):
#>                                          Min.  1st Qu.   Median     Mean
#> time [(hours since 1979-01-01 00:00:00)]    0 93503.75 187007.5 187007.5
#>                                           3rd Qu.   Max.
#> time [(hours since 1979-01-01 00:00:00)] 280511.2 374015
#> dimension(s):
#>   from     to offset delta x/y
#> x    1 374016   -0.5     1 [x]
#> y    1      1   -0.5     1 [y]

So I think a PR to gdal to add c-blosc to the gdal formula might just do it for Mac?

@ateucher
Copy link
Contributor

Actually, maybe a PR to https://github.com/R-macos/recipes/ is a better choice (or in addition to homebrew)

@edzer
Copy link
Member

edzer commented Dec 12, 2024

Actually, maybe a PR to https://github.com/R-macos/recipes/ is a better choice (or in addition to homebrew)

Yes, that is the place from where CRAN macos binaries originate from, which most people use.

@pepijn-devries
Copy link

I'll see if I can send a PR to https://github.com/r-windows/ for the windows build of gdal...

@ateucher
Copy link
Contributor

PR submitted here: R-macos/recipes#60. I had some trouble building gdal locally, any suggestions would be welcome!

@ateucher
Copy link
Contributor

Homebrew PR submitted as well: Homebrew/homebrew-core#201008

@edzer
Copy link
Member

edzer commented Dec 13, 2024

I'll see if I can send a PR to https://github.com/r-windows/ for the windows build of gdal...

@pepijn-devries : blosc support should be present in the windows build: https://github.com/r-spatial/sf/blob/main/src/Makevars.ucrt#L8

@pepijn-devries
Copy link

I'll see if I can send a PR to https://github.com/r-windows/ for the windows build of gdal...

@pepijn-devries : blosc support should be present in the windows build: https://github.com/r-spatial/sf/blob/main/src/Makevars.ucrt#L8

I think on Windows this bundle is used for GDAL:

https://github.com/r-spatial/sf/blob/39e8f51372e19237d95cd406ae4683a253c3c5b2/tools/winlibs.R#L11

And if I look at the build script that creates that bundle:

https://github.com/rwinlib/gdal3/blob/master/.github/workflows/bundle.sh

It does not link to blosc. Isn't that the problem?

@edzer
Copy link
Member

edzer commented Dec 13, 2024

I think rwinlib is no longer used for building CRAN windows binaries, and IIRC the R source tree (and build tools) is not on, or taken from, GitHub, at best it is a copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants