Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentinel-2 BOA_ADD_OFFSET harmonisation #134

Open
petebunting opened this issue Nov 23, 2022 · 13 comments
Open

Sentinel-2 BOA_ADD_OFFSET harmonisation #134

petebunting opened this issue Nov 23, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@petebunting
Copy link

Hi all,

I wonder whether you might consider harmonisation of the Sentinel-2 data concerning the BOA_ADD_OFFSET, which was introduced in Jan 2022?

The Google Earth Engine do this (https://developers.google.com/earth-engine/datasets/catalog/sentinel-2) as does sentinel-hub (https://forum.sentinel-hub.com/t/sentinel-2-processing-baseline-changes-and-harmonizevalues/4635)

The BOA_ADD_OFFSET value is provided within the product-metadata XML file, but it is not very easy to read this offset when using several images. Even assuming the offset remains at -1000 and isn't read from the header file, it is still something that all users need to remember to apply, and this isn't documented on the planetary computer Sentinel-2 page.

Pete

@petebunting
Copy link
Author

petebunting commented Nov 23, 2022

For reference for others, this is the function I've written to apply the offset - don't know if there is a better solution?

def apply_sen2_offset(sen2_scns_xa, offset=-1000):
    import numpy
    # Define the date splitting whether the offset should be applied.
    off_date = numpy.datetime64('2022-01-25')
    # Get Minimum date in timeseries
    time_min = sen2_scns_xa.time.min().values
    # Get Maximum date in timeseries
    time_max = sen2_scns_xa.time.max().values
    
    # Get the list of variables
    bands = list(sen2_scns_xa.data_vars)
    # List of all bands for which offset should be applied if present.
    s2_img_bands = ["B01", "B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B09", "B10", "B11", "B12"]
    
    sen2_scns_xa
    if (time_min < off_date) and (time_max > off_date):
        # Crosses the offset data and therefore part of the dataset needs offset applying
        sen2_scns_xa_pre_off = sen2_scns_xa.sel(time=slice(time_min, off_date))
        sen2_scns_xa_post_off = sen2_scns_xa.sel(time=slice(off_date, time_max))
        for band in bands:
            if band in s2_img_bands:
                sen2_scns_xa_post_off[band] = sen2_scns_xa_post_off[band] + offset
                sen2_scns_xa_post_off[band].where(sen2_scns_xa_post_off[band] < 0, 0)
        sen2_scns_xa = xarray.concat([sen2_scns_xa_pre_off, sen2_scns_xa_post_off], dim="time")
    elif time_min > off_date:
        # All scenes after offset date apply to all
        for band in bands:
            if band in s2_img_bands:
                sen2_scns_xa[band] = sen2_scns_xa[band] + offset
                sen2_scns_xa[band].where(sen2_scns_xa[band] < 0, 0)
    #else: time_max < off_date:
        # Do nothing - no offset required
    return sen2_scns_xa

To use:

# Load data
sen2_scn_xa = odc.stac.stac_load(
    signed_items,
    bands=bands,
    groupby="solar_day",
    chunks={"time":24, "latitude": 1028, "longitude": 1028},
    bbox=bbox,
    crs="EPSG:4326",
    resolution=img_res
)
# Apply offset
sen2_scn_xa = apply_sen2_offset(sen2_scn_xa)

@TomAugspurger
Copy link

Thanks for opening this. xref #40.

As you note the current workaround is for users to apply the offset themselves. This isn't great, and we're looking into how to make this better.

I'll take a closer look at your function, but one thing I noticed: you might need to be careful with dtypes and wrapping around. I believe the raw data for some of those bands is uint16 so you need to ensure that values smaller than the offset don't wrap around before clipping.

@TomAugspurger
Copy link

Also xref pytroll/satpy#1850, which (maybe?) handled this in satpy. I haven't looked too closely.

@petebunting
Copy link
Author

petebunting commented Nov 23, 2022

Thanks Tom.

Good point on wrapping hadn't thought of that. I've added an extra where call to set values > 10,000 to 0 as well. Bit of a hack, but it is a solution...

def apply_sen2_offset(sen2_scns_xa, offset=-1000):
    import numpy
    # Define the date splitting whether the offset should be applied.
    off_date = numpy.datetime64('2022-01-25')
    # Get Minimum date in timeseries
    time_min = sen2_scns_xa.time.min().values
    # Get Maximum date in timeseries
    time_max = sen2_scns_xa.time.max().values
    
    # Get the list of variables
    bands = list(sen2_scns_xa.data_vars)
    # List of all bands for which offset should be applied if present.
    s2_img_bands = ["B01", "B02", "B03", "B04", "B05", "B06", "B07", "B08", "B8A", "B09", "B10", "B11", "B12"]
    
    sen2_scns_xa
    if (time_min < off_date) and (time_max > off_date):
        # Crosses the offset data and therefore part of the dataset needs offset applying
        sen2_scns_xa_pre_off = sen2_scns_xa.sel(time=slice(time_min, off_date))
        sen2_scns_xa_post_off = sen2_scns_xa.sel(time=slice(off_date, time_max))
        for band in bands:
            if band in s2_img_bands:
                sen2_scns_xa_post_off[band] = sen2_scns_xa_post_off[band] + offset
                sen2_scns_xa_post_off[band].where(sen2_scns_xa_post_off[band] < 0, 0)
                sen2_scns_xa_post_off[band].where(sen2_scns_xa_post_off[band] > 10000, 0)
        sen2_scns_xa = xarray.concat([sen2_scns_xa_pre_off, sen2_scns_xa_post_off], dim="time")
    elif time_min > off_date:
        # All scenes after offset date apply to all
        for band in bands:
            if band in s2_img_bands:
                sen2_scns_xa[band] = sen2_scns_xa[band] + offset
                sen2_scns_xa[band].where(sen2_scns_xa[band] < 0, 0)
                sen2_scns_xa[band].where(sen2_scns_xa[band] > 10000, 0)
    #else: time_max < off_date:
        # Do nothing - no offset required
    return sen2_scns_xa

@TomAugspurger
Copy link

I haven't tested it yet, but I think that

band = band.clip(1000)
band -= 1000

should be equivalent and avoids the need to clip / .where values that potentially wrapped around.

@TomAugspurger
Copy link

Quick note: we have an example at https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a#Baseline-Change discussing this now.

@dzanaga
Copy link

dzanaga commented Mar 3, 2023

Quick note: we have an example at https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a#Baseline-Change discussing this now.

Thanks for sharing. There is a typo to fix perhaps, the clipped dataset is not returned.

I am pasting here my version of that example which checks on the baseline instead of the date. Perhaps not needed, but I was wondering if all products on the 25th of Jan are with the new baseline...

def harmonize(data):
    """
    Harmonize new Sentinel-2 data to the old baseline.
    
    https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a#Baseline-Change
    https://github.com/microsoft/PlanetaryComputer/issues/134
    
    Parameters
    ----------
    data: xarray.DataArray
        A DataArray with four dimensions: time, band, y, x

    Returns
    -------
    harmonized: xarray.DataArray
        A DataArray with all values harmonized to the old
        processing baseline.
    """
    baseline = data.coords['s2:processing_baseline'].astype(float)
    baseline_flag = baseline < 4
    
    if all(baseline_flag):
        return data
    
    offset = 1000
    bands = ["B01", "B02", "B03", "B04",
             "B05", "B06", "B07", "B08",
             "B8A", "B09", "B10", "B11", "B12"]

    old = data.isel(time=baseline_flag)
    to_process = list(set(bands) & set(data.band.data.tolist()))
    
    new = data.sel(time=~baseline_flag).drop_sel(band=to_process)

    new_harmonized = data.sel(time=~baseline_flag, band=to_process).copy()
    
    new_harmonized = new_harmonized.clip(offset)
    new_harmonized -= offset

    new = xr.concat([new, new_harmonized], "band").sel(band=data.band.data.tolist())
    return xr.concat([old, new], dim="time")

@TomAugspurger
Copy link

Thanks

There is a typo to fix perhaps, the clipped dataset is not returned.

Can you expand on this? In

    new = xr.concat([new, new_harmonized], "band").sel(band=data.band.data.tolist())
    return xr.concat([old, new], dim="time")

the clipped data should be in new_harmonized. And so it should be in the output thanks to the final concat.

@dzanaga
Copy link

dzanaga commented Mar 6, 2023

Sorry, I meant that the line:

new_harmonized.clip(offset)

in the example, should be:

new_harmonized = new_harmonized.clip(offset)

that's the typo I was referring to.

@TomAugspurger
Copy link

TomAugspurger commented Mar 6, 2023

Whoops. I thought that clip operated inplace. I'll get that fixed. Thanks for catching that! (And this highlights the need for this kind of functionality to be in a library, not copied from script to script).

TomAugspurger pushed a commit to microsoft/PlanetaryComputerExamples that referenced this issue Mar 6, 2023
@TomAugspurger
Copy link

microsoft/PlanetaryComputerExamples#249 updates the notebook.

TomAugspurger pushed a commit to microsoft/PlanetaryComputerExamples that referenced this issue Mar 6, 2023
@mhungen
Copy link

mhungen commented May 8, 2023

I would also like to know if there are plans to integrate the BOA_ADD_OFFSET.
As far as I know, these can be different for each band (at least in theory).
Is it possible to add the offset as a property of the assets?

e.g.: item.assets.B04.s2:boa-add-offset

@TomAugspurger
Copy link

stactools-packages/sentinel2#47 added it to the stactools package. We'll incorporate that into our pipeline at some point, but don't have an ETA right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants