Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

EsperanzaCuartero · 2021-01-28T13:33:52Z

Challenge 21 - Machine Learning to improve the CAMS global air quality forecasts

Stream 2 - Machine Learning for weather, climate and atmosphere applications

Goal

Develop an ML algorithm to predict (and correct) the time-varying bias of the global CAMS forecast for surface PM2.5, O3 and NO2 at the locations of air quality monitoring stations.

Mentors and skills

Mentors: @JohannesFlemming , @miha-at-ecmwf @jerome-barre-ecmwf
Skills required:
- Machine Learning
- Time-series analysis
- Handling geospatial data
- Basic understanding of air quality observations and models
- Python

Note: Challenge is funded by Copernicus. Only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions).

Challenge description

CAMS/ECMWF runs a computer model to predict global air pollution at a spatial resolution of about 40x40km (grid boxes size). While the CAMS model predicts the observed air quality mostly reasonably well errors in the prediction can occur because of the necessary simplification of the CAMS model and the uncertainties in the input data such as the emissions.
The main task is to develop an ML approach to predict the forecast errors at the location of air quality stations in order to correct them as a post-processing step. The observations to be used are hourly observations of surface ozone, NO2 and PM2.5 from about 2000 stations worldwide as provided in the openAQ data repository.

We suggest the following steps towards the solution:

Build ML model
Train the ML model to predict the difference between forecast and station observations using data from a recent previous year (2019 or 2018).
The input to the correction algorithm can be the CAMS model-forecast of the air quality value (O3, NO2, PM2.5) and forecast meteorological parameters (temperature, wind speed, etc. ).
Test performance of ML bias predication with independent data
Test the performance of the ML model for recent forecasts of the 2020-2021 period. Use basic error statistics such as bias, RMSE and correlation to compare the forecast accuracy of the ML-corrected forecast against the uncorrected forecast.
Model error analysis (optional)
Investigate the importance of the individual predictors and do a spatial analysis to identify patterns that could be used to better understand or improve the forecast model.

parantak · 2021-02-03T17:56:19Z

Greetings, I am Parantak, a final year CS undergrad. This specific challenge seems interesting and it falls within the domain of my prior endeavors. I'd like to contribute to this.
I have experience in time-series modeling, specifically with financial data. I have effectuated models for time-series classification, long-horizon/short-horizon prediction, anomaly detection, etc. Additionally, I've dealt with geospatial data before for a project in Epidemiology. I'd be obliged if someone could perhaps share an elaborate explanation of the problem statement and perhaps even a brief about the dataset that'll be put into use.

Thanks :D

EsperanzaCuartero · 2021-02-03T18:07:11Z

Dear Parantak, thank you for your interest. The mentors will answer your questions as soon as possible. Best, Esperanza.

JohannesFlemming · 2021-02-05T17:30:19Z

Hello Prantak,

Many thanks for your interest in ESoWC. It sound that you are are well suited to tackle the task, given your previous experience.

On of the challenge of the task is dealing with the air quality observations as hosted by openAQ. The data come from different providers, may have gaps and errors in them. So basically it can be "real-life / dirty" data, that need specific attention before the ML processing starts. The CAMS model data for air quality and meteorological variables are gridded data maps provided in netcdf format. Handling of them should be easier. They do not have gaps in space and time.

To get an idea about the model forecast have a look here:
https://www.windy.com/-PM2-5-pm2p5?camsEu,pm2p5,51.443,-0.927,5

The miss-match between the model and the air quality observations is the forecast error. We want you to make it possible to predict the forecast error for each station using meteorological model data and other predictors. Once we have a good predication of the error, we can correct the model forecast and make the users happy. We hope to be able to learn something about the model as such, as we get a better view of the nature of the model error.

My advice for a good proposal would be to demonstrate in a good way that you will be able to handle the openAQ and consider their fragmentation in the process. We also want to get a sound proposal for the procedure to testing your ML approach.

Thanks again,
We are looking forward to your proposal or any further question you may have.

Johannes Flemming

FedericaCas · 2021-02-12T08:40:14Z

Hi,
I am Federica Casamento, a master's student in Environmental Engineering.
I have a keen interest in climate change studies and I am studying Machine Learning and Python. I have academic experience in time series analysis, extreme value analysis of rainfall and temperature, bias correction and hydrological modelling using HEC-HMS, ERA5-Land dataset, observed data.
All the projects here are very appealing, but I've chosen this challenge, in line with my interests and the topics I would like to prove myself on.
I am drafting some ideas for the proposal and look forward to sharing them with you!

Thanks so much!

EsperanzaCuartero · 2021-02-12T11:40:07Z

Dear Federica, many thanks for your interest. Looking forward to seeing your ideas.
Best
Esperanza Cuartero

jwagemann · 2021-03-12T13:01:01Z

Hi,
join us for the ECMWF Summer of Weather Code Ask Me Anything sessions and learn all things ESoWC.

When:

17 March 2021 at 4 pm GMT and
24 March 2021 at 4 pm GMT

What:

learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

svijayETH · 2021-03-15T08:51:38Z

Hi,

I am Saloni, a master's student in environmental engineering. I did my bachelor's major in environmental engineering and a minor in computer science and engineering. I am very much interested in dispersion modeling, source apportionment studies, and air pollution forecasting. My latest interest is to use machine learning in this field. Thus, this project aligns very well with my interest. I have a good command of python and R. I have taken classes in Data Mining and Environmental Systems Data Science. I have experience in handling NetCDF files as I worked on an emission inventory update project. I have used USEPA AERMOD for dispersion modeling studies. Also, I have used machine learning in two projects.

I found this idea of predicting the forecast error and using it to correct the forecast very innovative. I am looking forward to submitting a detailed proposal. I have a small question. First, a thought is mentioned, followed by a brief question.

The model forecast is available globally at 40*40 km resolution. The observations are available at only 2000 stations. At each location, the reason for forecast error can be different. The difference between the forecast and measurement can be due to ignorance of building downwash, not so accurate emission inventory, monitoring instrument error (observation error), etc. All these errors and their reasons would vary from one location to another. Thus ideally, the ML model should be trained to predict forecast error at each location separately (i.e., independent of other locations). However, if this is the case, the forecast error can be predicted only at the locations where ambient air observations are available.

So my question is:

Is the scope of this project is only to predict the forecast error at locations where ambient observations are available? Or for the entire globe?

Thanks in advance!
Regards,
Saloni

JohannesFlemming · 2021-03-15T11:47:54Z

Hello Saloni,

thank you for your interest in Challenge #21. I think you experience and your knowledge of atmospheric dispersion modelling will be very helpful to put together a good proposal.

Your are right, the model error correction method can only be derived for the stations location, where station observations are available. This will be the most important task in the challenge. The level of difficulty increases between approaches that allows to use todays near-real-time observation data in the error predictions to approaches that only rely on last years observation data to train the ML method. A special issue will be to come up with an error correction method for stations, that have different error characterisation, bur are located in the same model grid-box

Depending on the success of the error prediction method, it can be further explored if the method is also suited to extrapolate parts of the model error correction in space. But, it will requires careful cross validation.

kind regards,
Johannes

JohannesFlemming · 2021-03-16T09:55:49Z

Hello Apurba,
many thanks for your interest in the ESoWC 2021 and ECMWF. We are looking forward to your project proposal. I think your expertise and educational background is very well suited for the task.

Please note, ESoWC is not an education programme. We expected the projects teams to complete their tasks in the best possible way on their own. We will hold meetings to discuss progress and give guidance and assistance when needed, but we do not commit to mentoring and training. Needless to say ... I hope that all participating teams/individuals will learn a lot by working on the projects.

kind regards,
Johannes

jwagemann · 2021-03-22T09:56:45Z

Hi,
join us for the ECMWF Summer of Weather Code Ask Me Anything session and learn all things ESoWC.

When: Wednesday, 24 March 2021 at 4 pm GMT

What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

apurba-biswas · 2021-04-04T19:49:45Z

Hi @JohannesFlemming, I was wondering what the traditional post-processing method is for bias-correction in this context. What do you currently do with the CAMS model after it's been run?

EsperanzaCuartero · 2021-04-06T09:10:46Z

Hi Apurba, many thanks for your interest. Before submitting your proposal, please bear in mind that this is a challenge funded by Copernicus and only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions). On the ESoWC website there is a list of challenges not funded by Copernicus which can also capture your interest and fit your skills. Best, Esperanza

JohannesFlemming · 2021-04-06T15:07:27Z

Hi Apurba,
we currently do not use and model-bias or post-processing methods. We only interpolate (linear) to the station location.
Johannes

EsperanzaCuartero · 2021-04-07T09:21:42Z

Dear Apurba, many thanks for sharing your personal details. Your team is eligible. I wish you good luck with the proposal. Best, Esperanza

apurba-biswas · 2021-04-14T19:06:21Z

@jwagemann Are we allowed to submit the proposal on the 16th of April (i.e. before midnight?)

jwagemann · 2021-04-14T19:45:49Z

Hi @apurba-biswas ,
the deadline to submit a proposal is this Friday, 16 April at 23:59 BST. Mentor are not able to provide direct feedback to your proposal - but feel free to ask any questions here on Github, that might help to tailor your proposal.
Hope this helps.

EsperanzaCuartero added the stream-2 Stream 2 - Machine Learning for weather, climate and atmosphere applications label Jan 28, 2021

EsperanzaCuartero assigned miha-at-ecmwf and JohannesFlemming Jan 28, 2021

EsperanzaCuartero changed the title ~~Challenge #21 - Machine learning to improve the CAMS global air quality forecasts~~ Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts Jan 29, 2021

EsperanzaCuartero assigned JohannesFlemming and miha-at-ecmwf and unassigned miha-at-ecmwf and JohannesFlemming Feb 3, 2021

EsperanzaCuartero assigned jerome-barre-ecmwf Feb 3, 2021

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

EsperanzaCuartero commented Jan 28, 2021 •

edited by jwagemann

Loading

parantak commented Feb 3, 2021

EsperanzaCuartero commented Feb 3, 2021

JohannesFlemming commented Feb 5, 2021

FedericaCas commented Feb 12, 2021

EsperanzaCuartero commented Feb 12, 2021

jwagemann commented Mar 12, 2021

svijayETH commented Mar 15, 2021

JohannesFlemming commented Mar 15, 2021

This comment has been minimized.

JohannesFlemming commented Mar 16, 2021

jwagemann commented Mar 22, 2021

apurba-biswas commented Apr 4, 2021

EsperanzaCuartero commented Apr 6, 2021

JohannesFlemming commented Apr 6, 2021

This comment has been minimized.

EsperanzaCuartero commented Apr 7, 2021

apurba-biswas commented Apr 14, 2021 •

edited

Loading

jwagemann commented Apr 14, 2021

Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

Comments

EsperanzaCuartero commented Jan 28, 2021 • edited by jwagemann Loading

Challenge 21 - Machine Learning to improve the CAMS global air quality forecasts

Goal

Mentors and skills

Challenge description

parantak commented Feb 3, 2021

EsperanzaCuartero commented Feb 3, 2021

JohannesFlemming commented Feb 5, 2021

FedericaCas commented Feb 12, 2021

EsperanzaCuartero commented Feb 12, 2021

jwagemann commented Mar 12, 2021

svijayETH commented Mar 15, 2021

JohannesFlemming commented Mar 15, 2021

This comment has been minimized.

JohannesFlemming commented Mar 16, 2021

jwagemann commented Mar 22, 2021

apurba-biswas commented Apr 4, 2021

EsperanzaCuartero commented Apr 6, 2021

JohannesFlemming commented Apr 6, 2021

This comment has been minimized.

EsperanzaCuartero commented Apr 7, 2021

apurba-biswas commented Apr 14, 2021 • edited Loading

jwagemann commented Apr 14, 2021

EsperanzaCuartero commented Jan 28, 2021 •

edited by jwagemann

Loading

apurba-biswas commented Apr 14, 2021 •

edited

Loading