Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6

Open
EsperanzaCuartero opened this issue Jan 28, 2021 · 18 comments
Assignees
Labels
stream-2 Stream 2 - Machine Learning for weather, climate and atmosphere applications

Comments

@EsperanzaCuartero
Copy link
Contributor

EsperanzaCuartero commented Jan 28, 2021

Challenge 21 - Machine Learning to improve the CAMS global air quality forecasts

Stream 2 - Machine Learning for weather, climate and atmosphere applications

Goal

Develop an ML algorithm to predict (and correct) the time-varying bias of the global CAMS forecast for surface PM2.5, O3 and NO2 at the locations of air quality monitoring stations.

Mentors and skills


Note: Challenge is funded by Copernicus. Only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions).


Challenge description

CAMS/ECMWF runs a computer model to predict global air pollution at a spatial resolution of about 40x40km (grid boxes size). While the CAMS model predicts the observed air quality mostly reasonably well errors in the prediction can occur because of the necessary simplification of the CAMS model and the uncertainties in the input data such as the emissions.
The main task is to develop an ML approach to predict the forecast errors at the location of air quality stations in order to correct them as a post-processing step. The observations to be used are hourly observations of surface ozone, NO2 and PM2.5 from about 2000 stations worldwide as provided in the openAQ data repository.

We suggest the following steps towards the solution:

  1. Build ML model
    Train the ML model to predict the difference between forecast and station observations using data from a recent previous year (2019 or 2018).
    The input to the correction algorithm can be the CAMS model-forecast of the air quality value (O3, NO2, PM2.5) and forecast meteorological parameters (temperature, wind speed, etc. ).

  2. Test performance of ML bias predication with independent data
    Test the performance of the ML model for recent forecasts of the 2020-2021 period. Use basic error statistics such as bias, RMSE and correlation to compare the forecast accuracy of the ML-corrected forecast against the uncorrected forecast.

  3. Model error analysis (optional)
    Investigate the importance of the individual predictors and do a spatial analysis to identify patterns that could be used to better understand or improve the forecast model.


ESoWC

@EsperanzaCuartero EsperanzaCuartero added the stream-2 Stream 2 - Machine Learning for weather, climate and atmosphere applications label Jan 28, 2021
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge #21 - Machine learning to improve the CAMS global air quality forecasts Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts Jan 29, 2021
@parantak
Copy link

parantak commented Feb 3, 2021

Greetings, I am Parantak, a final year CS undergrad. This specific challenge seems interesting and it falls within the domain of my prior endeavors. I'd like to contribute to this.
I have experience in time-series modeling, specifically with financial data. I have effectuated models for time-series classification, long-horizon/short-horizon prediction, anomaly detection, etc. Additionally, I've dealt with geospatial data before for a project in Epidemiology. I'd be obliged if someone could perhaps share an elaborate explanation of the problem statement and perhaps even a brief about the dataset that'll be put into use.

Thanks :D

@EsperanzaCuartero
Copy link
Contributor Author

Dear Parantak, thank you for your interest. The mentors will answer your questions as soon as possible. Best, Esperanza.

@JohannesFlemming
Copy link

Hello Prantak,

Many thanks for your interest in ESoWC. It sound that you are are well suited to tackle the task, given your previous experience.

On of the challenge of the task is dealing with the air quality observations as hosted by openAQ. The data come from different providers, may have gaps and errors in them. So basically it can be "real-life / dirty" data, that need specific attention before the ML processing starts. The CAMS model data for air quality and meteorological variables are gridded data maps provided in netcdf format. Handling of them should be easier. They do not have gaps in space and time.

To get an idea about the model forecast have a look here:
https://www.windy.com/-PM2-5-pm2p5?camsEu,pm2p5,51.443,-0.927,5

The miss-match between the model and the air quality observations is the forecast error. We want you to make it possible to predict the forecast error for each station using meteorological model data and other predictors. Once we have a good predication of the error, we can correct the model forecast and make the users happy. We hope to be able to learn something about the model as such, as we get a better view of the nature of the model error.

My advice for a good proposal would be to demonstrate in a good way that you will be able to handle the openAQ and consider their fragmentation in the process. We also want to get a sound proposal for the procedure to testing your ML approach.

Thanks again,
We are looking forward to your proposal or any further question you may have.

Johannes Flemming

@FedericaCas
Copy link

Hi,
I am Federica Casamento, a master's student in Environmental Engineering.
I have a keen interest in climate change studies and I am studying Machine Learning and Python. I have academic experience in time series analysis, extreme value analysis of rainfall and temperature, bias correction and hydrological modelling using HEC-HMS, ERA5-Land dataset, observed data.
All the projects here are very appealing, but I've chosen this challenge, in line with my interests and the topics I would like to prove myself on.
I am drafting some ideas for the proposal and look forward to sharing them with you!

Thanks so much!

@EsperanzaCuartero
Copy link
Contributor Author

Dear Federica, many thanks for your interest. Looking forward to seeing your ideas.
Best
Esperanza Cuartero

@jwagemann
Copy link

Hi,
join us for the ECMWF Summer of Weather Code Ask Me Anything sessions and learn all things ESoWC.

When:

  • 17 March 2021 at 4 pm GMT and
  • 24 March 2021 at 4 pm GMT

What:

  • learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

@svijayETH
Copy link

Hi,

I am Saloni, a master's student in environmental engineering. I did my bachelor's major in environmental engineering and a minor in computer science and engineering. I am very much interested in dispersion modeling, source apportionment studies, and air pollution forecasting. My latest interest is to use machine learning in this field. Thus, this project aligns very well with my interest. I have a good command of python and R. I have taken classes in Data Mining and Environmental Systems Data Science. I have experience in handling NetCDF files as I worked on an emission inventory update project. I have used USEPA AERMOD for dispersion modeling studies. Also, I have used machine learning in two projects.

I found this idea of predicting the forecast error and using it to correct the forecast very innovative. I am looking forward to submitting a detailed proposal. I have a small question. First, a thought is mentioned, followed by a brief question.

The model forecast is available globally at 40*40 km resolution. The observations are available at only 2000 stations. At each location, the reason for forecast error can be different. The difference between the forecast and measurement can be due to ignorance of building downwash, not so accurate emission inventory, monitoring instrument error (observation error), etc. All these errors and their reasons would vary from one location to another. Thus ideally, the ML model should be trained to predict forecast error at each location separately (i.e., independent of other locations). However, if this is the case, the forecast error can be predicted only at the locations where ambient air observations are available.

So my question is:

Is the scope of this project is only to predict the forecast error at locations where ambient observations are available? Or for the entire globe?

Thanks in advance!
Regards,
Saloni

@JohannesFlemming
Copy link

Hello Saloni,

thank you for your interest in Challenge #21. I think you experience and your knowledge of atmospheric dispersion modelling will be very helpful to put together a good proposal.

Your are right, the model error correction method can only be derived for the stations location, where station observations are available. This will be the most important task in the challenge. The level of difficulty increases between approaches that allows to use todays near-real-time observation data in the error predictions to approaches that only rely on last years observation data to train the ML method. A special issue will be to come up with an error correction method for stations, that have different error characterisation, bur are located in the same model grid-box

Depending on the success of the error prediction method, it can be further explored if the method is also suited to extrapolate parts of the model error correction in space. But, it will requires careful cross validation.

kind regards,
Johannes

@apurba-biswas

This comment has been minimized.

@JohannesFlemming
Copy link

Hello Apurba,
many thanks for your interest in the ESoWC 2021 and ECMWF. We are looking forward to your project proposal. I think your expertise and educational background is very well suited for the task.

Please note, ESoWC is not an education programme. We expected the projects teams to complete their tasks in the best possible way on their own. We will hold meetings to discuss progress and give guidance and assistance when needed, but we do not commit to mentoring and training. Needless to say ... I hope that all participating teams/individuals will learn a lot by working on the projects.

kind regards,
Johannes

@jwagemann
Copy link

Hi,
join us for the ECMWF Summer of Weather Code Ask Me Anything session and learn all things ESoWC.

When: Wednesday, 24 March 2021 at 4 pm GMT

What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants

How: register here.

@apurba-biswas
Copy link

Hi @JohannesFlemming, I was wondering what the traditional post-processing method is for bias-correction in this context. What do you currently do with the CAMS model after it's been run?

@EsperanzaCuartero
Copy link
Contributor Author

Hi Apurba, many thanks for your interest. Before submitting your proposal, please bear in mind that this is a challenge funded by Copernicus and only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions). On the ESoWC website there is a list of challenges not funded by Copernicus which can also capture your interest and fit your skills. Best, Esperanza

@JohannesFlemming
Copy link

Hi Apurba,
we currently do not use and model-bias or post-processing methods. We only interpolate (linear) to the station location.
Johannes

@apurba-biswas

This comment has been minimized.

@EsperanzaCuartero
Copy link
Contributor Author

Dear Apurba, many thanks for sharing your personal details. Your team is eligible. I wish you good luck with the proposal. Best, Esperanza

@apurba-biswas
Copy link

apurba-biswas commented Apr 14, 2021

@jwagemann Are we allowed to submit the proposal on the 16th of April (i.e. before midnight?)

@jwagemann
Copy link

Hi @apurba-biswas ,
the deadline to submit a proposal is this Friday, 16 April at 23:59 BST. Mentor are not able to provide direct feedback to your proposal - but feel free to ask any questions here on Github, that might help to tailor your proposal.
Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream-2 Stream 2 - Machine Learning for weather, climate and atmosphere applications
Projects
None yet
Development

No branches or pull requests

9 participants