-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge #21 - Machine Learning to improve the CAMS global air quality forecasts #6
Comments
Greetings, I am Parantak, a final year CS undergrad. This specific challenge seems interesting and it falls within the domain of my prior endeavors. I'd like to contribute to this. Thanks :D |
Dear Parantak, thank you for your interest. The mentors will answer your questions as soon as possible. Best, Esperanza. |
Hello Prantak, Many thanks for your interest in ESoWC. It sound that you are are well suited to tackle the task, given your previous experience. On of the challenge of the task is dealing with the air quality observations as hosted by openAQ. The data come from different providers, may have gaps and errors in them. So basically it can be "real-life / dirty" data, that need specific attention before the ML processing starts. The CAMS model data for air quality and meteorological variables are gridded data maps provided in netcdf format. Handling of them should be easier. They do not have gaps in space and time. To get an idea about the model forecast have a look here: The miss-match between the model and the air quality observations is the forecast error. We want you to make it possible to predict the forecast error for each station using meteorological model data and other predictors. Once we have a good predication of the error, we can correct the model forecast and make the users happy. We hope to be able to learn something about the model as such, as we get a better view of the nature of the model error. My advice for a good proposal would be to demonstrate in a good way that you will be able to handle the openAQ and consider their fragmentation in the process. We also want to get a sound proposal for the procedure to testing your ML approach. Thanks again, Johannes Flemming |
Hi, Thanks so much! |
Dear Federica, many thanks for your interest. Looking forward to seeing your ideas. |
Hi, When:
What:
How: register here. |
Hi, I am Saloni, a master's student in environmental engineering. I did my bachelor's major in environmental engineering and a minor in computer science and engineering. I am very much interested in dispersion modeling, source apportionment studies, and air pollution forecasting. My latest interest is to use machine learning in this field. Thus, this project aligns very well with my interest. I have a good command of python and R. I have taken classes in Data Mining and Environmental Systems Data Science. I have experience in handling NetCDF files as I worked on an emission inventory update project. I have used USEPA AERMOD for dispersion modeling studies. Also, I have used machine learning in two projects. I found this idea of predicting the forecast error and using it to correct the forecast very innovative. I am looking forward to submitting a detailed proposal. I have a small question. First, a thought is mentioned, followed by a brief question. The model forecast is available globally at 40*40 km resolution. The observations are available at only 2000 stations. At each location, the reason for forecast error can be different. The difference between the forecast and measurement can be due to ignorance of building downwash, not so accurate emission inventory, monitoring instrument error (observation error), etc. All these errors and their reasons would vary from one location to another. Thus ideally, the ML model should be trained to predict forecast error at each location separately (i.e., independent of other locations). However, if this is the case, the forecast error can be predicted only at the locations where ambient air observations are available. So my question is: Is the scope of this project is only to predict the forecast error at locations where ambient observations are available? Or for the entire globe? Thanks in advance! |
Hello Saloni, thank you for your interest in Challenge #21. I think you experience and your knowledge of atmospheric dispersion modelling will be very helpful to put together a good proposal. Your are right, the model error correction method can only be derived for the stations location, where station observations are available. This will be the most important task in the challenge. The level of difficulty increases between approaches that allows to use todays near-real-time observation data in the error predictions to approaches that only rely on last years observation data to train the ML method. A special issue will be to come up with an error correction method for stations, that have different error characterisation, bur are located in the same model grid-box Depending on the success of the error prediction method, it can be further explored if the method is also suited to extrapolate parts of the model error correction in space. But, it will requires careful cross validation. kind regards, |
This comment has been minimized.
This comment has been minimized.
Hello Apurba, Please note, ESoWC is not an education programme. We expected the projects teams to complete their tasks in the best possible way on their own. We will hold meetings to discuss progress and give guidance and assistance when needed, but we do not commit to mentoring and training. Needless to say ... I hope that all participating teams/individuals will learn a lot by working on the projects. kind regards, |
Hi, When: Wednesday, 24 March 2021 at 4 pm GMT What: learn everything about ESoWC - how it works, the challenges this year, some tips for your proposal and listen to ESoWC experiences from previous participants How: register here. |
Hi @JohannesFlemming, I was wondering what the traditional post-processing method is for bias-correction in this context. What do you currently do with the CAMS model after it's been run? |
Hi Apurba, many thanks for your interest. Before submitting your proposal, please bear in mind that this is a challenge funded by Copernicus and only nationals from the European Union and ECMWF Member States are eligible to apply (see Terms and Conditions). On the ESoWC website there is a list of challenges not funded by Copernicus which can also capture your interest and fit your skills. Best, Esperanza |
Hi Apurba, |
This comment has been minimized.
This comment has been minimized.
Dear Apurba, many thanks for sharing your personal details. Your team is eligible. I wish you good luck with the proposal. Best, Esperanza |
@jwagemann Are we allowed to submit the proposal on the 16th of April (i.e. before midnight?) |
Hi @apurba-biswas , |
Challenge 21 - Machine Learning to improve the CAMS global air quality forecasts
Goal
Develop an ML algorithm to predict (and correct) the time-varying bias of the global CAMS forecast for surface PM2.5, O3 and NO2 at the locations of air quality monitoring stations.
Mentors and skills
Challenge description
CAMS/ECMWF runs a computer model to predict global air pollution at a spatial resolution of about 40x40km (grid boxes size). While the CAMS model predicts the observed air quality mostly reasonably well errors in the prediction can occur because of the necessary simplification of the CAMS model and the uncertainties in the input data such as the emissions.
The main task is to develop an ML approach to predict the forecast errors at the location of air quality stations in order to correct them as a post-processing step. The observations to be used are hourly observations of surface ozone, NO2 and PM2.5 from about 2000 stations worldwide as provided in the openAQ data repository.
We suggest the following steps towards the solution:
Build ML model
Train the ML model to predict the difference between forecast and station observations using data from a recent previous year (2019 or 2018).
The input to the correction algorithm can be the CAMS model-forecast of the air quality value (O3, NO2, PM2.5) and forecast meteorological parameters (temperature, wind speed, etc. ).
Test performance of ML bias predication with independent data
Test the performance of the ML model for recent forecasts of the 2020-2021 period. Use basic error statistics such as bias, RMSE and correlation to compare the forecast accuracy of the ML-corrected forecast against the uncorrected forecast.
Model error analysis (optional)
Investigate the importance of the individual predictors and do a spatial analysis to identify patterns that could be used to better understand or improve the forecast model.
The text was updated successfully, but these errors were encountered: