In this section you will learn how to register and access pangeo-eosc
services.
You need to create an EGI Check-in account and enroll to the vo.pangeo.eu
Virtual Organisation. There are several steps to follow:
- Sign up for an EGI Check-in account following these steps. Using ORCID iD to authenticate is recommended.
- Enroll in the
vo.pangeo.eu
Virtual Organisation (VO) by clicking on the enrollment URL using the EGI Check-in account created in the previous step. Review and click onSubmit
. Please add a note in the statement of purpose when requesting to join the VO explaining why you want to accesspangeo-eosc
.
Managers of the Virtual Organisations may take several days to approve your petitions to join and also get back to you via email to verify your identity.
Access DaskHub via https://pangeo-eosc.vm.fedcloud.eu/ and choose among the 4 available flavors (as shown on the figure below):
- Pangeo Notebook uses a docker image maintained by the Pangeo community. It contains all the Python packages you need to data analysis and visualization. The list of packages and all the Pangeo Notebook environment is made available here; look up the
pangeo-notebook
folder. - Machine Learning Pangeo notebook with GPU enable tensorflow2: similarly, it is maintained by the Pangeo community and the complete computational environment with the list of Python packages is also available at https://github.com/pangeo-data/pangeo-docker-images in the
ml-notebook
folder. This flavor contains all the packages from the Pangeo Notebook flavor and is GPU-enabled tensorflow2. Choose this flavor if you need GPUs; for instance for training neural networks; - Machine Learning Pangeo notebook with GPU enable pytorch: it is the same as
ml-notebook
but with GPU-enabled pytorch. - Datascience Notebook with Python, R and Julia is maintained by the Jupyter community at https://github.com/jupyter/docker-stacks. Look up the
datascience-notebook
folder. It contains 3 different kernels, namely Python, R and Julia notebooks. Please note that you would probably need to add additional packages as the list of available packages is not exhaustive.
Currently (September 2023) we have configured quotas to host 20 simultaneous users with Jupyter (8 CPUs, 32GB RAM) and a Dask cluster (max: 4 workers, each worker with 8 CPUs and 32 GB RAM). This is subject to change depending on usage and resource availability at CESNET.
You need to click on Sign in with EGI Check-in
and then use your ORCID iD credentials.
A Dask Gateway is available for scaling your computation. For more details on this deployment, you may want to take a look at Daskhub helm chart.
Each user has a very small amount of local storage when using the DaskHub as it is not meant to be used for storing large data. Instead a dedicated MinIO Object storage has been setup.
The MinIO console endpoint is: https://pangeo-eosc-minio.vm.fedcloud.eu/. You can authenticate to the MinIO Object Storage in the same way you login to DaskHub. As shown on the Figure below, make sure you "Select Other Authentication Method" and "Login with SSO (checkin)" to access the MinIO console. Then use your ORCID iD to login.
You can create, access and manage your buckets from the minIO console (or use minIO Python package). The figure below shows the GUI (with several tabs on the left; the bucket tab is selected on the figure): initially, you won't have any buckets so please feel free to create public/privates buckets. As an individual user, make sure to let your bucket-name start with the prefix os.environ['JUPYTERHUB_USER']+'-'
as shown in DaskHub, otherwise the bucket will not be created. The value of os.environ['JUPYTERHUB_USER']
shows up in the top-right corner, next to the Logout
button when you first log into https://pangeo-eosc.vm.fedcloud.eu/.
In addition to the MinIO console, the API end point is https://pangeo-eosc-minioapi.vm.fedcloud.eu/
for those who prefer to interact with MinIO via the API. Please check out this example to get started.
If you need support, please open an issue.
Check out the open grafana dashboard. It is particularly useful to check that there are GPUs available before requesting an environment with GPU.
Join the Pangeo community in Europe in a weekly call every Tuesday at 9.30am CET/CEST at: https://meet.jit.si/pangeo-europe
Attend the meeting not only to get to know each other but also to ask questions about how to use the Pangeo ecosystem.
Please use the text below to cite/acknowledge Pangeo-EOSC services:
Pangeo-EOSC has benefited from services and resources provided by the EGI-ACE project (funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement no. 101017567), and the C-SCALE project (funded by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 101017529), with the dedicated support of CESNET. Computational resources were provided by the e-INFRA CZ project (ID:90254), supported by the Ministry of Education, Youth and Sports of the Czech Republic.
The European Open Science Cloud (EOSC) aims at becoming the main environment for hosting and processing research data to support European Science.
Pangeo is a worldwide community for Big Data geoscience promoting open, reproducible, and scalable science.
Pangeo Europe aims at highlighting European contributions to the Pangeo Community and at providing a reference deployment for Pangeo on EOSC. The Pangeo deployment on EOSC has been made possible thanks to CESNET in the context of the the EGI-ACE project and the C-SCALE project.