These are the steps to deploy Daskhub, a Dask Gateway enabled Jupyterhub using the infrastructure of the EGI Federation.
Please contact us for support via GitHub.
Getting access consists of the following steps:
- Sign up for an EGI Check-In account.
- Request to join the
vo.pangeo.eu
Virtual Organisation (VO) by visiting the enrollment URL with your EGI Check-In account. The subscription requires approval from the VO Managers. For further information, please check the VO ID card. - Use Infrastructure Manager. Look at how to add your credentials here.
A few considerations before we start:
- You need to be a member of the
vo.pangeo.eu
VO. Please see steps above. - We will be using the DaskHub and Pangeo interchangeably along this document. See the history here.
Here is an overview of the steps that we will follow:
- Configure a DNS name for your Pangeo deployment using the Dynamic DNS service.
- Get credentials from EGI Check-In to allow
configuring Jupyterhub authentication using this service, giving access to all members
of the
vo.pangeo.eu
Virtual Organization to your deployment. - Deploy a Kubernetes cluster on top of OpenStack, along with other tools like Grafana. Infrastructure Manager Dashboard (or simply IM Dashboard) will do this all for us automatically.
- Configure and install the DaskHub helm chart using Helm.
Log into the Dynamic DNS web GUI portal
with your EGI Check-In account to configure the DNS host name of your choice.
The web portal is intuitive, and there is also the associated
documentation
so we will not go into more details here.
Just use the Add Host
button, and follow the steps.
For the rest of this tutorial, let us consider the pangeo.vm.fedcloud.eu
host name.
You need to follow this step for every new Pangeo deployment host name generated, if you want to link it with EGI Check-in (which is recommended).
Please follow the steps on the Check-In documentation for Service Providers. In particular, the instructions using OpenID Connect as the authentication and authorization protocol. You need to fill the registration form at EGI Federation Registry.
Check-In runs three separate instances: production (aai.egi.eu
),
demo (aai-demo.egi.eu
), and development (aai-dev.egi.eu
). To quickly test
integration with Check-In, we suggest to configure Pangeo to connect to
the development instance (this way you can self approve the service registration).
Note that by default the vo.pangeo.eu
VO only exists
in the production instance of Check-In. Please ask the EGI Check-In team (via
an email to [email protected]
) to create the VO in the development instance
of Check-In.
Here are additional details to fill out the registration form:
General
tab:Integration environment
:Development
Protocol Specific
tab:Select Protocol
:OIDC Service
Client ID
: leave this emptyApplication Type
:Web
Redirect URI
:https://pangeo.vm.fedcloud.eu/hub/oauth_callback
. Adapt it to your own host name.Scope
: selectopenid
,email
,profile
, andeduperson_entitlement
.Grant Types
:authorization code
Token Endpoint Authorization Method
:Client Secret over HTTP Basic
Client Secret
: leave this empty
Once you submitted the registration form, you'll need to wait for approval, or review the
request yourself if using the Dev Federation, and then wait for the Deployment to be done.
Then, you'll just need to get the Client ID
and Client Secret
that should have been generated.
Log into the IM Dashboard
(if you want to deploy an Elastic cluster, be sure to use operational
instance of IM Dashboard, not im-dashboard-dev
)
with your EGI Check-In account, and configure your
credentials
with the vo.pangeo.eu
VO.
Then click on the
Kubernetes template
and add Make Kubernetes Virtual Cluster Elastic
and
Launch Prometheus + Grafana on top of a Kubernetes Virtual Cluster
as optional features.
There are also Jupyterhub and Daskhub available templates, but we'll do the deployment of
these services using Helm for now, as Daskhub Tosca template is not yet mature enough.
Out of all the configuration options, please pay special attention to the following:
HW Data
tab:- Front end node CPUs and Memory: if you want to create a big Kubernetes cluster, set at least 4 CPUs, and 16 GB of memory. No user pods will run on this front end node.
- Worker nodes CPUs and Memory: on CESNET, you can go up to 16 CPUs and 64GB memory. All user pods will run on these nodes.
Kubernetes Data
tab:Access Token for the Kubernetes admin user
: Please change itVersion of Kubernetes to install
: we currently only tested with 1.23.11Flag to install Cert-Manager
must be set toTrue
Email to be used in the Let's Encrypt issuer
: add your preferred email.DNS name of the public interface of the FE node to generate the certificate
must be set to the one configure in Step 1.
Elastic Data
tab:Maximum Number of WNs in the cluster
: please set the maximum of node you'll want the Kubernetes cluster to expand to.Min Number of free WNs in the cluster
: 0 or 1, WNs without allocated pods. With this option you ensure the number of empty WNs in preparation for a peak workload, and therefore you save some time when growing the cluster.
Cloud Provider Selection
tab:- Select the credentials configured earlier.
Select Site image
: selectubuntu-focal
orubuntu-jammy
.
Please review all configuration options, click Submit
and wait for the
deployment to finish with status Configured
. Then click on Outputs
and
take a note of the IP address assigned. See the
documentation
for IM Dashboard to learn more.
In order to be able to issue kubectl
or helm
commands described after, you'll need
to connect to the Kubernetes cluster front end node.
In order to do that when the deployement is in Configured
status, you'll need to follow these steps:
- On IM Dashboard, click on the VM with id 0 from the deployed infrastructure.
- Download the Credential file (
key.pem
file) on your local computer. - Add the key to your ssh agent:
chmod 600 /path/to/key.pem
ssh-add /path/to/key.pem
- Then just connect to the VM:
ssh cloudadm@<external_ip_of_frontendnode>
. External IP should be accessible from IM Dashboard Outputs of your ifnrastructure. - Launch a bash session using
bash
From there, you should be able to issue kubectl
or helm
commands, for example:
$ sudo kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-5754c77bd9-sw8gj 1/1 Running 0 36h
cert-manager cert-manager-cainjector-7bb5c8d6-2xf5w 1/1 Running 0 36h
cert-manager cert-manager-webhook-56ccc5ff8f-pj52w 1/1 Running 0 36h
daskhub api-daskhub-dask-gateway-655db6fb79-q6626 1/1 Running 0 11m
daskhub continuous-image-puller-lkzfc 1/1 Running 0 11m
daskhub controller-daskhub-dask-gateway-6d988656cf-66kht 1/1 Running 0 36h
...
Please go to the Dynamic DNS web GUI portal
and update the public IP for the DNS name with the one shown in the Outputs
button of the IM Dashboard. Now Reconfigure
the deployment so https
is
correctly configured with Let's Encrypt. There is still a missing step that
will be done in the section below (configuring the ingress with correct values
in the Helm values.yaml file).
After you have successfully registered the Pangeo service in the
EGI Federation Registry and the
Kubernetes cluster is deployed, below is the
values.yaml
that you need to deploy a Daskhub with Check-In authentication.
You'll need to replace some values in there:
token1
must be replaced with a hash generated usingopenssl rand -hex 32
on Linux.token2
must be replaced with another hash generated usingopenssl rand -hex 32
on Linux.egi_oauth_client
must be replaced by theClient ID
created during step 2.egi_oauth_secret
must be replaced by theClient Secret
created during step 2.- Don't forget to replace
pangeo.vm.fedcloud.eu
with your DNS name.
You might want also to modify other things (you'll be able to do it later if needed):
- The
dasklimits
part. - The
c.Backend.cluster_options
limit of dask-gateway workers. - The Docker image used for Jupyter notebooks and Dask, please search for
pangeo/pangeo-notebook
. You might want to change either the image or just the associated tag. - The Jupyter notebook resources limit in
singleuser
.
dask-gateway:
enabled: true
gateway:
auth:
jupyterhub:
apiToken: token1 # replace this
type: jupyterhub
extraConfig:
dasklimits: |
c.ClusterConfig.cluster_max_cores = 6
c.ClusterConfig.cluster_max_memory = "24 G"
c.ClusterConfig.cluster_max_workers = 4
c.ClusterConfig.idle_timeout = 1800
optionHandler: |
from dask_gateway_server.options import Options, Integer, Float, String
def options_handler(options):
if ":" not in options.image:
raise ValueError("When specifying an image you must also provide a tag")
return {
"worker_cores": options.worker_cores,
"worker_memory": int(options.worker_memory * 2 ** 30),
"image": options.image,
}
c.Backend.cluster_options = Options(
Integer("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
Float("worker_memory", default=2, min=2, max=8, label="Worker Memory (GiB)"),
String("image", default="pangeo/pangeo-notebook:2022.09.21", label="Image"),
handler=options_handler,
)
dask-kubernetes:
enabled: false
jupyterhub:
hub:
config:
GenericOAuthenticator:
client_id: egi_oauth_client # replace this
client_secret: egi_oauth_secret # replace this
oauth_callback_url: https://pangeo.vm.fedcloud.eu/hub/oauth_callback # replace this with your DNS name
authorize_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/auth
token_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/token
userdata_url: https://aai-dev.egi.eu/auth/realms/egi/protocol/openid-connect/userinfo
login_service: EGI Check-In
scope:
- openid
- email
- profile
- eduperson_entitlement
username_key: preferred_username
userdata_params:
state: state
allowed_groups:
- urn:mace:egi.eu:group:vo.pangeo.eu:role=member#aai.egi.eu
claim_groups_key: eduperson_entitlement
JupyterHub:
authenticator_class: generic-oauth
services:
dask-gateway:
apiToken: token1 # replace this
ingress:
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
enabled: true
hosts:
- pangeo.vm.fedcloud.eu # replace this with your DNS name
tls:
- hosts:
- pangeo.vm.fedcloud.eu # replace this with your DNS name
secretName: pangeo.vm.fedcloud.eu # replace this with your DNS name
proxy:
secretToken: token2 # replace this
service:
type: ClusterIP
singleuser:
cpu:
guarantee: 1
limit: 2
defaultUrl: /lab
extraEnv:
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
image:
name: pangeo/pangeo-notebook
tag: 2022.09.21
memory:
guarantee: 2G
limit: 4G
startTimeout: 600
storage:
capacity: 2Gi
type: dynamic
rbac:
enabled: true
Here is the helm
command to apply the changes (you might want to update or change helm chart version):
sudo helm upgrade daskhub daskhub \
--repo=https://helm.dask.org \
--install --wait \
--cleanup-on-fail \
--create-namespace \
--namespace daskhub \
--version 2022.8.2 \
--values values.yaml
If all went well, JupyterHub will be available at https://pangeo.vm.fedcloud.eu/
All members of the vo.pangeo.eu
VO will be able to log into
JupyterHub with Check-In now at the DNS name created in Step 1
(e.g. https://pangeo.vm.fedcloud.eu/).
Here we collect additional information that may be helpful to reconfigure the Pangeo cluster.
If things are not working as expected, you might want to look at
the CLUES logs (/var/log/clues2/clues2.log
).
If you see some OIDC auth Token expired
message in the file (which might
happen right after the Kubernetes deployment), you'll need to renew manually
the OIDC token. To do so:
- Login to https://im.egi.eu using EGI Checkin.
- Go to menu Advanced -> Settings and copy the OIDC Access token value.
- SSH to the cluster,
- Remove
/usr/local/ec3/refresh.dat
. - Edit
/usr/local/ec3/auth.dat file
, copy your OIDC token in two places. - Wait a few minutes, new file refresh.dat must appear, and the
OIDC auth Token expired
message from the log should disappear.
You need to check in /etc/clues2/conf.d/plugin-kubernetes.cfg that variables are correct, especially variables related to current flavor of WN VMs.
For example, for a flavor with 8 cores and 32GiB:
KUBERNETES_NODE_MEMORY=33567285248
KUBERNETES_NODE_SLOTS=8
Then, you'll need to restart clues2 service:
service cluesd restart
If you don't want to go through the step of configuring EGI Check-in auth for developement purpose, you can chose another authentication method, for example NativeAuthenticator.
You might also want to suppress all the limitation on Dask cluster and workers, have more resources on you Jupyterlab notebook server, and configure the latest Pangeo images for Jupyter and dask.
Please use the values.yaml
below to so:
dask-gateway:
enabled: true
gateway:
auth:
jupyterhub:
apiToken: token1 # replace this
type: jupyterhub
extraConfig:
optionHandler: |
from dask_gateway_server.options import Options, Integer, Float, String
def options_handler(options):
if ":" not in options.image:
raise ValueError("When specifying an image you must also provide a tag")
return {
"worker_cores": options.worker_cores,
"worker_memory": int(options.worker_memory * 2 ** 30),
"image": options.image,
}
c.Backend.cluster_options = Options(
Integer("worker_cores", default=1, min=1, max=8, label="Worker Cores"),
Float("worker_memory", default=4, min=2, max=32, label="Worker Memory (GiB)"),
String("image", default="pangeo/pangeo-notebook:latest", label="Image"),
handler=options_handler,
)
dask-kubernetes:
enabled: false
jupyterhub:
hub:
config:
Authenticator:
admin_users:
- admin
JupyterHub:
admin_access: true
authenticator_class: nativeauthenticator.NativeAuthenticator
extraConfig:
10-auth-config: |
import os, nativeauthenticator
c.JupyterHub.template_paths = [f"{os.path.dirname(nativeauthenticator.__file__)}/templates/"]
services:
dask-gateway:
apiToken: token1 # replace this
ingress:
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: "letsencrypt-prod"
enabled: true
hosts:
- pangeo.vm.fedcloud.eu # replace this with your DNS name
tls:
- hosts:
- pangeo.vm.fedcloud.eu # replace this with your DNS name
secretName: pangeo.vm.fedcloud.eu # replace this with your DNS name
proxy:
secretToken: token2 # replace this
service:
type: ClusterIP
singleuser:
cpu:
guarantee: 2
limit: 4
defaultUrl: /lab
extraEnv:
DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE: '{JUPYTER_IMAGE_SPEC}'
image:
name: pangeo/pangeo-notebook
tag: latest
memory:
guarantee: 4G
limit: 8G
startTimeout: 600
storage:
capacity: 2Gi
type: dynamic
rbac:
enabled: true
You'll need the same helm
command as above to apply the changes:
sudo helm upgrade daskhub daskhub \
--repo=https://helm.dask.org \
--install --wait \
--cleanup-on-fail \
--create-namespace \
--namespace daskhub \
--version 2022.8.2 \
--values values.yaml
You can use Daskhub Tosca template to deploy a Daskhub platform. However, it won't be configured with EGI Checkin, and it may lack some settings as it has not been updated since July 2022. If using it, you'lll probably have to use Helm commands anyway.
You can use it by selecting it as Kubernetes option, and filling the options:
Dask Data
tab:Jupyterhub auth token
: please configure an auth token (e.g. withopenssl rand -hex 32
on Linux)- Use
Jupyterhub singleuser image
andJupyterhub singleuser image version
to configure the default user environment in JupyterHub with a container image of your choice.