-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathconsortium.tex
309 lines (272 loc) · 17.7 KB
/
consortium.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
\label{sec:consortium}
\eucommentary{
The individual members of the consortium are described in a
separate section under Part A. There is no need to repeat that
information here.
\begin{itemize}
\item
Describe the consortium. How does it match the project's objectives,
and bring together the necessary disciplinary and inter-disciplinary
knowledge. Show how this includes expertise in social sciences and
humanities, open science practices, and gender aspects of R\&I, as
appropriate. Include in the description affiliated entities and
associated partners, if any.
\item
Show how the partners will have access to critical infrastructure
needed to carry out the project activities.
\item
Describe how the members complement one another (and cover the value
chain, where appropriate)
\item
In what way does each of them contribute to the project? Show that
each has a valid role, and adequate resources in the project to fulfil
that role.
\item
If applicable, describe the industrial/commercial involvement in the
project to ensure exploitation of the results and explain why this is
consistent with and will help to achieve the specific measures which
are proposed for exploitation of the results of the project (see
section 2.2).
\item
\textbf{Other countries and international organisations}: If one or
more of the participants requesting EU funding is based in a country
or is an international organisation that is not automatically eligible
for such funding (entities from Member States of the EU, from
Associated Countries and from one of the countries in the exhaustive
list included in the Work Programme General Annexes B are
automatically eligible for EU funding), explain why the participation
of the entity in question is essential to successfully carry out the
project.
\end{itemize}
}
\subsubsection{Consortium composition}
\TODO{Time permitting: add map of Europe with pointers to locations of each
site. Just to break up the pages of text without pictures.}
The \TheProject consortium spans the broad spectrum of actors required for
successfully developing and disseminating tools and infrastructure for open and
reproducible computational science, catering to the needs of the European and
global scientific community. It is composed of one academic institution
(University of Oslo), three research organisations (Max Planck Gesellschaft, Ifremer,
Simula), and one SME (QuantStack) based in three different countries (Norway,
France, Germany).
The consortium has developed through collaborations and common interests. Some
partners have been working together on different aspects of Jupyter development
(\site{QS}, \site{SRL}), software for education (\site{QS}, \site{SRL},
\site{MP}) and use of Jupyter tools for reproducible science (\site{SRL}, \site{MP})
for many years. Others contribute significant expertise in
the practice of open science and training (\site{IFR}, \site{UIO}).
Many participants (\site{MP}, \site{UIO}, \site{IFR}) have scientists involved who work on
facilitating computational and open science for scientists in their
institutions. As such, each of them has experience and a good overview of the
requirements for effective science and reproducible science from the many
research projects they are connected to. In addition, several of them are
research active in scientific domains, reproducibility and education.
The existing Binder tools -- which are the baseline for this project --
originate from Project Jupyter. We have core Jupyter and Binder developers in
our team, and thus direct access to developer expertise and experience.
Finally, we note that all project partners are long time passionate advocates of
open and reproducible science; building on highly successful past experience
with OpenDreamKit, they \emph{have chosen to write this proposal fully in the
open} on GitHub
(\href{https://github.com/minrk/horizon-widera-2022}{https://github.com/minrk/horizon-widera-2022})
for maximum transparency and engagement of the community. We have used the same
open source collaboration tools and practices as the open source open science
community.
\subsubsection{Complementarity and interdisciplinarity}
\label{sec:complementarity-and-interdisciplinarity}
For the successful delivery of this project and its mission to enable better
reproducibility and science through better software (and services built upon it),
we need complementary expertise from researchers and research software
engineers. As we build on, improve, and advance existing software tools from
Project Jupyter, it will be essential to know these well. As our approach will
provide automatic reproducible computational environments if best practice is
followed by the researchers, the education and training aspect for best practice
is also vital for this project.
The chosen consortium ensures a critical mass of interdisciplinary expertise and
excellence in key areas (such as natural sciences, education, software
engineering, Project Jupyter) with research organisations and SMEs of recognised
international reputation:
\begin{compactitem}
\item A set of use cases that cover several application domains and users, and that impose very diverse
requirements on open tools (\site{MP}, \site{IFR}, \site{UIO});
\item Lead developers in the Jupyter Ecosystem, including IPython, the Jupyter notebook, JupyterLab,
JupyterHub, Binder, mybinder.org, Jupyter Widgets (\site{SRL}, \site{QS})
\item Experts and major promoters of the Jupyter collaborative user interfaces
for interactive, exploratory and reproducible computing in a variety of scientific domains (\site{MP}, \site{IFR}, \site{UIO});
\item A long experience and proven track record of success with large and complex collaborative projects,
including European E-Infrastructure projects (\site{MP}, \site{SRL}),
projects focused on large-scale infrastructures and large experimental services (\site{MP}, \site{IFR}),
as well as experience in running large scale open source projects (Jupyter
project, \site{SRL}, \site{QS});
\item Experience in educating students and experienced researchers on
computational methods and open science (\site{SRL}, \site{MP}, \site{IFR}, \site{UIO});
\item A comprehensive range of skill sets and competencies in several relevant domains,
from applied research to standardisation to business analysis.
\end{compactitem}
We have budgeted travel funds to visit each other for short periods (of a few
weeks) where this is helpful to work more effectively and improve our ability to
work together within the interdisciplinary team of participants.
\subsubsection{Capacities and roles of participants}
\noindent \textbf{Simula Research Laboratory}
(\site{SRL}) is an internationally-leading Norwegian research institute in the key
ICT areas: communication systems, scientific computing, and software
engineering. Dedicated to tackling scientific challenges with long-term impact and of
genuine importance to real life, Simula offers an environment that emphasises
and promotes basic research. This translates into numerous projects funded by
the EU, Norwegian government, or regional institutions, that Simula was
involved in.
Benjamin Ragan-Kelley has contributed to the Jupyter Project since its
inception as a lead developer, and headed the Numerical Analysis and
Scientific Computing department at Simula from 2018-2021.
While continuing to contribute to and maintain the
open source software around which \TheProject centers,
he also researches the effectiveness and usefulness of such tools for education~\cite{JupyterHub-for-education-2016}
and reproducible science~\cite{binder,Forde2018ReproducibleRE,nbval-arxiv,repo2docker-checker2020,Beg2021}.
Simula's role -- in addition to managing the overall project -- is to provide
technical leadership and in-depth expertise of the Jupyter and Binder project, which will be
instrumental in the execution of this project.
As lead partner of the project,
Simula is the largest beneficiary, both as the largest technical contributor,
and for project-wide administrative support,
as well as the host of some project activities such as workshops,
and cloud computing resources.
\medskip \noindent The \textbf{Max Planck Society}
(Max Planck Gesellschaft, MPG) is a non-profit research
organisation with 86 research institutes and nearly 24,000 staff. For this
project, we have representatives from the \textbf{Max Planck Computing and Data
Facility (MPCDF)} -- the organisation's cross-institutional competence centre
for computational and data sciences -- and staff from the \textbf{Max Planck Institute
for Structure and Dynamics of Matter (MPSD)}, who are active in condensed matter research and
reproducibility research.
The MPCDF operates large state-of-the-art supercomputers, several mid-range
compute systems and data repositories for various Max Planck institutes and
provides an up-to-date infrastructure for data management including long-term
archival.
The MPCDF is a member of major European exascale projects, particularly the
BioExcel and Novel Materials Discovery (NOMAD) CoE, and of projects
such as Big data driven material science (BIGMax), FAIR data infrastructure for
Condensed-Matter Physics and the Chemical Physics of Solids (FAIRmat), and Data
Infrastructure Capacity for EOSC (DICE).
The MPSD enjoys an international reputation in
the field of the ultrafast structural dynamics. The MPSD is currently comprised
of 3 departments and several independent research groups, and is one of the
partners of the cluster formed with the Center for Free-Electron Laser Science
(CFEL), DESY, and the University of Hamburg.
MPCDF and MPSD have in-depth experience in delivering training and
workshops on computational methods, including best practice and reproducibility.
% A bit odd to mention an individual here, but lots of the prior work
% was not done at MPG, so it may help reviewers to know the name.
One team member (Hans Fangohr, MPSD) has a long-standing collaboration with \site{SRL}
and the Jupyter project, and a research interest in the use of open source
tools, Jupyter, and Binder for research and reproducible
research~\cite{Fangohr:ICALEPCS2017-TUCPA01,Fangohr2020,nbval-arxiv,Beg2021}.
% not sure if we want to show case the following:
% https://fangohr.github.io/teaching/index.html#awards
Moreover,
Hans Fangohr has won awards repeatedly for excellence in design and delivery of
teaching activities at different universities.
%
The MPCDF has multi-year expertise in deploying and using Cloud-based
JupyterHub and BinderHub installations for various use cases from different
scientific domains.
In this project, the team will co-design Binder-based services for
reproducible science and data publishing. They will draw from the wide research
activities of scientists in the Max Planck Society -- including social sciences,
humanities and HPC-based activities -- to evaluate, improve and apply the Binder
tools for use cases such as data publishing and better reproducibility in HPC.
\medskip \noindent \textbf{QuantStack}
is a France-based software corporation specialising in open-source
scientific computing.
Clients and partners of QuantStack range from financial software companies to robotics
startups and public research institutions.
QuantStack's team comprises maintainers and contributors to open-source technologies
considered as industry standards and adopted by millions in the world, such as Jupyter,
conda-forge, mamba, and many more. It is home to some of the most prolific contributors
to the ecosystem.
QuantStack is responsible for some of the main innovations in the Jupyter ecosystem
of the past few years. Features developed by the team include the support for
\emph{collaborative editing} in JupyterLab, the development of the \emph{JupyterLab Visual Debugger},
\emph{JupyterLite} (an in-browser distribution of JupyterLab leveraging WebAssembly for language kernels),
the \emph{xeus Jupyter kernels} (xeus-robot, xeus-cling, xeus-sql, xeus-lua, xeus-python),
and many data visualisation libraries such as ipygany, ipyleaflet, and ipycanvas,
as well as the Voilà dashboarding system. Beyond the new developments, QuantStack takes
a large part of the maintenance burden of the underlying Jupyter components.
QuantStack's open-source development is not limited to the Jupyter ecosystem, as the team
is also very active in the conda-forge project, a community-maintained distribution
of packages for scientific computing, with tens of thousands of packages available,
and hundreds of millions of package downloads monthly. QuantStack is also responsible for
the development of the mamba package manager, which has been adopted by the conda-forge
and Binder projects, among others.
QuantStack's team will provide expertise to the consortium as core Jupyter developers,
and will contribute to the project by improving the performance and reliability of the
Binder project for building software environments.
\medskip \noindent
The \textbf{French National Institute for Ocean Science (Ifremer)} is a French public
scientific and technological institution that works for exploring, understanding
and predicting the ocean. A pioneer in ocean science, Ifremer's cutting-edge
research is grounded in sustainable development and open science. Ifremer's
vision is to advance science, expertise and innovation by creating and sharing
ocean data, information \& knowledge. Ifremer hosts more than 1,500 personnel
spread along the French coastline in more than 20 sites.
Ifremer has a marine scientific computing center, hosting various world-class data for oceanography for different national, European, and international projects.
The Pangeo platform is already deployed using JupyterHub and Python
environment over HPC resources at Ifremer.
Within this project, Ifremer will focus on testing, validating, improving, and
applying the developed
tools for practical reproducibility in real-world research contexts,
such as ocean physics data analysis on satelite to model data and biologging data analysis on fish to track its behaviour and environment.
This data and analysis enable biologists to gain a better understanding of fish movement, their preferred habitats, and the environmental conditions they need to thrive, all of which are essential for the future protection of natural resources.
%\TODO{Tina, a bit more detail on the use cases, and the challenges?}
\medskip \noindent The \textbf{University of Oslo}
(UiO) is Norway's oldest institution for research and
higher education, with 28,000 students and 6,000 employees. UiO aspires to be an
international hub for the research-based integration of computing into science
education and has financed a university-wide hosting service for Jupyter
notebooks through JupyterHub to introduce a computational aspect to all
curriculum programs in all science disciplines from bachelor to postdoctoral
studies.
The University of Oslo is a Silver Partner to \href{https://carpentries.org}{The
Carpentries}, an international successful community driven project with
Instructors, Trainers, Maintainers, helpers, and supporters who share a mission
to teach foundational computational and data science skills to researchers. It
is also actively involved in the \href{https://coderefinery.org/}{CodeRefinery}
initiative that acts as a hub for FAIR (Findable, Accessible, Interoperable, and
Reusable) software practices. Twice a year, CodeRefinery organises big online
training events with more then 300 attendees each.
The focus of UiO is to use their vast and leading experience in training
and communication to educate researchers globally about open science and
practical reproducibility, to help translate the technical advances of this
project into wide-spread impact.
The University Center for Information Technology (USIT) that represents the University of Oslo in this
project is part of the Norwegian Research Infrastructure Services (NRIS),
a collaboration between highly qualified IT staff at the four Norwegian universities (NTNU), the
Universities of Bergen, Oslo and Tromsø and employees at Sigma2 (the Norwegian National e-infrastructure provider),
to pool competencies, resources and services. USIT also organises with NRIS
community-specific outreach events to connect with local communities and collaborates with other European
initiatives that offer Galaxy training/mentoring efforts in EOSC (EOSC-Nordic, RELIANCE, EuroScienceGateway),
and ELIXIR. USIT actively contributes to Nordic e-Infrastructure Collaboration (NeIC) projects such as the Nordic distributed tier-1 facility for the worldwide computing
grid serving the large hadron collider at CERN and
leads the Nordic Collaboration on e-Infrastructures for Earth System Modeling Tools (NICEST2).
\subsubsection{Connections beyond project partners}
As our ambition is to \textbf{improve practical reproducibility for the global
community of researchers}, we need to be well connected to understand
requirements and constraints from many domains. To improve our networking and
information gathering, we have started to compose our Community Engagement Panel
(Section~\ref{sec:community-engagement-panel},
Task~\taskref{management}{community-engagement-panel})
with the aim to bring together
representatives from diverse research domains, research infrastructure
providers, research funders, publishers, educators, and policy makers. We expect
to also be able to use that network to support communication, dissemination and
exploitation of our results.
The project partners are research active in current topics of open science, and
are members in various of research activities and organisations, including
BIGMax, NOMAD, FAIRmat, DICE, EOSC-NORDIC, RELIANCE, EuroScienceGateway, The Carpentries,
CodeRefinery and Software Sustainability Institute (SSI).
\TODO{All, Please list more. Feel free to add more relevant ones at the beginning.
EOSC, FAIR, RDA, Carpentries, SSI, ?}
% \input{coherence}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "proposal"
%%% End: