dsMTL (Federated Multi-Task Learning based on DataSHIELD) provided federated, privacy-preserving MTL analysis. dsMTL was developed based on DataSHIELD, an ecosystem supporting the federated analysis of sensitive individual-level data that remains stored behind the data owner’s firewall throughout analysis. Multi-task Learning (MTL) aimed at simultaneously learning the outcome (e.g. diagnosis) associated patterns across datasets with dataset-specific, as well as shared, effects. MTL has numerous exciting application areas, such as comorbidity modeling, and has already been applied successfully for e.g. disease progression analysis.
dsMTL currently includes three supervised and one unsupervised federated multi-task learning as well as two federated machine learning algorithms. Each algorithm captured a specific form of cross-cohort heterogeneity, which was linked to different applications in molecular studies.
Name | Type | Task | Effect |
---|---|---|---|
dsLasso |
ML | Classification/Regression | Train a Lasso model on the conbained cohorts |
dsLassoCov |
ML | Classification/Regression | Federated ML model that can capture the covariate effect |
dsMTL_L21 |
MTL | Classification/Regression | Screen out unimportant features to all tasks |
dsMTL_trace |
MTL | Classification/Regression | Identify models represented in low-dimentional spcae |
dsMTL_net |
MTL | Classification/Regression | Incorporate task-relatedness described as a graph |
dsMTL_iNMF |
MTL | Matrix factorization | Factorize matrices into shared and specific components |
The server side package can be found: dsMTLBase
install.packages("devtools")
library("devtools")
install_github("transbioZI/dsMTLClient")
dsMTL server-side package has been pre-installed in the opal demo server. Thus the most convenient way to test dsMTL functions is using opal demo server as the back end. If you want to use dsMTL in real applications, please follow the tutorial to install dsMTL server-side package on your server. The simulation datasets based on two-server scenario were provided. For each algorithm, we provided codes for testing the optimization solvers with different opinions, multiple training procedures of the model as well as the cross-validatin.
The testing files were here. Please download the file for each algorithm and run line by line.
DSLite is a R package allowing the simulation of DataSHIELD servers environment on a single machine. We provided tests files for every algorithm running with DSLite. The files are located here
- Install two DataSHIELD servers and dsMTL server-side package. Please find the tutorial in the server-side repositary dsMTLBase.
- Upload and import simulation datasets in your servers. Please find the tutorial in the server-side repositary dsMTLBase.
- Download and run the testing files here.
Note
In addition to the original architecture for federated MTL approaches as presented in the initial publication (Cao et al., 2022) and described above, which supports the analysis of sensitive individual-level data from geographically distributed data sources using the DataSHIELD platform, the concept of differential privacy has been included into the dsMTL package in 2025. This optional feature offers an additional security mechanism that is specific to the MTL models. In particular, differential privacy can provide a protection against so-called membership inference attacks to the models. Instructions on how to set up the optional differential privacy feature are given in the annotations of the corresponding MTL functions, and code with examples in this context can be found here.
Cao, H., Zhang, Y., Baumbach, J., Burton, P. R., Dwyer, D., Koutsouleris, N., Matschinske, J., Marcon, Y., Rajan, S., Rieg, T., Ryser-Welch, P., Späth, J., The COMMITMENT Consortium, Herrmann, C., and Schwarz, E. (2022). dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics, 38(21), 4919-4926. DOI: 10.1093/bioinformatics/btac616
Han Cao ([email protected])
- dsMTLBase - federated, privacy-preserving machine-learning and multi-task learning analysis: https://github.com/transbioZI/dsMTLBase
- Documents of opal servers: https://opaldoc.obiba.org/en/latest/index.html
- Tutorial of DataSHIELD for beginers: https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943395/Beginners+Hub
- Forum of DataSHIELD: https://datashield.discourse.group/
- opalr - an R package for managing DataSHIELD server from script: https://cran.r-project.org/web/packages/opalr/index.html
- resources - an R package for importing data of different sources: https://opaldoc.obiba.org/en/latest/resources.html
- Tutorial of resources: https://rpubs.com/jrgonzalezISGlobal/tutorial_resources
- dsOmics - an R package based on DataSHIELD for omics analysis: https://github.com/isglobal-brge/dsOmics
- Tutorial of omics analysis using dsOmics: https://rpubs.com/jrgonzalezISGlobal/tutorial_DSomics
- Tutorial of omics analysis using dsOmics2: https://htmlpreview.github.io/?https://github.com/isglobal-brge/dsOmicsClient/blob/master/vignettes/dsOmics.html
- A book of DataSHIELD book with detailed explainations of esential packages: https://isglobal-brge.github.io/resource_bookdown/