Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring Ligandomics analysis #2

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Refactoring Ligandomics analysis #2

wants to merge 6 commits into from

Conversation

CaroAMN
Copy link
Contributor

@CaroAMN CaroAMN commented Jul 4, 2022

start of the refactoring of the Ligandomics analysis + extra script for functions that I use

Done:

  • loading the data
  • data preparation like filtering
  • Waterfall plots
  • basic Venn diagrams

To do:

  • saturation analysis
  • length distribution
  • all todos open in the code
  • netMHCpan output reader
  • peptide selection

RNAseq analysis:

  • small changes like linting
  • included reduced data set were the dan contaminated sample was excluded + all benign samples also (just for testing )

Copy link
Contributor

@marissaDubbelaar marissaDubbelaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional to the comments, take a look at the linting

cschwitalla/Ligandomics_Analysis/Ligandomics_Analysis.R Outdated Show resolved Hide resolved
required_Libs <- c("tidyr","readxl", "ggVennDiagram", "dplyr", "stringr", "tibble",
"ggplot2", "org.Hs.eg.db")
required_Libs <- c("tidyr","readxl", "ggVennDiagram", "dplyr", "stringr",
"tibble", "ggplot2", "org.Hs.eg.db")

suppressMessages(invisible(lapply(required_Libs, library, character.only = T)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include a commented line that enables the user to install the libraries in one go

GB_HLA_types <- read_xlsx(paste0(input_dir, "HLA-Typisierung_GBM.xlsx"), col_names = TRUE)

# get list of unique HLA types
uniqe_HLA_types <- unique(c(as.matrix(GB_HLA_types[2:16, 2:7])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is unknown what the information in the columns and row is, can you use another approach?
If not specify this information clearly.

# Benign data Immunology -------------------------------------------------------
# more specific
# less hits
benign_pep_I <- read.csv(paste0(input_dir, "newBenignmorespecific/Benign_class1.csv"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you find a way to reduce these 7-8 lines even more?

##
## OUTPUT:
##
getProteinAcc_uniqemappers <- function(list) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the for loop, you can manipulate the data as it is



################################################################################
### Load Data ###
################################################################################
# Load meta data --> Metadata_GB.tsv in workdir
metadata <- read.table(file = metadata_file, sep = "\t", header = TRUE)
metadata2 <- metadata[-grep(("QATLV129AQ|QATLV139AX|QATLV162AW|QATLV171AV|QATLV188AQ"),metadata$QBiC.Code),]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QATLV(129AQ|139AX|162AW|171AV|188AQ) might be a better alternative

# get filenames of inputDir
file_names <- list.files(path = input_dir)
# files without ben + outlier sample
filnames_excl <- grep(("NEC|INF|T1"), file_names, value = TRUE)
filnames_excl <- filnames_excl[c(1:7,9:45)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define which columns you collect from the filenames_excl

@@ -154,13 +158,87 @@ make_heatmap <- function(gene_selection, vsd, batch, annotation_color) {
"Sex" = vsd@colData@listData$Sex,
"MGMT_methylation" = vsd@colData@listData$MGMT
)
if (!is.null(k)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the if-else shorter

## - batch: vsd column of the batch [vsd column]
##
## OUTPUT: PCA plot
plot_pca <- function(dds_default, batch) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants