Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Estimating Optimal Numbers of Clusters #39

Open
mariacastillo982 opened this issue Apr 26, 2023 · 5 comments
Open

Problem: Estimating Optimal Numbers of Clusters #39

mariacastillo982 opened this issue Apr 26, 2023 · 5 comments

Comments

@mariacastillo982
Copy link

Hello, I want to report that when I'm running my analysis, I'm getting the following error:

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: There were 50 or more warnings (use warnings() to see the first 50)

warnings()
Warning messages:
1: The dot-dot notation (..count..) was deprecated in ggplot2 3.4.0.
ℹ Please use after_stat(count) instead.
ℹ The deprecated feature was likely used in the RnBeads package.
Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.
2: Removed 5370 rows containing non-finite values (stat_ydensity()).
3: Removed 1743 rows containing non-finite values (stat_density()).
4: Removed 1743 rows containing non-finite values (stat_density()).
5: Removed 627 rows containing non-finite values (stat_density()).
6: Removed 627 rows containing non-finite values (stat_density()).
7: In cor(t(x), use = "pairwise.complete.obs") : the standard deviation is zero
8: In cor(t(x), use = "pairwise.complete.obs") : the standard deviation is zero
9: In cor(t(x), use = "pairwise.complete.obs") : the standard deviation is zero
10: In cor(t(x), use = "pairwise.complete.obs") : the standard deviation is zero
...
50: In cor(t(x), use = "pairwise.complete.obs") : the standard deviation is zero

Thank you for your help!

@schmic05
Copy link

Hi Maria,

Thanks for using RnBeads. From the message above, I cannot really tell where the issue arises from. Can you share the full analysis.log with us?

Thanks,

Michael

@jcdaneshmand
Copy link

jcdaneshmand commented Nov 7, 2023

Hello all, I am getting the exact same issue as Maria. Here is my full log:

2023-11-06 19:39:19 1.9 STATUS STARTED RnBeads Pipeline
2023-11-06 19:39:20 1.9 INFO Initialized report index and saved to index.html
2023-11-06 19:39:20 1.9 STATUS STARTED Loading Data
2023-11-06 19:39:20 1.9 INFO Number of cores: 1
2023-11-06 19:39:20 1.9 INFO Loading data of type "bs.bed.dir"
2023-11-06 19:39:20 1.9 STATUS STARTED Performing loading test
2023-11-06 19:39:20 1.9 INFO The first 10000 rows will be read from each data file
2023-11-06 19:39:20 1.9 INFO No column with file names specified: will try to find one
2023-11-06 19:39:20 1.9 STATUS STARTED Loading Data From BED Files
2023-11-06 19:39:21 2.0 STATUS STARTED Automatically parsing the provided sample annotation file
2023-11-06 19:39:21 2.0 STATUS Potential file names found in column 2 of the supplied annotation table
2023-11-06 19:39:21 2.0 STATUS COMPLETED Automatically parsing the provided sample annotation file
2023-11-06 19:39:21 2.0 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0001_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:39:21 2.0 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0048_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:39:22 2.0 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0125_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:39:22 2.0 STATUS Read 3 BED files
2023-11-06 19:39:22 2.0 STATUS Matched chromosomes and strands to annotation
2023-11-06 19:39:22 2.0 STATUS Checked for the presence of sites and coverage
2023-11-06 19:39:22 2.0 STATUS Initialized meth/covg matrices
2023-11-06 19:39:23 2.0 STATUS Combined a data matrix with 14144 sites and 3 samples
2023-11-06 19:39:23 2.0 STATUS Processed all BED files
2023-11-06 19:39:23 2.0 STATUS STARTED Creating RnBiseqSet object
2023-11-06 19:39:23 2.0 INFO Inferring strand information from annotation enabled
2023-11-06 19:39:56 4.9 STATUS Matched 14111 of 14144 methylation sites to the annotation
2023-11-06 19:39:56 4.9 STATUS Checking site coverage
2023-11-06 19:39:56 4.9 STATUS Creating methylation matrix
2023-11-06 19:39:56 4.9 STATUS Creating coverage matrix
2023-11-06 19:39:56 4.9 STATUS Creating object
2023-11-06 19:39:56 4.9 STATUS Summarizing strand methylation
2023-11-06 19:39:59 5.0 STATUS Summarizing tiling methylation
2023-11-06 19:39:59 5.0 STATUS Summarizing genes methylation
2023-11-06 19:39:59 5.0 STATUS Summarizing promoters methylation
2023-11-06 19:39:59 5.0 STATUS Summarizing cpgislands methylation
2023-11-06 19:39:59 5.0 STATUS COMPLETED Creating RnBiseqSet object
2023-11-06 19:39:59 5.0 STATUS COMPLETED Loading Data From BED Files
2023-11-06 19:39:59 5.0 STATUS STARTED Checking the loaded object
2023-11-06 19:39:59 5.0 INFO Checking the supplied RnBiseqSet object
2023-11-06 19:39:59 5.0 INFO The object contains information for 8216 methylation sites
2023-11-06 19:39:59 5.0 INFO The object contains information for 3 samples
2023-11-06 19:39:59 5.0 INFO The object contains 6099 missing methylation values
2023-11-06 19:39:59 5.0 INFO Methylation values are within the expected range
2023-11-06 19:39:59 5.0 INFO The object contains coverage information
2023-11-06 19:39:59 5.0 INFO Coverage values are within the expected range
2023-11-06 19:39:59 5.0 INFO The object loaded during the loading test is valid
2023-11-06 19:39:59 5.0 STATUS COMPLETED Checking the loaded object
2023-11-06 19:39:59 5.0 STATUS COMPLETED Performing loading test
2023-11-06 19:39:59 5.0 INFO No column with file names specified: will try to find one
2023-11-06 19:39:59 5.0 STATUS STARTED Loading Data From BED Files
2023-11-06 19:39:59 5.0 STATUS STARTED Automatically parsing the provided sample annotation file
2023-11-06 19:39:59 5.0 STATUS Potential file names found in column 2 of the supplied annotation table
2023-11-06 19:39:59 5.0 STATUS COMPLETED Automatically parsing the provided sample annotation file
2023-11-06 19:39:59 5.0 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0001_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:40:24 5.8 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0048_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:40:43 5.3 INFO Reading BED file: /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped/10303-AM-0125_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
2023-11-06 19:41:01 5.3 STATUS Read 3 BED files
2023-11-06 19:41:09 5.9 STATUS Matched chromosomes and strands to annotation
2023-11-06 19:41:09 5.9 STATUS Checked for the presence of sites and coverage
2023-11-06 19:41:10 6.1 STATUS Initialized meth/covg matrices
2023-11-06 19:41:16 5.7 STATUS Combined a data matrix with 10667924 sites and 3 samples
2023-11-06 19:41:16 5.7 STATUS Processed all BED files
2023-11-06 19:41:16 5.7 STATUS STARTED Creating RnBiseqSet object
2023-11-06 19:41:16 6.1 INFO Removed 26147 sites with unknown chromosomes
2023-11-06 19:41:16 5.9 INFO Inferring strand information from annotation enabled
2023-11-06 19:41:49 6.9 STATUS Matched 10595713 of 10667924 methylation sites to the annotation
2023-11-06 19:41:49 6.9 STATUS Checking site coverage
2023-11-06 19:41:52 6.8 STATUS Creating methylation matrix
2023-11-06 19:41:55 6.9 STATUS Creating coverage matrix
2023-11-06 19:41:58 7.1 STATUS Creating object
2023-11-06 19:41:59 6.5 STATUS Summarizing strand methylation
2023-11-06 19:43:37 7.1 STATUS Summarizing tiling methylation
2023-11-06 19:43:50 8.0 STATUS Summarizing genes methylation
2023-11-06 19:43:56 8.2 STATUS Summarizing promoters methylation
2023-11-06 19:44:01 8.2 STATUS Summarizing cpgislands methylation
2023-11-06 19:44:04 8.0 STATUS COMPLETED Creating RnBiseqSet object
2023-11-06 19:44:04 8.0 STATUS COMPLETED Loading Data From BED Files
2023-11-06 19:44:04 8.0 STATUS Loaded data from /mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped
2023-11-06 19:44:19 9.0 STATUS Predicted sex for the loaded samples
2023-11-06 19:44:19 9.0 STATUS Added data loading section to the report
2023-11-06 19:44:19 9.0 STATUS Loaded 3 samples and 6292277 sites
2023-11-06 19:44:19 9.0 INFO Output object is of type RnBiseqSet
2023-11-06 19:44:19 9.0 STATUS COMPLETED Loading Data
2023-11-06 19:44:25 9.0 INFO Initialized report index and saved to index.html
2023-11-06 19:44:25 9.0 STATUS STARTED Quality Control
2023-11-06 19:44:25 9.0 INFO Number of cores: 1
2023-11-06 19:44:25 9.0 STATUS STARTED Preparing Quality Control Information
2023-11-06 19:44:25 9.0 STATUS COMPLETED Preparing Quality Control Information
2023-11-06 19:44:25 9.0 STATUS STARTED Quality Control Section
2023-11-06 19:44:41 8.5 STATUS Added sequencing coverage histograms
2023-11-06 19:45:09 8.6 STATUS Added sample coverage section
2023-11-06 19:45:29 9.9 STATUS Added sequencing coverage violin plots
2023-11-06 19:45:32 9.7 STATUS COMPLETED Quality Control Section
2023-11-06 19:45:47 9.5 STATUS COMPLETED Quality Control
2023-11-06 19:45:48 9.5 INFO Initialized report index and saved to index.html
2023-11-06 19:45:48 9.5 STATUS STARTED Preprocessing
2023-11-06 19:45:48 9.5 INFO Number of cores: 1
2023-11-06 19:45:48 9.5 WARNING filtering.greedycut disabled for non-array datasets.
2023-11-06 19:45:48 9.5 STATUS STARTED Filtering Procedures
2023-11-06 19:45:53 9.9 STATUS STARTED Removal of SNP-enriched Sites
2023-11-06 19:45:53 9.9 STATUS Removed 46155 sites using SNP criterion "any"
2023-11-06 19:45:53 9.9 STATUS Saved removed sites to /mnt/e/Repositories/Jonah/SFT_MethylAnalysis/reports/preprocessing_data/removed_sites_snp.csv
2023-11-06 19:45:53 9.9 STATUS Added a corresponding section to the report
2023-11-06 19:45:53 9.9 STATUS COMPLETED Removal of SNP-enriched Sites
2023-11-06 19:45:53 9.9 STATUS STARTED Removal of Cross-reactive Probes
2023-11-06 19:45:53 9.9 STATUS Added a corresponding section to the report
2023-11-06 19:45:53 9.9 STATUS COMPLETED Removal of Cross-reactive Probes
2023-11-06 19:45:53 9.9 STATUS STARTED Removal of High Coverage (Outlier) Sites
2023-11-06 19:45:54 8.1 STATUS Removed 0 high coverage outlier sites
2023-11-06 19:45:54 8.1 STATUS Saved removed sites to /mnt/e/Repositories/Jonah/SFT_MethylAnalysis/reports/preprocessing_data/removed_sites_high_coverage.csv
2023-11-06 19:45:54 8.1 STATUS Added a corresponding section to the report
2023-11-06 19:45:54 8.1 STATUS COMPLETED Removal of High Coverage (Outlier) Sites
2023-11-06 19:45:54 8.1 STATUS STARTED Replacing Low Coverage Sites by NA
2023-11-06 19:45:55 9.5 STATUS Masked 6946071 site(s) based on coverage threshold 5
2023-11-06 19:45:56 9.5 STATUS Saved numbers of masked sites per sample to /mnt/e/Repositories/Jonah/SFT_MethylAnalysis/reports/preprocessing_data/masked_sites_coverage.csv
2023-11-06 19:45:56 9.5 STATUS Added a corresponding section to the report
2023-11-06 19:45:56 9.5 STATUS COMPLETED Replacing Low Coverage Sites by NA
2023-11-06 19:45:56 9.5 STATUS STARTED Removal of Sites on Sex Chromosomes
2023-11-06 19:45:56 9.6 STATUS Removed 160956 site(s) on sex chromosomes
2023-11-06 19:45:56 9.6 STATUS Saved removed sites to /mnt/e/Repositories/Jonah/SFT_MethylAnalysis/reports/preprocessing_data/removed_sites_sex.csv
2023-11-06 19:45:56 9.6 STATUS Added a corresponding section to the report
2023-11-06 19:45:56 9.6 STATUS COMPLETED Removal of Sites on Sex Chromosomes
2023-11-06 19:45:56 9.6 STATUS STARTED Missing Value Removal
2023-11-06 19:45:56 9.6 STATUS Using a sample quantile threshold of 0.5
2023-11-06 19:45:58 8.1 STATUS Removed 3395595 site(s) with too many missing values
2023-11-06 19:46:05 8.1 STATUS Saved removed sites to /mnt/e/Repositories/Jonah/SFT_MethylAnalysis/reports/preprocessing_data/removed_sites_na.csv
2023-11-06 19:46:08 9.4 STATUS Added a corresponding section to the report
2023-11-06 19:46:08 9.4 STATUS COMPLETED Missing Value Removal
2023-11-06 19:46:09 9.4 STATUS Retained 3 samples and 2689571 sites
2023-11-06 19:46:09 9.4 STATUS COMPLETED Filtering Procedures
2023-11-06 19:46:09 9.4 STATUS STARTED Summary of Filtering Procedures
2023-11-06 19:46:10 9.1 STATUS Created summary table of removed sites, samples and unreliable measurements
2023-11-06 19:46:10 9.1 STATUS Added summary table of removed and retained items
2023-11-06 19:46:11 9.1 INFO Subsampling 2000000 sites for plotting density distributions
2023-11-06 19:46:11 9.1 STATUS Constructed sequences of removed and retained methylation values
2023-11-06 19:46:14 9.1 STATUS Added comparison between removed and retained beta values
2023-11-06 19:46:14 9.1 STATUS COMPLETED Summary of Filtering Procedures
2023-11-06 19:46:14 9.1 STATUS STARTED Manipulating the object
2023-11-06 19:46:15 9.2 STATUS Updated NA masking
2023-11-06 19:46:29 9.3 STATUS Removed 3602706 sites (probes)
2023-11-06 19:46:29 9.3 INFO Retained 2689571 sites and 3 samples
2023-11-06 19:46:29 9.3 STATUS COMPLETED Manipulating the object
2023-11-06 19:46:29 9.3 INFO Imputation was skipped, data set may still contain missing methylation values
2023-11-06 19:46:29 9.3 STATUS COMPLETED Preprocessing
2023-11-06 19:46:32 9.3 INFO Initialized report index and saved to index.html
2023-11-06 19:46:32 9.3 STATUS STARTED Exploratory Analysis
2023-11-06 19:46:32 9.3 INFO Number of cores: 1
2023-11-06 19:46:33 9.3 STATUS Designed color mappings for probe type and CGI status
2023-11-06 19:47:20 9.0 STATUS STARTED Dimension Reduction Techniques
2023-11-06 19:47:20 9.0 WARNING Skipped due to too few samples
2023-11-06 19:47:20 9.0 STATUS COMPLETED Dimension Reduction Techniques
2023-11-06 19:47:24 9.5 STATUS STARTED Methylation Value Distributions - Sample Groups
2023-11-06 19:47:24 9.5 INFO processing beta_density_samples_1_1
2023-11-06 19:47:24 9.5 INFO Density estimation ( all samples--sites ): Groupwise retained observations after missing value removal: all:7062610/8068713
2023-11-06 19:47:24 9.5 INFO Density estimation ( all samples--sites ): Groupwise retained observations after subsampling: all:1000000/7062610
2023-11-06 19:47:26 9.5 INFO processing beta_density_samples_1_2
2023-11-06 19:47:26 9.5 INFO Density estimation ( all samples--tiling ): Groupwise retained observations after missing value removal: all:316269/348171
2023-11-06 19:47:26 9.5 INFO processing beta_density_samples_1_3
2023-11-06 19:47:26 9.5 INFO Density estimation ( all samples--genes ): Groupwise retained observations after missing value removal: all:87234/90342
2023-11-06 19:47:27 9.5 INFO processing beta_density_samples_1_4
2023-11-06 19:47:27 9.5 INFO Density estimation ( all samples--promoters ): Groupwise retained observations after missing value removal: all:95853/100704
2023-11-06 19:47:28 9.5 INFO processing beta_density_samples_1_5
2023-11-06 19:47:28 9.5 INFO Density estimation ( all samples--cpgislands ): Groupwise retained observations after missing value removal: all:76014/76956
2023-11-06 19:47:28 9.5 STATUS COMPLETED Methylation Value Distributions - Sample Groups
2023-11-06 19:47:28 9.5 STATUS STARTED Methylation Value Distributions - Site Categories
2023-11-06 19:47:28 9.5 INFO Density estimation ( CGI Relation--all samples ): Groupwise retained observations after missing value removal: Open Sea:2653048/3173163; Shelf:97996/116853; Shore:136662/163152; Island:4174904/4615545
2023-11-06 19:47:29 9.5 INFO Density estimation ( CGI Relation--all samples ): Groupwise retained observations after subsampling: Open Sea:1502588/2653048; Shelf:55501/97996; Shore:77400/136662; Island:2364511/4174904
2023-11-06 19:47:34 9.6 STATUS COMPLETED Methylation Value Distributions - Site Categories
2023-11-06 19:47:34 9.6 STATUS STARTED Sample Clustering
2023-11-06 19:47:34 9.6 STATUS STARTED Agglomerative Hierarchical Clustering
2023-11-06 19:47:34 9.6 STATUS Performed clustering on sites using correlation as a distance metric
2023-11-06 19:47:34 9.6 STATUS Performed clustering on sites using manhattan as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on sites using euclidean as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on tiling using correlation as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on tiling using manhattan as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on tiling using euclidean as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on genes using correlation as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on genes using manhattan as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on genes using euclidean as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on promoters using correlation as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on promoters using manhattan as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on promoters using euclidean as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on cpgislands using correlation as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on cpgislands using manhattan as a distance metric
2023-11-06 19:47:35 9.6 STATUS Performed clustering on cpgislands using euclidean as a distance metric
2023-11-06 19:47:35 9.6 STATUS COMPLETED Agglomerative Hierarchical Clustering
2023-11-06 19:47:35 9.6 STATUS STARTED Clustering Section
2023-11-06 19:47:35 9.6 STATUS STARTED Generating Heatmaps
2023-11-06 19:47:35 9.6 STATUS STARTED Region type: sites
2023-11-06 19:48:06 9.8 STATUS COMPLETED Region type: sites
2023-11-06 19:48:06 9.8 STATUS STARTED Region type: tiling
2023-11-06 19:48:35 9.8 STATUS COMPLETED Region type: tiling
2023-11-06 19:48:35 9.8 STATUS STARTED Region type: genes
2023-11-06 19:49:05 9.8 STATUS COMPLETED Region type: genes
2023-11-06 19:49:05 9.8 STATUS STARTED Region type: promoters
2023-11-06 19:49:35 9.8 STATUS COMPLETED Region type: promoters
2023-11-06 19:49:35 9.8 STATUS STARTED Region type: cpgislands
2023-11-06 19:50:05 9.2 STATUS COMPLETED Region type: cpgislands
2023-11-06 19:50:05 9.2 STATUS Created 135 heatmaps based on the clustering results
2023-11-06 19:50:05 9.2 STATUS COMPLETED Generating Heatmaps
2023-11-06 19:50:05 9.2 STATUS STARTED Adding Color Legends
2023-11-06 19:50:08 9.2 STATUS COMPLETED Adding Color Legends
2023-11-06 19:50:08 9.2 STATUS STARTED Estimating Optimal Numbers of Clusters

and this is where the error occurs:

Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: There were 31 warnings (use warnings() to see them).

My run command looks like this:

rnb.run.analysis(dir.reports=report.dir, data.source=data_source, data.type="bs.bed.dir", initialize.reports = TRUE, save.rdata = TRUE)

and my data source looks like this:

str(data_source)
List of 2
$ : chr "/mnt/e/Repositories/Jonah/SFT_nextflow/methyl_results/bismark/methylation_calls/methylation_coverage/gunzipped"
$ :'data.frame': 3 obs. of 4 variables:
..$ SampleName : chr [1:3] "10303-AM-0001_S1_L005" "10303-AM-0048_S1_L005" "10303-AM-0125_S1_L005"
..$ FilePath : chr [1:3] "10303-AM-0001_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov" "10303-AM-0048_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov" "10303-AM-0125_S1_L005_1_val_1_bismark_bt2_pe.deduplicated.bismark.cov"
..$ group : chr [1:3] "SFT" "SFT" "MEN"
..$ group_Level2: chr [1:3] "iSFT" "eSFT" "MEN"

It runs for a while, and has some significant output in the reports folder, but the analysis does not finish. What am I doing wrong here? Thanks so much for any input.

@mariacastillo982
Copy link
Author

mariacastillo982 commented Nov 7, 2023 via email

@jcdaneshmand
Copy link

Thank you @mariacastillo982 ! That turned out to be my problem as well and makes total sense. I tested the pipeline with 9 samples instead and it worked. Cheers!

@mariacastillo982
Copy link
Author

mariacastillo982 commented Nov 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants