Disease Transcriptomic Similarity Network

Disease Transcriptomic Similarity Network by pathway category

Pathways associated to comorbid diseases in women and men

Diseases connected by drugs in women and men

Methods:

Network construction:

Transcriptional similarities were calculated on the complete lists of the annotated genes, the union of the annotated significantly differentially expressed genes (sDEG), and their intersection based on differential expression values (logFC). Six similarity metrics were calculated: Pearson’s and Spearman’s coefficients, cosine similarity, and Euclidean, Canberra, and Manhattan distances. For the cosine similarity and the Euclidean, Canberra, and Manhattan distances empirical p-values were calculated through 10,000 permutations. In brief, for each gene selection (complete list and union and intersection of sDEGs) a suffling of the logFC values was performed, and the similarities between diseases were calculated. We corrected for multiple testing by Bonferroni approach, and considered as significant those similarities with an FDR<=0.05. In the case of Euclidean, Canberra, and Manhattan distances, the mean of the random distances was compared with the actual distances, obtaining positive (negative) values indicating a greater (lesser) similarity than expected by chance. The similarity values – obtained from the comparison between real and random distances in the case of Euclidean, Canberra and Manhattan distances, and from the coefficients in the case of Pearson and Spearman correlations and cosine similarity – were binarized, converting those coefficients greater than 0 to +1 and those less than 0 to -1. Going one step further, we generated disease networks with the metrics mentioned above for each comparison (sex) using the genes of each Reactome category (29 in total) and the genes associated with mitochondrial processes (extracted from MitoCarta [PMID:33174596]) separately. In this way, we generated a multilayer network of diseases, where each layer represents the similarity between diseases based on each of the Reactome categories and mitochondrial genes. Thus, in each generated network, nodes represent diseases, and edges represent similarities between them based on each metric and gene listing.

Overlap with epidemiology:

To identify the comorbidity relationships recovered by the disease transcriptomic similarity networks (DTSN) generated by comparing similarities between differential expression profiles and their ability to explain comorbidity relationships, we made use of previously published epidemiological network [PMID:30737381] (Supplementary Note 1). The overlap between networks was performed on the shared set of diseases (present in both the DTSNs and epidemiological networks). Specifically, the overlap of positive and negative transcriptomic similarities with the epidemiological networks was analyzed separately. Overlaps were measured by sex (women vs. women, men vs. men, and adjusted vs. adjusted). The significance of the overlap was assessed by Fisher’s tests and randomizations (generating 10,000 random networks shuffling the edges of the DTSNs while maintaining the degree distribution).

Disease-drug associations:

To study the potential sex-specific role of drugs in comorbidities, we retrieved drug targets from the DrugBank [PMID:29126136]. Since the number of targets per drug is relatively small for enrichment analyses, we used the protein-protein interaction network extracted from IID [PMID:34755877] - selecting only those protein-protein interactions in humans that have been experimentally verified – to expand the number of targets associated to a drug by mapping the targets on the network and selecting the first neighbors of the targets for each drug. We then conducted a GSEA enrichment analysis [PMID:16199517] to associate drugs targeting the products of up- or down-regulated genes with the corresponding disease, separately by sex. Disease-drug associations were extracted from the SIDER database [PMID:26481350]. Disease names were transformed into ICD10 codes using the Unified Medical Language System [PMID:14681409] and DrugBank IDs were mapped into drug names.