13 Differential Expression

The rnaseqDRaMA workflow relies on the edgeR package (Chen et al. 2019) to perform differential expression analysis. Typically, genes with low expression levels (< 3 counts per million in at least one group) are filtered from all downstream analyses. Benjamini-Hochberg false discovery rate (FDR) procedure is used to correct for multiple testing. Genes with an FDR-corrected P-value less than 0.05 and log2 fold change greater than 1 can be considered differentially expressed.

13.1 Volcano Plot

rnaseqDRaMA uses the Volcano plot to visualize the results of differential expression analysis.

A Volcano Plot combines a measure of statistical significance (P-Value or FDR) with the magnitude of the change between compared samples (logFC). The X-axis represents the difference of group means on a log scale for a comparison shown in the plot's title and can be selected via the Select Contrast control menu. The Y-axis represents the negative base 10 logarithm of the P-value or FDR for this comparison. Thus, genes in the right quadrant (positive fold change) are expressed at a higher level in the “first” condition whereas genes in the “left” quadrant (negative fold change) are expressed at a higher level in the “second” condition (Figure 13.1).

Each circle on the volcano plot represent a single gene. Hovering over a circle will display a tooltip with gene name, gene type, and the average expression value across all samples (Average CPM).

Volcano plot showing gene expression in TNF versus FBS-treated cells

Figure 13.1: Volcano plot showing gene expression in TNF versus FBS-treated cells

13.2 Table of Differential Expression Statistics

Under the volcano plot rnaseqDRaMA displays a table summarizing differential expression statistics for all contrasts. This table provides the same search and filtering capabilities as the gene expression table that was described in the Gene Expression chapter (see Gene Expression Table).

Gene-wise summary of differential expression statistics

Figure 13.2: Gene-wise summary of differential expression statistics

In addition to gene_id and gene_name columns that are the same as in the Gene Expression Table (12.1) several additional columns are available.

gene_type: Describes the biotype of a gene. These values are taken from the mandatory fields in the attribute column of Gencode/Ensembl/Refseq gtf/gff annotation file.

**tag:** Provides additional details regarding gene functions/annotation. This value is taken from the optional tag field in the Gencode annotation file.

seqnames: Typically a chromosome or a scaffold name in the relevant genome assembly.

width: Length of the gene between TSS and transcription termination site.

LogCPM: Average log-transformed CPMs across all sample. A good indication of the average expression level of a gene.

CON1vsCON2.logFC: Log fold change for comparison of CON1 and CON2. When positive average expression of the gene in CON1 is greater than average expression of the gene in CON2. Depending on the model, CON1 and CON2 can be complex contrasts rather than simple 1-to-1 group comparisons (e.g. interactions).

CON1vsCON2.PValue: Unadjusted P-value of a relevant test for expression equality in the comparison of CON1 and CON2. It is never recommended that P-value be reported in publication of RNA-seq data.

CON1vsCON2.FDR: Multiple testing corrected P-value using the Benjamini–Hochberg method.

The last three columns are contrast-specific and repeated for each contrast in the experiment.

13.3 Data Selection and Filtering

rnaseqDRaMA provides multiple gene selection and filtering capabilities.

Box- and Lasso- selection: These methods are available through plotly image control panel in the upper right corner of the volcano plot panel (see Controls[#controls]) for detailed description of the plotly image control panel. To de-select genes double-click anywhere within the plot area.

Significant gene selection: Available by checking the Select significant checkbox in the volcano plot control panel. The "significance" is defined by the combination of FDR/P-Value Threshold and Log Fold Change Threshold setting in the volcano plot control panel. Selected "significant" genes are colored red (Figure 13.3). To de-select "significant" genes uncheck the Select significant checkbox.

Significant gene selection

Figure 13.3: Significant gene selection

Select by Gene Name: Allows selection by entering HGNC symbol separated by spaces, commas, or semicolons. This search box also accepts wildcards. For examples, see the previous section. Genes selected by this method are highlighted in navy blue (Figure 13.4). To de-select genes selected by name, delete all gene names from a selection box an make an "empty" selection.

Gene selection by name. All Tnf* genes are shown

Figure 13.4: Gene selection by name. All Tnf* genes are shown

Select genes from the differential expression table: Clicking anywhere in a row for a gene of interest will highlight this gene in the table and color the corresponding gene in light blue on the volcano plot. When selecting genes from the table, avoid clicking the link to the Ensembl page for the gene. In the example in Figure 13.5, the differential expression table was searched for all genes that contained Col in the gene name. Matching genes were first filtered only those with the P-values less than 0.05 followed by manual selection of only collagens.

Gene selection by table selection. All Col* genes are shown with P-values less than 0.05.

Figure 13.5: Gene selection by table selection. All Col* genes are shown with P-values less than 0.05.

Multiple selection types can be combined on the same volcano plot. With the exception of "Select Significant" all selections persists between contrasts until reset, which allows visualizing changes in the selected genes across contrasts.

With the exception of Box- and Lasso- selections (plotly selection) all selections can be reset all at once using the Reset all Selections button in the control panel.

The volcano plot can be saved by pressing Download plot (PDF) button. Table of selected differentially expressed genes can be saved by pressing the Download Selected Data button. Finally, when any type of selection has been performed Copy selected gene button will appear. Pressing this button will invoke an internal clipboard window to copy the selected gene names (Figure 13.6).

An internal clipboard window

Figure 13.6: An internal clipboard window

References

Chen, Yunshun, Aaron TL Lun, Davis J McCarthy, Matthew E Ritchie, Belinda Phipson, Yifang Hu, Xiaobei Zhou, Mark D Robinson, and Gordon K Smyth. 2019. “EdgeR: Empirical Analysis of Digital Gene Expression Data in R.” http://bioinf.wehi.edu.au/edgeR.