- Welcome to TCC-GUI
- Data Simulation
- Exploratory Analysis
- TCC Computation
- MA Plot
- Volcano Plot
- Heatmap
- Expression Level Plot
- Analysis Report
🤔What’s TCC?
TCC^[1] is a R/Bioconductor package provides a series of functions for performing differential expression (DE) analysis from RNA-seq count data using a robust normalization strategy (called DEGES).
The basic idea of DEGES is that potential differentially expressed genes (DEGs) among compared samples should be removed before data normalization to obtain a well-ranked gene list where true DEGs are top-ranked and non-DEGs are bottom ranked. This can be done by performing the multi-step normalization procedures based on DEGES (DEG elimination strategy) implemented in TCC.
TCC internally uses functions provided by edgeR^[2], DESeq^[3], DESeq2^[4], and baySeq^[5] . The multi-step normalization of TCC can be done by using functions in the four packages.
🔬TCC-GUI: Graphical User Interface for TCC package
In this GUI version of TCC (TCC-GUI), all parameter settings are available just like you are using the original one. Besides, it also provides lots of plotting functions where the original package is unsupported now.
🛠Function
- Generalization of Simulation data .
- Dataset
summarization
and sampledistribution plot
for sample quality control. - Detection of differentially expressed genes (DEGs).
- Interactive visualization of
MA plot
,Volcano plot
,expression level plot
and so on. PCA
andheatmap
analysis (clustering included).- Output result in table, figure, code or report (.md, .pdf) (Under developing).
Please check other tab in Guidance
for details.
📧Contact
If you have any question, comment or advise about the application, please contact📧suwei(at)bi.a.u-tokyo.ac.jp or 📧kadota(at)bi.a.u-tokyo.ac.jp.
Also, you can access 🔗Github and open a new issue for bug report or function requirement (you can write in English, Chinese or Japanese as you like).
📚References
[1] Sun J, Nishiyama T, Shimizu K, et al. TCC: an R package for comparing tag count data with robust normalization strategies. BMC bioinformatics, 2013, 14(1): 219.
[2] Robinson M D, McCarthy D J, Smyth G K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 2010, 26(1): 139-140.
[3] Anders S, Huber W. Differential expression analysis for sequence count data. Genome biology, 2010, 11(10): R106.
[4] Love M I, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 2014, 15(12): 550.
[5] Hardcastle T J, Kelly K A. baySeq : empirical Bayesian methods for identifying differential expression in sequence count data. BMC bioinformatics, 2010, 11(1): 422.
🔗Emoji icons supplied by EmojiOne
Simulation Data
In Simulation Data, the GUI version of simulateReadCounts
function to generate simulation data generated based on various parameters will be used here.
You can set the random seed, number of genes (Ngene), Proportion of DEGs (PDEG) and the number of groups (Ngroup). According to the specific Ngroup value, tabs in the Group parameters will change in order to ensure the consistency. The caption of number of DEGs at the bottom of this panel and the Summary panel will keep real-time updating based on all parameters to speed-up the analysis.
Besides Ngene, PDEG and Ngroup, other information will be shown in Summary panel are:
- PGi: The assignment of DEGs in group i (i = 1, 2… Ngroup).
- FCGi: Degree of fold-change in group i (i = 1, 2… Ngroup).
- NRGi: Number of replicates in group i (i = 1, 2… Ngroup).
After the Generate Simulation Data button is clicked, the dataset will be generated within several seconds based on all parameters and the result will be displayed in the Simulation Data panel and the values will be colored from white to dark blue according to the expression values from low to high. Users can download the dataset to their local machine or just click the “Exploratory Analysis (Step 1)” to conduct various analysis directly.
Steps for data import
- First, click
Data Import (Step 1)
tab in the side bar on the left of this page.
- At the left of the top, you can
[Import Data]
just for a test, or click[Upload]
data tab for uploading your owntab-delimited text file
like hypodata.txt. Please make sure it’s a originalcount data
file. If you are going to upload large dataset (such as file in 50,000 rows when using online version TCC-GUI), please wait until the file has been uploaded completely. In this case, offline version is highly recommended.
Data will be shown after loaded.
- After the dataset loaded, input your grouping into the
[Group Selection]
panel.- First column is your sample name (the same as the column’s name of your input dataset), only the columns which are listed at here will be included into analysis.
- Second column is your grouping name (such as “control” or “sample”)
- Click
[Confirmed]
button, wait for a while, and the[Summary of Data]
and[Sample Distribution]
will show more information of your dataset. You can modify the plot and save them for the purpose of studying or publishing.
Steps for calculation
- Click
[Calculation]
tab in the side bar on the left of this page.
- You can change all the parameters of TCC calculation or just leave it as default. Click
[Run TCC Calculation]
button and wait for calculation finished. Depends on your size of dataset, method you have chosen and the iteration number, it will take several seconds to 2 minutes for calculation (WAD
<voom
< Others ).
- After calculation,
[Result Table]
will show on the right of the page.[Sample Distribution]
of before normalization and after will be drawn simultaneously.
Besides, you can copy and save the R code of TCC calculation (under[TCC Parameters]
panel) in the purpose of code studying or reproducing the same results on a local machine. - Next, Step3 & Step4 tabs will show up in the side bar. Step3 is for data exploration, visualization and analysis while Step4 is for outputs. you can choose any of them for the next step of your analysis.
🤔What’s MA plot?
An MA plot is an application of a Bland–Altman plot for visual representation of genomic data. The plot visualizes the differences between measurements taken in two samples, by transforming the data onto M (log ratio)
and A (mean average)
scales, then plotting these values. Though originally applied in the context of two channel DNA microarray gene expression data, MA plots are also used to visualize high-throughput sequencing analysis (quote from 🔗wikipedia).
Steps for plotting
- Change parameters or just leave it as default, click
[Generate MA-Plot]
button, and the MA plot will show up in the middle of the page.
- If you want to check the infomation of specific point (transcript or gene), hover your cursor on the point, and the additional information will be print out (on the right side of the page, a expression level plot will also be provided).
- If you want to mark some genes on the plot, please click the specific rows of the gene in
Result Table
panel, and click[Generate MA-Plot]
button again to refresh the plot.
- On the left of the page, you will see lots of different FDR cut off and the count of DEGs in
FDR vs DEGs
panel.
- On the right of the page, R code of MA plot is also provided.
🤔What’s Volcano Plot?
In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance
versus fold-change
on the y
and x
axes, respectively. These plots are increasingly common in omic experiments such as genomics, proteomics, and metabolomics where one often has a list of many thousands of replicate data points between two conditions and one wishes to quickly identify the most meaningful changes (quote from 🔗wikipedia).
Steps for plotting
Almost the same as [MA Plot]
. Please check the document of MA Plot.