Save

Save

Save

Save

Save


Save


Loading...





Generate report Download report data

Make sure to upload proteinGroups.txt file before running ssGSEA
  • Single-sample GSEA (ssGSEA) is an extension of conventional Gene Set Enrichment Analysis (GSEA), developed by Broad insitute1.
  • ssGSEA version used: v4
  • MSigDB version used: v6.1

[1] Krug, K., et al., A Curated Resource for Phosphosite-specific Signature Analysis. Mol Cell Proteomics, 2019. 18(3): p. 576-593.


Loading...





Generate report Download report data

Eatomics is an R-Shiny based web application that enables interactive exploration of quantitative proteomics data generated by MaxQuant software - specifically label-free quantification (LFQ) and Intensity Based Absolute Quantification (iBAQ) values. Eatomics enables fast exploration of differential abundance and pathway analysis to researchers with limited bioinformatics knowledge. The application aids in quality control of the quantitative proteomics data, visualization, differential abundance and pathway analysis. Highlights of the application are an extensive experimental setup module, the data download and report generation feature and the multiple ways to interact and customize the analysis.

1. Input files

Eatomics requires two file inputs:

  1. Demo_proteinGroups.txt: The proteinGroups.txt (i.e. a tab-separated files) as generated by the quantitative analysis software of raw mass spectrometry data - MaxQuant. The file should contain at least the columns Protein IDs, Majority protein IDs, Gene names, LFQ/iBAQ measurement columns, Reverse, Potential contaminant, Only identified by site. The latter three may be empty.

  2. Demo_clinicaldata.txt: The sample description file - a tab separated text file as can be produced with any Office program by saving the spread sheet as .txt. The file needs to contain a column named “PatientID”, which contains IDs that match the sample ID’s from the proteinGroups header (without the “LFQ intensity” or “iBAQ” prefixes) and one or more named columns with “parameters”, i.e. textual/factual/logical or continuous/integer values. Column names have to be unique.

Access to demo data is possible directly via the upload button if ou are testing on our public server. For your local installation you may use your own data or the demo files in Eatomics/Data from the github repository. The demo proteinGroups file represents a shortened version of the data assessed and described in Chen et al. [4] and is accompanied by a sample description file prepared by us, based on the publications supplementary data.

2. Application walk-through

Eatomics functionality is structured into four tab panels:

  • Load and Prepare raw data on samples and MaxQuant output, as well as quality control.
  • Conduct differential abundance analysis.
  • Calculate enrichment scores per sample (ssGSEA).
  • Conduct differential enrichment (or pathway) analysis.

All tabs consist of a side panel to configure the analysis and a main panel for interactive analysis visualization.

Step 1: Load and Prepare

The first tab provides an overview on the data quality and enables filtering and preparation of data for differential abundance and enrichment analysis ().

Configuration panel

Within the side panel the user can load data and configure quality control options.

Load proteinGroups.txt input file

To begin the analysis the user has to upload the MaxQuant file (e.g.proteinGroups.txt), as specified above. After full upload of the file, rows that were only found in the reverse database, belonging to potential contaminants or that have only been identified by site are filtered automatically.

Quality control and data cleansing

  • The user selects either LFQ or iBAQ as intensity metric to be considered for succeeding differential abundance analysis. If available, we suggest to use LFQ intensities as Eatomics was optimized for these. Internally, the intensity widget uses the selectProteinData function. In the case of iBAQ values, Eatomics performs automatic normalization via limma’s normalizeVSN() function. proteinGroups entries (rows) with a “+” in the “Reverse”, “Potential contaminant”, “Only identified by site” columns are removed automatically.
  • The exclude column widget allows the user to exclude samples, especially if any outliers are found while conducting initial quality analysis such as PCA. Selecting a sample here, results in the removal of that sample from the consecutive steps analysis steps.
  • To avoid proteins with many missing values across the samples, the user selects the minimum number of samples for which a protein must have been detected in. Internally the filter widget uses the filterProteins function.
  • Meaningful gene names: As genes names are easier to interpret than peptide identifiers, the gene names are displayed primarily. As gene names can be non-unique, the user can choose to let Eatomics
    • prepare unique IDs for duplicate gene names or (make.unique() R base function)
    • to sum up multiple abundance values for one gene name (checkForIsoforms custom function). In the latter case, the user is informed about intensity shares. Within the data download, uniprot accession numbers and protein names (or any other supplied by the MaxQuant output) are given.
  • Missing value imputation can be performed using knn (k-nearest-neighbour) from the impute package [5], MinDet or QRLIC from the imputeLCMD package [3] or a custom implementation of Perseus’ sampling from a down-shifted Gaussian distribution (implemented by Matthias Ziehm) with default parameters of width = 0.3 and shift = 1.8.

Load the sample description/clinical data file

Select and load the clinical data input file (e.g clinicaldata.txt), as specified above.

Configuration panel to load input data and to prepare the data set for analysis.

Configuration panel to load input data and to prepare the data set for analysis.

Visualization panel

In the main panel (right) interactive visualizations are shown.

Principal component analysis

A common method of dimensionality reduction is principal component analysis (PCA). Inherently, PCA calculates axes of most variation (principal components) within the abundance data. A common assumption is that a plot along the axes of most variation will segregate all samples/patients into groups under investigation. The user can choose which principle components to visualize in the PCA and can choose to color the samples based on the uploaded sample/clinical characteristics.

Distribution overview

The distribution overview gives an impression on the sample-wise distribution of all measured intensities.

Protein coverage

Protein coverage describes the count of distinct protein groups per sample.

Sample to sample heatmap

The sample-to-sample heatmap describes the biological and technical variability of the samples. The user can choose to use Euclidean distance or Pearson correlation as a (dis-) similarity metric. Formed clusters should resemble the sample groups under investigation.

Cumulative Protein Intensities

Protein intensities are cumulated across all samples and plotted according to their relative abundance. Colouring marks the respective quantile of the proteins. Highly abundant proteins, i.e., proteins ranked in the first quartile are colored in red and labels are specified. The top 20 ranked proteins and their cumulated intensity are given in the table to the right.

Visualization of protein abundance in a PCA.

Visualization of protein abundance in a PCA.

Sample-wise distribution overview of protein abundance data.

Sample-wise distribution overview of protein abundance data.

Sample-wise coverage of protein abundance data.