Welcome to POMAShiny!

POMAShiny is an user-friendly web-based workflow for pre-processing and statistical analysis of mass spectrometry data. This tool allows you to analyze your data:

Fast: Analyze and visualize your data easily in few steps

Friendly: POMAShiny provides users a very intuitive structure and a whole interactive analysis

Free: All POMAShiny options are completely open and free for all users


Upload Data

  • Upload your data in the “Upload Data” tab
  • Data must be a CSV (comma-separated-value) file

Target File

A .CSV with two mandatory columns + n optional covariates:

  • Each row denotes a sample (the same as in the features file)
  • First/Left-hand column must be sample IDs => red
  • Second/Left-hand column must be sample group/factor (e.g. treatment) => green
  • Covariates (optional): From the third column (included) users can also include several experiment covariates => purple

Features File

A .CSV with m columns:

  • Each row denotes a sample and each column denotes a feature
  • First row must contain the feature names

More Help and Instructions

Additional help and more detailed instructions are provided in the “Help” panel.


About POMAShiny

POMAShiny has been developed by Pol Castellano-Escuder, Raúl González-Domínguez, Cristina Andrés-Lacueva and Alex Sánchez-Pla at University of Barcelona, Spain.

The source code of POMAShiny is freely available on GitHub at https://github.com/pcastellanoescuder/POMAShiny.

We would appreciate reports of any issues with the app via the GitHub issue tracking at https://github.com/pcastellanoescuder/POMAShiny/issues.


EIB NUTRIMETABOLOMICS CIBERFESpng UBpng


Upload data panel

Exploratory report

After click the button above, go to the Pre-processing step

Target File

Features File

Prepared Data


Missing value imputation

Prepared Data

Imputed Data


Normalization

Not Normalized Data


Outlier detection


Volcano plot parameters


Boxplot parameters


Density plot parameters



Heatmap parameters



Univariate analysis


Multivariate analysis


Cluster analysis


Limma parameters


Correlation parameters



Regularization parameters


Random forest parameters


Rank product parameters


Odds Ratio parameters


Help

Last update: septiembre 28, 2020

Upload Data Panel

In this panel users can upload their data to be analyzed in POMAShiny. Data format must be a CSV (comma-separated-value) file.

Target File

A .CSV with two mandatory columns + n optional covariates:

  • Each row denotes a sample (the same as in the features file)
  • First/Left-hand column must be sample IDs => red
  • Second/Left-hand column must be sample group/factor (e.g. treatment) => green
  • Covariates (optional): From the third column (included) users can also include several experiment covariates => purple

Once this file has been uploaded, users can select desired rows in the “Target File” panel table to create a subset of the whole uploaded data. If this selection is done, only selected rows are analyzed in POMAShiny, if not (default) all uploaded data are analyzed.

Features File

A .CSV with m columns:

  • Each row denotes a sample and each column denotes a feature
  • First row must contain the feature names

Exploratory report

After uploading the data and clicking the “Submit” button, POMAShiny allows users to generate an exploratory data analysis PDF report automatically by clicking the green button with the label “Exploratory report” in the top of the central panel. See a PDF report example here.

Example data

POMAShiny includes two example datasets that are both freely available at https://www.metabolomicsworkbench.org. The first example dataset consists of a targeted metabolomics three-group study and the second example dataset consists of a targeted metabolomics two-group study. These two datasets allow users to explore all available functionalities in POMAShiny. Both dataset documentations are available at https://github.com/pcastellanoescuder/POMA.

NOTE: Once target and features files are uploaded and the desired rows are selected in the target file (if necessary), users must have to click the “Submit” button to continue with the analysis.

Equivalent functions in POMA: POMA::PomaMSnSetClass() (format data) and POMA::PomaEDA() (automatic PDF report).

Pre-processing Panel

Impute Values

Usually, mass spectrometry faces with a high number of missing values, most of them due to low signal intensity of peaks. Missing value imputation process in POMAShiny is divided in three sequential steps:

  1. Distinguish between zeros and missing values. In case of the data have values of these two types users can distinguish or not between them. This option may be useful in experiments combining endogenous and exogenous features, as in this case the exogenous ones could be a real zero (absence) and the endogenous ones are unlikely to be real zeros.

  2. Remove all features of the data that have more of a specific percentage (defined by user) of missing values in ALL study groups. By default this percentage is 20%.

  3. Imputation. POMAShiny offers six different methods to impute missing values:

  • replace missing values by zero
  • replace missing values by half of the minimum positive value in the original data (in each column)
  • replace missing values by the median of the column (feature)
  • replace missing values by the mean of the column (feature)
  • replace missing values by the minimum value in the column (feature)
  • replace missing values using KNN algorithm (default)

Armitage, E. G., Godzien, J., Alonso‐Herranz, V., López‐Gonzálvez, Á., & Barbas, C. (2015). Missing value imputation strategies for metabolomics data. Electrophoresis, 36(24), 3050-3060.

Equivalent function in POMA: POMA::PomaImpute().

Normalization

It’s known that some factors can introduce variability in MS data. Even if the data have been generated under identical experimental conditions, this introduced variability can have a critical influence on the final statistical results, making normalization a key step in the workflow.

POMAShiny offers six different methods to normalize data:

  • Autoscaling
  • Level scaling
  • Log scaling
  • Log transformation
  • Vast scaling
  • Log pareto scaling (default)

van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC genomics, 7(1), 142.

Users can evaluate the normalization effects in the interactive boxplots located in the “Normalized Data” tab.

Equivalent functions in POMA: POMA::PomaNorm() (normalization) and POMA::PomaBoxplots(group = "samples") (boxplots).

Outlier Detection

POMAShiny allows the analysis of outliers by different plots and tables as well as the possibility to remove statistical outliers from the analysis (default) using different modulable parameters.

The method implemented in POMAShiny is based on the euclidean distances (default but modulable) among observations and their distances to each group centroid in a two-dimensional space. Once this is computed, the classical univariate outlier detection formula _Q3 + 1.5*IQR_ (coefficient is modulable by the user) is used to detect multivariate group-dependant outliers using computed distance to each group centroid.

Select the method (distance), type and coefficient (the higher this value, the less sensitive the method is to outliers) to adapt the outlier detection method to your data. By switching the button “Show labels” all plots display automatically the sample IDs in the outlier detection plots.

  • Distances Polygon Plot: Group centroids and sample coordinates in a two-dimensionality space
  • Distances Boxplot: Boxplots of all computed distances to group centroid by group

NOTE: If the “Remove outliers” button is turned on (default), all detected outliers are excluded from the analysis automatically.

Equivalent functions in POMA: POMA::PomaOutliers(do = "analyze") (analyze outliers) and POMA::PomaOutliers(do = "clean") (remove outliers).

EDA Panel

POMAShiny offers several interactive and highly modulable plots designed to facilitate the exploratory data analysis (EDA) process, giving a wide range of visualization options.

Volcano Plot

In this tab, users can explore their data in an interactive volcano plot. This plot is based on the results of a standard T-test which users can define if data are paired or not and if the study group variances are equal or not. This option is only available for two-group studies.

POMAShiny interactive volcano plot gives users information about T-test significance and fold changes. log2 fold changes between groups are represented in the horizontal axis while -log10 T-test p-values are represented in the vertical axis.

Users can select if raw p-values or adjusted (FDR) p-values are displayed. Other parameters as p-value threshold, log2 fold change threshold or x-axis range are available in the parameters menu.

Equivalent function in POMA: POMA::PomaVolcano().

Boxplot

This tab provides a highly interactive boxplot that shows all data features by their different study groups. Each feature is represented by as many boxplots as there are groups in the study. Different visualization parameters are available in this tab:

  • Features to plot: By default this box is empty because all features are plotted. However, by selecting some specific features in this box, only these features are plotted
  • Show points: By turning on this button, points corresponding to each sample are shown in each feature boxplots. If your data contain many features, this option can slow down the interactive display
  • Split boxes: By default study group boxplots are overlapped in each feature. By turning on this button, study group boxplots are splitted in each feature. Only recommended if features selected to show are not too many (maximum 10 features)

Equivalent functions in POMA: POMA::PomaBoxplots(group = "features") (all features) and POMA::PomaBoxplots(group = "features", feature_name = c("XXX", "YYY", "ZZZ")) (only features XXX, YYY and ZZZ).

Density Plot

POMAShiny provides an interactive density plot to explore all study group distributions (default). However, by turning off the “Plot groups” button, POMAShiny plots the feature distributions indicated by user instead study group distributions.

Equivalent functions in POMA: POMA::PomaDensity() (study groups) and POMA::PomaDensity(group = "features", feature_name = c("XXX", "YYY", "ZZZ")) (only features XXX, YYY and ZZZ).

Heatmap

In this panel POMAShiny offers a classical heatmap as well as a hierarchical clustering with a color stripe that corresponds to each sample group label. Users can display or not sample IDs (not recommended if n is too large) and feature names (not recommended for too many features).

Equivalent function in POMA: POMA::PomaHeatmap().

Statistical Analysis Panel

Univariate Analysis

Univariate analysis is the simplest form of data analysis where the data being analyzed contains only one variable. Since it’s a single variable it doesn’t deal with causes or relationships.

T-test

T-test is a parametric statistical hypothesis test in which the test statistic follows a Student’s t-distribution under the null hypothesis. This analysis is used when you are comparing two groups. This test assumes the normal distribution of features. T-test results can be visualized in the volcano plot provided at EDA panel.

  • Equal Variance (or pooled) T-test: The equal variance T-test is used when the variance of the two tested groups is similar.

  • Unequal Variance T-test: The unequal variance T-test is used when the variance of the two tested groups is different (default). This test is also called Welch’s T-test.

  • Paired T Test: The paired T-test is performed when samples consist of matched pairs of similar units or when there are cases of repeated measures. This method can also applies on cases where the samples are related in some manner or have matching characteristics (default is that groups are not paired).

Equivalent function in POMA: POMA::Univariate(method = "ttest").

ANOVA

The analysis of variance (ANOVA) tests the hypothesis that the averages of two or more groups are the same. The ANOVA evaluates the importance of one or more factors when comparing the means of the response variable in the different levels of the factors. The null hypothesis states that all the means of the groups are the same while the alternative hypothesis states that at least one is different. ANOVA is a parametric method that assumes the normal distribution of features.

If one or more covariates have been included in the target file, an analysis of covariance (ANCOVA) is performed automatically and the results are available at the “ANCOVA Results” tab. The ANCOVA is a general linear model which mix ANOVA and regression. ANCOVA evaluates whether the means of the groups are equal while statistically controlling the effects of other continuous variables that are not of primary interest (as group or treatment), known as covariates.

Equivalent functions in POMA: POMA::Univariate(method = "anova") (ANOVA) and POMA::Univariate(method = "anova", covariates = TRUE) (ANCOVA).

Mann-Whitney U Test

Mann-Whitney U test is the non-parametric alternative test to the independent sample T-test. It’s a non-parametric test that is used to compare two group means that come from the same population, and used to test whether two sample means are equal or not. Usually, the Mann-Whitney U test is used when the assumptions of the T-test are not met. When the study groups are paired, this test becomes a Wilcoxon signed-rank test.

Equivalent function in POMA: POMA::Univariate(method = "mann").

Kruskal Wallis Test

Kruskal-Wallis test is a non-parametric alternative to ANOVA. It is an extension of the Mann-Whitney U test for 3 or more groups. Kruskal-Wallis test does not assume normality in the data, as opposed to the traditional ANOVA.

Equivalent function in POMA: POMA::Univariate(method = "kruskal").

Multivariate Analysis

Unlike univariate methods, multivariate methods are focused in the study of more than one feature at a time. These type of approaches have been widely used because their informativeness. Since being more complex than conventional univariate statistics, these methods can provide information about the structure of the data and different internal relationships that would not be observed with univariate statistics. However, the interpretation of these type of analysis can be more complex.

PCA (principal component analysis)

PCA is one of the most used methods for data dimension reduction. POMAShiny allows users to compute a PCA controlling different parameters:

  • Number of components: This number indicates the number of components that are calculated
  • Scale and Center: By default these parameters are disabled. If the data have been normalized
  • Show ellipses: By turning on this button, the ellipses computed assuming a multivariate normal distribution are drawn in a score plot and biplot

Equivalent function in POMA: POMA::PomaMultivariate(method = "pca").

PLS-DA (partial least squares discriminant analysis)

PLS-DA is a supervised method that uses the multiple linear regression method to find the direction of maximum covariance between the data and the sample group. POMAShiny allows users to compute a PLS-DA controlling different parameters:

  • Number of components: This number indicates the number of components that are calculated
  • VIP cutoff: This value indicates the variable importance in the projection (VIP) cutoff. Features shown in the VIP plot tab are based on this value. Only features with a VIP higher than this value are shown in the plot. This is a reactive option, it means that users doesn’t have to recalculate a PLS-DA to change this value, it can be changed and the VIP plot are updated automatically without doing anything more
  • Show ellipses: By turning on this button (default), the ellipses computed assuming a multivariate normal distribution are drawn in a score plot
  • Validation type: Internal validation to use, options are “Mfold” (default) or “Leave One Out”
  • Number of folds: Number of folds for Mfold validation method (default is 5). If the validation method is loo, this value will become to 1
  • Number of iterations for validation process: Number of iterations for the validation method selected