Poster Presentation AUS-oMicS 2025

ProteomeScholaR: Enabling Reproducible Analyses of Quantitative Proteomic Datasets Through Easy-to-Follow Scripting Templates (118510)

William Klare 1 , Nader Aryamanesh 2 3 , Mark Graham 4 , Ignatius Pang 1
  1. Australian Proteome Analysis Facility, Macquarie University, Sydney, NSW, Australia
  2. Bioinformatics Core Facility, Children's Medical Research Institute, Westmead, NSW, Australia
  3. Embryology Research Unit, Children's Medical Research Institute, Westmead, NSW, Australia
  4. Biomedical Proteomics Facility, Children's Medical Research Institute, Westmead, NSW, Australia

Modern proteomic datasets require substantial programming expertise and practical knowledge to analyse. Analysis is skill-gated, making it challenging for many researchers to apply modern statistical best practices. ProteomeScholaR is a package in R that addresses this challenge by providing a novel pipeline aiming to enable users to perform comprehensive differential protein abundance analyses across DIA, DDA/LFQ, and TMT workflows. Through well-documented workflow templates, researchers can systematically apply best-practice to all steps of quantitative proteomics analysis. 

ProteomeScholaR implements stringent quality control measures for peptide and protein identification by incorporating criteria such as false discovery rate thresholds, minimum unique peptidoforms per protein, and missing value limitations across samples. It integrates several sophisticated analytical tools: the IQ tool for peptide-to-protein quantitative data summarization¹, RUVIII-C for removing unwanted variation², and limma for sample normalization and linear modeling³. Pathway analysis can be performed either using user-supplied annotations via clusterProfiler⁴ or through automated analysis with gProfiler2⁵.

Structured on modular, object-oriented components, ProteomeScholaR's architecture facilitates easy integration of new tools as they emerge. The inclusion of comprehensive, documented workbooks that guide users through each analytical step, facilitates reproducibility and enabling public sharing of analyses. We demonstrate the pipeline's capabilities through a re-analysis of published data examining proteome changes in sepsis-causing bacteria adapting to serum growth⁶.

By streamlining complex proteomic analyses, ProteomeScholaR makes advanced analytical techniques accessible to researchers across all levels of programming expertise. The complete library and step-by-step tutorial will be available as an R package via https://github.com/APAF-bioinformatics/ProteomeScholaR 

1) Pham et al. 2020, Bioinformatics 36(8):2611-2613.

2) Poulos et al. 2020, Nature communications 11(1): 3793.

3) Ritchie et al. 2015 Nucleic Acids Res. 43(7), e47.

4) Wu et al. The Innovation, 2(3), 100141.

5) Kolberg et al. 2020 F1000Research, 9 (ELIXIR) (709).

6) Mu et al. 2023 Nature communications 14(1): 1530.