Online tools for proteomics data interpretation
Database toolbox A database generation tool designed for the specific requirements of proteomics data interpretation.
Generating the appropriate database to search MS data is a crucial step in any proteomics data interpretation workflow. It is obviously the case for database search algorithms but sequence alignment search tools also require the definition of a reference database to find homologies. Identification validation approaches require the generation of target/decoy databases and the addition of contaminant proteins, spiked proteins/peptides, ... And finally, publicly available protein sequence databases such as UniProtKB or NCBInr become huge and difficult to handle with basic office tools.
For all those reasons, this database toolbox was designed to facilitate the generation of custom databases with the possibility to extract any specific taxonomy from UniProtKB, UniProtKB-SwissProt and NCBInr databases, add custom sequences, common contaminants, generate decoy databases, upload and merge databases.

Related tutorial:

Database searches A user-friendly interface to run database searches with OMSSA [PubMed] and using grid computing.
Database searches are central in the high throughput MS/MS data interpretation workflow. The open-source OMSSA algorithm has proven to be a robust and valuable database search algorithm and has been implemented on MSDA through a user-friendly interface. The user can set all search parameters, define custom modifications and dynamically select databases from the database toolbox.
Being one of the most computationally intensive steps of the proteomics workflow, the tools have been developed on MSDA to run database searches on the biomed virtual organization of the European Grid Infrastructure.

Related tutorials:

De novo searches An automated de novo search pipeline combining PepNovo [PubMed], MSBlast [PubMed], and an in-house protein grouping algorithm and using grid computing.
Despite significant advances in high throughput strategies for protein identification using database searches, the identification of proteins from organisms not well represented in protein databases still remains a challenging open problem. Besides that, de novo sequencing will always remain necessary to identify sequence variants, even for well-annotated organisms. Therefore, this fully automated de novo pipeline has been implemented on MSDA and is run, as being even more resource consuming than database searches, on the biomed virtual organization of the European Grid Infrastructure.

Related tutorials:

Annotation explorer A comprehensive search facility to automatically extract GO ontologies for long lists of genes/proteins and obtain a graph-based visualization of the GO hierarchy.
Due to the continuously growing number of proteins that are identified and quantified in large-scale proteomics studies, the comprehensive extraction of meaningful and relevant biological information has, at least in part, to be automated. The Annotation Explorer module of MSDA has been developed to automatically extract in batch mode (i.e. for long lists of proteins or transcripts) all GO ontologies (GO terms) assigned to given entries in the GO database (, along with their term ancestries (parental lineage) to bring a broader view of functions that are present in a given proteome or of differential expression patterns. The graph-based visualization of extracted GO terms, including ancestries, is provided through a reconstructed AmiGO URL link. From the biology point of view, this Annotation Explorer tool therefore helps even non-specialists to identify the most relevant functional categories in huge datasets.

Related tutorials:

MSDA overview

Downloadable desktop software
DownloadDownload a compressed version of the macro
Validor is a macro for Microsoft™ Excel©, designed to extract peptide pairs with two different chemical derivatizations from identification results.
Validor was developed to automate our TMPP-based N-terminomics workflow and can also be used for any other mass adducts or PTMs.
Two extracted peptides must have the same accession number, the same peptide sequence and close retention times. The macro also extracts peptides with the same accession numbers and the same peptide start positions.
See the following publication for more detailed information : An improved stable isotope N-terminal labeling approach with light/heavy TMPP to automate proteogenomics data validation : dN-TOP

DownloadDownload a compressed version of Recover
Recover is a MS/MS spectra viewer/extractor designed to extract "high quality" spectra from peaklist files.
Recover has been developed to filter out high quality spectra from peaklists based on the following user-adjustable variables:
Spectrum quality filters:
  • The Emergence (E) is a multiplication factor applied to the noise level (computed with an appropriate algorithm for each spectrum) allowing to define "Useful Peaks" with intensities higher than E x noise level.
  • The Useful Peaks Number (UPN) is the minimal number of upper defined Useful Peaks contained in a spectrum to be recovered.

Additional filters:
  • The charge state filter allows removing spectra according to the precursor charge states written in the peaklist.
  • Identification results: an excel file containing identification results can be loaded in order to remove spectra the have been previously identified.
  • Additional filtering options: Allows removing spectra with no fragment ions higher than the precursor (allows removing singly charged parent ions fragmentation spectra).

Once these filters adjusted, they can be applied in batch mode to multiple files and new peak lists can be exported for further alternative treatments such as:
  • De novo searches on high quality spectra only (see de novo searches in the upper MSDA online tools)
  • Database searches with multiple PTMs
  • Database searches in refined databases

Recover allows reducing resource and time losses during data processing caused by the high number of low quality spectra commonly remaining in peak lists. It also allows more refined searches on selected spectra with potential high informative value.

Related tutorial: