PRIDE Utilities Libraries and Algorithms

ms-data-core-api

The ms-data-core-api is a modular and open-source library aimed to develop computational proteomics tools. The API, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data object model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra and identification data to quantification results. The library contains readers of three of the most used Proteomics Standards Initiative standard file formats (mzML, mzIdentML and mzTab). In addition to mzML, it also supports the other most commonly used mass spectra formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based).

Related Documents:

Project Issues
Project Wiki
JavaDoc

List of libraries and tools that use ms-data-core-api

Name	Repository	Description
PRIDE Inspector	pride-inspector	A Desktop visualisation tool for proteomics experiments
PRIDE protein inference library	pride-protein-inference	A library to compute protein inference in proteomics experiments
PRIDE Repository Pipelines	(PRIDE Database)	The Proteomics Identification Database

PRIDE Utilities

The PRIDE Utilities module contains a series of data structures and algorithms used by all the components of the PRIDE Inspector Toolsuite and other PRIDE Projects such as PRIDE Archive libraries and PRIDE Cluster. Some of the values defined in PRIDE Utilities are the definition of the amino acid mass table, pK values and hydrophobic indexes. The module also contains the mappings between different ontology terms meaning the same concept, e.g. the b ion annotation could be annotated using the PRIDE ontology term PRIDE:0000194 or the PSI-MS ontology term MS:1001224. These modules homogenize all the terms and concepts used in metadata annotations. For instance, the library contains the definition of the well-established search engines and processing software and their corresponding scores in different controlled vocabularies (CVs) or ontologies.

Related Documents:

Project Issues
Project Wiki
JavaDoc

jmzReader

The jmzReader Library is a collection of Java APIs to parse the most commonly used MS peak list formats. Currently, the library contains parsers for:

dta
mgf
ms2
mzData
mzXML
pkl
mzML
PRIDE XML

All parsers are optimized to be used in conjunction with mzIdentML (see link in the left panel). Based on a custom build class to efficiently parse text files line by line all parsers can handle arbitrary large files in minimal memory, allowing easy and efficient processing of peak list files using the Java programming language. mzIdentML files do not contain spectra data but refer to external peak list files. All peak list parsers support the methods used by mzIdentML to reference external spectra and implement a common interface. Thus, when developing software for mzIdentML programmers no longer have to support multiple peak list file formats but only this one interface. An example of how the jmzReader library can be used in conjunction with mzIdentML can be found in the wiki.

Related Documents:

Project Issues
Project Wiki

PRIDE Protein Inference

Protein Inference Algorithms (PIA) is a toolbox for MS based protein inference and identification analysis. PIA allows you to inspect the results of common proteomics spectrum identification search engines, combine them and conduct statistical analyses. The main focus of PIA lays on the integrated inference algorithms, i.e. concluding the proteins from a set of identified spectra. But it also allows you to inspect your peptide spectrum matches, calculate FDR values across different search engine results and visualize the correspondence between PSMs, peptides and proteins.

Most search engines for protein identification in MS/MS experiments return protein lists, although the actual search yields a set of peptide spectrum matches (PSMs). The step from PSMs to proteins is called “protein inference”. If a set of identified PSMs supports the detection of more than one protein in the searched database (“protein ambiguity”), usually only one representative accession is reported. These representatives may differ according to the used search engine and settings. Thus the protein lists of different search engines generally cannot be compared with one another. PSMs of complementary search engines are often combined to enhance the number of reported proteins or to verify the evidence of a peptide, which is improved by detection with distinct algorithms.

Related Documents:

Project Issues
Project Wiki
JavaDoc

PRIDE Modifications

The PRIDE Modification library is used to retrieve the protein modification information for a specific identifier from different databases: Unimod, PSI-MOD and the PRIDE Modification controlled vocabulary.

Protein post-translational modifications (PTM) increase the functional diversity of the proteome by the covalent addition of functional groups or proteins, proteolytic cleavage of regulatory subunits or degradation of entire proteins. These modifications include phosphorylation, glycosylation, ubiquitination, nitrosylation, methylation, acetylation, lipidation and proteolysis and influence almost all aspects of normal cell biology and pathogenesis. Therefore, identifying and understanding PTMs is critical in the study of cell biology and disease treatment and prevention. In addition to PTMs, there are other artefactual protein modifications that are added due to the experimental protocol followed by the researchers. Some examples are carbamydomethylation or oxidation. The proteomics community has developed tow major resources for protein modifications (including PTMs): Unimod13 and PSI-MOD14. However, modification idenfifiers from these two resources are not trivial to map since some of the modification in Unimod are not present in PSI-MOD and vice versa. Also, every search engine uses their notation and either Unimod or PSI-MOD. The PRIDE Modification library is used to retrieve the modification information for a specific identifier from different databases: Unimod, PSI-MOD and the PRIDE Modification controlled vocabulary (internal nomenclature used in PRIDE tools). This library is now used by different tools and pipelines.

Related Documents:

Project Issues
Project Wiki
JavaDoc

PRIDE XML Parser

The PRIDE XML JAXB library is a library for indexing and parsing PRIDE XML 2.1 files. This library does not load the whole file into the memory up-front, instead it employs XML indexing technique to index the file on the fly which gives you fast access and small meomory footprint. Additionally, all entities in PRIDE XML file are mapped as objects, and the internal reference between the objects are resolved automatically, this gives you direct access in the object model to entities that are only referenced by ID in the actual XML file.

Related Documents:

Project Issues
Project Wiki

Protein Details Fetcher

The Protein Details Fetcher is a library for indexing retrieving Protein Information and Details from different databases such as UNIPROT, ENSEMBL, NCBI, etc. It retrieve the information related with the sequence, the status of the protein.

Related Documents:

Project Issues
Project Wiki

About Us

PRIDE Development Team