Contents Menu Expand Light mode Dark mode Auto light/dark mode Try online with binder
pyOpenMS 2.8.0 documentation
Launch Binder
Logo
pyOpenMS 2.8.0 documentation

Introduction

  • Background

Getting started

  • Installation
  • Import pyopenms
  • Getting help
  • First look at data
  • Introduction
  • Background

Mass Spectrometry Concepts

  • MS Data
  • Chemistry
  • Peptides and Proteins
  • Oligonucleotides: RNA
  • Fragment spectrum generation
  • Spectrum alignment
  • Digestion
  • Identification Data
  • Quantitative Data

OpenMS Algorithms

  • Parameter Handling
  • Algorithms
  • Smoothing
  • Centroiding
  • Spectrum normalization
  • Charge and Isotope Deconvolution
  • Feature Detection
  • Map Alignment
  • Feature Linking
  • Peptide Search
  • Chromagraphic Analysis
  • Quality Control
  • Mass Decomposition
  • Export files for GNPS

Example Workflows

  • Identification by Accurate Mass
  • Untargeted Metabolomics Pre-Processing

How to contribute

  • Contribute

Advanced Topics

  • Reading Raw MS data
  • Other MS data formats
  • mzML files
  • Scoring spectra with HyperScore
  • Export to pandas DataFrame
  • Query MSExperiment with MassQL
  • Memory management
  • pyOpenMS in R
  • Build from source
  • Wrapping Workflow and wrapping new Classes
  • Interactive plots
  • Interfacing with ML libraries

Full API documentation

  • Full API documentation
    • pyopenms
      • pyopenms.free_mem
      • pyopenms.peptide_identifications_to_df
      • pyopenms.update_scores_from_df
      • AAIndex
      • AASeqWithMass
      • AASequence
      • AMSE_AdductInfo
      • AQS_featureConcentration
      • AQS_runConcentration
      • AScore
      • AbsoluteQuantitation
      • AbsoluteQuantitationMethod
      • AbsoluteQuantitationMethodFile
      • AbsoluteQuantitationStandards
      • AbsoluteQuantitationStandardsFile
      • AccurateMassSearchEngine
      • AccurateMassSearchResult
      • Acquisition
      • AcquisitionInfo
      • ActivationMethod
      • Adduct
      • AggregationMethod
      • AnalysisSummary
      • AnnotationState
      • AnnotationStatistics
      • ArrayWrapperFloat
      • Assay
      • Attachment
      • AverageLinkage
      • BSpline2d
      • Base64
      • BaseFeature
      • BasicProteinInferenceAlgorithm
      • BayesianProteinInferenceAlgorithm
      • BernNorm
      • BiGaussFitter1D
      • BiGaussModel
      • BilinearInterpolation
      • BinnedSpectrum
      • BoundaryCondition
      • BoxElement
      • CV
      • CVMappingFile
      • CVMappingRule
      • CVMappingTerm
      • CVMappings
      • CVReference
      • CVTerm
      • CVTermList
      • CVTermListInterface
      • CVTerm_ControlledVocabulary
      • CachedMzMLHandler
      • CachedSwathFileConsumer
      • CachedmzML
      • CalibrationData
      • ChannelInfo
      • ChargePair
      • ChargedIndexSet
      • ChecksumType
      • ChromatogramExtractor
      • ChromatogramExtractorAlgorithm
      • ChromatogramPeak
      • ChromatogramSettings
      • ChromatogramTools
      • ChromeleonFile
      • ClusterProxyKD
      • ClusteringGrid
      • CoarseIsotopePatternGenerator
      • ColumnHeader
      • CompNovoIdentification
      • CompNovoIdentificationCID
      • CompNovoIonScoring
      • CompNovoIonScoringCID
      • ComplementFilter
      • ComplementMarker
      • Compomer
      • Compound
      • ConfidenceScoring
      • Configuration
      • ConsensusFeature
      • ConsensusIDAlgorithm
      • ConsensusIDAlgorithmAverage
      • ConsensusIDAlgorithmBest
      • ConsensusIDAlgorithmIdentity
      • ConsensusIDAlgorithmPEPIons
      • ConsensusIDAlgorithmPEPMatrix
      • ConsensusIDAlgorithmRanks
      • ConsensusIDAlgorithmSimilarity
      • ConsensusIDAlgorithmWorst
      • ConsensusMap
      • ConsensusMapNormalizerAlgorithmMedian
      • ConsensusMapNormalizerAlgorithmQuantile
      • ConsensusMapNormalizerAlgorithmThreshold
      • ConsensusXMLFile
      • Contact
      • ContactPerson
      • ContinuousWaveletTransform
      • ContinuousWaveletTransformNumIntegration
      • ControlledVocabulary
      • ConvexHull2D
      • CrossLinkSpectrumMatch
      • CrossLinksDB
      • CsiAdapterHit
      • CsiAdapterIdentification
      • CsiAdapterRun
      • CsiFingerIdMzTabWriter
      • CsvFile
      • CubicSpline2d
      • DBoundingBox2
      • DIAScoring
      • DPosition1
      • DPosition2
      • DRange1
      • DRange2
      • DRangeIntersection
      • DTA2DFile
      • DTAFile
      • DataFilter
      • DataFilters
      • DataProcessing
      • DataType
      • DataValue
      • Date
      • DateTime
      • DeNovoIonScore
      • DecoyGenerator
      • DecoyTransitionType
      • Deisotoper
      • DigestSimulation
      • Digestion
      • DigestionEnzyme
      • DigestionEnzymeProtein
      • DigestionEnzymeRNA
      • DimensionDescription
      • DistanceMatrix
      • DocumentIdentifier
      • DriftTimeUnit
      • EDTAFile
      • Element
      • ElementDB
      • ElutionModelFitter
      • ElutionPeakDetection
      • EmgFitter1D
      • EmgGradientDescent
      • EmgModel
      • EmgScoring
      • EmpiricalFormula
      • EnzymaticDigestion
      • EnzymaticDigestionLogModel
      • ExperimentalDesign
      • ExperimentalDesignFile
      • ExperimentalDesign_MSFileSectionEntry
      • ExperimentalDesign_SampleSection
      • ExperimentalSettings
      • ExtractionCoordinates
      • FASTAEntry
      • FASTAFile
      • FIAMSDataProcessor
      • FIAMSScheduler
      • FalseDiscoveryRate
      • Feature
      • FeatureDeconvolution
      • FeatureDistance
      • FeatureFileOptions
      • FeatureFinder
      • FeatureFinderAlgorithmIsotopeWavelet
      • FeatureFinderAlgorithmMetaboIdent
      • FeatureFinderAlgorithmPicked
      • FeatureFinderIdentificationAlgorithm
      • FeatureFinderMetaboIdentCompound
      • FeatureFinderMultiplexAlgorithm
      • FeatureFindingMetabo
      • FeatureGroupingAlgorithm
      • FeatureGroupingAlgorithmKD
      • FeatureGroupingAlgorithmLabeled
      • FeatureGroupingAlgorithmQT
      • FeatureGroupingAlgorithmUnlabeled
      • FeatureHandle
      • FeatureMap
      • FeatureMapping
      • FeatureMapping_FeatureMappingInfo
      • FeatureMapping_FeatureToMs2Indices
      • FeatureXMLFile
      • File
      • FileHandler
      • FileType
      • FileTypes
      • FilterFunctor
      • FineIsotopePatternGenerator
      • Fitter1D
      • FloatDataArray
      • GNPSMGFFile
      • GNPSMetaValueFile
      • GNPSQuantificationFile
      • GaussFilter
      • GaussFitResult
      • GaussFitter
      • GaussTraceFitter
      • GoodDiffFilter
      • Gradient
      • GridBasedCluster
      • HMMState
      • HPLC
      • HiddenMarkovModel
      • HyperScore
      • IBSpectraFile
      • IDConflictResolverAlgorithm
      • IDDecoyProbability
      • IDFilter
      • IDMapper
      • IDMapper_SpectraIdentificationState
      • IDRipper
      • ILPDCWrapper
      • IMFormat
      • IMSAlphabet
      • IMSElement
      • IMSIsotopeDistribution
      • IMSIsotopeDistribution_Peak
      • IMSWeights
      • IMTypes
      • ISDGroup
      • ITRAQ_TYPES
      • IdXMLFile
      • Identification
      • IdentificationData
      • IdentificationDataConverter
      • IdentificationHit
      • IdentificationRuns
      • IncludeExcludeTarget
      • InclusionExclusionList
      • IndexTriple
      • IndexedMzMLDecoder
      • IndexedMzMLFileLoader
      • IndexedMzMLHandler
      • InspectInfile
      • InspectOutfile
      • Instrument
      • InstrumentSettings
      • IntegerDataArray
      • IntensityBalanceFilter
      • Interfaces
      • InternalCalibration
      • InternalCalibration_LockMass
      • Internal_MzMLValidator
      • InterpolationModel
      • IonDetector
      • IonIdentityMolecularNetworking
      • IonOpticsType
      • IonSource
      • IonType
      • IsobaricChannelExtractor
      • IsobaricChannelInformation
      • IsobaricIsotopeCorrector
      • IsobaricNormalizer
      • IsobaricQuantifier
      • IsobaricQuantifierStatistics
      • IsotopeCluster
      • IsotopeDiffFilter
      • IsotopeDistribution
      • IsotopeDistributionCache
      • IsotopeFitter1D
      • IsotopeLabelingMDVs
      • IsotopeMarker
      • IsotopeModel
      • IsotopePattern
      • IsotopeVariant
      • IsotopeWavelet
      • IsotopeWaveletTransform
      • ItraqConstants
      • ItraqEightPlexQuantitationMethod
      • ItraqFourPlexQuantitationMethod
      • JavaInfo
      • KDTreeFeatureMaps
      • KDTreeFeatureNode
      • Kernel_MassTrace
      • KroenikFile
      • LLMParam
      • LPWrapper
      • LabeledPairFinder
      • LibSVMEncoder
      • LightCompound
      • LightMRMTransitionGroupCP
      • LightModification
      • LightProtein
      • LightTargetedExperiment
      • LightTransition
      • LinearInterpolation
      • LinearResampler
      • LinearResamplerAlign
      • LocalLinearMap
      • LogType
      • LowessSmoothing
      • MRMAssay
      • MRMDecoy
      • MRMFP_ComponentGroupParams
      • MRMFP_ComponentParams
      • MRMFQC_ComponentGroupPairQCs
      • MRMFQC_ComponentGroupQCs
      • MRMFQC_ComponentQCs
      • MRMFeature
      • MRMFeatureFilter
      • MRMFeatureFinderScoring
      • MRMFeaturePicker
      • MRMFeaturePickerFile
      • MRMFeatureQC
      • MRMFeatureQCFile
      • MRMFragmentSelection
      • MRMIonSeries
      • MRMMapping
      • MRMRTNormalizer
      • MRMScoring
      • MRMTransitionGroupCP
      • MRMTransitionGroupPicker
      • MS2File
      • MSChromatogram
      • MSDGroup
      • MSDataAggregatingConsumer
      • MSDataCachedConsumer
      • MSDataSqlConsumer
      • MSDataStoringConsumer
      • MSExperiment
      • MSNumpressCoder
      • MSPFile
      • MSPGenericFile
      • MSQuantifications
      • MSSim
      • MSSpectrum
      • MSstatsFile
      • MT_QUANTMETHOD
      • MZTrafoModel
      • MZTrafoModel_MODELTYPE
      • MapAlignmentAlgorithmIdentification
      • MapAlignmentAlgorithmKD
      • MapAlignmentAlgorithmPoseClustering
      • MapAlignmentAlgorithmSpectrumAlignment
      • MapAlignmentEvaluationAlgorithmPrecision
      • MapAlignmentEvaluationAlgorithmRecall
      • MapAlignmentTransformer
      • MapConversion
      • MarkerMower
      • MascotGenericFile
      • MascotInfile
      • MascotXMLFile
      • MassAnalyzer
      • MassDecomposition
      • MassDecompositionAlgorithm
      • MassExplainer
      • MassTrace
      • MassTraceDetection
      • MassTraces
      • MasstraceCorrelator
      • MatrixDouble
      • Measure
      • MetaInfo
      • MetaInfoDescription
      • MetaInfoInterface
      • MetaInfoRegistry
      • MetaboTargetedAssay
      • MetaboTargetedAssay_CompoundTargetDecoyPair
      • MetaboTargetedTargetDecoy
      • MetaboTargetedTargetDecoy_MetaboTargetDecoyMassMapping
      • MetaboliteFeatureDeconvolution
      • MetaboliteSpectralMatching
      • Modification
      • ModificationDefinition
      • ModificationDefinitionsSet
      • ModificationsDB
      • ModifiedPeptideGenerator
      • ModifiedPeptideGenerator_MapToResidueType
      • MorpheusScore
      • MorpheusScore_Result
      • MorphologicalFilter
      • MsInspectFile
      • MultiplexDeltaMasses
      • MultiplexDeltaMassesGenerator
      • MultiplexDeltaMassesGenerator_Label
      • MultiplexDeltaMasses_DeltaMass
      • MultiplexIsotopicPeakPattern
      • MzDataFile
      • MzIdentMLFile
      • MzMLFile
      • MzMLSpectrumDecoder
      • MzMLSqliteHandler
      • MzMLSwathFileConsumer
      • MzQCFile
      • MzQuantMLFile
      • MzTab
      • MzTabFile
      • MzTabM
      • MzTabMFile
      • MzXMLFile
      • NASequence
      • NLargest
      • NeutralLossDiffFilter
      • NeutralLossMarker
      • NoiseEstimator
      • NonNegativeLeastSquaresSolver
      • NoopMSDataWritingConsumer
      • NormalizationMethod
      • Normalizer
      • NucleicAcidSpectrumGenerator
      • NumpressConfig
      • OMSSACSVFile
      • OMSSAXMLFile
      • OPXLDataStructs
      • OPXLHelper
      • OPXLSpectrumProcessingAlgorithms
      • OPXL_PreprocessedPairSpectra
      • OSBinaryDataArray
      • OSChromatogram
      • OSChromatogramMeta
      • OSSpectrum
      • OSSpectrumMeta
      • OSWFile
      • OSW_ChromExtractParams
      • OfflinePrecursorIonSelection
      • OnDiscMSExperiment
      • OpenMSBuildInfo
      • OpenMSOSInfo
      • OpenPepXLAlgorithm
      • OpenPepXLLFAlgorithm
      • OpenSwathDataAccessHelper
      • OpenSwathHelper
      • OpenSwathOSWWriter
      • OpenSwathScoring
      • OpenSwath_Scores
      • OpenSwath_Scores_Usage
      • OptimizationFunctions_PenaltyFactors
      • OptimizePeakDeconvolution
      • OptimizePeakDeconvolution_Data
      • OptimizePick
      • OptimizePick_Data
      • OriginAnnotationFormat
      • PI_PeakArea
      • PI_PeakBackground
      • PI_PeakShapeMetrics
      • PSLPFormulation
      • PSProteinInference
      • PScore
      • Param
      • ParamCTDFile
      • ParamEntry
      • ParamNode
      • ParamValue
      • ParamXMLFile
      • ParentPeakMower
      • Peak1D
      • Peak2D
      • PeakBoundary
      • PeakCandidate
      • PeakFileOptions
      • PeakIndex
      • PeakIntegrator
      • PeakIntensityPredictor
      • PeakMap
      • PeakMarker
      • PeakPickerCWT
      • PeakPickerHiRes
      • PeakPickerIterative
      • PeakPickerMRM
      • PeakPickerMaxima
      • PeakPickerSH
      • PeakShape
      • PeakSpectrum
      • PeakTypeEstimator
      • PeakWidthEstimator
      • PenaltyFactorsIntensity
      • PepXMLFile
      • PepXMLFileMascot
      • Peptide
      • PeptideAndProteinQuant
      • PeptideAndProteinQuant_PeptideData
      • PeptideAndProteinQuant_ProteinData
      • PeptideAndProteinQuant_Statistics
      • PeptideEntry
      • PeptideEvidence
      • PeptideHit
      • PeptideHit_AnalysisResult
      • PeptideHit_PeakAnnotation
      • PeptideIdentification
      • PeptideIndexing
      • PeptideProteinResolution
      • PeptideProteinResolution_ConnectedComponent
      • PercolatorFeatureSetHelper
      • PercolatorInfile
      • PercolatorOutfile
      • PlainMSDataWritingConsumer
      • PosteriorErrorProbabilityModel
      • Precursor
      • PrecursorCorrection
      • PrecursorIonSelection
      • PrecursorIonSelectionPreprocessing
      • PrecursorPurity
      • Prediction
      • ProbablePhosphoSites
      • Product
      • ProgressLogger
      • ProtXMLFile
      • ProteaseDB
      • ProteaseDigestion
      • Protein
      • ProteinEntry
      • ProteinGroup
      • ProteinHit
      • ProteinIdentification
      • ProteinInference
      • ProteinProteinCrossLink
      • ProteinResolver
      • ProteinResolverResult_Type
      • ProtonDistributionModel
      • Publication
      • PurityScores
      • QTCluster
      • QTClusterFinder
      • QcMLFile
      • QualityParameter
      • QuantitativeExperimentalDesign
      • QuotingMethod
      • RANSAC
      • RANSACParam
      • RANSACQuadratic
      • RNPxlMarkerIonExtractor
      • RNPxlModificationMassesResult
      • RNPxlModificationsGenerator
      • RNPxlReport
      • RNPxlReportRow
      • RNPxlReportRowHeader
      • RNaseDB
      • RNaseDigestion
      • RansacModelLinear
      • RansacModelQuadratic
      • Ratio
      • ReactionMonitoringTransition
      • RealMassDecomposer
      • RegularSwathFileConsumer
      • Residue
      • ResidueDB
      • ResidueModification
      • ResolverResult
      • RetentionTime
      • Ribonucleotide
      • RibonucleotideDB
      • RichPeak2D
      • RipFileContent
      • RipFileIdentifier
      • SIDE
      • SILACLabeler
      • SVMData
      • SVMPrediction
      • SVMWrapper
      • Sample
      • SavitzkyGolayFilter
      • Scaler
      • ScanMode
      • ScanWindow
      • SearchParameters
      • Seed
      • SeedListGenerator
      • SemanticValidator
      • SemanticValidator_CVTerm
      • SequestInfile
      • SequestOutfile
      • SignalToNoiseEstimatorMeanIterative
      • SignalToNoiseEstimatorMedian
      • SignalToNoiseEstimatorMedianChrom
      • SignalToNoiseEstimatorMedianRapid
      • SimProtein
      • SimRandomNumberGenerator
      • SimpleOpenMSSpectraFactory
      • SimplePairFinder
      • SimplePeak
      • SimpleSVM
      • SimpleSearchEngineAlgorithm
      • SimpleTSGXLMS
      • SiriusAdapterAlgorithm
      • SiriusAdapterHit
      • SiriusAdapterIdentification
      • SiriusAdapterRun
      • SiriusFragmentAnnotation
      • SiriusFragmentAnnotation_SiriusTargetDecoySpectra
      • SiriusMSFile
      • SiriusMSFile_AccessionInfo
      • SiriusMSFile_CompoundInfo
      • SiriusMzTabWriter
      • SiriusMzTabWriter_SiriusSpectrumMSInfo
      • SiriusTemporaryFileSystemObjects
      • Software
      • SolverParam
      • SourceFile
      • SpectraMerger
      • SpectraSTSimilarityScore
      • SpectralMatch
      • SpectrumAccessOpenMS
      • SpectrumAccessOpenMSCached
      • SpectrumAccessOpenMSInMemory
      • SpectrumAccessQuadMZTransforming
      • SpectrumAccessSqMass
      • SpectrumAccessTransforming
      • SpectrumAlignment
      • SpectrumAlignmentScore
      • SpectrumAnnotator
      • SpectrumHelper
      • SpectrumIdentification
      • SpectrumLookup
      • SpectrumMetaData
      • SpectrumMetaDataLookup
      • SpectrumSettings
      • SplineInterpolatedPeaks
      • SplinePackage
      • SplineSpectrum_Navigator
      • SqMassConfig
      • SqMassFile
      • SqrtMower
      • StablePairFinder
      • String
      • StringDataArray
      • StringView
      • SvmModelParameterSet
      • SvmTheoreticalSpectrumGenerator
      • SvmTheoreticalSpectrumGeneratorSet
      • SvmTheoreticalSpectrumGeneratorTrainer
      • SwathFile
      • SwathMap
      • SwathMapMassCorrection
      • SwathWindowLoader
      • SysInfo
      • TICFilter
      • TMTEighteenPlexQuantitationMethod
      • TMTElevenPlexQuantitationMethod
      • TMTSixPlexQuantitationMethod
      • TMTSixteenPlexQuantitationMethod
      • TMTTenPlexQuantitationMethod
      • TM_DataPoint
      • TOFCalibration
      • TSE_Match
      • Tagger
      • Tagging
      • TargetedExperiment
      • TargetedExperiment_Instrument
      • TargetedExperiment_Interpretation
      • TargetedExperiment_Modification
      • TargetedSpectraExtractor
      • TermSpecificityNuc
      • TextFile
      • TheoreticalIsotopePattern
      • TheoreticalSpectrumGenerator
      • TheoreticalSpectrumGeneratorXLMS
      • ThresholdMower
      • ToolInfo
      • TraMLFile
      • TraMLProduct
      • TraceInfo
      • TransformationDescription
      • TransformationModelBSpline
      • TransformationModelInterpolated
      • TransformationModelLinear
      • TransformationModelLowess
      • TransformationStatistics
      • TransformationXMLFile
      • TransitionPQPFile
      • TransitionTSVFile
      • TwoDOptimization
      • UnimodXMLFile
      • UniqueIdGenerator
      • Unit
      • UnitType
      • ValueType
      • VersionDetails
      • VersionInfo
      • WindowMower
      • XFDRAlgorithm
      • XLPrecursor
      • XMLFile
      • XMLHandler
      • XQuestResultXMLFile
      • XQuestScores
      • XRefType_CVTerm_ControlledVocabulary
      • XTandemInfile
      • XTandemXMLFile
      • streampos
    • pyopenms.Constants
      • pyopenms.Constants.EPSILON
      • pyopenms.Constants.PI
      • pyopenms.Constants.E
      • pyopenms.Constants.ELEMENTARY_CHARGE
      • pyopenms.Constants.ELECTRON_MASS
      • pyopenms.Constants.ELECTRON_MASS_U
      • pyopenms.Constants.PROTON_MASS
      • pyopenms.Constants.PROTON_MASS_U
      • pyopenms.Constants.C13C12_MASSDIFF_U
      • pyopenms.Constants.NEUTRON_MASS
      • pyopenms.Constants.NEUTRON_MASS_U
      • pyopenms.Constants.AVOGADRO
      • pyopenms.Constants.BOLTZMANN
      • pyopenms.Constants.PLANCK
      • pyopenms.Constants.GAS_CONSTANT
      • pyopenms.Constants.FARADAY
      • pyopenms.Constants.BOHR_RADIUS
      • pyopenms.Constants.VACUUM_PERMITTIVITY
      • pyopenms.Constants.VACUUM_PERMEABILITY
      • pyopenms.Constants.SPEED_OF_LIGHT
      • pyopenms.Constants.GRAVITATIONAL_CONSTANT
      • pyopenms.Constants.FINE_STRUCTURE_CONSTANT
      • pyopenms.Constants.DEG_PER_RAD
      • pyopenms.Constants.RAD_PER_DEG
      • pyopenms.Constants.MM_PER_INCH
      • pyopenms.Constants.M_PER_FOOT
      • pyopenms.Constants.JOULE_PER_CAL
      • pyopenms.Constants.CAL_PER_JOULE
      • pyopenms.Constants.PRECURSOR_ERROR_PPM_USERPARAM
      • pyopenms.Constants.FRAGMENT_ANNOTATION_USERPARAM
    • pyopenms.plotting
      • pyopenms.plotting.mirror_plot_spectrum
      • pyopenms.plotting.plot_chromatogram
      • pyopenms.plotting.plot_spectrum

Support

  • Support
  • Frequently Asked Questions
  • OpenMS Glossary
Back to top
Launch Binder

Peptide Search#

In MS-based proteomics, fragment ion spectra (MS2 spectra) are often interpreted by comparing them against a theoretical set of spectra generated from a FASTA database. OpenMS contains a (simple) implementation of such a “search engine” that compares experimental spectra against theoretical spectra generated from an enzymatic or chemical digest of a proteome (e.g. tryptic digest).

_images/database_search_illustration.png

SimpleSearch#

In most proteomics applications, a dedicated search engine (such as Comet, Crux, Mascot, MSGFPlus, MSFragger, Myrimatch, OMSSA, SpectraST or XTandem; all of which are supported by pyOpenMS) will be used to search data. Here, we will use the internal SimpleSearchEngineAlgorithm from OpenMS used for teaching purposes. This makes it very easy to search an (experimental) mzML file against a fasta database of protein sequences:

 1from urllib.request import urlretrieve
 2from pyopenms import *
 3
 4gh = "https://raw.githubusercontent.com/OpenMS/pyopenms-docs/master"
 5urlretrieve(gh + "/src/data/SimpleSearchEngine_1.mzML", "searchfile.mzML")
 6urlretrieve(gh + "/src/data/SimpleSearchEngine_1.fasta", "search.fasta")
 7protein_ids = []
 8peptide_ids = []
 9SimpleSearchEngineAlgorithm().search(
10    "searchfile.mzML", "search.fasta", protein_ids, peptide_ids
11)

This will print search engine output including the number of peptides and proteins in the database and how many spectra were matched to peptides and proteins:

Peptide statistics

  unmatched                : 0 (0 %)
  target/decoy:
    match to target DB only: 3 (100 %)
    match to decoy DB only : 0 (0 %)
    match to both          : 0 (0 %)

PSM inspection#

We can now investigate the individual hits as we have done before in the Identification tutorial.

 1for peptide_id in peptide_ids:
 2    # Peptide identification values
 3    print(35 * "=")
 4    print("Peptide ID m/z:", peptide_id.getMZ())
 5    print("Peptide ID rt:", peptide_id.getRT())
 6    print("Peptide scan index:", peptide_id.getMetaValue("scan_index"))
 7    print("Peptide scan name:", peptide_id.getMetaValue("scan_index"))
 8    print("Peptide ID score type:", peptide_id.getScoreType())
 9    # PeptideHits
10    for hit in peptide_id.getHits():
11        print(" - Peptide hit rank:", hit.getRank())
12        print(" - Peptide hit charge:", hit.getCharge())
13        print(" - Peptide hit sequence:", hit.getSequence())
14        mz = (
15            hit.getSequence().getMonoWeight(
16                Residue.ResidueType.Full, hit.getCharge()
17            )
18            / hit.getCharge()
19        )
20        print(" - Peptide hit monoisotopic m/z:", mz)
21        print(
22            " - Peptide ppm error:", abs(mz - peptide_id.getMZ()) / mz * 10**6
23        )
24        print(" - Peptide hit score:", hit.getScore())

We notice that the second peptide spectrum match (PSM) was found for the third spectrum in the file for a precursor at 775.38 m/z for the sequence RPGADSDIGGFGGLFDLAQAGFR.

 1tsg = TheoreticalSpectrumGenerator()
 2thspec = MSSpectrum()
 3p = Param()
 4p.setValue("add_metainfo", "true")
 5tsg.setParameters(p)
 6peptide = AASequence.fromString("RPGADSDIGGFGGLFDLAQAGFR")
 7tsg.getSpectrum(thspec, peptide, 1, 1)
 8# Iterate over annotated ions and their masses
 9for ion, peak in zip(thspec.getStringDataArrays()[0], thspec):
10    print(ion, peak.getMZ())
11
12e = MSExperiment()
13MzMLFile().load("searchfile.mzML", e)
14print("Spectrum native id", e[2].getNativeID())
15mz, i = e[2].get_peaks()
16peaks = [(mz, i) for mz, i in zip(mz, i) if i > 1500 and mz > 300]
17for peak in peaks:
18    print(peak[0], "mz", peak[1], "int")

Comparing the theoretical spectrum and the experimental spectrum for RPGADSDIGGFGGLFDLAQAGFR we can easily see that the most abundant ions in the spectrum are y8 (877.452 m/z), b10 (926.432), y9 (1024.522 m/z) and b13 (1187.544 m/z).

Visualization#

When loading the searchfile.mzML into the OpenMS visualization software TOPPView, we can convince ourselves that the observed spectrum indeed was generated by the peptide RPGADSDIGGFGGLFDLAQAGFR by loading the corresponding theoretical spectrum into the viewer using “Tools”->”Generate theoretical spectrum”:

_images/psm.png

From our output above, we notice that the second peptide spectrum match (PSM) at 775.38 m/z for sequence RPGADSDIGGFGGLFDLAQAGFR was found with an error tolerance of 2.25 ppm, therefore if we set the precursor mass tolerance to 4 ppm (+/- 2ppm), we expect that we will not find the hit at 775.38 m/z any more:

 1salgo = SimpleSearchEngineAlgorithm()
 2p = salgo.getDefaults()
 3print(p.items())
 4p[b"precursor:mass_tolerance"] = 4.0
 5salgo.setParameters(p)
 6
 7protein_ids = []
 8peptide_ids = []
 9salgo.search("searchfile.mzML", "search.fasta", protein_ids, peptide_ids)
10print("Found", len(peptide_ids), "peptides")

As we can see, using a smaller precursor mass tolerance leads the algorithm to find only one hit instead of two. Similarly, if we use the wrong enzyme for the digestion (e.g. p[b'enzyme'] = "Formic_acid"), we find no results.

More detailed example#

Now include some additional decoy database generation step as well as subsequent FDR filtering.

 1from urllib.request import urlretrieve
 2
 3searchfile = "../../src/data/BSA1.mzML"
 4searchdb = "../../src/data/18Protein_SoCe_Tr_detergents_trace.fasta"
 5
 6# generate a protein database with additional decoy sequenes
 7targets = list()
 8decoys = list()
 9FASTAFile().load(
10    searchdb, targets
11)  # read FASTA file into a list of FASTAEntrys
12decoy_generator = DecoyGenerator()
13for entry in targets:
14    rev_entry = FASTAEntry(entry)  # copy entry
15    rev_entry.identifier = "DECOY_" + rev_entry.identifier  # mark as decoy
16    aas = AASequence().fromString(
17        rev_entry.sequence
18    )  # convert string into amino acid sequence
19    rev_entry.sequence = decoy_generator.reverseProtein(
20        aas
21    ).toString()  # reverse
22    decoys.append(rev_entry)
23
24target_decoy_database = "search_td.fasta"
25FASTAFile().store(
26    target_decoy_database, targets + decoys
27)  # store the database with appended decoy sequences
28
29# Run SimpleSearchAlgorithm, store protein and peptide ids
30protein_ids = []
31peptide_ids = []
32
33# set some custom search parameters
34simplesearch = SimpleSearchEngineAlgorithm()
35params = simplesearch.getDefaults()
36score_annot = [b"fragment_mz_error_median_ppm", b"precursor_mz_error_ppm"]
37params.setValue(b"annotate:PSM", score_annot)
38params.setValue(b"peptide:max_size", 30)
39simplesearch.setParameters(params)
40
41simplesearch.search(searchfile, target_decoy_database, protein_ids, peptide_ids)
42
43# Annotate q-value
44FalseDiscoveryRate().apply(peptide_ids)
45
46# Filter by 1% PSM FDR (q-value < 0.01)
47idfilter = IDFilter()
48idfilter.filterHitsByScore(peptide_ids, 0.01)
49idfilter.removeDecoyHits(peptide_ids)
50
51# store PSM-FDR filtered
52IdXMLFile().store(
53    "searchfile_results_1perc_FDR.idXML", protein_ids, peptide_ids
54)

However, usually researchers are interested in the most confidently identified proteins. This so called protein inference problem is a difficult problem because of often occurring shared/ambiguous peptides. To be able to calculate a target/decoy-based FDR on the protein level, we need to assign scores to proteins first (e.g. based on their observed peptides). This is done by applying one of the available protein inference algorithms on the peptide and protein IDs.

 1protein_ids = []
 2peptide_ids = []
 3
 4# Re-run search since we need to keep decoy hits for inference
 5simplesearch.search(searchfile, target_decoy_database, protein_ids, peptide_ids)
 6
 7# Run inference
 8bpia = BasicProteinInferenceAlgorithm()
 9params = bpia.getDefaults()
10# FDR with groups currently not supported in pyopenms
11params.setValue("annotate_indistinguishable_groups", "false")
12bpia.setParameters(params)
13bpia.run(peptide_ids, protein_ids)
14
15
16# Annotate q-value on protein level
17# Removes decoys in default settings
18FalseDiscoveryRate().apply(protein_ids)
19
20# Filter targets by 1% protein FDR (q-value < 0.01)
21idfilter = IDFilter()
22idfilter.filterHitsByScore(protein_ids, 0.01)
23
24# Restore valid references into the proteins
25remove_peptides_without_reference = True
26idfilter.updateProteinReferences(
27    peptide_ids, protein_ids, remove_peptides_without_reference
28)
29
30# store protein-FDR filtered
31IdXMLFile().store(
32    "searchfile_results_1perc_protFDR.idXML", protein_ids, peptide_ids
33)
Next
Chromagraphic Analysis
Previous
Feature Linking
Copyright © 2023, OpenMS Team
Made with Sphinx and @pradyunsg's Furo
On this page
  • Peptide Search
    • SimpleSearch
    • PSM inspection
    • Visualization
    • More detailed example