IDFilter

class pyopenms.IDFilter

Bases: object

Cython implementation of _IDFilter

Documentation is available at http://www.openms.de/current_doxygen/html/classOpenMS_1_1IDFilter.html

This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context. —– The filter functions modify their inputs, rather than creating filtered copies. —– Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID. —– The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, …). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary. —– The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.

__init__()
  • Cython signature: void IDFilter()

  • Cython signature: void IDFilter(IDFilter &)

Methods

__init__

  • Cython signature: void IDFilter()

countHits

  • Cython signature: size_t countHits(libcpp_vector[PeptideIdentification] identifications)

extractPeptideSequences

Cython signature: void extractPeptideSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & sequences, bool ignore_mods)

filterHitsByRank

  • Cython signature: void filterHitsByRank(libcpp_vector[PeptideIdentification] & ids, size_t min_rank, size_t max_rank)

filterHitsByScore

  • Cython signature: void filterHitsByScore(libcpp_vector[PeptideIdentification] & ids, double threshold_score)

filterPeptidesByCharge

Cython signature: void filterPeptidesByCharge(libcpp_vector[PeptideIdentification] & peptides, size_t min_charge, size_t max_charge) Filters peptide identifications according to charge state

filterPeptidesByLength

Cython signature: void filterPeptidesByLength(libcpp_vector[PeptideIdentification] & peptides, size_t min_length, size_t max_length) Filters peptide identifications according to peptide sequence length

filterPeptidesByMZ

Cython signature: void filterPeptidesByMZ(libcpp_vector[PeptideIdentification] & peptides, size_t min_mz, size_t max_mz) Filters peptide identifications by precursor m/z, keeping only IDs in the given range

filterPeptidesByMZError

Cython signature: void filterPeptidesByMZError(libcpp_vector[PeptideIdentification] & peptides, double mass_error, bool unit_ppm) Filter peptide identifications according to mass deviation

filterPeptidesByRT

Cython signature: void filterPeptidesByRT(libcpp_vector[PeptideIdentification] & peptides, size_t min_rt, size_t max_rt) Filters peptide identifications by precursor RT, keeping only IDs in the given range

filterPeptidesByRTPredictPValue

Cython signature: void filterPeptidesByRTPredictPValue(libcpp_vector[PeptideIdentification] & peptides, const String & metavalue_key, double threshold)

getBestHit

  • Cython signature: bool getBestHit(libcpp_vector[PeptideIdentification] identifications, bool assume_sorted, PeptideHit & best_hit)

keepBestPeptideHits

Cython signature: void keepBestPeptideHits(libcpp_vector[PeptideIdentification] & peptides, bool strict)

keepBestPerPeptide

Cython signature: void keepBestPerPeptide(libcpp_vector[PeptideIdentification] & peptides, bool ignore_mods, bool ignore_charges, size_t nr_best_spectrum) Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence

keepBestPerPeptidePerRun

Cython signature: void keepBestPerPeptidePerRun(libcpp_vector[ProteinIdentification] & prot_ids, libcpp_vector[PeptideIdentification] & peptides, bool ignore_mods, bool ignore_charges, size_t nr_best_spectrum) Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence on a per run basis

keepHitsMatchingProteins

  • Cython signature: void keepHitsMatchingProteins(libcpp_vector[PeptideIdentification] & ids, libcpp_set[String] accessions)

keepNBestHits

  • Cython signature: void keepNBestHits(libcpp_vector[PeptideIdentification] & ids, size_t n)

keepNBestSpectra

Cython signature: void keepNBestSpectra(libcpp_vector[PeptideIdentification] & peptides, size_t n) Filter identifications by "N best" PeptideIdentification objects (better PeptideIdentification means better [best] PeptideHit than other)

keepPeptidesWithMatchingModifications

Cython signature: void keepPeptidesWithMatchingModifications(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & modifications) Keeps only peptide hits that have at least one of the given modifications

keepPeptidesWithMatchingSequences

Cython signature: void keepPeptidesWithMatchingSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[PeptideIdentification] & bad_peptides, bool ignore_mods) Removes all peptide hits with a sequence that does not match one in 'good_peptides'

keepUniquePeptidesPerProtein

Cython signature: void keepUniquePeptidesPerProtein(libcpp_vector[PeptideIdentification] & peptides) Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

removeDecoyHits

  • Cython signature: void removeDecoyHits(libcpp_vector[PeptideIdentification] & ids)

removeDuplicatePeptideHits

Cython signature: void removeDuplicatePeptideHits(libcpp_vector[PeptideIdentification] & peptides) Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID)

removeEmptyIdentifications

  • Cython signature: void removeEmptyIdentifications(libcpp_vector[PeptideIdentification] & ids)

removeHitsMatchingProteins

  • Cython signature: void removeHitsMatchingProteins(libcpp_vector[PeptideIdentification] & ids, libcpp_set[String] accessions)

removePeptidesWithMatchingModifications

Cython signature: void removePeptidesWithMatchingModifications(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & modifications) Removes all peptide hits that have at least one of the given modifications

removePeptidesWithMatchingSequences

Cython signature: void removePeptidesWithMatchingSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[PeptideIdentification] & bad_peptides, bool ignore_mods) Removes all peptide hits with a sequence that matches one in 'bad_peptides'

removeUnreferencedProteins

Cython signature: void removeUnreferencedProteins(libcpp_vector[ProteinIdentification] & proteins, libcpp_vector[PeptideIdentification] & peptides) Removes protein hits from the protein IDs in a 'cmap' that are not referenced by a peptide in the features or if requested in the unassigned peptide list

updateHitRanks

  • Cython signature: void updateHitRanks(libcpp_vector[PeptideIdentification] & identifications)

updateProteinGroups

Cython signature: bool updateProteinGroups(libcpp_vector[ProteinGroup] & groups, libcpp_vector[ProteinHit] & hits)

updateProteinReferences

Cython signature: void updateProteinReferences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[ProteinIdentification] & proteins, bool remove_peptides_without_reference) Removes references to missing proteins.

DigestionFilter

alias of pyopenms.pyopenms_2.__DigestionFilter

countHits()
  • Cython signature: size_t countHits(libcpp_vector[PeptideIdentification] identifications) Returns the total number of peptide hits in a vector of peptide identifications

  • Cython signature: size_t countHits(libcpp_vector[ProteinIdentification] identifications) Returns the total number of protein hits in a vector of protein identifications

extractPeptideSequences()

Cython signature: void extractPeptideSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & sequences, bool ignore_mods)

:param peptides :param ignore_mods: Boolean operator default to false in case of any modifications in sequences during extraction :returns: Sequences

filterHitsByRank()
  • Cython signature: void filterHitsByRank(libcpp_vector[PeptideIdentification] & ids, size_t min_rank, size_t max_rank)

The hits between ‘min_rank’ and ‘max_rank’ (both inclusive) in each ID are kept Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1 The ranks are (re-)computed before filtering ‘max_rank’ is ignored if it is smaller than ‘min_rank’ —– Note: that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same) This method is useful if a range of higher hits is needed for decoy fairness analysis

  • Cython signature: void filterHitsByRank(libcpp_vector[ProteinIdentification] & ids, size_t min_rank, size_t max_rank)

The hits between ‘min_rank’ and ‘max_rank’ (both inclusive) in each ID are kept Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1 The ranks are (re-)computed before filtering ‘max_rank’ is ignored if it is smaller than ‘min_rank’ —– Note: that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same) This method is useful if a range of higher hits is needed for decoy fairness analysis

filterHitsByScore()
  • Cython signature: void filterHitsByScore(libcpp_vector[PeptideIdentification] & ids, double threshold_score) Filters peptide or protein identifications according to the score of the hits. The score orientation has to be set to higherscorebetter in each PeptideIdentification. Only peptide/protein hits with a score at least as good as ‘threshold_score’ are kept

  • Cython signature: void filterHitsByScore(libcpp_vector[ProteinIdentification] & ids, double threshold_score) Filters peptide or protein identifications according to the score of the hits. The score orientation has to be set to higherscorebetter in each PeptideIdentification/ProteinIdentifiation. Only peptide/protein hits with a score at least as good as ‘threshold_score’ are kept

  • Cython signature: void filterHitsByScore(MSExperiment & experiment, double peptide_threshold_score, double protein_threshold_score) Filters an MS/MS experiment according to score thresholds

filterPeptidesByCharge()

Cython signature: void filterPeptidesByCharge(libcpp_vector[PeptideIdentification] & peptides, size_t min_charge, size_t max_charge) Filters peptide identifications according to charge state

filterPeptidesByLength()

Cython signature: void filterPeptidesByLength(libcpp_vector[PeptideIdentification] & peptides, size_t min_length, size_t max_length) Filters peptide identifications according to peptide sequence length

filterPeptidesByMZ()

Cython signature: void filterPeptidesByMZ(libcpp_vector[PeptideIdentification] & peptides, size_t min_mz, size_t max_mz) Filters peptide identifications by precursor m/z, keeping only IDs in the given range

filterPeptidesByMZError()

Cython signature: void filterPeptidesByMZError(libcpp_vector[PeptideIdentification] & peptides, double mass_error, bool unit_ppm) Filter peptide identifications according to mass deviation

filterPeptidesByRT()

Cython signature: void filterPeptidesByRT(libcpp_vector[PeptideIdentification] & peptides, size_t min_rt, size_t max_rt) Filters peptide identifications by precursor RT, keeping only IDs in the given range

filterPeptidesByRTPredictPValue()

Cython signature: void filterPeptidesByRTPredictPValue(libcpp_vector[PeptideIdentification] & peptides, const String & metavalue_key, double threshold)

Parameters
  • peptides – Input/output

  • metavalue_key – Name of the meta value that holds the p-value: “predicted_RT_p_value” or “predicted_RT_p_value_first_dim”

  • threshold – P-value threshold

getBestHit()
  • Cython signature: bool getBestHit(libcpp_vector[PeptideIdentification] identifications, bool assume_sorted, PeptideHit & best_hit)

Parameters
  • identifications – Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits

  • assume_sorted – Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

  • best_hit – Contains the best hit if successful in a vector of peptide identifications

Returns

true if a hit was present, false otherwise

Parameters
  • identifications – Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits

  • assume_sorted – Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

  • best_hit – Contains the best hit if successful in a vector of protein identifications

Returns

true if a hit was present, false otherwise - Cython signature: bool getBestHit(libcpp_vector[ProteinIdentification] identifications, bool assume_sorted, ProteinHit & best_hit)

Parameters
  • identifications – Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits

  • assume_sorted – Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at

  • best_hit – Contains the best hit if successful in a vector of protein identifications

Returns

true if a hit was present, false otherwise

keepBestPeptideHits()

Cython signature: void keepBestPeptideHits(libcpp_vector[PeptideIdentification] & peptides, bool strict)

Parameters
  • peptides – Input/output

  • strict – If set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.)

keepBestPerPeptide()

Cython signature: void keepBestPerPeptide(libcpp_vector[PeptideIdentification] & peptides, bool ignore_mods, bool ignore_charges, size_t nr_best_spectrum) Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence

keepBestPerPeptidePerRun()

Cython signature: void keepBestPerPeptidePerRun(libcpp_vector[ProteinIdentification] & prot_ids, libcpp_vector[PeptideIdentification] & peptides, bool ignore_mods, bool ignore_charges, size_t nr_best_spectrum) Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence on a per run basis

keepHitsMatchingProteins()
  • Cython signature: void keepHitsMatchingProteins(libcpp_vector[PeptideIdentification] & ids, libcpp_set[String] accessions) Filters peptide or protein identifications according to the given proteins (positive)

  • Cython signature: void keepHitsMatchingProteins(libcpp_vector[ProteinIdentification] & ids, libcpp_set[String] accessions) Filters peptide or protein identifications according to the given proteins (positive)

  • Cython signature: void keepHitsMatchingProteins(MSExperiment & experiment, libcpp_vector[FASTAEntry] & proteins)

keepNBestHits()
  • Cython signature: void keepNBestHits(libcpp_vector[PeptideIdentification] & ids, size_t n)

  • Cython signature: void keepNBestHits(libcpp_vector[ProteinIdentification] & ids, size_t n)

  • Cython signature: void keepNBestHits(MSExperiment & experiment, size_t n) Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum

keepNBestSpectra()

Cython signature: void keepNBestSpectra(libcpp_vector[PeptideIdentification] & peptides, size_t n) Filter identifications by “N best” PeptideIdentification objects (better PeptideIdentification means better [best] PeptideHit than other)

keepPeptidesWithMatchingModifications()

Cython signature: void keepPeptidesWithMatchingModifications(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & modifications) Keeps only peptide hits that have at least one of the given modifications

keepPeptidesWithMatchingSequences()

Cython signature: void keepPeptidesWithMatchingSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[PeptideIdentification] & bad_peptides, bool ignore_mods) Removes all peptide hits with a sequence that does not match one in ‘good_peptides’

keepUniquePeptidesPerProtein()

Cython signature: void keepUniquePeptidesPerProtein(libcpp_vector[PeptideIdentification] & peptides) Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)

removeDecoyHits()
  • Cython signature: void removeDecoyHits(libcpp_vector[PeptideIdentification] & ids) Removes hits annotated as decoys from peptide or protein identifications. Checks for meta values named “target_decoy” and “isDecoy”, and removes protein/peptide hits if the values are “decoy” and “true”, respectively

  • Cython signature: void removeDecoyHits(libcpp_vector[ProteinIdentification] & ids) Removes hits annotated as decoys from peptide or protein identifications. Checks for meta values named “target_decoy” and “isDecoy”, and removes protein/peptide hits if the values are “decoy” and “true”, respectively

removeDuplicatePeptideHits()

Cython signature: void removeDuplicatePeptideHits(libcpp_vector[PeptideIdentification] & peptides) Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID)

removeEmptyIdentifications()
  • Cython signature: void removeEmptyIdentifications(libcpp_vector[PeptideIdentification] & ids) Removes peptide or protein identifications that have no hits in them

  • Cython signature: void removeEmptyIdentifications(libcpp_vector[ProteinIdentification] & ids) Removes peptide or protein identifications that have no hits in them

removeHitsMatchingProteins()
  • Cython signature: void removeHitsMatchingProteins(libcpp_vector[PeptideIdentification] & ids, libcpp_set[String] accessions) Filters peptide or protein identifications according to the given proteins (negative)

  • Cython signature: void removeHitsMatchingProteins(libcpp_vector[ProteinIdentification] & ids, libcpp_set[String] accessions) Filters peptide or protein identifications according to the given proteins (negative)

removePeptidesWithMatchingModifications()

Cython signature: void removePeptidesWithMatchingModifications(libcpp_vector[PeptideIdentification] & peptides, libcpp_set[String] & modifications) Removes all peptide hits that have at least one of the given modifications

removePeptidesWithMatchingSequences()

Cython signature: void removePeptidesWithMatchingSequences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[PeptideIdentification] & bad_peptides, bool ignore_mods) Removes all peptide hits with a sequence that matches one in ‘bad_peptides’

removeUnreferencedProteins()

Cython signature: void removeUnreferencedProteins(libcpp_vector[ProteinIdentification] & proteins, libcpp_vector[PeptideIdentification] & peptides) Removes protein hits from the protein IDs in a ‘cmap’ that are not referenced by a peptide in the features or if requested in the unassigned peptide list

updateHitRanks()
  • Cython signature: void updateHitRanks(libcpp_vector[PeptideIdentification] & identifications) Updates the hit ranks on all peptide or protein IDs

  • Cython signature: void updateHitRanks(libcpp_vector[ProteinIdentification] & identifications) Updates the hit ranks on all peptide or protein IDs

updateProteinGroups()

Cython signature: bool updateProteinGroups(libcpp_vector[ProteinGroup] & groups, libcpp_vector[ProteinHit] & hits)

Parameters
  • groups – Input/output protein groups

  • hits – Available protein hits (all others are removed from the groups)

Returns

Returns whether the groups are still valid (which is the case if only whole groups, if any, were removed)

updateProteinReferences()

Cython signature: void updateProteinReferences(libcpp_vector[PeptideIdentification] & peptides, libcpp_vector[ProteinIdentification] & proteins, bool remove_peptides_without_reference) Removes references to missing proteins. Only PeptideEvidence entries that reference protein hits in ‘proteins’ are kept in the peptide hits