ProteaseDigestion#
- class pyopenms.ProteaseDigestion#
Bases:
objectCython implementation of _ProteaseDigestion
- Original C++ documentation is available here
– Inherits from [‘EnzymaticDigestion’]
Class for the enzymatic digestion of proteins
Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.
Usage:
from pyopenms import * from urllib.request import urlretrieve # urlretrieve ("http://www.uniprot.org/uniprot/P02769.fasta", "bsa.fasta") # dig = ProteaseDigestion() dig.setEnzyme('Lys-C') bsa_string = "".join([l.strip() for l in open("bsa.fasta").readlines()[1:]]) bsa_oms_string = String(bsa_string) # convert python string to OpenMS::String for further processing # minlen = 6 maxlen = 30 # # Using AASequence and digest result_digest = [] result_digest_min_max = [] bsa_aaseq = AASequence.fromString(bsa_oms_string) dig.digest(bsa_aaseq, result_digest) dig.digest(bsa_aaseq, result_digest_min_max, minlen, maxlen) print(result_digest[4].toString()) # GLVLIAFSQYLQQCPFDEHVK print(len(result_digest)) # 57 peptides print(result_digest_min_max[4].toString()) # LVNELTEFAK print(len(result_digest_min_max)) # 42 peptides # # Using digestUnmodified without the need for AASequence from the EnzymaticDigestion base class result_digest_unmodified = [] dig.digestUnmodified(StringView(bsa_oms_string), result_digest_unmodified, minlen, maxlen) print(result_digest_unmodified[4].getString()) # LVNELTEFAK print(len(result_digest_unmodified)) # 42 peptides
- __init__()#
Overload:
- __init__(self) None
Overload:
- __init__(self, in_0: ProteaseDigestion) None
Methods
Overload:
countInternalCleavageSites(self, sequence)Returns the number of internal cleavage sites for this sequence.
Overload:
digestUnmodified(self, sequence, output, ...)Performs the enzymatic digestion of an unmodified sequence
getEnzymeName(self)Returns the enzyme for the digestion
getMissedCleavages(self)Returns the max.
getSpecificity(self)Returns the specificity for the digestion
getSpecificityByName(self, name)Returns the specificity by name.
Overload:
peptideCount(self, protein)Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings
Overload:
setMissedCleavages(self, missed_cleavages)Sets the max.
setSpecificity(self, spec)Sets the specificity for the digestion (default is SPEC_FULL)
- countInternalCleavageSites(self, sequence: bytes | str | String) int#
Returns the number of internal cleavage sites for this sequence.
- digest()#
Overload:
- digest(self, protein: AASequence, output: List[AASequence]) int
Overload:
- digest(self, protein: AASequence, output: List[AASequence], min_length: int, max_length: int) int
Performs the enzymatic digestion of a protein.
- Parameters:
protein – Sequence to digest
output – Digestion products (peptides)
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)
- Returns:
Number of discarded digestion products (which are not matching length restrictions)
- digestUnmodified(self, sequence: StringView, output: List[StringView], min_length: int, max_length: int) int#
Performs the enzymatic digestion of an unmodified sequence
By returning only references into the original string this is very fast
- Parameters:
sequence – Sequence to digest
output – Digestion products
min_length – Minimal length of reported products
max_length – Maximal length of reported products (0 = no restriction)
- Returns:
Number of discarded digestion products (which are not matching length restrictions)
- getMissedCleavages(self) int#
Returns the max. number of allowed missed cleavages for the digestion
- getSpecificity(self) int#
Returns the specificity for the digestion
- getSpecificityByName(self, name: bytes | str | String) int#
Returns the specificity by name. Returns SPEC_UNKNOWN if name is not valid
- isValidProduct()#
Overload:
- isValidProduct(self, protein: AASequence, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool
Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage
Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here
- Parameters:
protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s
allow_nterm_protein_cleavage – Regard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with ‘M’
allow_random_asp_pro_cleavage – Allow cleavage at D|P sites to count as n/c-terminal
- Returns:
True if peptide has correct n/c terminals (according to enzyme, specificity and above flags)
Overload:
- isValidProduct(self, protein: bytes | str | String, pep_pos: int, pep_length: int, ignore_missed_cleavages: bool, methionine_cleavage: bool) bool
Forwards to isValidProduct using protein.toUnmodifiedString()
Overload:
- isValidProduct(self, sequence: bytes | str | String, pos: int, length: int, ignore_missed_cleavages: bool) bool
Boolean operator returns true if the peptide fragment starting at position pos with length length within the sequence sequence generated by the current enzyme
Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the MC flag provided here
- Parameters:
protein – Protein sequence
pep_pos – Starting index of potential peptide
pep_length – Length of potential peptide
ignore_missed_cleavages – Do not compare MC’s of potential peptide to the maximum allowed MC’s
- Returns:
True if peptide has correct n/c terminals (according to enzyme, specificity and missed cleavages)
- peptideCount(self, protein: AASequence) int#
Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings
- setEnzyme()#
Overload:
- setEnzyme(self, name: bytes | str | String) None
Sets the enzyme for the digestion (by name)
Overload:
- setEnzyme(self, enzyme: DigestionEnzyme) None
Sets the enzyme for the digestion
- setMissedCleavages(self, missed_cleavages: int) None#
Sets the max. number of allowed missed cleavages for the digestion (default is 0). This setting is ignored when log model is used
- setSpecificity(self, spec: int) None#
Sets the specificity for the digestion (default is SPEC_FULL)