This bio-recipe shows two things: (1) how to search for a given name (or for that matter, a very short fragment) in the peptide sequences of the entire SwissProt database, (2) how to align a given name to all peptide sequences of SwissProt.
The name is first converted into uppercase letters, because peptide sequences are stored in uppercase. Then all the letters that do not correspond to a one-letter-code of an amino acid are replaced. The letter 'O' is replaced by 'Q', and all other letters are replaced by an 'X'.
After the replacement, the program first looks for an exact match of the name in the database. If this exact search is successful, the sequence is shown. Next, the name is aligned to every sequence in the SwissProt database using the local alignment method. The sequence and the score of the best alignment are stored.
Define location of SwissProt database, and create a Dayhoff matrix suitable for this matches (not too distant).
SwissProt := '~cbrg/DB/SwissProt.Z': ReadDb(SwissProt); DM := DayMatrix(30);
Reading 169638448 characters from file /home/cbrg/DB/SwissProt.Z Pre-processing input (peptides) 163235 sequences within 163235 entries considered Peptide file(/home/cbrg/DB/SwissProt.Z(169638448), 163235 entries, 59631787 aminoacids) DM := DayMatrix(Peptide, pam=30, Sim: max=18.263, min=-19.062, del=-26.659-1.396*(k-1))
Load the SwissProt database, and assign the names
Name := 'Peter'; Name := 'Gaston'; Name := 'Widmayer';
Name := Peter Name := Gaston Name := Widmayer
Convert name into uppercase letter, replace all letters that are not one-letter-codes of amino acids
Name := uppercase(Name): for i to length(Name) do if Name[i]='O' then Name[i] := 'Q' elif AToInt(Name[i]) = 0 then Name[i] := 'X' fi od; Name;
WIDMAYER
Look for an exact match
Sequence(SearchSeqDb(Name));
Local alignment of name with all sequences in SwissProt. Print each alignment that is a new maximum.
BestAl := Align( Name, Sequence(Entry(1)) ); for s in Sequences() do a := Align( Name, s ); if a[Score] > BestAl[Score] then BestAl := a; print(a) fi od:
BestAl := Alignment('MAY',Sequence(AC('P15711'))[102..104],38.1052,DM,0,0,{Local}) lengths=4,4 simil=46.4, PAM_dist=30, identity=75.0% ID=1A11_ORYSA AC=Q07215; DE=1-aminocyclopropane-1-carboxylate synthase 1 (EC 4.4.1.14) (ACC synthase 1) (S-adenosyl-L-methionine methylthioadenosine-lyase 1). OS=Oryza sativa (Rice). OC=Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; Ehrhartoideae; Oryzeae; Oryza. KW=Ethylene biosynthesis; Fruit ripening; Lyase; Multigene family; Pyridoxal phosphate. WIDM |!|| WVDM lengths=5,5 simil=46.7, PAM_dist=30, identity=80.0% ID=3BHD_HORSE AC=O46516; P79437; DE=3 beta-hydroxysteroid dehydrogenase/ delta 5-->4-is ..(295).. somerase)]. OS=Equus caballus (Horse). OC=Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Perissodactyla; Equidae; Equus. KW=Endoplasmic reticulum; Isomerase; Mitochondrion; Multifunctional enzyme; NAD; Oxidoreductase; Steroidogenesis; Transmembrane. DMAYE ||.|| DMGYE lengths=7,7 simil=50.7, PAM_dist=30, identity=57.1% ID=3SHD_NEUCR AC=P07046; Q7RVA2; DE=3-dehydroshikimate dehydratase (EC 4.2.1.-) (DHS dehydratase) (DHSase). OS=Neurospora crassa. OC=Eukaryota; Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; Sordariomycetidae; Sordariales; Sordariaceae; Neurospora. KW=Lyase; Quinate metabolism. WIDMAYE ||::|.| WIELAHE lengths=6,6 simil=51.8, PAM_dist=30, identity=66.7% ID=AATM_YEAST AC=Q01802; DE=Aspartate aminotransferase, mitochondrial precursor (EC 2.6.1.1) (Transaminase A). OS=Saccharomyces cerevisiae (Baker's yeast). OC=Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces. KW=Aminotransferase; Mitochondrion; Pyridoxal phosphate; Transferase; Transit peptide. IDMAYE !||||: VDMAYQ lengths=4,4 simil=55.2, PAM_dist=30, identity=100.0% ID=AGAL_RHIME AC=Q9X4Y0; DE=Alpha-galactosidase (EC 3.2.1.22) (Melibiase). OS=Rhizobium meliloti (Sinorhizobium meliloti). OC=Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Rhizobiaceae; Sinorhizobium/Ensifer group; Sinorhizobium. KW=Complete proteome; Glycosidase; Hydrolase; Magnesium; NAD; Plasmid. WIDM |||| WIDM lengths=7,7 simil=56.2, PAM_dist=30, identity=57.1% ID=AROK_HELHP AC=Q7VIH7; DE=Shikimate kinase (EC 2.7.1.71) (SK). OS=Helicobacter hepaticus. OC=Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales; Helicobacteraceae; Helicobacter. KW=Aromatic amino acid biosynthesis; ATP-binding; Complete proteome; Kinase; Transferase. WIDMAYE |.||.!| WLDMSFE lengths=6,6 simil=58.2, PAM_dist=30, identity=66.7% ID=AROK_METTH AC=O26896; DE=Shikimate kinase (EC 2.7.1.71) (SK). OS=Methanobacterium thermoautotrophicum. OC=Archaea; Euryarchaeota; Methanobacteria; Methanobacteriales; Methanobacteriaceae; Methanothermobacter. KW=Aromatic amino acid biosynthesis; ATP-binding; Complete proteome; Kinase; Transferase. WIDMAY |!|||! WVDMAF lengths=5,5 simil=60.5, PAM_dist=30, identity=100.0% ID=CAHC_SPIOL AC=P16016; DE=Carbonic anhydrase, chloroplast precursor (EC 4.2.1.1) (Carbonate dehydratase). OS=Spinacia oleracea (Spinach). OC=Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; Caryophyllales; Amaranthaceae; Spinacia. KW=Chloroplast; Direct protein sequencing; Lyase; Transit peptide; Zinc. DMAYE ||||| DMAYE lengths=5,5 simil=64.8, PAM_dist=30, identity=100.0% ID=DP3A_SYNY3 AC=P74750; P73215; DE=DNA polymerase III alpha subunit (EC 2.7.7.7) [Contains: Ssp dnaE intein]. OS=Synechocystis sp. (strain PCC 6803). OC=Bacteria; Cyanobacteria; Chroococcales; Synechocystis. KW=Autocatalytic cleavage; Complete proteome; DNA replication; DNA-directed DNA polymerase; Protein splicing; Transferase. WIDMA ||||| WIDMA bytes alloc=34000000, time=24.130 lengths=8,8 simil=67.4, PAM_dist=30, identity=75.0% ID=HUTU_AGRT5 AC=Q8U8Z9; DE=Urocanate hydratase (EC 4.2.1.49) (Urocanase) (Imidazolonepropionate hydrolase). OS=Agrobacterium tumefaciens (strain C58 / ATCC 33970). OC=Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales; Rhizobiaceae; Rhizobium/Agrobacterium group; Agrobacterium. KW=Complete proteome; Histidine metabolism; Lyase; NAD. WIDMAYER |.|||.|| WLDMARER lengths=8,8 simil=67.5, PAM_dist=30, identity=75.0% ID=HUTU_BACSU AC=P25503; DE=Urocanate hydratase (EC 4.2.1.49) (Urocanase) (Imidazolonepropionate hydrolase). OS=Bacillus subtilis. OC=Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus. KW=Complete proteome; Histidine metabolism; Lyase; NAD. WIDMAYER |||||.|: WIDMAQEK lengths=6,6 simil=71.4, PAM_dist=30, identity=100.0% ID=LEU1_CAMJE AC=Q9PLV9; DE=2-isopropylmalate synthase (EC 2.3.3.13) (Alpha- isopropylmalate synthase) (Alpha-IPM synthetase). OS=Campylobacter jejuni. OC=Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales; Campylobacteraceae; Campylobacter. KW=Complete proteome; Leucine biosynthesis; Transferase. IDMAYE |||||| IDMAYE bytes alloc=34400000, time=46.080 bytes alloc=34400000, time=65.180
© 2005 by Gaston Gonnet, Informatik, ETH Zurich
Last updated on Fri Sep 2 16:52:15 2005 by GhG