Search your Name in the SwissProt Database

     This bio-recipe shows two things: (1) how to search for a given name (or for that matter, a very short fragment) in the peptide sequences of the entire SwissProt database, (2) how to align a given name to all peptide sequences of SwissProt.

     The name is first converted into uppercase letters, because peptide sequences are stored in uppercase. Then all the letters that do not correspond to a one-letter-code of an amino acid are replaced. The letter 'O' is replaced by 'Q', and all other letters are replaced by an 'X'.

     After the replacement, the program first looks for an exact match of the name in the database. If this exact search is successful, the sequence is shown. Next, the name is aligned to every sequence in the SwissProt database using the local alignment method. The sequence and the score of the best alignment are stored.

Define location of SwissProt database, and create a Dayhoff matrix suitable for this matches (not too distant).

SwissProt := '~cbrg/DB/SwissProt.Z':
DM := DayMatrix(30);
Reading 169638448 characters from file /home/cbrg/DB/SwissProt.Z
Pre-processing input (peptides)
163235 sequences within 163235 entries considered
Peptide file(/home/cbrg/DB/SwissProt.Z(169638448), 163235 entries, 59631787
DM := DayMatrix(Peptide, pam=30, Sim: max=18.263, min=-19.062,

Load the SwissProt database, and assign the names

Name := 'Peter';
Name := 'Gaston';
Name := 'Widmayer';
Name := Peter
Name := Gaston
Name := Widmayer

Convert name into uppercase letter, replace all letters that are not one-letter-codes of amino acids

Name := uppercase(Name):
for i to length(Name) do
    if Name[i]='O' then Name[i] := 'Q'
    elif AToInt(Name[i]) = 0 then Name[i] := 'X'
    fi od;

Look for an exact match


Local alignment of name with all sequences in SwissProt. Print each alignment that is a new maximum.

BestAl := Align( Name, Sequence(Entry(1)) );
for s in Sequences() do
    a := Align( Name, s );
    if a[Score] > BestAl[Score] then BestAl := a; print(a) fi
BestAl := Alignment('MAY',Sequence(AC('P15711'))[102..104],38.1052,DM,0,0,{Local})
lengths=4,4 simil=46.4, PAM_dist=30, identity=75.0%
ID=1A11_ORYSA   AC=Q07215;   DE=1-aminocyclopropane-1-carboxylate synthase 1 (EC (ACC synthase 1) (S-adenosyl-L-methionine methylthioadenosine-lyase
1).   OS=Oryza sativa (Rice).   OC=Eukaryota; Viridiplantae; Streptophyta;
Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; Liliopsida; Poales;
Poaceae; Ehrhartoideae; Oryzeae; Oryza.   KW=Ethylene biosynthesis; Fruit
ripening; Lyase; Multigene family; Pyridoxal phosphate.
lengths=5,5 simil=46.7, PAM_dist=30, identity=80.0%
ID=3BHD_HORSE   AC=O46516; P79437;   DE=3 beta-hydroxysteroid dehydrogenase/
delta 5-->4-is ..(295).. somerase)].   OS=Equus caballus (Horse).  
OC=Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
Eutheria; Perissodactyla; Equidae; Equus.   KW=Endoplasmic reticulum; Isomerase;
Mitochondrion; Multifunctional enzyme; NAD; Oxidoreductase; Steroidogenesis;
lengths=7,7 simil=50.7, PAM_dist=30, identity=57.1%
ID=3SHD_NEUCR   AC=P07046; Q7RVA2;   DE=3-dehydroshikimate dehydratase (EC
4.2.1.-) (DHS dehydratase) (DHSase).   OS=Neurospora crassa.   OC=Eukaryota;
Fungi; Ascomycota; Pezizomycotina; Sordariomycetes; Sordariomycetidae;
Sordariales; Sordariaceae; Neurospora.   KW=Lyase; Quinate metabolism.
lengths=6,6 simil=51.8, PAM_dist=30, identity=66.7%
ID=AATM_YEAST   AC=Q01802;   DE=Aspartate aminotransferase, mitochondrial
precursor (EC (Transaminase A).   OS=Saccharomyces cerevisiae (Baker's
yeast).   OC=Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;
Saccharomycetales; Saccharomycetaceae; Saccharomyces.   KW=Aminotransferase;
Mitochondrion; Pyridoxal phosphate; Transferase; Transit peptide.
lengths=4,4 simil=55.2, PAM_dist=30, identity=100.0%
ID=AGAL_RHIME   AC=Q9X4Y0;   DE=Alpha-galactosidase (EC (Melibiase).  
OS=Rhizobium meliloti (Sinorhizobium meliloti).   OC=Bacteria; Proteobacteria;
Alphaproteobacteria; Rhizobiales; Rhizobiaceae; Sinorhizobium/Ensifer group;
Sinorhizobium.   KW=Complete proteome; Glycosidase; Hydrolase; Magnesium; NAD;
lengths=7,7 simil=56.2, PAM_dist=30, identity=57.1%
ID=AROK_HELHP   AC=Q7VIH7;   DE=Shikimate kinase (EC (SK).  
OS=Helicobacter hepaticus.   OC=Bacteria; Proteobacteria; Epsilonproteobacteria;
Campylobacterales; Helicobacteraceae; Helicobacter.   KW=Aromatic amino acid
biosynthesis; ATP-binding; Complete proteome; Kinase; Transferase.
lengths=6,6 simil=58.2, PAM_dist=30, identity=66.7%
ID=AROK_METTH   AC=O26896;   DE=Shikimate kinase (EC (SK).  
OS=Methanobacterium thermoautotrophicum.   OC=Archaea; Euryarchaeota;
Methanobacteria; Methanobacteriales; Methanobacteriaceae; Methanothermobacter.  
KW=Aromatic amino acid biosynthesis; ATP-binding; Complete proteome; Kinase;
lengths=5,5 simil=60.5, PAM_dist=30, identity=100.0%
ID=CAHC_SPIOL   AC=P16016;   DE=Carbonic anhydrase, chloroplast precursor (EC (Carbonate dehydratase).   OS=Spinacia oleracea (Spinach).  
OC=Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; Caryophyllales;
Amaranthaceae; Spinacia.   KW=Chloroplast; Direct protein sequencing; Lyase;
Transit peptide; Zinc.
lengths=5,5 simil=64.8, PAM_dist=30, identity=100.0%
ID=DP3A_SYNY3   AC=P74750; P73215;   DE=DNA polymerase III alpha subunit (EC [Contains: Ssp dnaE intein].   OS=Synechocystis sp. (strain PCC 6803). 
OC=Bacteria; Cyanobacteria; Chroococcales; Synechocystis.   KW=Autocatalytic
cleavage; Complete proteome; DNA replication; DNA-directed DNA polymerase;
Protein splicing; Transferase.
bytes alloc=34000000, time=24.130
lengths=8,8 simil=67.4, PAM_dist=30, identity=75.0%
ID=HUTU_AGRT5   AC=Q8U8Z9;   DE=Urocanate hydratase (EC (Urocanase)
(Imidazolonepropionate hydrolase).   OS=Agrobacterium tumefaciens (strain C58 /
ATCC 33970).   OC=Bacteria; Proteobacteria; Alphaproteobacteria; Rhizobiales;
Rhizobiaceae; Rhizobium/Agrobacterium group; Agrobacterium.   KW=Complete
proteome; Histidine metabolism; Lyase; NAD.
lengths=8,8 simil=67.5, PAM_dist=30, identity=75.0%
ID=HUTU_BACSU   AC=P25503;   DE=Urocanate hydratase (EC (Urocanase)
(Imidazolonepropionate hydrolase).   OS=Bacillus subtilis.   OC=Bacteria;
Firmicutes; Bacillales; Bacillaceae; Bacillus.   KW=Complete proteome; Histidine
metabolism; Lyase; NAD.
lengths=6,6 simil=71.4, PAM_dist=30, identity=100.0%
ID=LEU1_CAMJE   AC=Q9PLV9;   DE=2-isopropylmalate synthase (EC (Alpha-
isopropylmalate synthase) (Alpha-IPM synthetase).   OS=Campylobacter jejuni.  
OC=Bacteria; Proteobacteria; Epsilonproteobacteria; Campylobacterales;
Campylobacteraceae; Campylobacter.   KW=Complete proteome; Leucine biosynthesis;
bytes alloc=34400000, time=46.080
bytes alloc=34400000, time=65.180

© 2005 by Gaston Gonnet, Informatik, ETH Zurich

Index of bio-recipes

Last updated on Fri Sep 2 16:52:15 2005 by GhG

!!! Dieses Dokument stammt aus dem ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the ETH Web archive and is no longer maintained !!!