Darwin is an interactive system which executes expressions or programs which are given as input. Expressions are evaluated directly, and the result(s) are printed for each statement.
1+2; 2*3; 3/4; 4-5; 5^6;
3 6 0.7500 -1 15625
(1+2)*3/4^5;
0.00878906
Each Darwin statement is terminated with a semicolon (;). If you forget the semicolon, the system will do nothing until one is read. In some cases, it may result in an invalid input.
1+1nothing happens until we enter the missing (;)
;
2
355/113 1+2;
on line 12, syntax error: 1+2; ^forgetting the semi colon causes an error in the next statement.
Functions are evaluated on the arguments which are enclosed in parenthesis. Darwin knows about most mathematical functions.
sqrt(0); sqrt(1); sqrt(2); sqrt(36);
0 1 1.4142 6
log(1); log10(10); log(2); exp(log(2));
0 1 0.6931 2
max(Pi,exp(1),sqrt(2)); abs(tan(2));
3.1416 2.1850
round(2.499999), round(2.5), round(2.50000001);
2, 2, 3
sin(0), cos(Pi), arctan(1000000000)/Pi;
0, -1, 0.5000
The ditto symbol (") can be used to recall the previous expression evaluated. A double ditto ("") recalls the second back and a triple (""") the third back.
1; 20; 30; ("" + """) / ";
1 20 30 0.7000
Repetition of statements is done with the do ... od construct. Any number of statements or expressions can be placed in the body of the loop. The do part is preceeded by many optional parts, for, from, by, to and while. The following loop calculates an expression with four terms for each value of i from 1 to 5.
for i to 5 do i, i^2, i^3, log(i) od;
1, 1, 1, 0 2, 4, 8, 0.6931 3, 9, 27, 1.0986 4, 16, 64, 1.3863 5, 25, 125, 1.6094
The following loop finds out prime numbers less than 100 (by taking a shortcut and testing only against the possible primes). The function mod computes the rest of the division by its second argument, so by testing that the mod is not zero we are testing for not divisible by the second argument.
for i from 101 by -2 to 70 do if mod(i,3) <> 0 and mod(i,5) <> 0 and mod(i,7) <> 0 then lprint( i, 'is prime') fi od;
101 is prime 97 is prime 89 is prime 83 is prime 79 is prime 73 is prime 71 is prime
The following loop, starts at 36 increasing by 3 while the square of this number is less than 3000. For each value it tests several conditions with an if statement with several branches and prints messages accordingly.
for i from 36 by 3 while i^2 < 3000 do if mod(i,13) = 0 then lprint( i,'is an unlucky number' ) elif mod(i,7) = 0 then lprint( i,'is a lucky number' ) elif mod(i,9) = 0 then lprint( i,'is easy to check for divisibility' ) else lprint( i, 'is a boring number' ) fi od;
36 is easy to check for divisibility 39 is an unlucky number 42 is a lucky number 45 is easy to check for divisibility 48 is a boring number 51 is a boring number 54 is easy to check for divisibility
Variables are like boxes where we can store values. Variables are identified by a name, which has to be composed of letters and numbers or underscores (numbers or underscores cannot be the first character). To store a value in a variable we use the assignment operator (:=). In Darwin, any name can be used for any value without any previous declaration.
a := 1; b := 2;
a := 1 b := 2
x := [1,2,3]; MyFile := 'gonnet.drw'; An_extremely_long_name := 0;
x := [1, 2, 3] MyFile := gonnet.drw An_extremely_long_name := 0
Assigned variables can be used later anywhere a constant can be used.
(10*a+b)^b;
144
Procedures or functions are declared enclosed between the tokens proc and end. The procedure is an object which can be assigned to a variable. Once this is done, the named variable can be used as a function.
p := proc() print('the procedure is being executed') end; p();
p := proc () print(the procedure is being executed) end the procedure is being executed
A function or procedure may have parameters. These are specified in parenthesis after the word proc, and can be used anywhere inside the procedure. The final value evaluated by a procedure is the value returned, and can be used as a function value.
q := proc(a,b,c) (a+b)*c end; q(1,2,3); q(1/2,3,Pi);
q := proc (a, b, c) (a+b)*c end 9 10.9956
In principle, the arguments can contain any types of values.
r := proc(a,b,c) lprint('the first argument is',a); lprint('the second argument is',b); lprint('the third argument is',c); end: r(1/7,100*Pi,'this is a string of text');
the first argument is 0.1429 the second argument is 314.1593 the third argument is this is a string of text
The following procedure does some computation which requires the use of a local variable res. The result is computed in this local variable and it is evaluated at the end of the function so that its value is returned as the value of the function. Since the second argument should be an integer, we force checking its type.
s := proc( a, b:integer ) res := a; for j to b do res := res*(a+j) od; res end: s(1,10); s(10,3);
39916800 17160
s(10,2.5);
Error, s expects a 2nd argument, b:integer, found: 2.5000
The local variables of the function above res, j exist only inside the procedure, and do leave traces once the procedure finished execution.
res, j;
res, j
Any variable which is assigned inside a procedure (or is the variable or a for loop) becomes local. To prevent an assigned variable to be local, it has to be explicitly declared global.
s2 := proc( a, b ) global type_of_error; if b=0 then type_of_error := 'division by 0'; 0 else type_of_error := 'none'; a/b fi; end: s2(1,3); type_of_error;
0.3333 none
s2(1,0); type_of_error;
0 division by 0
Several names have special meaning inside procedures. procname is the name of the function called,args contains all the argumentsnargs is the number of arguments used. This facilitates writing very generic functions.
t := proc() printf('the function %s was called with %d arguments\n', procname, nargs); for i to nargs do printf('argument %d --> %a of type %a\n', i, args[i], type(args[i]) ) od; end: t();
the function t was called with 0 arguments
t(1,Pi,[1,2,3],'abcd');
the function t was called with 4 arguments argument 1 --> 1 of type integer argument 2 --> 3.1416 of type float argument 3 --> [1, 2, 3] of type list argument 4 --> abcd of type string
The following is a classical example to show recursive computation; the computation of factorials.
fact := proc( n:integer ) description 'compute the factorial of a positive integer'; if n < 0 then error('negative argument') elif n < 2 then 1 else fact(n-1)*n fi end: fact(0); fact(10);
1 3628800The following produce errors:
fact(1/2); fact(-1);
Error, fact expects a 1st argument, n:integer, found: 0.5000 Error, (in fact) negative argumentThe description becomes a quick way of identifying the function:
print(fact);
fact: Usage: fact( n:integer ) compute the factorial of a positive integer
The following function returns a list with three numbers. Lists are the same objects as arrays, and matrices are lists of lists.
three := proc( v:numeric ) [ v+1,v+2,v+3 ] end: three(100);
[101, 102, 103]
List of numbers allow normal arithmetic.
10 * three(100) + [0.1,0.11,0.111];
[1010.1000, 1020.1100, 1030.1110]
Vectors multiplied by vectors (of the same length) are interpreted as inner products.
three(1) * three(1000);
9020
Data structures are identical to function calls, except that the function does not evaluate, it remains identical to the call itself.
me := Person( Gaston, 1948, true, [Ariana,Pedro,Julio,Ignacio] );
me := Person(Gaston,1948,true,[Ariana, Pedro, Julio, Ignacio])
For further operations with data structures, please see the bio-recipes on classes (Counters and Poisson). Here we will just perform the most basic operations.
Person := proc( name:string, BirthYear:posint, married:boolean, children:list(string) ) noeval( procname(args) ) end: CompleteClass( Person ); type(me,Person);
true
length(me), me[3];
4, true
me[name], length(me[children]);
Gaston, 4
A comprehensive help system is available. By entering a name after a question mark (at the beginning of a line), the help system is activated:
?Align;
Function Align - align sequences using various modes of dynamic programming Calling Sequence: Align(seq1,seq2,method,DayMat) Parameters: Name Type Description ------------------------------------------------------------------------------ seq1 {ProbSeq,string} pept, nucleot or probabilistic sequence seq2 {ProbSeq,string} pept, nucleot or probabilistic sequence method string the mode of dynamic programming to use DayMat {DayMatrix,list(DayMatrix)} Dayhoff matrices used for alignment Returns: Alignment Synopsis: Align does an alignment of two sequences using the similarity scores given in the DayMat and the given method. If a single DayMatrix is given, the alignment is done using it. If a list of DayMatrix is given, it is understood that the best PAM matrix be used. In this case Align will also compute the PamDistance and PamVariance between the two sequences. The method is optional, if not given it assumes Local. The valid methods are: Local A local alignment will be performed, this means that the best subsequences of seq1 and seq2 will be selected to be aligned. This type of alignment gives the highest possible similarity score of any alignment. This is sometimes called the Smith & Watermann algorithm. Global A global alignment will be performed, this means that the entire seq1 is aligned against the entire seq2. This may result in a negative score if the sequences do not align very well. This is sometimes called the Needleman & Wunsch algorithm. CFE A Cost-Free ends alignment is done. This is like a Global alignment, but deletions of one of the sequences at each of the end are not penalized. In some sense it is between a Local and a Global alignment. Shake A forward-backward alignment is performed. This alignment iterates forward and backwards until the score cannot be increased. In its forward phase will start at the given positions for seq1 and seq2 and find the ends which give a maximal score. From this end, it will perform backwards dynamic programming to find the optimal beginning, and so on until convergence. This type of alignment is quite similar to a Local alignment, but can be directed to focus on a particular alignment, even though it may not be the best of the two sequences. If the DayMat is omitted, the global variable DM (if assigned a DayMatrix) is used, else a PAM-250 matrix is constructed. If in addition to the method, the keyword "NoSelf" is included, when sequences of peptides or nucleotides are aligned (excluding ProbSeq), self-matches are not allowed. That is, if a sequence is aligned to itself (being structurally the same string, this we call self-alignment), the self-match (which is trivial) will not be allowed. This is done by giving the alignment of a position with itself a large penalty. By doing this it is possible to find repeated patterns. I.e. an alignment with itself, where the identity is ruled out, will show any repeated patterns. In particular if the sequences align with an offset of k, then there is a k-long motif which is repeated in the sequence. The method to find the approximate PamDistance and variance may not find the global maximum of the Score, it may find a local maximum. By using the argument "ApproxPAM=ppp", the search for the maximum will be started at PAM distance ppp. This may help when we know an approximation of the distance, or may provide a way of exploring the existence of other local maxima. Examples: > Align(AC(P00083),AC(P00091)); Alignment(Sequence(AC('P00083'))[14..92],Sequence(AC('P00091'))[19..97],177.7799,DM,0,0,{Local}) > Align(Entry(1),Entry(2),Local,DMS); Alignment(Sequence(AC('P15711'))[905..917],Sequence(AC('Q43495'))[13..25],45.1050,DMS[346],80,1153.8025,{Local}) > Align(AC(P13475),AC(P13475),Local,DMS,NoSelf); Alignment(Sequence(AC('P13475'))[128..178],Sequence(AC('P13475'))[137..188],279.9088,DMS[308],42.1286,98.4150,{Local,NoSelf}) See Also: ?Alignment ?CodonAlign ?DynProgStrings ?MAlign ?CalculateScore ?DynProgScore ?EstimatePam ------------
The help system uses approximate string matching to find appropriate topics:
?geriatic code
GGG G Gly AGG R Arg CGG R Arg UGG W Trp GGA G Gly AGA R Arg CGA R Arg UGA Stop GGC G Gly AGC S Ser CGC R Arg UGC C Cys GGU G Gly AGU S Ser CGU R Arg UGU C Cys GAG E Glu AAG K Lys CAG Q Gln UAG Stop GAA E Glu AAA K Lys CAA Q Gln UAA Stop GAC D Asp AAC N Asn CAC H His UAC Y Tyr GAU D Asp AAU N Asn CAU H His UAU Y Tyr GCG A Ala ACG T Thr CCG P Pro UCG S Ser GCA A Ala ACA T Thr CCA P Pro UCA S Ser GCC A Ala ACC T Thr CCC P Pro UCC S Ser GCU A Ala ACU T Thr CCU P Pro UCU S Ser GUG V Val AUG M Met CUG L Leu UUG L Leu GUA V Val AUA I Ile CUA L Leu UUA L Leu GUC V Val AUC I Ile CUC L Leu UUC F Phe GUU V Val AUU I Ile CUU L Leu UUU F Phe See Also: ?AltGenCode ?BaseToInt ?CIntToAmino ?CodonToInt ?IntToBBB ?AminoToInt ?BBBToInt ?CIntToCodon ?Complement ?IntToCInt ?antiparallel ?BToInt ?CIntToInt ?GeneticCode ?IntToCodon ?AToCInt ?CIntToA ?CodonToA ?IntToB ?Reverse ?AToCodon ?CIntToAAA ?CodonToCInt ?IntToBase ------------
Next we will align two sequences of amino acids.
s1 := 'NMTTSRQLLFTFFFTTTFFFFFFQARGLPCSPTWC'; s2 := 'NQLLFTFFTTTFFFFQAKGLRSAS'; a := Align(s1,s2):
s1 := NMTTSRQLLFTFFFTTTFFFFFFQARGLPCSPTWC Warning: procedure s2 reassigned s2 := NQLLFTFFTTTFFFFQAKGLRSAS
The system warns the user that the variable s2 had been previously assigned to a procedure and now it is reassigned (and hence the procedure becomes inaccessible). We have terminated the last line with a colon (:). This makes the system not echo the value of the line. In this case we want to print the alignment, rather than to see the structure holding it.
print(a);
lengths=27,24 simil=61.9, PAM_dist=250, identity=63.0% RQLLFTFFFTTTFFFFFFQARGLPCSP :|||||||.|| ||||||!||.::: NQLLFTFFTTT___FFFFQAKGLRSAS
For further details on alignments and using a database please see the bio-recipe on Alignments.
© 2011 by Gaston Gonnet, Informatik, ETH Zurich
Last updated on Mon Oct 3 17:23:30 2011 by GhG
!!! This document is stored in the ETH Web archive and is no longer maintained !!!