pidgin.ml 4.83 KB
Newer Older
huet's avatar
huet committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
(**************************************************************************)
(*                                                                        *)
(*                 The Zen Computational Linguistics Toolkit              *)
(*                                                                        *)
(*                              Gérard Huet                               *)
(*                                                                        *)
(* ©2002 Institut National de Recherche en Informatique et en Automatique *)
(**************************************************************************)

(*i Fake ml source file, used for printing ML code with Ocamlweb look i*)

(*
The core language has types, values, and exceptions. 
Thus, [1] is a value of predefined type [int], whereas 
["CL"] is a [string]. 
Pairs of values inhabit the corresponding product type. Thus:
[(1,"CL") : (int * string)]. 
Recursive type declarations create new types,
whose values are inductively built from the associated constructors. 
Thus the Boolean type could be declared
as a sum by: *)
type bool = [True | False];

(* Parametric types give rise to polymorphism.
Thus if [x] is of type [t] and [l] is of type 
[(list t)], we construct the list adding [x] to [l]
as [[x :: l]]. The empty list is [[]], of (polymorphic) type
[(list 'a)]. Although the language is strongly typed, explicit type
specification is rarely needed from the designer, since principal types
may be inferred mechanically.

The language is functional in the sense that functions are first class
objects. Thus the doubling integer function may be written as
[fun x -> x+x], and it has type [int -> int]. It may be associated
to the name [double] by declaring: *)
value double = fun x -> x+x;

(* Equivalently we could write: *)
value double x = x+x;

(* Its application to value [n]
is written as [(double n)] or even [double n] when there is no
ambiguity. Application associates to the left, and thus [f x y] stands
for [((f x) y)]. 
Recursive functional values are declared with the keyword [rec].
Thus we may define factorial as: *)

value rec fact n = if n=0 then 1 else n*(fact (n-1));

(* Functions may be defined by pattern matching. Thus the first projection of
pairs could be defined by: *)
value fst = fun [ (x,y) -> x ];

(* or equivalently (since there is only one pattern in this case) by: *)
value fst (x,y) = x;

(* Pattern-matching is also usable in [match] expressions which generalize
case analysis,
such as: [match l with [ [] -> True | _ -> False ]], which
tests whether list [l] is empty, using underscore as catch-all
pattern.

Evaluation is strict, which means that [x] is evaluated before
[f] in the evaluation of [(f x)]. The [let] expressions allow the
sharing of sub-computations. Thus
[let x = fact 10 in x+x] will compute [fact 10] first, 
and only once. 
An equivalent postfix [where] notation may be used as well. Thus
the conditional expression [if b then e1 else e2] is equivalent to: *)

(* [ choose b where choose = fun [ True -> e1 | False -> e2]; ] *)

(* Exceptions are declared with the type of their parameters, like in: *)
exception Failure of string;

(* An exceptional value may be raised, like in:
[raise (Failure "div 0")] and handled by a [try] switch on
exception patterns, such as: *)

(* [ try expression with [ Failure s -> ... ]; ] *)

(* Other imperative constructs may be used, such as 
references, mutable arrays, while loops and I/O commands,
but we shall seldom need them.
%If [x] has been declared as a [ref int], it may be assigned
%by an instruction such as [x.val:=0]. 
Sequences of instructions are
evaluated in left to right regime in [do] expressions, such as:
[do {e1; ... en}]. 

ML is a {\sl modular} language,
in the sense that sequences of type, value
and exception declarations may be packed in a structural unit called a
[module], amenable to separate treatment. 
Modules have types themselves, called {\sl signatures}. Parametric
modules are called {\sl functors}. The algorithms presented in this paper
will use in essential ways 
this modularity structure, but the syntax ought to be self-evident.
Finally, comments are enclosed within starred parens like: *)

value s = "This is a string"; (* This is a comment *)

(* Readers not acquainted with programming languages may think of ML
definitions as recursive
equations over inductively defined algebras. Most of them are simple
primitive recursive functionals. The more complex recursions of our
automata coroutines will be shown to be well-founded by 
a combination of lexicographic and multiset orderings.

Pidgin ML definitions may actually be directly executed as 
Objective Caml programs \cite{ocaml},
under the so-called revised syntax \cite{camlp4}. 
The following development may thus 
be used as the reference implementation of a core
computational linguistics platform, dealing with lexical, phonemic and
morphological aspects. 
*)