Lecture 2 goes live

parent 4787aea3
......@@ -161,7 +161,7 @@ The deadline is **Friday, February 16, 2018**.
* [Guidelines](agda/Index.lagda.rst)
* [Effectful functional programming](slides/pedagand-01.pdf) ([Source](agda/01-effectful/Monad.lagda.rst)).
* Dependent functional programming.
* [Dependent functional programming](slides/pedagand-02.pdf) ([Source](agda/02-dependent/Indexed.lagda.rst), [McCompiler.v](coq/McCompiler.v)).
* Total functional programming.
* Generic functional programming.
* Open problems in dependent functional programming.
......
..
::
{-# OPTIONS --allow-unsolved-metas --type-in-type #-}
open import Level renaming (zero to zeroℓ ; suc to sucℓ)
open import Data.Unit hiding (_≤_)
open import Data.Bool
open import Data.Nat hiding (_*_ ; _≤_)
open import Data.Maybe
open import Data.Product
open import Data.List hiding (_++_)
open import Data.String
open import Function hiding (id ; const)
open import Relation.Binary.PropositionalEquality
module 02-dependent.Indexed where
================================================================
Indexed functional programming
================================================================
Last week:
- monadic programming: "how to survive without (some) effects"
- took advantage of dependent types for proofs (tactics, anyone?)
- but wrote **simply**-typed programs (mostly)
Today, we shall:
- write dependently-typed program: types will dependent on values
- exploit inductive families to encode our invariants ("syntax")
- take advantage of this precision to host these domain-specific languages ("semantics")
The vision: `The Proof Assistant as an Integrated Development Environment`_
..
::
showNat : ℕ → String
showNat zero = "0"
showNat (suc n) = "S(" ++ showNat n ++ ")"
************************************************
Typing ``sprintf``
************************************************
..
::
module Format where
open import Data.Char
open import Function
Our introductory example is a Classic, introduced by Lennart
Augustsson in his seminal paper on `Cayenne`_. In plain ML (*ie.*
without GADTs), the ``sprintf`` function cannot be given an ML type:
the value of its arguments depending on the user-provided format.
.. code-block:: guess
sprintf "foo %d" : ℕ → String
sprintf "bar %s" : String → String
sprintf "baz %d %s" : ℕ → String → String
Formats are not random strings of characters:
- structure = syntax (`format`)
- from string to structure = parser
::
data format : Set where
digit : (k : format) → format
string : (k : format) → format
symb : (c : Char)(k : format) → format
end : format
parse : List Char → format
parse ('%' ∷ 'd' ∷ cs ) = digit (parse cs)
parse ('%' ∷ 's' ∷ cs ) = string (parse cs)
parse ('%' ∷ c ∷ cs ) = symb c (parse cs)
parse ( c ∷ cs) = symb c (parse cs)
parse [] = end
We can *embed* the semantics of a format by describing its meaning
within Agda itself::
⟦_⟧ : format → Set
⟦ digit k ⟧ = ℕ → ⟦ k ⟧
⟦ string k ⟧ = String → ⟦ k ⟧
⟦ symb c k ⟧ = ⟦ k ⟧
⟦ end ⟧ = String
⟦p_⟧ : String → Set
⟦p_⟧ = ⟦_⟧ ∘ parse ∘ toList
And we can easily realize this semantics::
eval : (fmt : format) → String → ⟦ fmt ⟧
eval (digit k) acc = λ n → eval k (acc ++ showNat n)
eval (string k) acc = λ s → eval k (acc ++ s)
eval (symb c k) acc = eval k (acc ++ fromList (c ∷ []))
eval end acc = acc
sprintf : (fmt : String) → ⟦p fmt ⟧
sprintf fmt = eval (parse (toList fmt)) ""
``sprintf`` can thus be seen as an interpreter for a small language
(whose AST is described by ``format``) to the semantic domain
described by ``⟦_⟧``. And it works::
test : sprintf "test %%d & %%s: %d & %s" 2 "hello world!"
≡ "test %d & %s: S(S(0)) & hello world!"
test = refl
**Exercise (difficulty: 1)** Print ``%d`` using decimal numbers instead of Peano numbers
**Exercise (difficulty: 3)** Add support for the format ``%.ns`` where
`n` is a (decimal) number representing the maximum size of the prefix
of the string ``s`` to be printed. Careful, the formats ``%.s`` or
``%.cs`` are non-sensical: this should impact either parsing or
interpretation.
************************************************
Mostly correct-by-construction compiler
************************************************
..
::
module Compiler where
infixr 5 _∙_
infixr 5 _#_
Using dependent types (in particular, inductive families), we can bake
our invariants in the datatypes we manipulate and make sure that they
are preserved as we process them. The advantage is twofold:
- build *initial* models of a domain of interest (syntax!)
- total denotational semantics in Agda itself (interpreter!)
We illustrate this approach with another Classic, a draft of James
McKinna & Joel Wright entitled `A type-correct, stack-safe, provably
correct, expression compiler`_. As suggested by the title, we are
going to implement a correct-by-construction compiler from a language
of expressions to a stack machine.
--------------------------------
Type-safe representation
--------------------------------
Because Agda's type system is extremely rich, we can in fact *absorb*
the type discipline of expressions in Agda. In programming terms, we
define a datatype ``exp`` that represents only well-typed expressions::
data typ : Set where
nat bool : typ
sem : typ → Set
sem nat = ℕ
sem bool = Bool
data exp : typ → Set where
val : ∀ {T} → (v : sem T) → exp T
plus : (e₁ e₂ : exp nat) → exp nat
ifte : ∀ {T} → (c : exp bool)(e₁ e₂ : exp T) → exp T
We define the semantics of this language by interpretation within
Agda::
eval : ∀ {T} → exp T → sem T
eval (val v) = v
eval (plus e₁ e₂) = eval e₁ + eval e₂
eval (ifte c e₁ e₂) = if eval c then eval e₁ else eval e₂
If we were pedantic, we would call this a *denotational*
semantics.
Note that we crucially rely on the fact that ``sem`` computes at the
type level to ensure that, for example, the ``if_then_else_`` is
performed on a Boolean and not a natural number. This is called a
*tagless* interpreter. In a non-dependent setting, values would have
carried a tag (discriminating them based on their type) and the
evaluator would have to deal with type errors dynamically::
module Tagged where
data value : Set where
isNat : (n : ℕ) → value
isBool : (b : Bool) → value
data exp' : Set where
val : (v : value) → exp'
plus : (e₁ e₂ : exp') → exp'
ifte : (c e₁ e₂ : exp') → exp'
eval' : exp' → Maybe value
eval' (val v) = just v
eval' (plus e₁ e₂)
with eval' e₁ | eval' e₂
... | just (isNat n₁)
| just (isNat n₂) = just (isNat (n₁ + n₂))
... | _ | _ = nothing
eval' (ifte c e₁ e₂)
with eval' c | eval' e₁ | eval' e₂
... | just (isBool b) | v₁ | v₂ = if b then v₁ else v₂
... | _ | _ | _ = nothing
**Exercise (difficulty: 1)** The above implementation is needlessly
verbose, use the Maybe monad to hide the treatment of errors.
The moral of this implementation is that we failed to encode our
invariant in the datatype ``exp'`` and paid the price in the
implementation of ``eval'``.
--------------------------------
Stack machine
--------------------------------
Our stack machine will interpret a fixed set of opcodes, transforming
input stack to output stack. A stack will contain values,
ie. Booleans or integers. We can therefore describe well-typed stacks
by identifying the type of each elements::
stack-typ = List typ
data stack : stack-typ → Set where
ε : stack []
_∙_ : ∀ {T σ} → sem T → stack σ → stack (T ∷ σ)
In particular, a non-empty stack allows us to peek at the top element
and to take its tail::
top : ∀ {T σ} → stack (T ∷ σ) → sem T
top (t ∙ _) = t
tail : ∀ {T σ} → stack (T ∷ σ) → stack σ
tail (_ ∙ s) = s
Using an inductive family, we can once again garantee that
instructions are only applied onto well-formed and well-typed stacks::
data code : stack-typ → stack-typ → Set where
skip : ∀ {σ} → code σ σ
_#_ : ∀ {σ₁ σ₂ σ₃} → (c₁ : code σ₁ σ₂)(c₂ : code σ₂ σ₃) → code σ₁ σ₃
PUSH : ∀ {T σ} → (v : sem T) → code σ (T ∷ σ)
ADD : ∀ {σ} → code (nat ∷ nat ∷ σ) (nat ∷ σ)
IFTE : ∀ {σ₁ σ₂} → (c₁ c₂ : code σ₁ σ₂) → code (bool ∷ σ₁) σ₂
As a result, we can implement a (total) interpreter for our stack
machine::
exec : ∀ {σ-i σ-f} → code σ-i σ-f → stack σ-i → stack σ-f
exec skip s = s
exec (c₁ # c₂) s = exec c₂ (exec c₁ s)
exec (PUSH v) s = v ∙ s
exec ADD (x₁ ∙ x₂ ∙ s) = x₁ + x₂ ∙ s
exec (IFTE c₁ c₂) (true ∙ s) = exec c₁ s
exec (IFTE c₁ c₂) (false ∙ s) = exec c₂ s
**Exercise (difficulty: 1)** Implement a simply-typed version of
``code`` and update ``exec`` to work (partially) from list of tagged
values to list of tagged values.
--------------------------------
Compilation
--------------------------------
The compiler from expressions to stack machine code is then
straightforward, the types making sure that we cannot generate
non-sensical opcodes::
compile : ∀ {T σ} → exp T → code σ (T ∷ σ)
compile (val v) = PUSH v
compile (plus e₁ e₂) = compile e₂ # compile e₁ # ADD
compile (ifte c e₁ e₂) = compile c # IFTE (compile e₁) (compile e₂)
**Exercise (difficulty: 1)** Implement the (same) compiler on the
simply-typed representation of expressions ``exp'``.
Note that this does not guarantee that we preserve the semantics!
**Exercise (difficulty: 4)** We could address that remark by indexing
expressions (``exp``) not only by their type but also by their
denotation (a natural number):
.. code-block:: guess
expSem : (T : typ) → ⟦ T ⟧ → Set
Similarly, the stack machine opcodes could be indexed by their
denotation (a stack):
.. code-block:: guess
codeSem : (σ : stack-typ) → stack σ → Set
As a result, a type-safe ``compile`` function from ``expSem`` to
``codeSem`` could ensure semantics-preservation by
construction. Implement these source and target languages and the
correct-by-construction compiler.
--------------------------------
Correctness
--------------------------------
The correctness proof amounts to showing that the interpreter for
expressions agrees with the result of executing the stack
machine. Having baked the typing discipline in our input expressions
and output machine codes, we can focus on proving only the meaningful
cases::
correctness : forall {T σ} → (e : exp T)(s : stack σ) → exec (compile e) s ≡ eval e ∙ s
correctness (val v) s = refl
correctness (plus e₁ e₂) s
rewrite correctness e₂ s
| correctness e₁ (eval e₂ ∙ s) = refl
correctness (ifte c e₁ e₂) s
rewrite correctness c s
with eval c
... | true rewrite correctness e₁ s = refl
... | false rewrite correctness e₂ s = refl
**Exercise (difficulty: 2)** Prove the same theorem one the
simply-typed implementations. You may prefer to work in Coq, so as to
take advantage of tactics to automate the tedium.
This exercise has its roots in the very origin of most programming and
reasoning techniques we take for granted today:
- the role of initiality in formal reasoning
- the importance of equational reasoning for proving program correctness
These ideas were, for examples, in their inception at the first
edition of POPL with `Advice on structuring compilers and proving them
correct`_ (1973), which was further refined by `More on advice on
structuring compilers and proving them correct`_, (1980). This
reflects the influence it had on a generation of computer scientists
interested in language design on one hand (they gave us algebraic
datatypes) and verified compilation on the other hand (they gave us
denotational models).
************************************************
Computing normal forms of λ-terms
************************************************
In Lecture 1, we have seen that, by finding a suitable semantics
domain, we could auto-magically compute normal forms for monadic
programs. Could we do the same for the whole (effect-free) λ-calculus?
..
::
module STLC where
infix 35 _∈_
infixl 40 _▹_
infixr 50 _⇒_
infixr 55 _*_
infix 60 _!_
--------------------------------
Types and terms
--------------------------------
We consider the simply-typed λ-calculus, whose grammar of types and
contexts is as follows::
data type : Set where
unit : type
_⇒_ _*_ : (S T : type) → type
data context : Set where
ε : context
_▹_ : (Γ : context)(T : type) → context
Thanks to inductive families, we can represent *exactly* the
well-scoped and well-typed λ-terms::
data _∈_ (T : type) : context → Set where
here : ∀ {Γ} → T ∈ Γ ▹ T
there : ∀{Γ T'} → (h : T ∈ Γ) → T ∈ Γ ▹ T'
data _⊢_ (Γ : context) : type → Set where
lam : ∀{S T} →
(b : Γ ▹ S ⊢ T) →
---------------
Γ ⊢ S ⇒ T
var : ∀{T} →
(v : T ∈ Γ) →
-----------
Γ ⊢ T
_!_ : ∀{S T} →
(f : Γ ⊢ S ⇒ T)(s : Γ ⊢ S) →
--------------------------
Γ ⊢ T
tt :
--------
Γ ⊢ unit
pair : ∀{A B} →
(a : Γ ⊢ A)(b : Γ ⊢ B) →
----------------------
Γ ⊢ A * B
fst : ∀{A B} →
Γ ⊢ A * B →
---------
Γ ⊢ A
snd : ∀{A B} →
Γ ⊢ A * B →
---------
Γ ⊢ B
This representation of λ-terms is folklore amongst programmers of the
dependent kind. A comprehensive discussion of its pros and cons can be
found in the pedagogical `Strongly Typed Term Representations in
Coq`_.
-------------------------------------
Interlude: substitution, structurally
-------------------------------------
Substitution for de Bruijn λ-terms is usually (offhandedly) specified
in the following manner:
.. code-block:: guess
n [σ] = σ(n)
(M N)[σ] = M[σ] N[σ]
(λ M)[σ] = λ (M[0 · (σ ∘ λ n. suc n)])
σ ∘ ρ = λ n. (σ n)[ρ]
However, this definition contains a fair amount of mutual recursion,
whose validity is not obvious and will be a hard sell to a termination
checker. Let us exhibit this structure and, at the same time, exercise
ourselves in the art of unearthing initial models.
..
::
module Exercise-mono where
open import Data.Fin
**Exercise (difficulty: 2)** In Agda, the type of finite sets of
cardinality ``n`` is defined by an inductive family:
.. code-block:: guess
data Fin : ℕ → Set where
zero : {n : ℕ} → Fin (suc n)
suc : {n : ℕ} (i : Fin n) → Fin (suc n)
We are interested in **monotone** functions from ``Fin n`` to ``Fin
m``. We could obviously formalize this class of functions as "any
function from ``Fin n`` to ``Fin m`` as long as it is monotone"
however a more *intentional* characterization can be given by means of
an inductive family::
data _⊇_ : (m : ℕ)(n : ℕ) → Set where
-- COMPLETE
Intuitively, this datatype provides a grammar of monotone functions,
which we can then interpret back into actual (monotone) functions::
⟦_⟧ : ∀ {m n} → m ⊇ n → Fin n → Fin m
⟦ wk ⟧ k = {!!}
lemma-valid : ∀{m n k l} → (wk : m ⊇ n) → k ≤ l → ⟦ wk ⟧ k ≤ ⟦ wk ⟧ l
lemma-valid = {!!}
We can adapt this intentional characterization of monotone functions
to typed embeddings::
data _⊇_ : context → context → Set where
id : ∀ {Γ} → Γ ⊇ Γ
weak1 : ∀ {Γ Δ A} → (wk : Δ ⊇ Γ) → Δ ▹ A ⊇ Γ
weak2 : ∀ {Γ Δ A} → (wk : Δ ⊇ Γ) → Δ ▹ A ⊇ Γ ▹ A
shift : ∀ {Γ Δ T} → Γ ⊇ Δ → T ∈ Δ → T ∈ Γ
shift id v = v
shift (weak1 wk) v = there (shift wk v)
shift (weak2 wk) here = here
shift (weak2 wk) (there v) = there (shift wk v)
rename : ∀ {Γ Δ T} → Γ ⊇ Δ → Δ ⊢ T → Γ ⊢ T
rename wk (lam t) = lam (rename (weak2 wk) t)
rename wk (var v) = var (shift wk v)
rename wk (f ! s) = rename wk f ! rename wk s
rename wk tt = tt
rename wk (pair a b) = pair (rename wk a) (rename wk b)
rename wk (fst p) = fst (rename wk p)
rename wk (snd p) = snd (rename wk p)
sub : ∀ {Γ Δ T} → Γ ⊢ T → (∀ {S} → S ∈ Γ → Δ ⊢ S) → Δ ⊢ T
sub (lam t) ρ = lam (sub t (λ { here → var here ;
(there v) → rename (weak1 id) (ρ v) }))
sub (var v) ρ = ρ v
sub (f ! s) ρ = sub f ρ ! sub s ρ
sub tt ρ = tt
sub (pair a b) ρ = pair (sub a ρ) (sub b ρ)
sub (fst p) ρ = fst (sub p ρ)
sub (snd p) ρ = snd (sub p ρ)
sub1 : ∀ {Γ S T} → Γ ▹ S ⊢ T → Γ ⊢ S → Γ ⊢ T
sub1 t s = sub t (λ { here → s ; (there v) → var v })
A formal treatment of this construction can be found in `Formalized
metatheory with terms represented by an indexed family of types`_, for
example.
**Exercise (difficulty: 2)** Weakenings interpret to renaming
functions and functions do compose so we are naturally driven to
implement composition directly on renamings::
_∘wk_ : ∀ {Δ ∇ Γ} → Δ ⊇ ∇ → Γ ⊇ Δ → Γ ⊇ ∇
_∘wk_ = {!!}
And we must make sure, that this notion of composition is the *right*
one::
lemma-right-unit : ∀ {Γ Δ} → (wk : Γ ⊇ Δ) → wk ∘wk id ≡ wk
lemma-right-unit = {!!}
lemma-left-unit : ∀ {Γ Δ} → (wk : Γ ⊇ Δ) → id ∘wk wk ≡ wk
lemma-left-unit = {!!}
lemma-assoc : ∀ {Γ Δ ∇ Ω} → (wk₃ : Γ ⊇ Δ)(wk₂ : Δ ⊇ ∇)(wk₁ : ∇ ⊇ Ω) →
(wk₁ ∘wk wk₂) ∘wk wk₃ ≡ wk₁ ∘wk (wk₂ ∘wk wk₃)
lemma-assoc = {!!}
-------------------------------------
Normal forms
-------------------------------------
We can represent the equation theory as an inductive family::
data _⊢_∋_↝βη_ : (Γ : context)(T : type) → Γ ⊢ T → Γ ⊢ T → Set where
rule-β : ∀{Γ S T}{b : Γ ▹ S ⊢ T}{s : Γ ⊢ S} →
------------------------------------
Γ ⊢ T ∋ (lam b) ! s ↝βη sub1 b s
rule-η-fun : ∀{Γ S T}{f : Γ ⊢ S ⇒ T} →
------------------------------------------------------
Γ ⊢ S ⇒ T ∋ f ↝βη lam (rename (weak1 id) f ! var here)
rule-η-pair : ∀{Γ A B}{p : Γ ⊢ A * B} →
------------------------------------------------------
Γ ⊢ A * B ∋ p ↝βη pair (fst p) (snd p)
data _⊢_∋_∼βη_ : (Γ : context)(T : type) → Γ ⊢ T → Γ ⊢ T → Set where
inc : ∀ {Γ T t₁ t₂} →
Γ ⊢ T ∋ t₁ ↝βη t₂ →
-----------------
Γ ⊢ T ∋ t₁ ∼βη t₂
reflexivity : ∀{Γ T t} →
-----------
Γ ⊢ T ∋ t ∼βη t
symmetry : ∀{Γ T t t'} →
Γ ⊢ T ∋ t ∼βη t' →
------------
Γ ⊢ T ∋ t' ∼βη t
transitivity : ∀{Γ T t t' t''} →
Γ ⊢ T ∋ t ∼βη t' →
Γ ⊢ T ∋ t' ∼βη t'' →
--------------
Γ ⊢ T ∋ t ∼βη t''
struct-lam : ∀{Γ S T b b'} →
Γ ▹ S ⊢ T ∋ b ∼βη b' →
----------------
Γ ⊢ S ⇒ T ∋ lam b ∼βη lam b'
struct-! : ∀{Γ S T f f' s s'} →
Γ ⊢ S ⇒ T ∋ f ∼βη f' →
Γ ⊢ S ∋ s ∼βη s' →
-----------------
Γ ⊢ T ∋ f ! s ∼βη f' ! s'
struct-pair : ∀{Γ A B a a' b b'} →
Γ ⊢ A ∋ a ∼βη a' →
Γ ⊢ B ∋ b ∼βη b' →
----------------
Γ ⊢ A * B ∋ pair a b ∼βη pair a' b'
struct-fst : ∀{Γ A B p p'} →
Γ ⊢ A * B ∋ p ∼βη p' →
------------------------
Γ ⊢ A ∋ fst p ∼βη fst p'
struct-snd : ∀{Γ A B p p'} →
Γ ⊢ A * B ∋ p ∼βη p' →
------------------------
Γ ⊢ B ∋ snd p ∼βη snd p'
..
::
module NBE-gensym where
open STLC
Compute η-long β-normal forms for the simply typed λ-calculus:
- define a representation of terms (``term``)
- interpret types and contexts in this syntactic model (``⟦_⟧`` and ``⟦_⟧context``)
- interpret terms in this syntactic model (``eval``)
::
data term : Set where
lam : (v : String)(b : term) → term
var : (v : String) → term
_!_ : (f : term)(s : term) → term
tt : term
pair : (x y : term) → term
fst : (p : term) → term