PoprC Code Explanation: Introduction and Overview
Modified on November 11, 2013
I’ve been working on a compiler called PoprC for my programming language, Popr. It has been about a year since I started, so I want to explain how the compiler currently works to help clarify my ideas. It might also be interesting to others, and I hope to get some feedback.
The compiler consists of four main components: runtime (rt.c
), evaluation (eval.c
), predefined primitives (primitive.c
), and LLVM code generation (llvm.cpp
). The runtime provides code to be linked into compiled Popr code, to handle memory management and graph reduction. eval.c has code for parsing Popr expressions and displaying results, as well as functions to generate .dot files (graphing working memory) for debugging. It contains any code related to evaluation that is not required in the runtime. The predefined primitives in primitives.c form the basic building blocks for Popr programs. llvm.cpp
handles tracing evaluation to generate LLVM IR that is currently JIT compiled.
The compiler is based on abstract interpretation; it is an interpreter that can also handle variables and placeholders. Variables represent a partially unknown value, such as an argument to a function. Placeholders are partially unknown functions. Functions continue to operate in the presence of variables and placeholders; if the result can’t be known given the arguments, variables and placeholders are used to denote the unknown portions in the resulting value. This allows the interpreter to partially evaluate functions. A tracing function can be hooked into the interpreter to convert operations on variables and placeholders into code implementing the partially applied function. The ability of the interpreter to execute alternatives allows all branches to be explored while generating code for full flow analysis.
Some examples might help illustrate how it works (you can try them online):
: +
?_2 <- arg(0)
?_3 <- arg(1)
?i3 <- type
?i2 <- type
?i1 <- ?3 ?2 add
[ ?i1 ]
A +
is entered at the prompt (user input is shown after the prompt ‘:
’). First the interpreter produces arguments until the expression is complete. It creates ?_2
and ?_3
as the first and second arguments, respectively. The character after the ?
denotes the type, an underscore (_
) represents an unknown type, and i
represents an integer. The +
function restricts its arguments to integer type, producing the next two lines. Finally +
is applied to its arguments, producing the add
line, and the result is the variable ?i1
.
: popr swap drop
?_6 <- arg(0)
?f7 <- ?l6 head
?8 <- f[?7]
[ ?_8 ]
A more complex example is the head
function in lib.peg
. It takes a list argument (?l6
), extracts the function within (?f7
), and applies it, producing a value of unknown type (?_8
).
To understand more in depth how PoprC works, you will need to understand the details of the interpreter, which will be the topic of my next post.