Posted on August 15, 2020
Arbitrary limits are bad
I’ve written PoprC in an embedded style of C, where malloc()
is avoided. This has a lot of benefits: no dynamic allocation overhead, repeatable addresses which can be used as identifiers, and the ability to use watchpoints when debugging, as well as being able to easily iterate over arrays.
It does have one large drawback, though: because memory is statically allocated, the sizes cannot change. I generally try to set the limits around 2x a reasonable amount, but while this works okay for development, I know it’s not acceptable for end use.
Startup “static” allocation
To address this, I implemented an easy way to perform “static” allocation on startup (in static_alloc.c
. The idea is to perform all allocations with arbitrary limits on startup, which allows the opportunity to adjust the sizes based on flags or a configuration file.
Doing this manually would be annoying, so I used the code generation system that I’ve used elsewhere to collect macros of the form STATIC_ALLOC(name, type, default_size)
. Then, I would translate allocations of the form type name[size]
into that macro throughout PoprC’s source. In addition, there are extended forms of the macro to specify alignment (STATIC_ALLOC_ALIGNED
) and sizes that are dependent on another (STATIC_ALLOC_DEPENDENT
.)
There are a few drawbacks to this system:
- Sizes can’t depend on things not visible in
static_alloc.c
- Types must also be visible in that file.
- Static allocations are in a single namespace, so collisions might be a problem.
- Addresses are no longer static, so some things must be set up on initialization (e.g. in
*_init()
procedures.)
- LLDB seems to have more difficulty with
malloc
’ed pointers for some reason.
All of these limitations were fairly easy to work with.
A nice added feature for debugging is the compiler can identify pointers in the allocated region, which is most of the pointers I’m interested in during debugging. I can type pp (some pointer)
in LLDB now to display a description of the form variable[offset]
(see print_static_alloc()
.)
Future work
Some allocations aren’t needed all of the time. I would like to implement a way to perform temporary allocations in an area allocated on startup (using static_alloc.c
) which is large enough to hold the maximum temporary allocation. This would be like a statically allocated union where the fields can be declared anywhere.
I have yet to implement reading allocation sizes from a configuration file, so PoprC is just using the default sizes. After implementing this, it would be good to have assertion messages that indicate which size needs to be adjusted.
Posted on July 21, 2020
What is an error anyway…
I have refrained from implementing error messages for PoprC until now.
There are several reasons for this:
Failure is okay, and expected.
Other than trivial syntax errors, there is only one possible error in a Popr program - program non-determinism. Determinism is the property of having a unique successful execution path for any input, so a non-deterministic program has either no successful path, or multiple successful paths, for some input. While the program must be deterministic, it’s okay for any part to fail (e.g. division when the divisor is zero) as long as it is handled by the consumer of that computation. Every branch must fail except one, so failure is common and expected.
1 0 / __ Division by zero
"one" 2 + __ Addition operates on integers
3 False ! __ Asserting False
These all fail, but appending True |
to any one of these will result in True
.
This means detecting true errors requires the entire program context, which is not always available. Automatically proving determinism is not implemented yet, and is not solvable in general.
Reduction is lossy, which can make locating errors in the source imprecise.
Location information is discarded during reduction, and yet the result might fail after reduction. The reduction graph is more expressive than the source language, so a failure might not correspond to a precise location in the original source code.
A simple example is that in 1 2 swap 3 + odd !
, 2
is not involved in the error, but after reducing the addition, the location of the result must include the constants 1
and 3
from which the result (4
) is derived.
Source locations take up space.
Popr uses a very compact internal representation (IR) graph. For a long time, I’ve been able to fit everything I’ve needed for the IR into 64 byte nodes, but there’s no space for another byte, much less another pointer.
Posted on March 31, 2018
Modified on April 5, 2018
Popr is a new programming language (compiler, try it) that works unlike any other programming language that I know of, so a concise description of the language is difficult.
Rather than describe the semantics in relation to other languages, or listing the formal evaluation rules, I will present a graphical notation that, while impractical for larger programs, shows how Popr programs work in an intuitive way.
In this notation, we will build machines that consume and produce dots.
Posted on November 19, 2013
Every programming language supports some form of branching. A simple form of branching is the C if
statement, which evaluates a boolean expression, and then selects a branch based on the value of that expression. This works well if the branching condition can be calculated before choosing a branch, but sometimes a branch can’t be selected until one of the branches has been partially computed. If this is the case, the programmer has to write several nested if
statements, one for each decision point, and divide the calculation into incremental fragments, often repeating parts of the same calculation in different branches. Another example of this type of branching is common in error handling, where errors are checked in several places. If an error occurs, the program must jump to an alternate branch which handles the error, often by aborting the calculation and returning an error to the caller, which will continue to propagate upwards until it can be handled reasonably. This is why C++ explicitly supports exceptions, although it is really just a type of branching.
Posted on November 10, 2013
A quote in the PoprC runtime is a fragment of possibly unevaluated code. You can think of a quote as holding a section of code, which can be further assembled and executed. Arguments can be appended to the left (pushl
), and results can be removed from the right (popr
). Quotes can also be composed using (.
). Quote literals are indicated by a section of code surrounded by square brackets ([ ... ]
). Quotes can be nested without limit. Quotes can be used as auxiliary stacks.
Quotes cannot be spliced up a level (‘removing the square brackets’.) This would result in all higher order functions having variable arity. Furthermore, the higher order functions’ behavior would be tied to their internal operation, because any function executed within another could consume and modify its internal data, making abstract interpretation and even compile time parsing of library functions difficult. So, the only way to manipulate and access the contents from outside are through pushl
and popr
.
This is important, because it is what allows PoprC to entirely remove all local (intermediate) quote operations (pushl
, popr
, and .
) from the resulting compiled code. Furthermore, inlining can make more quote operations local, at the expense of code size. This is also interesting because, although most implementations of concatenative languages rely on one or more stacks to store arguments to functions, PoprC aggressively eliminates the only non-primitive datatype (which can be used as a stack), and most function arguments are assigned at compile time.
Posted on November 4, 2013
Memory management in the PoprC runtime is done using a custom allocator, based on a ring of cells. It’s fairly simple, but it works, and I’m not yet to the point where it needs to be optimized. Memory allocation must be handled carefully, because I want PoprC to be able to produce code suitable for embedded devices, which have very little memory, and need real-time guarantees.
The runtime’s garbage collection is based on reference counting to minimize memory usage, to avoid unpredictable pauses, and also because it’s simple. In addition, the reference count can be used to determine when it is okay to modify a closure, rather than making a modified copy. This is an effective optimization because most closures only have a single reference.
Posted on November 1, 2013
Because Popr is a functional language, execution of a Popr expression can be thought of as a series of reductions to reach the final result. PoprC relies on functions in the runtime (rt.c
) to handle the reduction of Popr expressions. It can reduce functions with static (known) arguments. If this were all PoprC could do, it would be an interpreter, and not a compiler. It can also reduce functions with variables as arguments, producing a trace which is converted into LLVM IR. This results in code with all static parts fully reduced, and only the dynamic (not known in advance) portions implemented by using calls to the runtime.
The runtime is also responsible for memory management. Unallocated memory is arrange as a free ring (a doubly-linked list with no head or tail) of cells. Each cell stores data required to implement a closure, which is a function and its arguments, but a cell also has more information required for memory management, alternatives, reduced values, alternatives, and a temporary pointer used during graph copying.
Posted on October 30, 2013
Modified on November 11, 2013
I’ve been working on a compiler called PoprC for my programming language, Popr. It has been about a year since I started, so I want to explain how the compiler currently works to help clarify my ideas. It might also be interesting to others, and I hope to get some feedback.
The compiler consists of four main components: runtime (rt.c
), evaluation (eval.c
), predefined primitives (primitive.c
), and LLVM code generation (llvm.cpp
). The runtime provides code to be linked into compiled Popr code, to handle memory management and graph reduction. eval.c has code for parsing Popr expressions and displaying results, as well as functions to generate .dot files (graphing working memory) for debugging. It contains any code related to evaluation that is not required in the runtime. The predefined primitives in primitives.c form the basic building blocks for Popr programs. llvm.cpp
handles tracing evaluation to generate LLVM IR that is currently JIT compiled.
Posted on October 29, 2013
Modified on November 5, 2013
Popr is a pure functional lazy post-fix concatenative programming language. I will briefly explain some of the features of Popr through some examples. You can try these examples online.
1 2 + --> [ 3 ]
+
is a function that takes two integers and returns one. There are several similar functions, with the same meaning as in C: -
, *
, <
, <=
, ==
, >
, >=
. Booleans are currently represented as integers, non-zero for true, zero for false.