PoprC Runtime Operation Part 3: Quotes

Posted on November 10, 2013

A quote in the PoprC runtime is a fragment of possibly unevaluated code. You can think of a quote as holding a section of code, which can be further assembled and executed. Arguments can be appended to the left (pushl), and results can be removed from the right (popr). Quotes can also be composed using (.). Quote literals are indicated by a section of code surrounded by square brackets ([ ... ]). Quotes can be nested without limit. Quotes can be used as auxiliary stacks.

Quotes cannot be spliced up a level (‘removing the square brackets’.) This would result in all higher order functions having variable arity. Furthermore, the higher order functions’ behavior would be tied to their internal operation, because any function executed within another could consume and modify its internal data, making abstract interpretation and even compile time parsing of library functions difficult. So, the only way to manipulate and access the contents from outside are through pushl and popr.

This is important, because it is what allows PoprC to entirely remove all local (intermediate) quote operations (pushl, popr, and .) from the resulting compiled code. Furthermore, inlining can make more quote operations local, at the expense of code size. This is also interesting because, although most implementations of concatenative languages rely on one or more stacks to store arguments to functions, PoprC aggressively eliminates the only non-primitive datatype (which can be used as a stack), and most function arguments are assigned at compile time.

Figure 1: [1 + 2 +]

Quotes are stored as lists, which are vectors of pointers to closures. Pointers to incomplete closures are always on the left of the list (c->ptr[list_size(c) - 1]). An incomplete closure can contain another incomplete closure (but only one), which forms a chain terminating at the innermost incomplete closure (node 4 in Figure 1, notice it only has one argument). To the right are complete closures, ready to be reduced (except for deps, which although marked as complete, can point to incomplete closures.)

cell_t *empty_list() {
  cell_t *c = closure_alloc(1);
  c->func = func_reduced;
  c->type = T_LIST;
  return c;
}

cell_t *quote(cell_t *x) {
  cell_t *c = closure_alloc(2);
  c->func = func_reduced;
  c->type = T_LIST;
  c->ptr[0] = x;
  return c;
}

It’s easy to make a list; just allocate the cells, set c->func and c->type, and assign the items of the list to c->ptr. Above is code to create an empty list (empty_list()), and a list with just one item (quote(c)).

cell_t *expand(cell_t *c, unsigned int s) {
  if(!c) return 0;
  int n = closure_args(c);
  int cn_p = calculate_cells(n);
  int cn = calculate_cells(n + s);
  if(!c->n && cn == cn_p) {
     c->size += s;
    return c;
  } else {
    /* copy */
    cell_t *new = closure_alloc(n + s);
    memcpy(new, c, cn_p * sizeof(cell_t));
    if(is_placeholder(c)) trace(new, c, tt_copy);
    new->n = 0;
    traverse_ref(new, ARGS_IN | PTRS | ALT);
    new->size = n + s;
    if(is_reduced(c)) alt_set_ref(c->alt_set);
    drop(c);
    return new;
  }
}

When a larger list is needed to insert more items (such as in compose() and pushl_nd()), expand(c, n) is used to expand c to allow n more items, returning the expanded list.

cell_t *pushl_nd(cell_t *a, cell_t *b) {
  assert(is_closure(a) &&
	 is_closure(b) && is_list(b));

  int n = list_size(b);
  if(n) {
    cell_t *l = b->ptr[n-1];
    if(!closure_is_ready(l)) {
      cell_t *_b = arg_nd(l, a, b);
      if(is_placeholder(l)) trace(_b->ptr[n-1], l, tt_copy);
      return _b;
    }
  }

  cell_t *e = expand(b, 1);
  e->ptr[n] = a;
  return e;
}

pushl_nd(a, b) tries to fill in a as an argument for the leftmost item in the list using arg_nd, otherwise the list is expanded and a is inserted in the leftmost position.

cell_t *compose_nd(cell_t *a, cell_t *b) {
  int n = list_size(b);
  int n_a = list_size(a);
  int i = 0;
  if(n && n_a) {
    cell_t *l;
    while(!closure_is_ready(l = b->ptr[n-1]) && i < n_a) {
      cell_t *x = a->ptr[i];
      if(is_placeholder(x)) {
	/* ... */
      } else {
	b = arg_nd(l, ref(x), b);
	++i;
      }
    }
  }
  cell_t *e = expand(b, n_a - i);
  int j;
  for(j = n; i < n_a; ++i, ++j) {
    e->ptr[j] = ref(a->ptr[i]);
  }
  drop(a);
  return e;
}

compose_nd(a, b) is similar to pushl_nd, but a is a list, so it conceptually the same as performing pushl_nd(x, b) for each item x in a, but it is much more efficient because b is expanded only once. Ignore if(is_placeholder(x) { ... } for now, just look at the else clause.

The _nd variants of functions are nondestructive to their arguments, so that pushl_nd doesn’t affect other references; only the portion of the graph that has an exclusive reference can be modified and the rest must be copied. This is accomplished with arg_nd(c, a, r) which is similar to arg(c, a), except the argument r points to the root of the graph of which arg_nd will return a modified version in which c has been supplied argument a.

arg_nd uses modify_copy(c, r), which returns a copy of r where it is safe to modify c.

cell_t *modify_copy(cell_t *c, cell_t *r) {
  cell_t *new = _modify_copy1(c, r, true);
  if(new && new != r) {
    ref(new);
    drop(r);
  }
  if(new) {
    _modify_copy2(new);
    return new;
  } else return r;
}

void _modify_new(cell_t *r, bool u) {
  cell_t *n;
  if(clear_ptr(r->tmp, 3)) return;
  if(u) {
    n = ref(r);
  } else {
    n = copy(r);
    n->tmp = (cell_t *)3;
    n->n = 0;
  }
  r->tmp = mark_ptr(n, 3);
}

/* first sweep of modify_copy */
cell_t *_modify_copy1(cell_t *c, cell_t *r, bool up) {
  if(!is_closure(r)) return 0;

  r = clear_ptr(r, 3);
  int nd = nondep_n(r);

  /* is r unique (okay to replace)? */
  bool u = up && !nd;

  if(r->tmp) {
    assert(is_marked(r->tmp, 3));
    /* already been replaced */
    return clear_ptr(r->tmp, 3);
  } else r->tmp = (cell_t *)3;
  if(c == r) _modify_new(r, u);
  traverse(r, {
      if(_modify_copy1(c, *p, u))
	_modify_new(r, u);
    }, ARGS | PTRS | ALT);
  return clear_ptr(r->tmp, 3);
}

cell_t *get_mod(cell_t *r) {
  if(!r) return 0;
  cell_t *a = r->tmp;
  if(is_marked(a, 2)) return clear_ptr(a, 3);
  else return 0;
}

/* second sweep of modify copy */
void _modify_copy2(cell_t *r) {

  /* r is modified in place */
  bool s = r == clear_ptr(r->tmp, 3);

  if(!is_closure(r)) return;
  /* alread been here */
  if(!is_marked(r->tmp, 1)) return;
  r->tmp = clear_ptr(r->tmp, 1);
  traverse(r, {
      cell_t *u = clear_ptr(*p, 3);
      cell_t *t = get_mod(u);
      if(t) {
	if(!(s && t == u)) {
	  *p = ref(t);
	  if(s) drop(u);
	}
	_modify_copy2(t);
      } else if(!s) ref(u);
      if((!s || t != u) && is_weak(r, *p)) {
	--(*p)->n;
      }
    }, ARGS | PTRS | ALT);
}

modify_copy is complex, but essentially, it makes a lazy copy using the tmp fields in cell_t. _modify_copy1 performs the first sweep, where copied closures are allocated as needed and stored in tmp, and _modify_copy2 performs the second sweep, where references are updated in the copied graph to point to the new closures.

As you can see, quotes are very powerful, although the implementation is somewhat complex. The compiler can eliminate any overhead from using quotes in most cases.

In the next article, I will cover how alternatives work in the PoprC runtime.