Wednesday, 6 June 2012

Dynamic Optimisation: the Next Logical Step?

Dynamic Optimisation covers a set of techniques used to optimise a piece of code at runtime based on the dynamic state and path through the code so far. The canonical example is optimising a loop: while statically-optimised version unrolls several iterations of the loop, the dynamically-optimised version has realised that the loop's predicate can never be true and so removes the loop entirely. Dynamic optimisation is most commonly applied when compiling a higher-level language into machine code, but what if we applied the technique at a higher level of abstraction?

One of my favourite interview questions is 'what implementation of data structure x would fit this use case?'. Maybe you're concentrating on the higher-level guarantees a data structure provides (e.g. uniqueness or ordering), or maybe you're prioritising time-to-market over application performance. If you could specify what implementation of a data structure to use for a given pattern of data access, or a given availability of system resources you could use a data structure that optimises its implementation over its lifetime, trading off predictability and book-keeping overhead to raise the lower bound on performance.

I've written a proof-of-concept to highlight some of the interesting problems this data structure faces. It's a implementation of the java.util.List interface that decides whether to store its data in a LinkedList or an ArrayList. As you probably know, they perform equally for appends (although an ArrayList is amortised O(1) while a LinkedList is O(1) every time), but an ArrayList gives O(1) random access but O(n) prepending while a LinkedList gives O(n) random access and O(1) prepending.

We'll start with the wrapper class, that delegates to the current backing implementation. I'll use Google's Guava to reduce the boilerplate. You could write a slower but more generic implementation with the java.lang.reflect's proxying mechanism, and could probably do this in a single line of a dynamic language.

12
13
14
15
16
17
18
19
20
⟨Dynamic List Class⟩+≡
private static final class DynamicList<E> extends ForwardingList<E> {
  private List<E> implementation = new ArrayList<E>();
  protected List<E> delegate() {
    return implementation;
  }
  ⟨Book keeping⟩
  ⟨Other Methods⟩
}

We'll need to gather data to determine the access patterns. All accesses (additions, replacements, deletions and queries) fall into two categories: accesses at the ends of the list (index 0 and the last index) and other "random" accesses. We'll do this by (tediously) instrumenting the interface's method calls:

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
⟨Other Methods⟩+≡
public boolean add(E arg0) {
  endAccess();
  return super.add(arg0);
}
public void add(int index, E element) {
  accessAt(index);
  super.add(index, element);
}
public E set(int index, E element) {
  accessAt(index);
  return super.set(index, element);
}
public E get(int index) {
  accessAt(index);
  return super.get(index);
}
public E remove(int index) {
  accessAt(index);
  return super.remove(index);
}    

Bulk accesses are marginally more interesting; we'll consider then as one access each (not n) as at an implementation level they only require an O(1) reorganisation of pointers or a )amortised) single call to System.arrayCopy (which is really O(n) - it'd be interesting to see what effects the two cost-counting methods had on the structure's behaviour).

52
53
54
55
56
57
58
59
60
61
62
63
64
⟨Other Methods⟩+≡
public boolean addAll(Collection<? extends E> collection) {
  endAccess();
  return super.addAll(collection);
}
public boolean addAll(int index, Collection<? extends E> elements) {
  accessAt(index);
  return super.addAll(index, elements);
}
public List<E> subList(int fromIndex, int toIndex) {
  accessAt(fromIndex);
  return standardSubList(fromIndex, toIndex);
}

For the actual book-keeping, we'll just have int counters, and will ignore the effects of overflow. Industrial-strength solutions may want to scale the numbers down (or subtract the same constant from each of them) if any one gets too large:

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
⟨Book keeping⟩+≡
private int endAccess = 0;
private int randomAccess = 0;
private void endAccess() {
  ++endAccess;
  evaluateImpl();
}
private void accessAt(int index) {
  if (index == 0 || index == size()) {
    ++endAccess;
  } else {
    ++randomAccess;
  }
  evaluateImpl();
}

Now to the first really interesting problem. The evaluateImpl method might be quite expensive (especially if it decides we need to change the implementation we're using), and once the pattern of access behaviour is obvious we don't really need the evaluation overhead when we're not going to do anything with it. Also, if the access pattern is right on the border of the trade-off between implementations we don't want to end up changing implementations backwards and forwards on every access. We'll therefore want some degree of hysterisis - I'll go for an exponential back-off (so only consider our choice of implementation on the 1st, 2nd, 4th, 8th, 16th, &c. access). This means if the access behaviour changes later in the list's lifetime we'll only react to it very slowly. It's also likely that the pattern of accesses just after the list is created won't be representative - consider a read-only list, where the first n accesses are insertions at the end (so a LinkedList would be ideal) but all accesses after that are random. I'll therefore make the exponential backoff start at 32 - that is the first evaluation of the implementation's suitability will be after 32 accesses:

90
91
92
93
94
95
96
97
⟨Book keeping⟩+≡
private int nextEvaluation = 32;
private void evaluateImpl() {
  if (randomAccess + endAccess == nextEvaluation) {
    ⟨Evaluate Suitability⟩
    nextEvaluation = Math.max(nextEvaluation << 1, 0x7FFFFFFF); // don't overflow
  }
}

And now for the key algorithm - what implementation best fits the metrics we've gathered? I'll do a trivial implementation in-line; you could probably abstract the logic behind a strategy pattern to keep it tidier. I've tried to choose numbers that give sensible results, so if the ratio of random accesses to end accesses is greater than 1:4 I believe an ArrayList will probably perform better.

103
104
105
106
107
108
⟨Evaluate Suitability⟩+≡
if(randomAccess > endAccess / 4) {
  useArrayList();
} else {
  useLinkedList();
}

Writing meaningful micro-benchmarks in any statically-compiled language is very hard, and in Java it's virtually impossible. Despite this, I wrote a loop that first inserts 50,000 random numbers into a list at a random index, then inserts 50,000 (different) random numbers at index 0 of a new list - to get a rough idea about whether my implementation-choosing algorithm above "works". The (meaningless) results are below, with total loop times in milliseconds:

Test ArrayList LinkedList DynamicList
Inserting at random index 613 3,245 648
Inserting at index 0 1,126 11 22

I (rather unethically) adjusted the ratio in my algorithm until the DynamicList picked the right implementation in each benchmark - yet I still think this shows the idea of a dynamically-chosen data structure implementation is workable.

Back to the code - the switches avoid unnecessary work:

123
124
125
126
127
128
⟨Other Methods⟩+≡
private void useLinkedList() {
  if (implementation instanceof LinkedList) return;
  implementation = new LinkedList<E>(implementation);
  ⟨Changed Implementation⟩
}

As evaluating an ArrayList happens relatively infrequently you can do a bit of maintenance. I'll compact the backing array down to remove any unused trailing nulls, which should (marginally) reduce the memory footprint of my process.

134
135
136
137
138
139
140
141
142
⟨Other Methods⟩+≡
private void useArrayList() {
  if (implementation instanceof ArrayList) {
    ((ArrayList<E>)implementation).trimToSize();            
    return;
  }
  implementation = new ArrayList<E>(implementation);
  ⟨Changed Implementation⟩
}

The next, more Java-specific, implementation detail is how do you deal with Iterators (or more generically, long-lived mutable objects that have pointers into the DynamicList's internal structures. I don't want the list itself to keep track of all the objects that have pointers into it as this makes the code less efficient (as every time you change implementation you have to update these pointers - even if these pointers are never going to be dereferenced again) and could lead to memory leaks (as there are hard-to-debug circular references that could even fool a garbage-collector). Instead I'll make all the references one-way, from the Iterators to the DynamicList. Before accessing pointers into the DynamicList's internal state they will check if the DynamicList has changed implementation since the last time they checked: if so their pointers are invalid so they'll have to be refreshed (in this case, by asking the DynamicList for new ones).

148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
⟨Changed Implementation⟩+≡ ++switches;
⟨Book keeping⟩+≡ private int switches = 0;
⟨Other Methods⟩+≡
public ListIterator<E> listIterator(int index) {
  accessAt(index);
  return new DynamicListIterator(implementation.listIterator(index), switches);
}
private class DynamicListIterator extends ForwardingListIterator<E> {
  private ListIterator<E> delegate;
  private int age; // the number of switches the last time we looked at the DynamicList
  DynamicListIterator(ListIterator<E> delegate, int age) {
    this.delegate = delegate; this.age = age;
  }
  protected ListIterator<E> delegate() {
    if (age == switches) { // no changes
      return delegate;
    } else { // the implementation's changed!
      age = switches;
      return delegate = implementation.listIterator(delegate.nextIndex());
    }
  }
}   

I've used an integer counter to keep track of the "age" of the DynamicList, rather than a real timestamp of the last implementation switch. While this has a very unlikely overflow bug (if the implementation switches so many times that the counter wraps round and comes back to the number it started at - the Iterator won't notice its pointers are now invalid), it avoids system calls and is more deterministic (useful when debugging application code using this list).

We'll hide all the implementation details behind a static factory method, just in case we want to change the constructor later (e.g. to pass in hints about the initial size of the backing array of the ArrayList implementation).

177
178
179
180
181
182
183
184
185
⟨com/blogspot/remisthoughts/dynamicoptimisation/DynamicDataStructures.java:*⟩+≡
package com.blogspot.remisthoughts.dynamicoptimisation;
⟨Imports⟩
public final class DynamicDataStructures {
    __lang__Dynamic List Class__rang__
    public static @Nonnull <E> List<E> newList() {
        return new DynamicList<E>();
    }
}

And we'll finish off with all the Java boilerplate needed to make this work:

191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
⟨Imports⟩+≡
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
import javax.annotation.Nonnull;
import com.google.common.collect.ForwardingList;
import com.google.common.collect.ForwardingListIterator;
⟨Other Methods⟩+≡
public ListIterator<E> listIterator() {
  return listIterator(0);
}
public Iterator<E> iterator() {
  return listIterator(0);
}

Unit testing this code will be quite difficult as there are a lot of edge cases - ideally I'd use Google's Collections API compatibility test framework but there doesn't seem to be an easily-accessible (i.e. in the central Maven repository) compiled version of the code.

In the meantime, you can find this code in my Github repo...

Wednesday, 28 March 2012

Perfect Hashes in Java

Given a set of m keys, a minimal perfect hash function maps each key to an integer 0 to m-1, and (most importantly) each key maps to a different integer. This means you can use the "perfect hash" number as a index into an array (i.e. use it as a hashmap) for guaranteed O(1) insertions & lookups. I'm going explain the BMZ algorithm, roughly following the author's C implmentation as it creates perfect hashes in O(m) space and time. I'll end up with an implementation of Google Guava's Equivalence as then you can use wrappers and standard Java HashMaps to create an efficient Collection with a minimum of wheel-reinventing.

But first I'll start with a simple example. Working in Java is useful as we can re-use our key Objects' hashCode methods to do most of the work. However, it's unlikely that the numbers that hashCode returns are "perfect" - so we'll have to modify them deterministically. I'll use an idea I got from the Jenkins hash algorithm - basically choose a seed integer and mix that with the hashCodes of the keys. As we want the resulting hashCode to lie between 0 and m-1 we'll just do mod-m on the result after mixing in the seed - so then now we just have to worry about choosing a seed that makes each object map to a different number.

8
9
10
11
12
13
14
15
16
17
18
19
⟨First Draft⟩+≡ 
private static final class MixSeed<E> extends Equivalence<E> {
    private final int seed;
    private final int m;
    MixSeed(int seed, int m) { this.seed = seed; this.m = m; }
    protected boolean doEquivalent(E a, E b) {
        return a.equals(b);
    }
    protected int doHash(E t) {
        return (t.hashCode() ^ seed) % m;
    }
}

The first - draft approach is simply to guess a seed; if the resulting hashCodes are perfect, then return an Equivalence that uses that seed, but if not try again. We don't want to keep looping forever, so fix the number of tries and fail if no perfect hash is found.

25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
⟨First Draft⟩+≡ 
public static <E> Equivalence<E> create(Collection<? extends E> keys) {
    Random r = new Random(17); // fixed seed, so deterministic
    for (int tries = 0; tries < 1000; ++tries) {
        int seed = r.nextInt();
        SortedSet<Integer> perfectHashes = Sets.newTreeSet();
        for (E key : keys) {
            perfectHashes.add((key.hashCode() ^ seed) % keys.size());
        }
        if (perfectHashes.first() == 0 && perfectHashes.last() == keys.size() - 1) {
            return new MixSeed<E>(seed, keys.size());
        }
   }
   ⟨Give Up⟩
}

This is clearly not very likely to succeed. To work out the exact probability of an iteration finding a perfect hash, we'll assume the hashCode mixed with the seed is uniformly distributed between 0 and m-1. The first key can be mapped to any of the m integers in this range, the second to any of the m-1 remaining integers, the third to the m-2 remaining integers, &c., and the probablity of this happening is m/m * (m-1)/m * (m-2)/m * ... * 1/m, which is m!/mm - so not very likely!

The BMZ algorithm takes a pretty interesting approach. To build the perfect hash in O(m) time we can only store an O(m) amount of state. We want to make the constant as big as possible (which uses a lot of memory - not ideal), so we could either store really big state objects, or make several queries smaller state objects (which BMZ does). The problem them becomes: (1) how do you work out what queries to make, and more importantly (2) how do you build up the state such that each key makes result in a different hash number.

BMZ queries the state twice to get the data it needs to return the hash number, and solves the first step by a logical extension of the first draft above: instead of having one seed, have two! The Equivalence below takes the shared state g (an array whose length is not m), queries it twice with the two different seeds, and combines them by simply summing the two states it finds. I've made the Equivalence Serializable so once you've done the hard work of generating it you can persist it somewhere and load it in other applications.

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
⟨Returned Equivalence⟩+≡ 
private static final class BMZ<E> extends Equivalence<E> implements Serializable {
    private final int seed1;
    private final int seed2;
    private final int[] g;
    BMZ(int seed1, int seed2, int[] g) { 
        this.seed1 = seed1; this.seed2 = seed2; this.g = g;
    }
    protected boolean doEquivalent(E a, E b) {
        return a.equals(b);
    }
    protected int doHash(E t) {
        int[] hashes = getTwoHashes(t, seed1, seed2, g.length); 
        return g[hashes[0]] + g[hashes[1]];
    }
}
/** we'll use this elsewhere, so let's extract this logic into its own method */
private static int[] getTwoHashes(Object t, int seed1, int seed2, int n) {
    int hc = t.hashCode(); // don't call twice - premature optimization?
    int h1 = (hc ^ seed1) % n;
    int h2 = (hc ^ seed2) % n;
    if(h1 < 0) { h1 += n; } // Java modulus gives numbers -n < h1 < n...
    if(h2 < 0) { h2 += n; } // ...but we want positive numbers to use as indices
    if(h1 == h2) { h2 = (h2 + 1) % n; } // h1 == h2 violates some assumptions (see later) - this is a quick fix!
    return new int[]{h1, h2};
}

That was the easy part - so how do we know what to put in g? The BMZ algorithm centres around treating this state as a graph. Each key is mapped to an edge (so that's it uses two queries - one for the vertex at each end) and each vertex has an integer attached to it. The vertices are numbered from 0 to n (I'll use the same letters as the paper to make it easier to read this side-by-side), and the integer attached to each vertex v is stored in the g array at index v. This means that the lookup operation in the Equivalence above adds the two numbers attached to vertices at either end of the edge that corresponds to the key.

So how should we choose how big n is? The answer again parallels the "First Draft" solution: we relax the problem slightly, and say that we only require a solution (i.e. a perfect hash Equivalence) with a reasonable probability. As above, we make several guesses, and fail if none of them reach an answer - and the relaxed problem means we can choose an n that is reasonable likely to give us a solution (much easier than working out an exact answer); the paper suggests this should be 1.15m.

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
⟨BMZ Method⟩+≡ 
public static <E> Equivalence<E> createBMZ(
    Collection<? extends E> keys,
    int maxTries /*100 in c implementation*/, 
    double c /*1.15 suggested*/) 
{
    Random r = new Random(17); // fixed seed, so deterministic
    for (int tries = 0; tries < maxTries; ++tries) {
        int seed1 = r.nextInt();
        int seed2 = r.nextInt();
        int[] g = new int[(int)Math.ceil(c * keys.size())];
        ⟨Put Numbers in g⟩
   }
   ⟨Give Up⟩
}

Now we have to choose what number to give each vertex so that the edges match to the perfect hash codes of the keys. It'll help if we break this problem down. In the following situations, a, b, c and d are vertices and the edges are numbered in square brackets (how we choose which number gets assigned to which edge comes later).

  1. a
    a single vertex can be trivially assigned zero
  2. a--[4]--b
    if either a or b are known, then the other is chosen so that a+b=4. If neither a nor b have been assigned then we can choose any pair of integers that sum to 4
  3. a--[4]--b--[7]--c--[4]--d
    an interesting case - if we know the integer of one of the end vertices (a or d) in this chain we can work out the other vertices by walking along the chain, at each step picking a number to make the edge just walked calculate correctly
  4. a--b--c--a
    this is going to be the really hard case to solve - cycles or vertices of degree) greater than two.

We'll therefore divide the vertices of the graph into two parts - one set that have to be solved the hard way (case 4 - called "critical nodes" in the paper), and others that can be solved by walking down chains or the other two simple cases. You can also see that loops in the graph (edges with both ends at the same vertex) will cause real problems - as (e.g.) if the edge needs to be an odd number, and the vertex stores an integer then we can't solve this graph. This is why the BMZ Equivalence class adds one to one of the hashes in a lookup if both hashes are the same - this turns loops into normal edges.

We'll first need to convert the Objects passed to the graph into a set of edges (in O(m) time and space - or we'll lose any big-O speedup this algorithm gives). We'll make our domain objects immutable, and not worry about all the garbage they make.

111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
⟨Graph Utilities⟩+≡ 
private static final class Edge {
     final int a; final int b;
     Edge(int[] ab) { this.a = ab[0]; this.b = ab[1]; } 
     public String toString() { return String.format("(%d,%d)", a, b); }
}
private static final class Graph {
    final List<Edge> edges;
    /** indexed by vertex, holds list of vertices that vertex is connected to */
    final List<Integer>[] adjacencyList;
    Graph(int n, int m) { 
        this.edges = new ArrayList<Edge>(m);
        this.adjacencyList = new List[n];  
    }
    /** @returns true if this edge is a duplicate */
    boolean addEdge(Edge e) {
        edges.add(e);
        if(getAdjacencyList(e.a).contains(e.b)) return true; // linear, but list should be v. small
        getAdjacencyList(e.a).add(e.b); 
        getAdjacencyList(e.b).add(e.a); 
        return false;
    }
    private List<Integer> getAdjacencyList(int forVertex) {
        List<Integer> ret = adjacencyList[forVertex];
        return ret == null ? (adjacencyList[forVertex] = new LinkedList<Integer>()) : ret;
    }
}
private static Graph toGraph(Collection<?> objects, int seed1, int seed2, int n) {
    Graph ret = new Graph(n, objects.size());
    for(Object object : objects) { if(ret.addEdge(new Edge(getTwoHashes(object, seed1, seed2, n)))) return null; }
    return ret;
}
⟨Put Numbers in g⟩+≡
Graph graph = toGraph(keys, seed1, seed2, g.length);
if(graph == null) { continue; } // some duplicates - try again with new seeds

So how do we work out if a node is "critical" or not? We know that degree 0 and 1 nodes definitely aren't critical, so we'll start by eliminating them. This leaves us with the remaining tangle mess (or messes - the graph could be disconnected). We can then "strip off" any chains of edges (case 3 above) as we can solve them the easy way. We can find the ends of all the chains (if there are any) by looking through all the degree-one vertices, and then follow the chain towards the mess as far as it'll go, removing any vertices we cross from the critical set:

151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
⟨Find Critical Nodes Helper⟩+≡
private static BitSet findCriticalNodes(Graph graph, int n) {
    // calculate node degrees...
    int[] degrees = new int[n];
    for(Edge edge : graph.edges){ ++degrees[edge.a]; ++degrees[edge.b]; };
    // ...and trim the chains...
    List<Integer> degree1 = new LinkedList<Integer>();
    for(int i=0; i<n; ++i) { if(degrees[i] == 1) degree1.add(i); }
    while(!degree1.isEmpty()){
        int v = degree1.remove(0); --degrees[v];
        if(graph.adjacencyList[v] != null) 
            for(int adjacent : graph.adjacencyList[v] ) 
                if(--degrees[adjacent] == 1 ) degree1.add(adjacent);
    }
    // ...and return a bitmap of critical vertices
    BitSet ret = new BitSet(n); // all non-critical by default - very useful!
    for(int i=0; i<n; ++i) { if(degrees[i] > 1) ret.set(i); }
    return ret;
}
⟨Put Numbers in g⟩+≡
BitSet criticalNodes = findCriticalNodes(graph, g.length);

Now that we've classified the vertices into "critical" and (therefore) "non-critical" ones, we can start assigning integers to them. Assigning numbers to the critical vertices is essentially a graph colouring problem - we want to choose the integers so that adjacent nodes sum to the value of the edge (also - we haven't assigned the integers 0 to m-1 to the edges yet!). We'll therefore decide what integer each edge should have as we go along - this gives us a bit more flexibility when we assign integers to vertices. As we've still not assigned numbers to the non-critical vertices we don't have to assign edge integers sequentially in this step. We can skip any edge integers that would require impossible combinations of vertex integers, and assign these leftover edge integers to the non-critical vertices later.

We'll therefore have a bitmap ae that stores all the edge integers we've assigned so far. We can only assign each integer to an edge once or we won't end up with a perfect hash (remember, each edge is a key and a perfect hash assigns a different integer to each key).

179
180
⟨Put Numbers in g⟩+≡
BitSet ae = new BitSet(g.length);

We'll call the value we'll try to give to the next critical vertex x, and will start our assignment at the lowest critical vertex (this is an arbitary choice - we need to start our depth-first search somewhere). For each vertex we process, we must make sure the integer we give it (i.e. x) doesn't cause two edges to end up with the same integer (as each edge is a key, and two keys that hash to the same number means our hash isn't perfect). We'll therefore just keep incrementing the x (in getXThatSatifies) until it doesn't break this invariant. However, we mustn't forget the other invariant - the hash of each key (i.e. integer assigned to each edge) must be between 0 and m-1. We'll have to add a bit of validation every time we pick a new x; we'll check every adjacent vertex to make sure this new x doesn't cause the edge to have the same value as one of the other edges.

186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
⟨Label Critical Nodes Helper⟩+≡ 
/** @returns false if we couldn't assign the integers */
private static boolean assignIntegersToCriticalVertices(Graph graph, int[] g, BitSet ae, BitSet criticalNodes) {
    int x = 0;
    List<Integer> toProcess = new LinkedList<Integer>(); 
    BitSet assigned = new BitSet(g.length);
    while(!assigned.equals(criticalNodes)) {
        BitSet unprocessed = ((BitSet)criticalNodes.clone()); unprocessed.andNot(assigned);
        toProcess.add(unprocessed.nextSetBit(0)); // start at the lowest unassigned critical vertex
        // assign another "tree" of vertices - not all critical ones are necessarily connected!
        x = processCriticalNodes(toProcess, graph, ae, g, x, assigned, criticalNodes);
        if(x < 0) return false; // x is overloaded as a failure signal
    }
    return true;
}
/** process a single "tree" of connected critical nodes, rooted at the vertex in toProcess */
private static int processCriticalNodes(List<Integer> toProcess, Graph graph, BitSet ae, int[] g, int x, BitSet assigned, BitSet criticalNodes) {
    while(!toProcess.isEmpty()) {
        int v = toProcess.remove(0);
        if(v < 0 || assigned.get(v)) continue; // there are no critical nodes || already done this vertex
        if(graph.adjacencyList[v] != null) {
            x = getXThatSatifies(graph.adjacencyList[v], x, ae, assigned, g);
            for(Integer adjacent : graph.adjacencyList[v]) {
                if(!assigned.get(adjacent) && criticalNodes.get(adjacent) && v!= adjacent) { 
                    // give this one an integer, & note we shouldn't have loops - except if there is one key 
                    toProcess.add(adjacent); 
                } 
                if(assigned.get(adjacent)) {  
                    int edgeXtoAdjacent = x + g[adjacent]; // if x is ok, then this edge is now taken
                    if(edgeXtoAdjacent >= graph.edges.size()) return -1; // this edge is too big! we're only assigning between 0 & m-1
                    ae.set(edgeXtoAdjacent); 
                } 
            }
        }
        g[v] = x; assigned.set(v); // assign candidate x to g
        ++x; // next v needs a new candidate x
    } 
    return x; // will use this as a candidate for other "trees" of critical vertices
}
private static int getXThatSatifies(List<Integer> adjacencyList, int x, BitSet ae, BitSet assigned, int[] g) {
    for(Integer adjacent : adjacencyList) {
        if(assigned.get(adjacent) /*only covers critical nodes*/ && ae.get(g[adjacent] + x)) { 
            // if we assign x to v, then the edge between v & and 'adjacent' will
            // be a duplicate - so our hash code won't be perfect! Try again with a new x:
            return getXThatSatifies(adjacencyList, x + 1, ae, assigned, g);
        } 
    }
    return x; // this one satisfies all edges
}
⟨Put Numbers in g⟩+≡
if(!assignIntegersToCriticalVertices(graph, g, ae, criticalNodes)) continue; // try again from the start with different seeds

We've done the hard part - now it's all downhill from here. We've got all integers we haven't assigned to edges as zeros in the ae BitSet, and we know that the edges between vertices in the non-critical group are just single chains (i.e case 3 above). We'll therefore do a breadth-first search of the vertices starting at the critical ones, and every time we go from a critical to a non-critical vertex or go from one non-critical vertex to another we'll assign integers to those non-critical vertices so that the edge between them is the next edge unassigned in the ae set:

242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
⟨Label Non Critical Nodes Helper⟩+≡ 
private static void assignIntegersToNonCriticalVertices(Graph graph, int[] g, BitSet ae, BitSet criticalNodes) {
    List<Integer> toProcess = new LinkedList<Integer>();
    for(int v = criticalNodes.nextSetBit(0); v != -1; v = criticalNodes.nextSetBit(v+1)) { toProcess.add(v); } // load with the critical vertices
    BitSet visited = (BitSet) criticalNodes.clone();
    processNonCriticalNodes(toProcess, graph, ae, visited, g); // process the critical nodes
    // we've done everything reachable from the critical nodes - but
    // what about isolated chains?
    for(int v = visited.nextClearBit(0); v != -1 && v < g.length; v = visited.nextClearBit(v+1)) { 
        toProcess.add(v);
        processNonCriticalNodes(toProcess, graph, ae, visited, g);
    }    
}
/** process everything in the list and all vertices reachable from it */
private static void processNonCriticalNodes(List<Integer> toProcess, Graph graph, BitSet ae, BitSet visited, int[] g) {
    int nextEdge = ae.nextClearBit(0);
    while(!toProcess.isEmpty()) {
        int v = toProcess.remove(0);
        if(v < 0) continue; // there are no critical nodes
        if(graph.adjacencyList[v] != null) {
            for(int adjacent : graph.adjacencyList[v]) {
                if(!visited.get(adjacent) && v != adjacent) { // shouldn't have loops - only if one key 
                    // we must give it a value
                    g[adjacent] = nextEdge - g[v]; // i.e. g[v] + g[a] = edge as needed
                    toProcess.add(adjacent);
                    ae.set(nextEdge);
                    nextEdge = ae.nextClearBit(nextEdge + 1);                    
                }
            }
        }
        visited.set(v);
    } 
}
⟨Put Numbers in g⟩+≡
assignIntegersToNonCriticalVertices(graph, g, ae, criticalNodes); // this can't fail

And that's it! Every vertex has a value so our graph is complete. We'll just return it, wrapped in the Equivalence we made above:

282
283
⟨Put Numbers in g⟩+≡ 
return new BMZ<E>(seed1, seed2, g);

To make this code into a useful library we'll add an public static method that chooses the hash algorithm and fills in some of the default parameters:

289
290
291
292
293
⟨Generic Method⟩+≡
/** makes a perfect hash function for the given set of keys */
public static <E> Equivalence<E> create(Collection<? extends E> keys) {
    return createBMZ(keys, 100, 1.15);
}

And here's the overall framework of the class:

299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
⟨com/googlecode/perfecthashes/PerfectHashes.java:*⟩+≡
package com.googlecode.perfecthashes;
⟨Imports⟩
public final class PerfectHashes {
    /*
    ⟨First Draft⟩
    */
    ⟨Returned Equivalence⟩
    ⟨BMZ Method⟩
    ⟨Generic Method⟩
    // Helpers
    ⟨Graph Utilities⟩
    ⟨Find Critical Nodes Helper⟩
    ⟨Label Critical Nodes Helper⟩
    ⟨Label Non Critical Nodes Helper⟩
}

And some final Java boilerplate:

320
321
322
323
324
325
326
327
328
329
330
⟨Imports⟩+≡ 
import java.io.Serializable;
import java.util.ArrayList;
import java.util.BitSet;
import java.util.Collection;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;
import com.google.common.base.Equivalence;
⟨Give Up⟩+≡
throw new IllegalStateException("giving up - perfect hashcode too hard to find!");

And we're finished! The code's here and you can use it in a maven project by adding the dependency:

<dependency>
  <groupId>com.googlecode.perfect-hashes</groupId>
  <artifactId>perfect-hashes</artifactId>
  <version>1.0.1</version>
</dependency>

Friday, 9 March 2012

A Literate Programming Tool

This is the last blog post I'll be hand-writing html for. I'd like a tool that can take a markdown document (as they're easy to write) and either "weave" it into a legible html document or "tangle" it into source code. There are loads of tools that do this already - but I want one that requires a minimum of markdown syntax, can understand (read: syntax highlight) many different language, and most importantly produce more than one source file from a single markdown document (for polyglot programming, or code & config, or antlr grammars and other DSLs). I don't want to re-invent any more of the wheel than I have to, so I'll use redcarpet to do the markdown parsing, trollop for a command-line interface and coderay to generate syntax highlighted html.

I'll start with a basic overview of the functionality:


opts = Trollop::options do
  opt :weave, "Produce documentation", :short => 'w'
  opt :tangle, "Produce code", :short => 't'
  opt :outputdir, "Directory to write files to", :default => Dir.pwd, :short => 'o'
  opt :lang, "Default language of code", :default => "ruby", :short => 'l'
  opt :files, "Files to process", :type => :string, :short => 'f', :required => true
end

It's pretty straightforward; I want to tangle and weave in a single command as well as one or the other, and want to be able to override the output directory to aid integration into build chains. Coderay can't guess what (programming) language a piece of text is in, so I'll have to either specify it in the markdown (as part of a fenced code block or use some default). I'll add some other markdown extensions that seem useful:


html_opts = {
  :fenced_code_blocks => true, 
  :superscript => true, 
  :tables => true, 
  :lax_html_blocks => true, 
  :strikethrough => true
}

The markdown must specify how to glue the various bits of code together. I like to think of it in terms of "anchors", or as a tree of bits of code. Each section of code could have one or more "anchors", or named markers, and should say which marker it should be attached to. Each output source file has an artificial root anchor (called *). I'll define anchors using ampersands (as that seems to be a common feature of literate tags in other implementations), so something like @My Anchor Name. I'll also indicate that a section of code should be added to an anchor using anchor-name-plus-equals, e.g. @My Anchor Name@ += .... A markdown code block can have zero or more of these plus-equals, and any code before the first plus-equals should be added to the root (i.e. *) anchor of a file with the same name as the markdown document, but with a file extension corresponding to the default language (let's call this the "default source file"). You can add a block to the root of a specific source file by prepending the source file path (including folders) to the *, so something like @my_folder/my_output.c:*@ += .... We'll find these tags in the code using regexes:


$allowed = '(\w| )*|((.*:)?\*)'

And we'll have a poor-but-simple way of guessing the output extension for the default source file: a hardcoded hash keyed by the default language (config files, command line overrides or generally anything not hard-coded would be useful here - I haven't found a library that does this):


$ext_for_lang = {
  :ruby => 'rb',
  :c => 'c',
}

And as we should create any directory structure needed to write the source files to the locations given in their root tags, I'll add a helper (this should really be an optional argument to File.open - maybe I should monkey-patch it?):


def write_to path, data
 FileUtils.mkdir_p(File.dirname(path))
 File.open(path, 'w') {|f| f.write(data)}
end

I'll do "weaving" (conerting to HTML) first. I'll use redcarpet's XHTML renderer for the most part - we just have to add some highlighting to the code blocks. I'll also use Coderay's :line_number_start feature to print the line numbers in the markdown document. There's not any obvious helper method that redcarpet can give me (it's written in C, and only exposes some lightweight ruby object wrappers, so there's not much in the way of utility functions - or even any way to interact with the parsing process that I can see), so I'll just count the new lines between the start of the markdown document and this fragment of code (and yes, this is very inefficient and potentially incorrect if the code block is copy+pasted - maybe I should add a new-line counter to every callback). There's one really big problem: "@" symbols and unicode. Any "@" passed to Coderay get highlighted (i.e. surrounded by HTML tags), which will interfere with the anchor display. I want to display the tags as left- and right-angle brackets (a bit like the latex output of Knuth's web), so I'll have to substitute dummy strings in place of the "@"s before passing the code to Coderay for formatting and then replace the dummy strings with the HTML entity codes for the symbols I want. Ideally I'd just pass the unicode equivalent of the left- and right-angle brackets, but that gets tagged by Coderay too.


class Weave < Redcarpet::Render::XHTML
  attr_accessor :default_lang, :original_text
  def block_code(code, lang)
   l_ang, r_ang, equiv = "__lang__", "__rang__", "__equiv__"
   line_num = @original_text[0,@original_text.index(code)].count("\n")+1
    code = code.
      gsub(/@(#{$allowed})@\s*\+=/,"#{l_ang}\\1#{r_ang}+#{equiv}").
      gsub(/@(#{$allowed})@/,"#{l_ang}\\1#{r_ang}")
    code = CodeRay.
      scan(code, lang.nil? ? @default_lang : lang.to_sym).
      html(
        :wrap => :div, 
        :css => :style, 
        :line_numbers => :table,
        :line_number_start => line_num)
    code.
      gsub(/#{l_ang}(#{$allowed})#{r_ang}(\s*)\+#{equiv}/,'⟨\\1⟩+≡').
      gsub(/#{l_ang}(#{$allowed})#{r_ang}/,'⟨\\1⟩')
  end  
  def codespan(code); block_code(code,nil) end
end

The code that calls this is pretty standard boilerplate: create a parser, iterate over the input markdown files, render them to HTML & write the output to an HTML file with the same name as the markdown file (with an HTML file extension tacked on):


if opts.weave
  r = Redcarpet::Markdown.new(Weave, html_opts)
  r.renderer.default_lang = opts.lang.to_sym
  opts.files.split(',').each{|file|
    code = File.open(file, 'r'){|f|f.read}
    r.renderer.original_text = code
 html = r.render(code)
 write_to("#{opts.outputdir}/#{file}.html", html)
  }
end

The tangle code's a bit more interesting. We have to build up a hash of what strings of code are attached to each anchor, and then run a pass after processing every file to construct the output source files (fully & recursively resolving all the anchors in the code blocks). The normalise method converts the default root node * into a root node with the default source file path - so it needs to know what the markdown file name is (@file_no_ext) and what language the code is in (if it's not given) (@default_lang).

The chunk processing code splits the code block into sections to be added to different anchors (including a synthentic anchor-add for everything before the first anchor). It picks up the start index and the length of the anchor tokens so we can remove them from the code blocks - they won't be valid c/ruby/&c. code. There's also a synthetic anchor-add at the end to make the iteration easier.


class Tangle < Redcarpet::Render::Base
  attr_accessor :default_lang, :links, :file_no_ext
  def block_code(code, lang)
    chunks = 
      [{:start => 0, :anchor => '*', :anchor_len => 0}] + 
      code.scan(/(@(#{$allowed})@\s*\+=)/).map{|x|
        {:start => code.index(x[0]), :anchor => x[1], :anchor_len => x[0].length}
      } +
      [{:start => code.length}]
    (1..chunks.length-1).each{|index|
      last, this = chunks[index-1], chunks[index]
      @links[normalise(last[:anchor],lang)] << code[last[:start] + last[:anchor_len], this[:start]]
    }
    nil
  end
  def normalise(link_name, lang)
    if link_name == '*'
      "#{@file_no_ext}.#{$ext_for_lang[lang.nil? ? @default_lang : lang.to_sym]}:*"
    else
      link_name
    end
  end
  def codespan(code); block_code(code,nil) end
end

The last bit of code does the tangling. As in the weaving code it uses redcarpet to process each file, but this time we don't care about the output. Instead we're left with the links Hash which has all the sections of code to be added to each anchor point. The (normalised) root anchor(s) will also be keys to this hash. There's one final step - we'd like to just extract the roots and write their arrays of code snippets to a file. As these snippets may still have anchors in we should remove the anchor tokens and replace them with any snippets that should be attached to those anchors. We should do this recursively until no anchors are left. My implementation isn't very safe - you could easily make it recurse infinitely or forget to attach snippets to an anchor due to a typo. It also doesn't warn you if anchors don't have any code attached to them - another potential sign of misspelt anchor names.


if opts.tangle
  links = Hash.new{|h,k|h[k]=[]}
  r = Redcarpet::Markdown.new(Tangle, html_opts)
  r.renderer.default_lang, r.renderer.links = opts.lang.to_sym, links
  opts.files.split(',').each{|file|
    r.renderer.file_no_ext = file[0,file.rindex('.')]
 r.render(File.open(file, 'r'){|f|f.read})
  }
  resolve = lambda{|parts| parts.join('').gsub(/@(#{$allowed})@/) {|match|resolve.call(links[match[1..-2]])} }
  links.keys.reject{|k|!k.end_with? ':*'}.each{|root|
 write_to(
   "#{opts.outputdir}/#{root[0..-3]}", 
   resolve.call(links[root]))
  }
end

So that's it! An example: the following markdown...


### my file ###
add some *text*:

~~~~~
@*@ +=
int main() { 
  @Do Stuff@
  @Do More Stuff@
  return 0; 
}
~~~~~

and some more text. we should probably say what dostuff is; it's:

~~~~~
@Do Stuff@ +=
int* me = &1;
*me++;
~~~~~

was that clearer? More stuff is the same as the first:

~~~~
@Do More Stuff@ += @Do Stuff@
~~~~

...can be woven into the following html using literate_md --lang=c --weave -f example.md into...

my file

add some text:

5
6
7
8
9
10
⟨*⟩+≡
int main() { 
  ⟨Do Stuff⟩
  ⟨Do More Stuff⟩
  return 0; 
}

and some more text. we should probably say what dostuff is; it's:

16
17
18
⟨Do Stuff⟩+≡
int* me = &1;
*me++;

was that clearer? More stuff is the same as the first:

24
⟨Do More Stuff⟩+≡ ⟨Do Stuff⟩

...or tangled using into...


int main() { 
  
int* me = &1;
*me++;

   
int* me = &1;
*me++;


  return 0; 
}

I've put this script as a gem here - let me know if you have any improvements!

Monday, 30 January 2012

Triple Buffering as a Concurrency Mechanism (II)

As a commentor on Reddit has pointed out, this implementation is flawed. The following steps illustrate how one of the buffers change be orphaned (0, 1 and 2 refer to the indices within the buffer array):

MacroLine tmp1tmp2CleanDirtySnap
(initial) 012
TRIPLE_BUFFER_FLIP_WRITERtmp1 = clean 0012
TRIPLE_BUFFER_NEW_SNAPtmp2 = clean 00012
TRIPLE_BUFFER_NEW_SNAPclean = snap 00212
TRIPLE_BUFFER_NEW_SNAPsnap = tmp2 00210
TRIPLE_BUFFER_FLIP_WRITERclean = dirty 00110
TRIPLE_BUFFER_FLIP_WRITERdirty = tmp1 0100

...and we can no longer access buffer two. The problem comes from trying to order two sets of three operations so that pretty much any combination leaves us in a valid state. My revised solution (which will fix the problem of double-snapping in the first post) will use a form of Optimistic Concurrency Control - basically in the two main macros we take a copy of the pointers, do the relevant swap and then atomically overwrite the three pointers only if nothing else has changed them since we took our copy. As three pointers are too big for a single atomic operation, we'll compact them down into three indices into the buffer array, encoded in three pairs of bits in an int:

/* bit flags are (unused) (new write) (2x dirty) (2x clean) (2x snap) */
#define TRIPLE_BUFFER_TYPE(TYPENAME, TYPE) \
  struct TYPENAME { \
    TYPE buffer[3]; \
    volatile uint_fast8_t flags; \
  }

Note that they are now volatile too - we don't want the compiler to optimise out checks to see if they've changed. I've also moved the to the end of the struct - I'm beginning to think about alignment and I'm assuming if the struct is aligned in a useful way I don't want to misalign the three buffers by the 8 bits of the flag array. Initialising is a bit more simple:

/* initially dirty = 0, clean = 1 and snap = 2 */
#define TRIPLE_BUFFER_NEW(NAME,TYPENAME) \
  struct TYPENAME NAME; \
  NAME.flags = 0x6;

We have to update our accessors to get the index of the clean or snap buffer out of the flags, and return a pointer to that element of the array:

#define TRIPLE_BUFFER_SNAP_PTR(BUFFER) &BUFFER.buffer[BUFFER.flags & 0x3]
#define TRIPLE_BUFFER_WRITE_PTR(BUFFER) &BUFFER.buffer[(BUFFER.flags & 0x30) >> 4]

Google helps with binary-to-hex conversions. The meat of this problem is still the flip and snap operations. I've added an extra bit flag that is set by the writer and cleared by the reader. The reader can use this to determine if there have been any writes since its last snap, and if not it won't take another snap. This means if the macro is called twice it won't snap and "un-snap" as the previous implementation did:

#define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      if((flagsNow & 0x40)==0) break; \
      newFlags = (flagsNow & 0x30) | ((flagsNow & 0x3) << 2) | ((flagsNow & 0xC) >> 2); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

The I-have-written flag is in the seventh most significant bit (0x40) and the newFlags bit-shifting logic is just building an int by or'ing the extracted indices after shifting them to their new positions. The flip method is similar:

#define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      newFlags = 0x40 | ((flagsNow & 0xC) << 2) | ((flagsNow & 0x30) >> 2) | (flagsNow & 0x3); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

And you can add the following to the bottom of the unit test to prove the snap-unsnap problem's been fixed:

*TRIPLE_BUFFER_WRITE_PTR(it) = 7;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 8;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);
TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);