Wednesday 6 June 2012

Dynamic Optimisation: the Next Logical Step?

Dynamic Optimisation covers a set of techniques used to optimise a piece of code at runtime based on the dynamic state and path through the code so far. The canonical example is optimising a loop: while statically-optimised version unrolls several iterations of the loop, the dynamically-optimised version has realised that the loop's predicate can never be true and so removes the loop entirely. Dynamic optimisation is most commonly applied when compiling a higher-level language into machine code, but what if we applied the technique at a higher level of abstraction?

One of my favourite interview questions is 'what implementation of data structure x would fit this use case?'. Maybe you're concentrating on the higher-level guarantees a data structure provides (e.g. uniqueness or ordering), or maybe you're prioritising time-to-market over application performance. If you could specify what implementation of a data structure to use for a given pattern of data access, or a given availability of system resources you could use a data structure that optimises its implementation over its lifetime, trading off predictability and book-keeping overhead to raise the lower bound on performance.

I've written a proof-of-concept to highlight some of the interesting problems this data structure faces. It's a implementation of the java.util.List interface that decides whether to store its data in a LinkedList or an ArrayList. As you probably know, they perform equally for appends (although an ArrayList is amortised O(1) while a LinkedList is O(1) every time), but an ArrayList gives O(1) random access but O(n) prepending while a LinkedList gives O(n) random access and O(1) prepending.

We'll start with the wrapper class, that delegates to the current backing implementation. I'll use Google's Guava to reduce the boilerplate. You could write a slower but more generic implementation with the java.lang.reflect's proxying mechanism, and could probably do this in a single line of a dynamic language.

12
13
14
15
16
17
18
19
20
⟨Dynamic List Class⟩+≡
private static final class DynamicList<E> extends ForwardingList<E> {
  private List<E> implementation = new ArrayList<E>();
  protected List<E> delegate() {
    return implementation;
  }
  ⟨Book keeping⟩
  ⟨Other Methods⟩
}

We'll need to gather data to determine the access patterns. All accesses (additions, replacements, deletions and queries) fall into two categories: accesses at the ends of the list (index 0 and the last index) and other "random" accesses. We'll do this by (tediously) instrumenting the interface's method calls:

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
⟨Other Methods⟩+≡
public boolean add(E arg0) {
  endAccess();
  return super.add(arg0);
}
public void add(int index, E element) {
  accessAt(index);
  super.add(index, element);
}
public E set(int index, E element) {
  accessAt(index);
  return super.set(index, element);
}
public E get(int index) {
  accessAt(index);
  return super.get(index);
}
public E remove(int index) {
  accessAt(index);
  return super.remove(index);
}    

Bulk accesses are marginally more interesting; we'll consider then as one access each (not n) as at an implementation level they only require an O(1) reorganisation of pointers or a )amortised) single call to System.arrayCopy (which is really O(n) - it'd be interesting to see what effects the two cost-counting methods had on the structure's behaviour).

52
53
54
55
56
57
58
59
60
61
62
63
64
⟨Other Methods⟩+≡
public boolean addAll(Collection<? extends E> collection) {
  endAccess();
  return super.addAll(collection);
}
public boolean addAll(int index, Collection<? extends E> elements) {
  accessAt(index);
  return super.addAll(index, elements);
}
public List<E> subList(int fromIndex, int toIndex) {
  accessAt(fromIndex);
  return standardSubList(fromIndex, toIndex);
}

For the actual book-keeping, we'll just have int counters, and will ignore the effects of overflow. Industrial-strength solutions may want to scale the numbers down (or subtract the same constant from each of them) if any one gets too large:

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
⟨Book keeping⟩+≡
private int endAccess = 0;
private int randomAccess = 0;
private void endAccess() {
  ++endAccess;
  evaluateImpl();
}
private void accessAt(int index) {
  if (index == 0 || index == size()) {
    ++endAccess;
  } else {
    ++randomAccess;
  }
  evaluateImpl();
}

Now to the first really interesting problem. The evaluateImpl method might be quite expensive (especially if it decides we need to change the implementation we're using), and once the pattern of access behaviour is obvious we don't really need the evaluation overhead when we're not going to do anything with it. Also, if the access pattern is right on the border of the trade-off between implementations we don't want to end up changing implementations backwards and forwards on every access. We'll therefore want some degree of hysterisis - I'll go for an exponential back-off (so only consider our choice of implementation on the 1st, 2nd, 4th, 8th, 16th, &c. access). This means if the access behaviour changes later in the list's lifetime we'll only react to it very slowly. It's also likely that the pattern of accesses just after the list is created won't be representative - consider a read-only list, where the first n accesses are insertions at the end (so a LinkedList would be ideal) but all accesses after that are random. I'll therefore make the exponential backoff start at 32 - that is the first evaluation of the implementation's suitability will be after 32 accesses:

90
91
92
93
94
95
96
97
⟨Book keeping⟩+≡
private int nextEvaluation = 32;
private void evaluateImpl() {
  if (randomAccess + endAccess == nextEvaluation) {
    ⟨Evaluate Suitability⟩
    nextEvaluation = Math.max(nextEvaluation << 1, 0x7FFFFFFF); // don't overflow
  }
}

And now for the key algorithm - what implementation best fits the metrics we've gathered? I'll do a trivial implementation in-line; you could probably abstract the logic behind a strategy pattern to keep it tidier. I've tried to choose numbers that give sensible results, so if the ratio of random accesses to end accesses is greater than 1:4 I believe an ArrayList will probably perform better.

103
104
105
106
107
108
⟨Evaluate Suitability⟩+≡
if(randomAccess > endAccess / 4) {
  useArrayList();
} else {
  useLinkedList();
}

Writing meaningful micro-benchmarks in any statically-compiled language is very hard, and in Java it's virtually impossible. Despite this, I wrote a loop that first inserts 50,000 random numbers into a list at a random index, then inserts 50,000 (different) random numbers at index 0 of a new list - to get a rough idea about whether my implementation-choosing algorithm above "works". The (meaningless) results are below, with total loop times in milliseconds:

Test ArrayList LinkedList DynamicList
Inserting at random index 613 3,245 648
Inserting at index 0 1,126 11 22

I (rather unethically) adjusted the ratio in my algorithm until the DynamicList picked the right implementation in each benchmark - yet I still think this shows the idea of a dynamically-chosen data structure implementation is workable.

Back to the code - the switches avoid unnecessary work:

123
124
125
126
127
128
⟨Other Methods⟩+≡
private void useLinkedList() {
  if (implementation instanceof LinkedList) return;
  implementation = new LinkedList<E>(implementation);
  ⟨Changed Implementation⟩
}

As evaluating an ArrayList happens relatively infrequently you can do a bit of maintenance. I'll compact the backing array down to remove any unused trailing nulls, which should (marginally) reduce the memory footprint of my process.

134
135
136
137
138
139
140
141
142
⟨Other Methods⟩+≡
private void useArrayList() {
  if (implementation instanceof ArrayList) {
    ((ArrayList<E>)implementation).trimToSize();            
    return;
  }
  implementation = new ArrayList<E>(implementation);
  ⟨Changed Implementation⟩
}

The next, more Java-specific, implementation detail is how do you deal with Iterators (or more generically, long-lived mutable objects that have pointers into the DynamicList's internal structures. I don't want the list itself to keep track of all the objects that have pointers into it as this makes the code less efficient (as every time you change implementation you have to update these pointers - even if these pointers are never going to be dereferenced again) and could lead to memory leaks (as there are hard-to-debug circular references that could even fool a garbage-collector). Instead I'll make all the references one-way, from the Iterators to the DynamicList. Before accessing pointers into the DynamicList's internal state they will check if the DynamicList has changed implementation since the last time they checked: if so their pointers are invalid so they'll have to be refreshed (in this case, by asking the DynamicList for new ones).

148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
⟨Changed Implementation⟩+≡ ++switches;
⟨Book keeping⟩+≡ private int switches = 0;
⟨Other Methods⟩+≡
public ListIterator<E> listIterator(int index) {
  accessAt(index);
  return new DynamicListIterator(implementation.listIterator(index), switches);
}
private class DynamicListIterator extends ForwardingListIterator<E> {
  private ListIterator<E> delegate;
  private int age; // the number of switches the last time we looked at the DynamicList
  DynamicListIterator(ListIterator<E> delegate, int age) {
    this.delegate = delegate; this.age = age;
  }
  protected ListIterator<E> delegate() {
    if (age == switches) { // no changes
      return delegate;
    } else { // the implementation's changed!
      age = switches;
      return delegate = implementation.listIterator(delegate.nextIndex());
    }
  }
}   

I've used an integer counter to keep track of the "age" of the DynamicList, rather than a real timestamp of the last implementation switch. While this has a very unlikely overflow bug (if the implementation switches so many times that the counter wraps round and comes back to the number it started at - the Iterator won't notice its pointers are now invalid), it avoids system calls and is more deterministic (useful when debugging application code using this list).

We'll hide all the implementation details behind a static factory method, just in case we want to change the constructor later (e.g. to pass in hints about the initial size of the backing array of the ArrayList implementation).

177
178
179
180
181
182
183
184
185
⟨com/blogspot/remisthoughts/dynamicoptimisation/DynamicDataStructures.java:*⟩+≡
package com.blogspot.remisthoughts.dynamicoptimisation;
⟨Imports⟩
public final class DynamicDataStructures {
    __lang__Dynamic List Class__rang__
    public static @Nonnull <E> List<E> newList() {
        return new DynamicList<E>();
    }
}

And we'll finish off with all the Java boilerplate needed to make this work:

191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
⟨Imports⟩+≡
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.ListIterator;
import javax.annotation.Nonnull;
import com.google.common.collect.ForwardingList;
import com.google.common.collect.ForwardingListIterator;
⟨Other Methods⟩+≡
public ListIterator<E> listIterator() {
  return listIterator(0);
}
public Iterator<E> iterator() {
  return listIterator(0);
}

Unit testing this code will be quite difficult as there are a lot of edge cases - ideally I'd use Google's Collections API compatibility test framework but there doesn't seem to be an easily-accessible (i.e. in the central Maven repository) compiled version of the code.

In the meantime, you can find this code in my Github repo...

4 comments:

  1. "Unit testing this code will be quite difficult as there are a lot of edge cases - ideally I'd use Google's Collections API compatibility test framework but there doesn't seem to be an easily-accessible (i.e. in the central Maven repository) compiled version of the code."

    In the Guava repository, guava-testlib is where this stuff lives. I don't know if you can use it as a Maven dependency directly, but certainly you can just check it out of the repository.

    ReplyDelete
  2. Interesting :) As for testing, a property based approach might be the way to go, then you randomly generate lots of tests and check the behaviour of the implementation acts identically to a reference.

    At line 91:
    private int nextEvaluation = 32;

    I think you mean uint, otherwise this:

    nextEvaluation = Math.max(nextEvaluation << 1, 0x7FFFFFFF); // don't overflow

    Will overflow, if I'm thinking about it right. Either that or change it to 0x3FFFFFFF so that it's never more than half the maximum value so when shifted it can't overflow.

    ReplyDelete
  3. I think it would be really nice to have an ArrayList that dynamically specializes to storing a certain kind of primitives without boxing, which is what a number of dynamic language implementations like V8 and PyPy are doing.

    BTW: TimSort, the sorting algorithm that is used Python and in Java since Java 7 is using dynamic optimization in its merging subcomponent to adapt to the concrete data in the list at hand: http://en.wikipedia.org/wiki/Timsort#Galloping_Mode

    ReplyDelete