Caching asynchronous calculations // Noizeramp

Have you ever seen how file managers preparing the thumbnails in background while you are browsing some folder on your drive? If you will switch to the other folder and then get back to first you will see the thumbnails almost instantly. They were calculated asynchronously and cached. Here’s what the text is about.

This time we will talk a bit about object attributes, caching and asynchronous computations. It is quite a common situation when you have some attribute or set of attributes which are property of the object from one side and aren’t from the other. Let’s examine simple example.

Imagine that you have a list of letters in your mailbox. You wish to calculate the rating of each letter using number of keywords in its text, the fact of whether it’s read or not and some other parameters. The part of the score which is taken for keywords presence relies on keywords list which aren’t the property of each letter. So, in terms of good design there should be a score calculator class which will know about current keywords list and will return the score for any given letter.

We are clear with design. Each time the application needs a score it calls the calculator with letter object passed as parameter and gets the value back. But what if the calculations are expensive? In example above we have the counting of keywords in letter text as part of score computation. It’s pretty expensive task to perform absolutely each time we need the score.

To resolve this performance bottleneck we need some sort of cache which stores the calculated values and invalidates them at the known moments. If the value isn’t in the cache yet it will be calculated and returned, if it is then it will simply will be questioned and returned.

The other tricky part is that if you have lots of objects (letters in your mailbox) waiting for calculator to evaluate new score for each object may be too expensive again. Imagine that you have two sub-folders in your mailbox each of which has thousands of letters. Should your application freeze for a scores recalculation period once user have changed the list of keywords? No way, absolutely not. Here comes another interesting part.

When you have a set of cached values (scores) for your key objects (letters) and the time of recalculation comes (the moment when they become invalid – the moment when the list of keywords changes, for example) several threads can be involved in the scores invalidation procedure. So, when you tell the calculator to invalidate all cached values it orders all worker threads to recalculate values. To maintain the blocking nature of original idea – when you ask for a score the calculator blocks until it’s ready – we need to use the already known SimpleLock introduced in previous article (“Simple cross-thread locks”). Why? Because we have to lock main thread asking for value until the calculation is finished and the calculation lock should start from invalidation command (not from the moment when the object is fetched by the worker thread for processing, because there could be thousands of objects in queue and delay between invalidation command and actual processing could be significant). So, we lock the object in one thread and unlock it after recalculation is finished. It’s simple. Now, let’s put everything together.

What we have by now?

Abstract calculator class which returns some value for some key and knows nothing about actual calculation algorithm.
Set of worker threads observing the queue of recalculation tasks.
Several methods (non-blocking) for invalidation: invalidate(key), invalidateAll()
Method (blocking) for getting value for the key: getValue(key)
Method for removing keys we don’t need any more: remove(key)

That’s all. Let’s start with implementation?

Let us begin with the simple stub for the abstract CachingCalculator:

/**
 * <p>Abstract caching calculator. The calculator knows nothing
 * about how to calculate actual value for a given key, but
 * knows how to take care of caching, multi-threaded computation
 * and invalidation.</p>
 */
public abstract class CachingCalculator<Key, Value>
{
    /**
     * Creates calculator.
     *
     * @param threads   number of invalidation threads.
     * @param queueSize invalidation queue size.
     */
    public CachingCalculator(int threads, int queueSize)
    {
    }

    /**
     * Returns the value corresponding to the key.
     *
     * @param key key.
     *
     * @return value.
     */
    public synchronized Value getValue(Key key)
    {
    }

    /**
     * Deletes key from cache.
     *
     * @param key key.
     */
    public synchronized void deleteKey(Key key)
    {
    }

    /**
     * Notifies the calculator that the cache should be invalidated.
     */
    public synchronized void invalidateAll()
    {
    }

    /**
     * Notifies the calculator that the value of the key
     * should be invalidated.
     *
     * @param key key.
     */
    public synchronized void invalidateKey(Key key)
    {
    }
}

As you can see the, the stub is pretty straight forward. The constructor takes number of worker threads and approximate queue size as parameter. Next, we have method for getting the value for the key and means for telling that the key has gone, requires to be invalidated or whole keys set should be invalidated.

Now, let’s think about the internals. We need the keys be associated with the values – the map will do the job. Next, we need the locking of questioning thread while the value of the key is being computed – we need a cross-thread lock (SimpleLock, for example). As we can’t lock the keys (it will affect the external application too much) and can’t lock the values (we might not have one at the moment of locking) we need some carrier for key-value pair we could lock safely. Here it comes:

/**
 * Lockable holder of the value associated with the key.
 */
private static class Holder<Key, Value> extends SimpleLock
{
    private Key     key;
    private Value   value;
}

Now we can define our map:

private Map<Key, Holder<Key, Value>>        keyHolderMap;

Now we need to think about the actual computation. As we wish our CachingCalculator to be customizable we need it to have abstract method which will be overridden by exact implementation.

/**
 * Calculates the value of the key.
 *
 * @param key key.
 *
 * @return value.
 */
protected abstract Value calculate(Key key);

Next, comes the definition of the ComputationThread. We will have several such threads spinning all the time. Each of them will monitor the queue of tasks and take them from it. Once the thread has a task it delegates the computation to the abstract method above and unlocks the task on completion, so, the threads waiting for a value will be released and continue execution.

/**
 * Calculation thread. Takes holders from the queue and asks calculator to
 * recalculate the value.
 */
private class CalculationThread extends Thread
{
    private BlockingQueue<Holder<Key, Value>> queue;

    /**
     * Creates thread with the queue.
     *
     * @param queue queue of holders for invalidation.
     */
    public CalculationThread(BlockingQueue<Holder<Key, Value>> queue)
    {
        setDaemon(true);
        this.queue = queue;
    }

    /**
     * Main loop.
     */
    public void run()
    {
        boolean interrupted = false;
        while (!interrupted)
        {
            Holder<Key, Value> holder = null;
            try
            {
                holder = queue.take();
                holder.value = calculate(holder.key);
            } catch (InterruptedException e)
            {
                // Someone wishes us to interrupt
                interrupted = true;
            } finally
            {
                holder.unlock();
            }
        }
    }
}

This class is really simple. It’s main purpose is take the holders from queue, recalculate and unlock. Then the cycle repeats until it gets interrupted. The most interesting part is a queue. The queue should be blocking which means that method take() should not return without a value – it should wait for new value to be added to the queue to return. The definition of the queue will be:

private BlockingQueue<Holder<Key, Value>>   invalidationQueue;

Now let’s check the most interesting parts and the rest of implementation can be found in the attached sources.

Here is what the getValue(Key) method looks like:

/**
 * Returns the value corresponding to the key.
 *
 * @param key key.
 *
 * @return value.
 */
public synchronized Value getValue(Key key)
{
    Holder<Key, Value> holder = keyHolderMap.get(key);
    if (holder == null)
    {
        holder = new Holder<Key, Value>();
        holder.key = key;
        invalidateHolder(holder);
        keyHolderMap.put(key, holder);
    }

    holder.lock();
    Value result = holder.value;
    holder.unlock();

    return result;
}

It takes the holder from map and if the holder was there, it copies the value while holding the lock and returns. If the holder isn’t there yet, it means that the key is new. In this case the module creates fresh holder and submits the invalidation request which puts the task into recalculation queue with locking it. The consequent lock (after the if block) will not be established until the calculation is finished and guaranties that calculator behaves exactly as if the value was computed right in this method. You may ask, what’s the benefit? In this case, there are no benefits as we still block until the result is there, but when we deal with invalidation methods (invalidateKey() and invalidateAll()) most probably there will be very modest locking times as several threads will be working at producing updated values for cached keys.

The invalidation part is also very simple:

/**
 * Notifies the calculator that the value of the key
 * should be invalidated.
 *
 * @param key key.
 */
public synchronized void invalidateKey(Key key)
{
    invalidateHolder(keyHolderMap.get(key));
}

/**
 * Invalidates the holder of the value.
 *
 * @param holder holder.
 */
private void invalidateHolder(Holder<Key, Value> holder)
{
    holder.lock();
    try
    {
        invalidationQueue.put(holder);
    } catch (InterruptedException e)
    {
        // Someone wants us to not to do so
        holder.value = calculate(holder.key);
        holder.unlock();
    }
}

We lock holder before doing everything else with it to be sure in exclusive access to the key-value pair. Later we submit it to the queue. It may come that the queue is full and there’s no way to put our task in, then the invalidation will happen in this particular thread and no data will be lost.

The full source can be found in attached archive at the end of the article. Below is a simple usage sample, which illustrates some of the major benefits of this module.

public class Example
{
    public static void main(String[] args)
    {
        CachingCalculator<String, Integer> calc =
            new CachingCalculator<String, Integer>()
        {
            public Integer calculate(String str)
            {
                System.out.println("Calculating: " + str);
                return str == null ? 0 : str.hashCode();
            }
        };

        System.out.println(calc.getValue(null));
        System.out.println(calc.getValue("a"));
        System.out.println(calc.getValue("a"));
        calc.invalidateKey("a");
        System.out.println("Invalidate key");
        System.out.println(calc.getValue("a"));
        System.out.println(calc.getValue("a"));
        calc.invalidateAll();
        System.out.println("Invalidate all");
        System.out.println(calc.getValue("a"));
        System.out.println(calc.getValue("a"));
        calc.deleteKey("a");
        System.out.println("Delete key");
        System.out.println(calc.getValue("a"));
        System.out.println(calc.getValue("a"));
    }
}

Until the next time!