Nov 9 2011

Remote controls under linux still aren’t easy

Jonathan Ricketson

Perhaps unfairly, I had a lot of trouble getting a new remote control infrared receiver set up to control my XBMC media center.
I eventually ended up using some advice from this guide. But getting there wasn’t easy.

Part of the problem was that this remote control receiver registers itself as a keyboard. This isn’t a problem really, it just made me confused.

Another part of the problem is that most of lirc is now in the kernel. But, I guess this is pretty recent, and no one talks about it. So most of the lirc adventure was a waste of time (I sure understand it a lot better now though).

Anyway, it is all running now, and it ended up being super simple (isn’t everything after you figure it out?)


May 21 2011

Coffeescript equivalent of Ruby’s ||= (conditional assignment)

Jonathan Ricketson

Ruby has a nice syntax for replacing the value of a variable if the variable is ‘falsy’. It is often used to cache expensive calculations.
it looks like:

def get_expensive_value
  @value ||= do
    # some expensive calculation
  end
end

The equivalent in Coffeescript is:

getExpensiveValue: ->
  @value ?= ( =>
    # some expensive calculation
  )()

Nov 21 2010

Dijkstra’s shortest path algorithm in Coffeescript

Jonathan Ricketson

I (thought) I had a need for a shortest path implementation across a graph. Turns out I don’t, but here it is in case it is useful to someone else.

It is adapted from a javascript implementation.

To use: new Dijkstra().find_path(source_node, dest_node)
source_node and dest_node should have methods called “adjacent_nodes” that knows about the nodes that they are attached to.
This implementation assumes equal weighting on all links.

class Dijkstra
 
  constructor: () ->
    @predecessors = {}
 
  single_source_shortest_paths: (s, d) ->
    costs = {}
    costs[s] = 0
 
    open = {'0': [s]}
 
    keys = (obj) ->
      keys = []
      for key of obj
        keys.push(key)
      keys.sort(sorter)
      keys
 
    sorter = (a, b) ->
      parseFloat(a) - parseFloat(b);
 
    add_to_open = (cost, v) ->
      key = '' + cost;
      open[key] ?= [];
      open[key].push(v);
 
    while (open)
      # In the nodes remaining in graph,
      # find the node, u, that currently has the shortest path from s.
      key = keys(open).pop()
      break if ! key?
      bucket = open[key]
      u = bucket.shift()
 
      delete open[key] if (bucket.length == 0)
 
      # Current cost of path from s to u.
      w_of_s_to_u = parseFloat(key)
 
      # v is the node across the current edge from u.
      for v in u.adjacent_nodes()
        w_of_s_to_v = w_of_s_to_u + 1
 
        costs[v] ?= Infinity
 
        if costs[v] > w_of_s_to_v
          costs[v] = w_of_s_to_v
          add_to_open(w_of_s_to_v, v)
          @predecessors[v] = u
 
        # If a destination node was specified and we reached it, we're done.
        if v.hash == d.hash
          open = null
          break
    costs[d] ? Infinity
 
  extract_shortest_path_from_predecessor_list: (d) ->
    nodes = [];
    u = d
    while (u)
      nodes.push(u)
      u = @predecessors[u]
    nodes.reverse()
 
  find_cost: (s, d) ->
    @single_source_shortest_paths(s, d)
 
  find_path: (s, d) ->
    cost = @single_source_shortest_paths(s, d)
    if cost < Infinity
      return @extract_shortest_path_from_predecessor_list(d)
    throw new Error("Could not find a path from #{s} to #{d}.")

Feb 19 2010

Evolvethefuture is live

Jonathan Ricketson

Finally EvolveTheFuture is live. There is still plenty to do, but now it has a reasonably stable API, and is ready for other people with enough documentation for people other than me to get started.

I am going to work next on community building, because it is going to be a lot more interesting if there are more people involved.

A excerpt from the EvolveTheFuture homepage:

EvolveTheFuture is an artificial life (Alife) environment that challenges you to write a better animal. It is a simple challenge with infinitely complex and interesting solutions.

Animals can move about eating other animals and are in turn eaten. Animals can reproduce and the children are usually like the parent, but sometimes they are a little different. And in that little difference evolution is allowed.

Evolution happens slowly though, so more interesting things might be able to be created through the imagination of people interested in natural competition. Animals are written in a custom assembly code and run in a javascript environment in your browser – and everyone else’s who opens this page.

The best animals are those that create a more successful species.

The best creators are those that create a successful dynasty.

Please note that this site works best in Firefox 3.5 and above. Chrome 3 and above and Safari 4 and above. It may not work at all in other browsers.

So, please come along and see what is happening. If you have any friends that you think might be interested, tell them too. And please tell me if you have any ideas or problems.


Feb 8 2010

Memcache lockless queue implementation (v2)

Jonathan Ricketson

The new Google App Engine 1.3.1 SDK has an added method on memcache called “grab_tail”. This is a great little method that removes the need for my lockless queue to manage the read counter. We still need to manage the write counter though, because there doesn’t seem to be a way to add items to a memcache namespace with an arbitrary name.

I have also added a way to rollback in the case that your processing of queue items encounters an error. I was doing the reads non-destructively earlier, so I just needed to reset the read-counter to an appropriate point, but now (because grab_tail is destructive), we need to re-queue the messages.

<rant>

In this implementation, the method that gets the next write counter “__nextCounter”, is far more complicated than it should be, because of an outstanding defect on memcache. This defect is probably quite simple to fix, but since I reported it almost 6 months ago, it hasn’t even been acknowledged. I do really wonder about the value of an issue tracker to a community that doesn’t use it. If the google team are not going to use the issue tracker, it would be far better to acknowledge that and get rid of it completely.

</rant>

Here it is

from google.appengine.api import memcache
import logging
 
class Queue(object):
    itemPrefix="queueItem"
    writeCounter="writeCounter"
    def __init__(self, queueName):
        self.name = queueName
 
    def write(self, msg):
        counter = self.__nextWriteCounter()[0]
        msgKey = self.itemPrefix + str(counter)
        if not memcache.add(msgKey, msg, namespace=self.name):
            raise QueueException("msg key already existed: %s" % msgKey)
        logging.debug("wrote to %s:%s" % (self.name, msgKey))
 
    def writeMulti(self, messages):
        if len(messages) == 0:
            return
        mapping={}
        counters = self.__nextWriteCounter(len(messages))
        for msg,counter in zip(messages,counters):
            mapping[self.itemPrefix + str(counter)]=msg
 
        self.writeMulti(memcache.add_multi(mapping, namespace=self.name))
 
    def read(self):
        result = memcache.grab_tail(item_count=1, namespace=self.name)
        if len(result) > 0:
            return result[0]
        else:
            return None
 
    def readMulti(self, maxItems=100):
        return memcache.grab_tail(item_count=maxItems, namespace=self.name)
 
    def requeueMessages(self, messages):
        self.writeMulti(messages)
 
    def __currentWriteCounter(self):
        return self.__currentCounter(self.writeCounter, 0)
    def __nextWriteCounter(self,howmany=1):
        return self.__nextCounter(self.writeCounter, howmany=howmany)
 
    def __currentCounter(self, key, default):
        counter = memcache.get(key, namespace=self.name)
        if not counter:
            memcache.set(key, default, namespace=self.name)
            counter = default
        return counter
 
    def __nextCounter(self, key, howmany):
        counter = memcache.incr(key, namespace=self.name+"c", delta=howmany)
        if counter is None:
            if not memcache.add(key, howmany, namespace=self.name+"c"):
                # handles the case where another thread got in first with the add
                counter = memcache.incr(key, namespace=self.name+"c",delta=howmany)
                if counter==None:
                    raise QueueException("could not increment counter: %s" % key)
            else:
                counter = howmany
        return range(counter-(howmany-1),counter+1)
 
class QueueException(Exception):
    """Base APIProxy error type."""

and to use it:

queue=queue.Queue(clientViews.QUEUE_NAME)
def processMessages(request):
    msgs=[]
    try:
        msgs = clicksAndViewsQueue.readMulti(maxItems=100)
        if len(msgs) > 0:
            for msg in msgs:
                processMessage(msg)
 
            taskqueue.addTask(url=viewUrl, queueName=taskqueue.BACKGROUND_QUEUE)
    except Exception, e:
        clicksAndViewsQueue.requeueMessages(msgs)
        raise

Oct 14 2009

Money database property for Google App Engine

Jonathan Ricketson

Calculating money is a tricky thing. Your calculations have to be ultra-precise, so no storing things as floats where $1.33 might actually be stored as 1.3299999999. That is no good for calculations… But equally it is no good storing in cents either: what is 133c /2?

Python has a Decimal datatype, but this requires serialisation to String. Nick Johnson from the Google App Engine team wrote a post about how to write a Decimal property for Google App Engine. Unfortunately this because this serialises to String, then when you do any sorting, then you get String sorting: 100 comes before 11, which comes before 20. Bummer. I played around with storing numbers with a bunch of leading zeroes ie: 0000000010. But that starts to feel a bit hacky.

So I wrote a Money database property that has 6 places of precision, and works as a normal Python numeric type. I haven’t implemented all of those methods, just the ones that I needed. I would be happy to take feedback though.

To use:

class Transaction(db.model):
    dateOccurred = db.DateProperty(auto_now_add=True)
    description = db.StringProperty(required=True)
    amount = MoneyProperty(required=True)
 
t = Transaction(description="I got some money", amount=Money(10.34))
from google.appengine.ext import db
 
class Money(object):
	multiple = 1000000.0
	def __init__(self, val, multiply=True):
		if multiply:
			self._intVal = int(float(val) * self.multiple)
		else:
			self._intVal = int(val)
 
	def format(self, places=2):
		return "%.*f" % (places, float(self))
 
	def __float__(self):
		return self._intVal / self.multiple
 
	def __repr__(self):
		return "%.06f" % (self._intVal / self.multiple)
 
	def __mul__(self, other):
		if type(other) == Money:
 			return Money((self._intVal * other._intVal) / self.multiple, False)
		else:
			return Money(self._intVal * other, False)
 
	__rmul__ = __mul__
	def __add__(self, other):
		if type(other) == Money:
 			return Money(self._intVal + other._intVal, False)
		else:
			return Money(self._intVal + (other * self.multiple), False)
 
	__radd__ = __add__
 
	def __cmp__(self, other):
		if other == None:
			return 1
		elif other == "":
			return 1
		elif type(other) != Money:
			return self._intVal - other*self.multiple
		return self._intVal - other._intVal
 
	def __sub__(self, other): 
		return Money(self._intVal - other._intVal, False)
 
	def __rsub__(self, other): 
		return Money(other._intVal - self._intVal, False)
 
	def __div__(self, other): 
		if type(other) == Money:
			return Money((self._intVal * self.multiple) / other._intVal, False)
		else:
			return Money(self._intVal / other, False)
 
	def __rdiv__(self, other): 
		if type(other) == Money:
			return Money((other._intVal * self.multiple) / self._intVal, False)
		else:
			return Money((other * self.multiple * self.multiple) / self._intVal, False)
 
	def __neg__(self): 
		return Money(self._intVal * -1, False)
 
class MoneyProperty(db.Property):
    data_type = Money
 
    def get_value_for_datastore(self, model_instance):
    	value = super(MoneyProperty, self).get_value_for_datastore(model_instance)
    	if value==None:
    		return None
    	elif isinstance(value, Money):
    		return value._intVal
    	else:
    		return Money(value)._intVal
    def make_value_from_datastore(self, value):
    	if value==None:
    		return None
    	else:
    		return Money(value, False)
 
    def empty(self, value):
    	return value == None
 
	def get_value_for_form(self, instance):
		value = super(MoneyProperty, self).get_value_for_form(instance)
		if not value:
			return None
		if isinstance(value, Money):
			return float(value)
		return value
 
	def make_value_from_form(self, value):
		if not value:
			return []
		if isinstance(value, Money):
			return Money(value)
		return value

Sep 11 2009

Google App Engine migration script (v2)

Jonathan Ricketson

After talking about theprevious version of my migration script, I had need to make some significant changes to it. These changes support loading model classes that do not validate in their current model version. I.E. if you have added a new required field, or have renamed a field then you can make these changes using this script.

This uses the underlying Query and Entity types, so no Model constraints are enforced. But after you make all the changes to the entity, it is then loaded into the Model class to get any defaults applied and validation.

So the main features of this migration approach is:

  • Uses the task queue for paging over all specified Model classes
  • Sets defaults from the Model definition
  • Allows adding/removing modifying of fields through a Dictionary like object (the Entity)

If you just want to apply any new defaults, you don’t need to do anything, other than specify the model class in the list to migrate. Any classes without a MigrationWorker are loaded into their Model class and have defaults and validation applied through that mechanism.

If there are specific things that you want to do to the object then you need to create your own MigrationWorker.

class ads_fooWorker(MigrationWorker):
    kind = "ads_foo"
    def processItem(self, item):
        logging.info("processing %s %s" % (item.kind(), item.key()))
        logging.info(item)
        if "views" in item and type(item['views']) == type([]):
            item['views'] = sum(item['views'])
        elif "views" not in item:
            item['views'] = 0
        if "clicks" in item and type(item['clicks']) == type([]):
            item['clicks'] = sum(item['clicks'])
        elif "clicks" not in item:
            item['clicks'] = 0
        super(ads_fooWorker, self).processItem(item)

In this class we are migrating a model class that (because we are using AEP) is called ads_foo. The underlying Entity object that we are operating on is a Dictionary like object, so we can look for keys, add keys and ‘del’ keys. As you can see here, we are changing a field from being a List of Integers to being a single Integer. This would not be possible if the object were loaded into the Model class.

The thing to be careful of (if using the taskQueue) is to make sure that you check if you have already made the changes as you are not guaranteed that your task will not be run twice. So make sure that your MigrationWorker checks if it needs to do it’s work.

The full code is here. To do my migration I usually add my MigrationWorkers directly to this script.

import logging
 
from django.http import HttpResponse
 
from google.appengine.api import datastore
from google.appengine.ext import db
from google.appengine.api.labs import taskqueue
from google.appengine.runtime import apiproxy_errors
 
"""migrate from version x to version y"""
def migrate(request):
    modelsToMigrate = ["ads_foo",
                     "ads_bar"]
    [_addTask(url=Worker.worker_url % i) for i in modelsToMigrate]
    return HttpResponse("created all tasks for processing")
 
#General Migration
def migrateModel(request, model_name):
    #create the worker class and tell it to work
    workerName = '%sWorker' % model_name
    if not workerName in globals():
        logging.info("no worker for %s" % model_name)
        Worker = generateWorker(model_name)
    else:
        Worker = globals()[workerName]
    if "start" in request.REQUEST:
        Worker(request.REQUEST['start']).work()
    else:
        Worker().work()
    return HttpResponse("ok")
 
class MigrationWorker(object):
    ITEMS_TO_FETCH = 10
    worker_url = "/worker/migrate/%s"
    def __init__(self, startKey=None):
        self.startKey = startKey
 
    def work(self):
        query = datastore.Query(self.kind)
        if self.startKey:
            query['__key__ &gt;'] = db.Key(self.startKey)
        items = query.Get(self.ITEMS_TO_FETCH)
        if not items:
            logging.info('Finished migrating %s' % self.kind)
            return
 
        last_key = items[-1].key()
        [self.processItem(x) for x in items]
 
        _addTask(url=self.worker_url % self.kind, params=dict(start=last_key))
        logging.info('Added another task to queue for %s starting at %s' %
                     (self.kind, last_key))
 
    """Override this method to do some work for each item
    """
    def processItem(self, item):
        logging.info("processing %s %s" % (item.kind(), item.key()))
        modelClass = db.class_for_kind(item.kind()).from_entity(item)
        modelClass.put()
 
def generateWorker(kind_name):
    class DynamicClass(MigrationWorker):
        kind = kind_name
    return DynamicClass
 
class ads_fooWorker(MigrationWorker):
    kind = "ads_foo"
    def processItem(self, item):
        logging.info("processing %s %s" % (item.kind(), item.key()))
        logging.info(item)
        if "views" in item and type(item['views']) == type([]):
            item['views'] = sum(item['views'])
        elif "views" not in item:
            item['views'] = 0
        if "clicks" in item and type(item['clicks']) == type([]):
            item['clicks'] = sum(item['clicks'])
        elif "clicks" not in item:
            item['clicks'] = 0
        super(ads_fooWorker, self).processItem(item)
 
def _addTask(url, params={}, queueName='default'):
    try:
        logging.info("add task to %s [%s]" % (queueName, (url, params)))
        task = taskqueue.Task(url=url, params=params)
        task.add(queueName)
    except taskqueue.TransientError, e:
        logging.exception("adding Task failed with a TransientError")
        addTask(url, params, queueName)
    except apiproxy_errors.OverQuotaError, e:
        #but keep going
        logging.exception("adding Task failed with a TransientError")

Jul 22 2009

The people around you make the difference

rob

Over at 37signals, Matt blogged on a topic recently that really resonated with me… the gist of what he said being that if a project or company is made up of a whole lot of people who don’t really know each other, individuals are generally going to play it safer than a group of people who are comfortable with one another who might fight harder to get their point heard. It doesn’t need to be a shit-fight, just an environment where people can be freely passionate and walk away as friends.  Obviously this is a generalisation and there are always people who will say what they feel – I tend to be one of them although that’s somewhat mood dependent. Anyway, the net effect of this can often be mediocrity which can be damaging or at least limiting to a project or company.

Something else that I find i’m often up against is trying to consider how important someone’s job is to them when I set expectations around quality or general awareness of what’s going on in their professional world. As someone who spends silly amounts of time working on code, reading about development / design etc at all hours of the day, I need to keep reminding myself that for many others it’s just a job and they’re happy for it to start at nine and end at five. To loosely tie this in with Matt’s point, it’s about where the line is between profession and passion and the effect of having people around that don’t necessarily care much or are indifferent to what they do.  Personally I find it frustrating and draining. Vigorous debate over things that I truly believe in (software or not) are moments that I live for, so being in situations where that can’t happen is just a little bit soul destroying. The thing with this though is that there are so many levels that you can deliver software on that are all based on the context of the business / project, cost, quality, target audience etc. There is far more work for developers than there are ‘good developers’ to do that work and the fact is that for many situations, near enough just has to be good enough. Personally, that’s just not for me though. Not that I necesarily fall ino this category (yet), but any serious product or company that excels at what they do have no time for that mentality. That’s what sets them apart.


Jun 28 2009

Google App Engine version migration

Jonathan Ricketson

When writing your application on Google App Engine, you are inevitably going to deploy a version to production that does not have a final set of features. This is (of course) unavoidable. So, then for version 2, when creating new features you will also (probably) have to refactor the data model, at least adding new fields, and potentially renaming fields, or creating composed entities. So, when putting version 2 into production the data that all of your users have created in version 1 will need to be migrated to the new data-model.

Nothing exists as of now to do something similar to Rails Migrations, so everything pretty much needs to be done yourself.

The solution that I outline here is an automated way of processing over all entities in a list of Model classes. This solution loads the entities up into the v2 model classes and then calls your code to modify them (and create any new objects that might be required). This solution will not work for the case where the v2 model is missing fields that need to be read to migrate to the new state. That solution would need to be written using the underlying Entity classes and Query. (I haven’t needed to do that yet). This is really more of a recipe that can be modified to your purposes, than a generic drop-in migration tool.

The idea came from code copied from a google demo.

Here is the file: migrate.py

Directions for use (these directions are for Django, but if you are using the basic handler it would look very similar):

1. Rename migrate.py to migrateXToY.py (for your X and your Y)

2. Modify your urls.py to contain:

patterns("migrateXToY",
                       (r'^migrate/migrateXToY$', 'migrateXToY'),
                       (r'^worker/migrateXToY/(?P[^/]+)$', 'migrateModel'),
)

3. Create a migrateXToY method in the migrate.py like the following (modelsToMigrate is the list of model names that will be processed):

def migrate1To2(request):
    modelsToMigrate = ['User', 'Foo', 'Bar', 'BarDetail', 'Image', 'Report']
    [taskqueue.add(url='/worker/migrateXToY/%s' % i) for i in modelsToMigrate]
    return HttpResponse("created all tasks for processing")

4. Create a FooWorker class like the following (the migrateItem method will receive each item that is being migrated, the method should return the item after processing and should not put it. For efficiency puts are batched):

class PlacementWorker(MigrationWorker):
    kind=Foo
    kindName="Foo"
    def migrateItem(self, item):
        logging.info("processing %s %s" % (self.kindName, item.key()))
        #whatever processing you need to do
        item.cancelled=False
        return item

Jun 24 2009

Memcache lockless queue implementation

Jonathan Ricketson

I had a need for an application that I am writing on Google App Engine for a way to store jobs and then process them all at once. I found the idea for a Memcached lockless queue and created an implementation of it: queue.py
To write to it (I do this from a view where I want to store some stats about the data that was shown in the view):

thisQueue=queue.Queue(QUEUE_NAME)
def method():
    thisQueue.write(data)

and then later on to read from it. I created a cron job that is executed often.

thisQueue=queue.Queue(QUEUE_NAME)
def cronMethod():
    msg = thisQueue.read()
    while msg:
        processMessage(msg)
        msg = thisQueue.read()

If you are using this on Google App Engine, you should also be careful that you don’t run out of time to execute. If you get a DeadlineExceededError after you have read, but before you have finished processing, then the message might get lost.