Saturday, July 4, 2015

The EveKit Wayback Machine

We've posted in a few places that we're changing EveKit to implement something called the "Wayback Machine".  That work is finally happening on the Beta site starting tonight.  This post is a more detailed description of what these changes are and how they work.  This post assumes you know what EveKit is, why you might use it, etc.  If you're unfamiliar, you can visit the front page which has a decent summary of the main features.

What is the Wayback Machine?


Simply put, in Wayback Machine mode, EveKit will now retain a history of all changes to any XML API data we're downloading for you.  Previously, the only data which really had a history were things like industry jobs, market orders and the wallet.  Now, every time some piece of data changes we'll version it and save the old version for later reference.  We're also changing the API so that you can view the data at any time in the past.

The main reason we implemented this mode is to get better tracking of account assets over time.  When you're trying to manage a modest sized EVE corporation, it's useful to track how fast you're burning through key resources needed for production.  Likewise, there are several other interesting use cases.  For example, you can view corporation member tracking information over time to track movements, recruitment, etc.  If the feature catches on, I'm sure several more clever people in the community will come up with their own interesting use cases.

How It Works (Technical Details)


The main change to support versioned data is to assign every object a "lifeline".  A lifeline is a series of time stamps which indicate intervals at which an object was live in the system.  To simplify the implementation, we make copies of objects as they change, so a single logical object is actually represented as a sequence of physical objects reflecting the logical object's state as it changed over time.  We implement lifelines by assigning two values to each object in the sequence:

  • lifeStart: the time stamp (inclusive) at which the object first became live; and
  • lifeEnd: the time stamp (exclusive) at which the object was deleted or replaced.
The time at which a given object is live is therefore represented by the interval [lifeStart, lifeEnd).  The latest live version of an object will have lifeEnd set to +infinity.

There are two ways in which a (logical) object can change, each of which modifies its lifeline:
  • Delete: if an object is deleted, then we "end of life" the last object in the sequence, which means we set lifeEnd to the delete time;
  • Update: if an object is updated, then we "end of life" the last object in the sequence and create a new object with lifeStart set to the update time, and lifeEnd set to +infinity. 
We've changed the EveKit API (see below) to make it possible to query the version of a logical object at any given time.  If the query time is t, then this query is just a scan through the object sequence for an object with lifeStart <= t and t < lifeEnd.

Versioning everything raises some efficiency concerns so we've made two main changes to try to keep things performing well:
  1. Queries are optimized for returning the latest live data.  We're assuming the vast majority of API calls will be for current live data, so we've optimized our queries to work best for that case.
  2. We've internally refactored data which changes frequently. For example, most of a Character Sheet is relatively static except for things like "balance".  So we version balance data separately from the rest of the character sheet. This change minimizes the amount of data we have to copy when versioning.

What Happens to Meta Data? 


EveKit allows user defined meta data to be assigned to any object we sync through the XML API.  Meta data is just a set of key/value pairs made up of bounded length strings.  We've chosen to not version meta data, which means the following:
  • Meta data is associated individually with the physical objects which make up the lifeline of a logical object.  That is, meta data is a non-versioned property of each physical object;
  • Meta data may be copied forward depending on how the lifeline changes.  Specifically:
    • If an object is updated, then the meta data for the old object is copied and becomes the meta data for the new object.  This copy is completely independent of the meta data for the old object.
    • If an object is deleted, then later re-added, then the new object will have empty meta data (meta data is not copied from the last live version of the object).
  • Meta data changes independently for all physical objects in the lifeline for a given logical object.  This means you can modify the meta data for a given object without affecting the meta data for any other object currently in the same lifeline. 


API Changes


There are two basic changes to the EveKit client APIs:

  1. Every API call for synced data now includes an optional "at" parameter.  If you specify this parameter, it should contain the time stamp (in milliseconds UTC since the epoch) at which you want to view your data.
  2. Every API result now includes two new fields:
    • lifeStart: the time stamp (in milliseconds UTC since the epoch) at which the returned data became live, or 0 if this data was live when the site was switched over (see below).
    • lifeEnd: the time stamp (in milliseconds UTC since the epoch) at which the returned data was deleted or replaced by a more recent version, or -1 if this data represents the latest live version.
If you omit the "at" parameter, then EveKit will return the latest live version of the requested data.  An API call will either return valid data, or throw an error code if no data was live at the specified time.

For API calls which return a single data item (e.g. a single asset), lifeStart and lifeEnd are a direct view of the current life line for the given object.  For API calls which return multiple data items (e.g. a list of all assets), lifeStart and lifeEnd are bounds:
  • lifeStart = max(lifeStart) for all items returned by the call
  • lifeEnd = min(lifeEnd) for all items returned by the call
The individual items within a returned list may have different individual life lines, but the result is guaranteed to be live in the bound represented by lifeStart and lifeEnd.

A Special Note About Meta Data


As described in the technical details above, EveKit meta data is not actually versioned.  So there are no API changes for meta data and there is no "at" parameter required for meta data calls.  The lifeStart and lifeEnd values in a meta data response can be safely ignored.


How the Roll Out Will Work


The roll out is a multi-step process all occurring during down time:

  1. The data sync scheduler is disabled.  No new API data will be synchronized during downtime.  You can continue to make API calls against EveKit, but the results will be unpredictable.
  2. All current sync data is backed up.
  3. All current sync data is converted to Wayback Machine mode.  Each current data time will be assigned a lifeStart of 0 and a lifeEnd of +infinity.
  4. After a few basic tests, the data sync scheduler will be re-enabled and down time will end.
All current data is therefore considered live at the time of conversion.  The conversion time also places a natural lower bound on the "at" parameter: a query before the time of conversion will always return all the data which was live at conversion time.  As new data is downloaded from the XML API, current data will be versioned as described above.

Final Thoughts


We're very excited about this change, but we're also a bit nervous about any change that touches everyone's data.  So we'll be watching the Beta site pretty carefully over the next week before we decide to make the change to the main site.  We'll continue to post updates to the forums, this blog and the EveKit main site as we get closer to releasing on the main site.

No comments:

Post a Comment