Event sourcing: rainbows and unicorns
A not-so-short overview of what makes Event Sourcing a technique worth trying out.
After having learned about Event Sourcing a little less than two years ago, I was busy absorbing as much information as possible about the topic and gaining some first-hand experience by building a few systems using this approach. Now seems like a good time to summarize that knowledge and hopefully make the topic accessible to more people.
If you haven't heard about Event Sourcing before, you should stay and read this article. It'll give you an overview about the idea behind Event Sourcing (spoiler: it works like git) and a more or less balanced opinion about why (spoiler: it solves a lot of common problems) and when you should use it (spoiler: whenever possible).
If you've used Event Sourcing in practice already, maybe with one of the big frameworks you are welcome to stop reading now, though maybe you'll find a different take on how to structure things in this article.
A little bit of history
Event sourcing is based on a very simple idea: keeping history around instead of only the current state of things. Most likely you have worked with a relational database management system already, such as PostgreSQL or MySQL Every time you run an
UPDATE statement, you are actually throwing data away: whatever data was in the columns you just updated, after the
UPDATE it's gone forever and cannot be retrieved .
Under the Event Sourcing model, you wouldn't execute an
UPDATE statement (because that updates the current state — the current value in a row/column). Instead, you'd only be allowed to execute
INSERT statements — recording a new change somewhere. After all, if you have a complete list of all the changes that have been recorded, it's easy to derive the current state of a given row in your database: you just take the latest version.
With a little bit of hand-waving at (most of) the details, we have a working definition for this article:
Event Sourcing means recording all changes to your data, instead of keeping only the current state of your data.
An event is a record of how your data changed. It maps to user intent, e.g "user logged in".
The classical example for systems that already work this way is your bank account: your bank does not keep track of your account balance only, but rather keeps a ledger of all transactions touching your account. The current balance is then derived by looking at all transactions, from the moment you opened your account to the moment you want to check your balance.
Now that we have a simple definition of what Event Sourcing is, we can look at some of the theoretical benefits:
- time-travel: since we have a full history of all the changes in an event-sourced system, we can inspect the state of the system at any prior point in time. A real-world example: git keeps a history of all the changes to your source code and thus allows you to look at your code from a year ago.
- business intelligence: more often than not, we're interested in how the system arrived at a specific state, not so much in what actual state it is in. Real world example: something is wrong with your application (= invalid/broken state), you check the logs (= record of state changes) to find out how the application ended up in the broken state.
- no mapping required: in an event-sourced system you don't need to map user intent to tables and rows — just log things as they happen. Real world example: posting to a forum with an image needn't be translated into making changes to the tables
mediabut can be just logged  as what it is: "User X posted the following text into forum Y, file Z was attached".
These benefits not only look really nice on paper, but can be a live saver in practice too.
Everything has a price, so unfortunately those benefits don't come for free:
- growing storage needs: never deleting data obviously means that your application will keep using more and more disk space. However, disk space is cheap .
- querying data is slow: if every state change in your application results in an event, you'll end up with a lot of events. Iterating over all of them every time you want to show some data to a user is obviously going to be slow when you have more than a handful of events to iterate over. The solution is to never iterate over the event log when displaying data . How this works is covered later in this article.
- nobody is familiar with this: true, only few people are familiar with this approach. Most developers have been exposed to the concept through the use of version control systems already however.
So, Event Sourcing has downsides, but they are not very grave. In my experience the biggest hurdle seems to be the lack of familiarity, but "biggest" is relative and in absolute terms this has been only a small hurdle — developers could be onboarded in project using this structure in one or two days.
...how do you query data?
The answer is simple: you don't! Instead of querying the event log for the data you need, you just expect the data to be prepared for you, in the shape you need.
But where does that data come from? It has to come from the event log of course, because that is the system of record, the source of truth for the application. The secret lies in inverting the relationship between the client needing data and the source of truth providing the data. Instead of the client asking for data when necessary (pull), we update secondary data structures every time a new event is added to the log (push).
Coming back to the bank account example: once calculated, there is no need to recalculate your account balance until a new transaction happens. Every time a new transaction is recorded, the balance is updated. When you log into your online banking system, you'll just find the balance precalculated for you.
Having a separate set of data structures (let's call them view models) for serving queries has a surprising number of benefits:
- great performance becomes easy: since the source of truth is still the event log, we can design view models as we see fit. For example, it becomes feasible to have a separate view model for each screen in your application, holding exactly the data necessary for displaying that screen.
- flexibility: because view models are derived from the event log, they are expendable. Want to see whether Redis or memcached is faster for your particular workload? No problem! Build two view models, one with Redis, the other with memcached, and compare the results. Need a relational database for convenient ad-hoc queries? Another view model!
- robustness: nothing is lost if there is an error in the code building your view model. Fix the error, process all events again, and your view model looks as if there never was an error in the first place.
- decouples front- and backend: if you're working in Web development, supporting the data needs for many different frontends can be quite a challenge. By having cheap view models, frontend developers get the freedom to decide on the data they need (format, contents, etc) and can work with mock data until the backend developers have implemented a matching view model.
Writing data is another matter, and it is best approached in a structured way, lest you end up with all kinds different data in your event log.
One way that is often described in articles about the topic is using Command objects for explicitly modeling the inputs your system accepts. One of the nice things about command objects is that they give actions a meaningful name (such as "LogUserIn", "InviteToTeam", etc) and map directly to what your users want to do with your application. This helps with focusing on the end users of your software and what they are trying to achieve .
Besides helping with communicating intent, command objects also provide a natural place in the code for handling input parameter validation, cross-cutting concerns, and documentation. Being an explicitly modeled object, commands can be queued for later processing (e.g. in the client, when the client is offline).
When a command is submitted to the application, the application checks whether the command is valid or not. For example, a login attempt for a user that has never signed up should fail for obvious reasons. Note the phrase "a user that has never signed up" — this refers to the history / current state for a given user again.
The requirement to look at the current state of an entity (e.g. a user) when processing a command makes it very tempting to just look at a view model which contains the necessary information and make the decision about accepting or refusing the command. However, the contract between the event log and the view models is loose: a view model is considered discardable, but it cannot be discarded if the application needs it to process commands. This problem is addressed by introducing another component:
Aggregates maintain the necessary state for validating commands. This state is again derived from the event log, albeit with a little twist: the state is derived every time a command that needs the aggregate is executed. This ensures that modifications to the event log are validated against a consistent view of the event log.
It makes sense to scope aggregates narrowly, so that the amount of events that pertains to an aggregate stays as small as possible. Consider the following example:
- you have 10 000 "user signed up" events in your event log
- a user wants to sign up
- you need to find out whether the email address the user wants to use for sign up is already taken or not
If your aggregate is "Users", i.e. the set of all users in your application, you need to iterate over 10 000 events to get an answer to the question of whether the new user's email address has already been taken.
Contrast this with modeling "User", i.e. a single user, as an aggregate. In this case you'd only need to look at the N (possibly 0) events for the user with the email address supplied in the "Sign up" command.
Putting it all together
With aggregates we've covered the final piece of the puzzle. With the final piece in place, we can formulate the general mode of operation for an event sourced application:
- read a command from a client
- find the aggregate(s) the command targets
- replay a subset of the event log through the aggregates
- execute the command
- persist events result from the command in the event store
- return the result to the client (either an error or a set of events)
- notify view models about new events
And that's it! It looks surprisingly similar to how HTTP POST/PUT transactions work already in the traditional N-tier architecture
- accept a request from the client
- look up some information from the persistence layer (usually a relational database)
- run some domain logic using information from the request and from the database lookup
- make changes to the data in the persistence layer
- return result to client
While none of this is fundamentally new, it might take some time to get accustomed to all the different terms and the relationships between them. The best way to learn more is to get your own experiences . Play around with the idea and build a few toy projects, see whether you run into problems and if this is way of building things is more complex than what you are used to.
In a follow-up article we'll explore how implement an event-sourced system in Ruby (without Rails) and how that affects testing and code clarity.
1: Technically, you'd still have that data in your database backups. Pause for a moment to think about how much time it would take you to look at the value in a specific row from two years ago (or last month for the matter, it doesn't make a difference).
2: The word "logged" here doesn't mean "written to your application's log file" but rather "written to an append-only data store".
3: Users don't think in relational database tables. They want to "log in", not "add a row to the sessions table". Developers are prone to thinking in terms of implementation only, which makes it easy to lose sight of the actual problem the software is trying to solve for the user.