I’m a data stream believer.
Let me explain what that means.
I see data arriving in a stream: I’ll either process and use it, or I let it pass me by. I don’t bother storing it.
Storing data is a popular paradigm (the “data ocean”) but I believe that people often fail to really understand that the value of data isn’t fixed – it only really has value in context and time.
For example if I tweet “hot cross buns going fast. come to the kitchen now” that data point only has value for a short period of time: a few minutes later, if you care about your buns being hot, and a little longer if you’re not so bothered. In any case, in an hour there will be no hot cross buns to speak of and the value of that data point has now decayed to zero.
To take advantage of the data stream, I need to listen to it and be watching out for key events. Events like those hot cross buns being available. To do that, I’ll either set up an alert for “hot cross buns” or I’ll make sure I follow the right twitter account. In both cases I have thought ahead and set up a listener.
I am now acting as a data stream believer. I’ve decided what I’m looking out for and set myself up with an alert for when it occurs. When it occurs I’ll then take action.
To me, this “data stream believer” should be our dominant strategy when working with modern data – particularly big and social data.
The alternative data ocean strategy is a lot harder to make work. Data ocean thinking goes a bit like this – “I don’t know what data I really want right now but I am sure in the future it’ll be valuable. What we’ll do is store all the data and then when we need it we can query it. By storing everything we’ll never be caught short.”
I think there a couple of issues with this train of thought:
- Data storage costs just keep going up. Over time you’ll always need more storage as more data is created and stored. You’re not recouping the cost of storage anywhere so it’s just a black hole of cost. Admittedly data storage costs are going down but there’s still a cost somewhere.
- You aren’t using the data you do have. This is more important than cost – data oceans leads to woolly thinking and sloppy processes. You’re storing something but you’re not doing anything with it. It’s just sitting there gathering virtual dust. Even if its just cognitive cost (at the back of your mind you know that data is there) it’s still a waste of your attention.
Let’s make this real, with a consumer example of data stream versus data ocean thinking – digital photo storage.
Family 1 opts for a data ocean approach. Every photo they ever take is stored on the cloud. They never look at their old photos. Scanning through them is a boring chore as there are multiple copies of the same snap, a bunch of out of focus ones and some scanned expense receipts in there too.
Family 2 opts for a data stream approach. Every year they review the photos they took from the last year and choose 50 to go in that year’s family album. They get the family album printed, share it in the living room and never look at the cloud folder ever again.
Which family do you think is getting the most out of their data (in this case family photos)?
So we can see – a data stream believer takes the data stream and processes or discards it. Value is added immediately – and often it is pushed straight back into the stream.
It is with this philosophy how Rise works – data points are pulled from the stream on each player, processed into scores and pushed back to the stream as a release of the leaderboard. This then feeds other people’s data stream activities – how did I do last week – time to optimise for next week.
The data stream creates a faster, more vibrant, feedback loop and uses data well.
How about you? Are you ready to become a data stream believer?