Should have blogged about this last week, but other demands on my time prevailed.
There’s an article on TechCrunch (brought to my attention by my colleague Justin) about the launch of Swivel, whose founders Dmitry Dimov and Brian Mulloy describe as the “YouTube of data”. What they mean by this is that they’ve created a place where users can upload interesting data sets and then plot them against other data sets from other users to look for correlations, such as the interesting one below:
Unfortunately I don’t have much particularly interesting data to upload (and the data that I do have that is interesting is confidential), so I wasn’t able to try this with some of my own data. Apparently when the site launches, you will be able to upload data and keep it private – though I don’t know how many people will be happy to trust their precious data to a relatively unknown third party (not to mention the legal aspects).
If Swivel can overcome this obstacle, however (and they need to – charging for private data is their main revenue source, apparently), then they could be onto something. They’re building out significant data center capability to perform correlations behind the scenes and suggest data sets that you might want to compare. But it will be interesting to see whether the correlations they come up with are anything more than just of the ‘happy coincidence’ variety (for example, the rising plot of oil prices in the chart above could appear to correlate nicely with the usage of World of Warcraft, if you’re careful to pick the right range, etc). So perhaps Swivel should have a little tutorial on how correlation does not imply causation on their home page.
The site’s other challenge is the cleanliness of the data – even when trying to compare data that was date-based, the site choked several times (doubtless these are problems that the team is working out), but there is a larger issue of ‘standardization’ of axes or segments. Date is (relatively) easy – you can make some assumptions about the date range that a particular data point relates to – but other ranges/segments are harder, such as:
- Country (problems with old vs new names, regions, etc)
- Age (lots of data is grouped into age ranges, e.g. 16-24, 25-34, but these are not consistent)
- Income (same problem as above, plus currency fluctuations thrown into the mix)
And that’s just the axes/segments for humans – other entities like companies have their own characteristics which are not measured in a standard way, especially not internationally.
It’ll be interesting to come back to Swivel in a few months when there’s some more data in there (and when they have their private data service up and running). I wish them well.
Ian, I don’t think privacy is the point here. Isn’t this the YouTube of data? How many people post something on YouTube in an effort to keep it private?
The point is to get people to see your stuff. Your stock price. Your feed subscription numbers. Comments per post. EBITDA. Whatever.
Thanks for taking the time to write about Swivel. You have hit upon many of the issues we are working on to ensure Swivel is valuable for folks. Long way to go of course. Thanks for the encouragement too.
Brian Mulloy
CEO & Cofounder
http://www.swivel.com
Robbin, I think the “YouTube of data” moniker may be a little inappropriate (and Swivel only have themselves to blame here), since their only way of making money is to charge to upload private data. As an aside, it’s enormously refreshing to see a Web 2.0 startup that isn’t dependent on Google Adsense revenues for its business model.
I imagine that the market that Swivel is trying to reach is corporate (or academic) data analysts who have a set of their own private data, but want to correlate that data against publicly available data. If you knew the age distribution of your customer base, for example, there’d be value in correlating that with where in the country/world they’re likely to be. But you’d want to be pretty sure that nobody got their hands on your private customer data in the process.
Actually the challenge may be getting sufficiently interesting public data sets that people can correlate against. But I imagine that Swivel has a team (ok, it’s probably just a guy in a cube) scouring the internet and academia for just such sources.