Tuesday, 1 October 2013

What Does Facebook Do With Your Data?

 
Picture by amateurartguy

The above is the all seeing eye of Mark Zuckerberg, lord of Facebook (bear with me). Lord Zuckerberg's eye is all seeing because his computers record every one of your actions in his realm (i.e. Facebook). Every preference you list, every page you like  and every profile you view is stored on Lord Zuckerberg's computers. This treasure trove of information about Facebook's users is its Big Data. Now Lord Zuckerberg has promised his people that he will not look at their chats or messages. He doesn't need to. For example, his Zuckerbergness can accurately predict your sexuality from your Facebook activity. And if you've yet to reveal it publicly, then he can send you an ad for help with coming out. 

Your Facebook likes provide a huge amount of data about you that can be used to make accurate and novel predictions. A recently published study found that they could accurately predict all sorts about people from their Facebook likes. The Zuck Man knows - 

  • how you vote, 
  • your religion,  
  • whether you smoke and even
  • your I.Q.  
For some reason the aforementioned study also found a correlation between liking curly fries and having a high I.Q. Go figure. 

Facebook is a huge social network where over one billion people interact. The Big Data produced is vast and personal. What does Facebook do with your personal data? No surprises here they're trying to make money out of it. Obviously advertisers would fall over themselves to get their grubby corporate mitts on all that data. Or not - Facebook is not making as much money as you'd expect. At least, not as much as investors wanted. Last year's initial public offering of Facebook on the stock market saw many investors lose a lot of money by vastly overestimating Facebook's value.  Investors and advertisers are all starting to wonder, have they all been taken for zuckers?




Advertisers have had 72 years to perfect the TV ad (for the curious here's the first TV ad to be broadcast). They have very accurate methods for calculating the exact impact of these ads.  They can control an entire screen and a family's attention for 30 seconds of sound and images. They know how many sales an ad will garner and can therefore calculate whether a TV ad campaign is going to get a good return on their investment. What exactly is a Facebook post going to do for them?

Facebook's Advertising Business

 Facebook has two goals: Users must want to use Facebook more than any other social network;  Advertisers must know that advertising on this social network will result in its users buying products. Facebook's big problem is the fundamental disconnect between what its users want to do on its network and what its advertising clients will pay for them to do. Users don't want to socialize in a sea of ads and advertisers are never gonna pay for posts that don't lead to sales. We have a conundrum.   

 In May last year General Motor's marketing team pulled their $10 million Facebook advertising budget. They publicly announced this the day before Facebook's launch on the stock market, bit of a dick move. GM's beefs were two in number. Beef the first was that they had no concrete proof that these ads were more effective than their free Facebook fan page. Beef the second, Facebook refuses to allow big obnoxious video ads. The Zuck's reasoning for laying down the law on a $10 million dollar client is to ensure that users aren't turned off from Facebook. Remember Myspace? Exactly. Myspace hemorrhaged users because it got greedy and flooded the site with ads. Facebook is not going to let GM ruin its user experience. It is however trying desperately to show its advertisers exactly how effective its ads are, who they're reaching and what they go on to do. And that is where your data comes in.  Facebook is committed to using your data to allow advertisers to specify exactly who sees their ads, and to inform them how many people it eventually reached.   

Facebook allows its advertisers to target an audience based on age, location, relationship status, sexuality, education level, what they 'like' and who they are connected to. As well as texts, links, logos and small photos Facebook also offers advertisers 'sponsored stories'. A sponsored story is paid for and then pops up in your news feed. If you like it then all of your friends will see it in their news feed. The more likes your ad gets the more people will see it and your marketing message's reach can increase exponentially. 

It may be a little alarming that Marky-Z and co are doing their best to commercialize a space where you spend a lot of personal contact time with friends and family. However, the Facebook team realize that their only actual resource is their users. They've shown that they are not willing to risk becoming Myspace by turning down GM's insistence on video ads. And guess who has just come crawling back?

Why do your friends' Facebook pages always look odd?

There is no one Facebook. Pretty much every Facebook page in use today is an experiment of some kind. Mark's merry men (and  of course worthy women) are constantly testing new and different features for the Facebook page live. Shades, placing, new features, font in fact pretty much everything to do with the page vary hugely for each user. Each change is a test and vast amounts of data is gathered to see how users reacted. One of the things Facebook uses your data for is to optimize its website - to increase the time you spend socializing and your propensity to buy what's being advertised. Loads of Internet companies do this. It's called A/B testing. Marissa Mayer, now CEO of Yahoo, claims that her team at Google once tested 41 shades of blue for the Google toolbar to see which was the most clickable. Rock and Roll... Interestingly, Facebook operates so many tests that it uses a programme called Gatekeeper to keep track of them all. Gatekeeper ensures that Facebook knows which bit of data is the result of which change.  

Facebook has some fierce competition ahead. It needs to compete with traditional advertising to show advertisers that it has an effective advertising product. Twitter, Google+ and Google's Search Ads are all competing with Facebook for Advertiser's web spend. And of course there is the example of Myspace. Someday soon there will be another new social network type thing that will try and steal the next generation of users from Facebook. Twitter's already doing its best. Facebook needs to compete with the rest of the internet to keep its users where they are and looking at its ads. Which means that Mark Zuckerberg is working for us, and will probably not do too many horrible things with our data. Disagree? Do you fear the Z-man? Tell me about it below!




Friday, 20 September 2013

How Netflix Turns Big Data Into New TV Shows

 
 This photo is the work of gildas_f.

This story starts with the data that Netflix gathers from its users. Netflix is a company that streams a huge variety of TV shows and films to anything you own with a screen and an internet connection, for £5.99 a month. When I talk about Netflix's 'Big Data' I mean the vast number of facts, mainly about its customer's viewing habits, that it stores on its computers. If you thirst for a deeper explanation of what Big Data is I recommend this utterly magnificent blog post here. Netflix is now using its customer data to inform its quest to create the perfect TV show!

So What Kind Of Data Does Netflix Store?

Now obviously it's not news that a company is using customer data to decide what TV show ideas to throw money at. What is news is the scale Netflix is doing it at. Netflix today has about 30 million subscribers. For the last 13 years they have all been watching, choosing, searching, 'favoriting', rating, switching off, fast forwarding, pausing, streaming on different devices and in different locations, following shows and getting bored half way through seasons. Netflix keeps records of all of this. This is Netflix's big data. In addition, all of Netflix's content has numerous 'tags' attached. These tags are small descriptions of genre, mood, the action, the actors etc. So Netflix has huge amounts of data on exactly how its 30 million subscribers like their TV served. And according to this data, what the people want is -

House of Cards

Image by Zennie Abraham

Set in Washington DC it follows Representative Frank Underwood's (Kevin Spacey) back stabbing strewn pursuit of revenge and political power. Netflix invested roughly $100 million for the first two seasons. Why were they so confident about their first big show? They looked to the data. It revealed a large overlap between users who had watched films by director David Fincher (like the social network) to the end, films featuring Kevin Spacey and the original British TV show the House of Cards. There was a recipe for a hit TV show right in Netflix's data. According to Netflix's data, viewers prefer binge watching numerous episodes at a time, not having them drip fed week after week. So Netflix released the entire season all at once. Netflix's canny use of viewer data gets canny-er. They showed different trailers for House of Cards to different users depending on what they knew that user liked. Kevin Spacey fans saw trailers that featured mostly him. 'Thelma and Louise' watchers saw trailers highlighting the female characters. Film aficionados saw trailers that focused on the director. So what's the show like? 





The House that big data built turned out to be a huge success. It got great reviews, a bucketful of Emmy nominations and made money. So Netflix are doing it again. 

Orange is The New Black
Authors of image are Netflix Company, Jenji Kohan (Producer) and Jordan Jacbos (Art Director)

A comedy set in a women's prison, Orange is the New Black is based on the autobiography written by Piper Kerman about her 15 months in a women's prison on a decade old drug crime. The main story follows Piper's struggling to live with the inmates. Flashbacks flash up to tell the story of what brought the inmates here and why Piper committed her crime. Just as with House of Cards, the data told Netflix to make this show. A high proportion of Netflix users all liked dark comedies, a good natured female lead and plots involving prison or crime. Unlike House of Cards however, Netflix didn't give Orange is The New Black a large marketing budget.  

Netflix's Recommendations 

Netflix builds a profile up of the TV Tastes of all its subscribers. This is used to generate recommendations of shows you might want to watch next. Netflix is getting pretty good at data driven recommendations - around 75% of  its views come from its recommendation section. Orange Is The New Black didn't need marketing. Netflix knew which of its subscribers would want to see it and they knew that Netflix's recommendations are usually solid. Orange Is The New Black got more viewings in its first week than House of Cards or Netflix's Flagship show 'Arrested Development'.  

The market is getting a lot more competitive for Netflix. It faces competent competitors like love film and Hulu. Netflix already operates on a very low margin and is facing increasing licensing costs for its content. There are going to be famous TV shows that Netflix subscribers are not going to have access to - Game of Thrones, The Sopranos, South Park etc. So Netflix is going to have to make the most out of the content it does have. You can get more value from a cheaper, unusual 'indy' TV show, if you recommend it to people you know would like it, but who might not search for it themselves.  

Netflix's strategy is to use its Big Data to find viewers for TV shows that don't exist yet, make them and then recommend them to exactly those subscribers who will love them. Netflix is going to need the customer loyalty that this will create because it will need to increase its subscription fees soon. It's an interesting time to be either a Netflix subscriber or investor. 




Friday, 23 August 2013

Big Data

Big Data


The above symbolizes just how much data companies have to deal with today. A lot. They're drowning in it. Well a few are, others are finding ingenious ways to spin data into revenue. The modern deluge of data is a mine of money - understanding it is key to understanding the modern business world, so read on. 

Why Is Big Data Important?


This is a blog about the future of business, so first off, show me the money! $300 billion is the amount that the magic of Big Data could provide the US health industry. €250 billion could be conjured up for the public sector of EU countries, and a chunky 60% increase in profit is in store[1] for retailers who convene with the black magic of big data. All these Big Data predictions are produced by those splendid chaps at McKinsey, to illustrate how much money organizations could stand to make with Big Data techniques.  Whatever Big Data is, it’s evidently worth a lot of money.

Big Data isn't just about making existing business models more profitable; Big Data has generated fancy new ways of making money. There are companies that aim to use social media data to predict political upheaval . Risky and complicated investments are now feasible for companies to evaluate and thus invest. One example of a big data backed investment is that made by a Seattle based Health Insurer was able to invest money in a voluntary wellness and health scheme for its employees and make money from it. The reduction in health claims alone saved the company more than the amount spent on in the incentives to join. The company estimates gaining $1.5 million from increased productivity from its employees’ own estimation of the effect the programme had.

Far from a black future of snooping and omniscient organizations knowing all our personal information, I see a bright future with big data. Annoying ads will be extinct. Persistently plaguing me with images of how happy are those who consume the premium household surface cleaner is a waste of money for the advertiser and tedious for me. However, I wouldn't mind ogling the new smartphones—those I just might buy. If you have access to enough data about me, you just might be able to send ads my way that don’t make me pine for soviet Russia. Or, what about a smartphone that knows your voice so well that it can tell when you’re upset, and notify your friends?

What Exactly Is Big Data?

Big Data are data sets that are too large for conventional storage hardware and analysis software to handle. Today that typically means in-between a dozen terabytes and a handful of petabytes.  So far so unexciting, a larger spreadsheet doesn't sound revolutionary. Well, Big Data’s fundamental magic property is to allow us to see the future. Induction is a leap of logic that leads us from past observation to future prediction. The more past observations (i.e. data) we have, the more reliable our prediction is.[2] The more varied and voluminous our observations, the more nuance and detail we discover in the properties of things and the laws of nature. The bigger the data, the more reliable our predictions of the future. But what’s so special about this amount of bigness? Why am I bombarded with the Big Data buzz now?

The answer can be found in this paper on the ‘The Unreasonable Effectiveness of Data’. In it, Googlers Peter Norvig, Alon Halevy and Fernando Pereira argue that Big Data has a special significance for the social sciences. Where physics and chemistry has had spectacular success with elegant mathematical formulae able to precisely describe how a kind of fundamental physical event will happen, the social sciences have failed. Historically our attempts to ‘formalize’ the social sciences have all failed. The heart of wave mechanics is schrodinger's single one line equation. A rule book of the grammar of the English language runs to over 1700 pages. Attempting to ape physics is totally the wrong approach, argue Google’s pioneers of machine translation. Attempting to teach the machine the precise rules for which expressions are grammatical or semantic equivalents isn’t going to work. Giving it vast amounts of data and simple prediction rules will make it a far more endearing conversationalist. To illustrate the scale of this shift, our Googlers tell of the excitement of dealing with the Brown Corpus in the undergrad years. The brown corpus is a set of over a million words – carefully tagged and catalogued. In 2006 Google released a trillion word corpus with frequency counts for sequences less than five words. This was messy data, but vastly more useful for deriving predictions. 

All this is relevant for business because the domain of the social sciences is where money is made. Atoms don’t buy anything[3]. Imagine how business would act if they could predict with a simple one line equation exactly what percentage productivity increase would result from different management styles. Such an equation has not been found, but perhaps with vast amounts of social media data we have something just as good.

How Much Data Are We Talking Here?

The amount of data business are hoarding is vast and it’s increasing rapidly, and the rapidity with which it is increasing is increasing, rapidly. According to McKinsey the amount of global data produced will increase by 40% each year John Gantz and David Reinsel predict in ‘The Digital Universe’ that the amount of data in the world is currently doubling every two years. 15 out of 17 sectors in the US have more data stored per company than the US library of congress- 235 terabytes. So where’s it all coming from? Social media is a big culprit. Every twitter, every photo, every video, every comment and every ‘like’ is being logged. The increased digitization of businesses is another. With computers being used whenever possible in business- every transaction and and transportation is being logged and stored as data. Smartphones, now outselling dumb phones, are another fountain of data. Logging every user’s location ensures that there is a huge amount of data to be used. The internet of things is yet another source of data sure to surge. Gartner predicts that we will see 50 billion machine voices added to the internet. The internet of things is the increasingly large network of sensors built into the billions of devices we use.

Google's Software Magic

One of the big factors in today’s Big Data revolution is Google’s creation of Map Reduce data processing software, and Yahoo’s re-creation of this technique in the Apache Hadoop programme. Before Hadoop, one’s go to data analysis solution was a relation database. Relational Databases employ what is called a schema-on-write model. Before analyzing any of your data you had to create a schema. You have to go through a load operation which takes your original data and converts it into that schema. Once your data is in this format reading it is fast and simple. However it can only be read in this format. You can only ask it questions that you built into the schema you loaded it with. Asking new questions requires loading the data with a different schema and thus costs money, quite a lot of money. Hadoop as a programme is a powerful alternative to relational databases and has complementary properties. Hadoop has a ‘schema-on-read’ model, meaning that it can load the data straight away with no waiting for any schema to be applied. Loading the data with Hadoop is far quicker, reading the data is slower. 

The really cool thing about Hadoop is that it functions by distributing the data and the analysis process across numerous different computers, vastly reducing the time necessary to finish the task. This is why far more data is available for cheap analysis today. The Hadoop Distributed File System works by chopping up files (default 64mb in size) and then spreading those blocks throughout the numerous clusters, replicating each block a number of times (the default is three). The operation is run over the sum of all these computers – carrying out the right operation on the right file at the right stage of the process. The system is built and optimized for putting files into the database, getting them out, deleting files but not updating current files. These assumptions allow us to achieve the scalability. The more computers, you add to the cluster, the briefer will be the analysis process. Even cooler, the code you use for a handful of computers, will work just as well on a hundred.   

Now that I’ve explained the basics, next Friday I will regale you with a list of the most fantastic uses of Big Data to be found in the corporate world.

If you've found anything I've not got quite right, or something I've missed that really ought to have been included, please leave a comment! Be warned I will take your silence as your testimony that I am perfect!






[1] forgive me
[2] Quite why this is so remains a mystery.
[3] Well not on their own they don’t.

The Aim Of This Blog

The Aim Of This Blog


The aim of this blog is to efficiently explain the new technologies that are going to refashion the entire economy. For every techno buzz word there will be a post and for every post there will be a clear explanation of what this thing is, how you can make money from it and what kind of change it will bring. Terms like ‘The Internet of Things’, ‘Big Data’ and ‘Cloud Computing’ are thrown about like we’re being paid per buzz word and fined for any context or explanation included. However, if you understand these concepts you’re going to understand the directions your company, job and industry are going to be taking in the next decade. I’ll award myself internet points for each of my posts which is the clearest introduction to the topic in the internet.1

Another mission critical aim of this blog is to get you excited by these ideas. If the potential and all round clevercloggery of this stuff is boring you, you’ve not understood. Anywho that’s gonna be my motto in writing this stuff. Understanding that Big Data just involves big data bases is one thing. But unless you know why McKinsey reckons that manipulating Big Data techniques could save the EU’s public sector more money than the GDP of Greece, annually, my job in explaining the importance of this stuff has failed.

A further aim is to give you plenty of links to further interesting stuff. I’ve benefited from loads of great pages, articles, blog posts, youtube lectures, interviews and speeches. And I’ve waded through rubbish to get at them. You dear reader will not have to do so. I’ll link you to the best of them on the relevant blog post.

When will such wonders be posted?-  Every week, at least! Our first stop will be an explanation of everything to do with Big Data. Next I’ll explain how Google’s magic was bottled and put into an piece of software called Hadoop. Then we’ll hit ‘The Internet of Things’, ‘Cloud Computing’ then mobile computing and then, who knows? we’ll see where the night takes us.  



After I’ve introduced the basics of these concepts I can start writing about the possibilities that these new techniques open up to us, the new rules to the trillion dollar game of the free market that they introduce.
1 And I live for internet points. Yes I’m afraid this blog will include footnotes. A little stuffy and academic I know. It does however allow my blog posts to be far pithier than they would be otherwise. Plus it means I can pathetically ape my favourite author, David Foster Wallace. Stay tuned down here for stuff that wasn’t good enough to make the post proper!