The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or clickhereto continue anyway

Elad Blog: Facebook Must Really Suck At Machine Learning

Facebook recently claimed it is hard[1] to differentiate between

fake news

[2] and real news. Given how similar fake news detection is to related problems such as search index spam, ads landing page spam, social networking bots, and porn detection this suggests one of two things: (1) Facebook really sucks at machine learning or (2) Facebook does not want to address the problem. Lets look at each of these:

1. Facebook Sucks At Machine Learning?

Over the course of my career I worked on, amongst other things, Google mobile products (including mobile search index and looking at items like porting Google News to mobile), Google ads targeting to pages across the web, and Twitter Search (I was Director of Search Product for a time). At both Google and Twitter, the companies had to deal with large number of ambiguous signals including:

In all cases, the important thing to do was to understand the content of a tweet, web page, or other content unit, and then to rank the relative quality and importance of that content. Similar problems also exist in areas like Google web index spam and porn detection. In all cases, there are a lot of shades of grey - i.e. there is a fine line between porn and not-porn, or a spammy tweet and a silly or satiric tweet.

Facebook has developed a number of technologies to rank its news feed, to target ads, and to classify its users. However, the claim from Facebook has been that fake news is a complex area, and this complexity makes it difficult to address.

Intriguingly,

a group of undergrads at Princeton were able to build a quick and dirty fake news classifier during a 36 hour hackathon

. It is possible these Princeton students a set of once-in-a-generation geniuses. Or, perhaps, fake news is actually tractable as a problem using existing techniques Facebook already has in house.

2. Facebook Does Not Want To Address The Problem?

Facebook's CEO

recently posted

that 99% of news post on Facebook are not fake (see below for chart of

Facebook user engagement with fake versus non-fake news

). Facebook has also been under fire by the right for its "liberal bias". This prompted

Facebook to hold a meeting with leading conservative members of the GOP

to discuss its newsfeed and how to play a non-partisan role in content.

Fake news is not a partisan issue. It is about ensuring that people are helped to understand what is real and what are lies. A lack of willingness to tackle the issue of fake news is a willingness to accept a lack of truth in our society at mass scale.

Other Companies You Can Work At Instead Of Facebook

Great engineers want to work with other great engineers. If Facebook lacks the talent to address the fake news problem, do you really want to join an organization so poor at machine learning? Alternatively, if Facebook simply lacks the will to address this issue, it might be something worth taking into account as well. A number of talented engineers are also immigrants - a group much maligned in fake news posts. If you are a talented machine learning or AI engineer, there are a number of companies you can work at instead of Facebook. Some potential ideas:

If you work on machine learning or data science and want to work somewhere other then Facebook - feel free to drop me a line. I am happy to refer you to a few dozen companies as alternatives.

Notes

[1] Exact quote from Zuckerberg is:

"This is an area where I believe we must proceed very carefully though. Identifying the "truth" is complicated. While some hoaxes can be completely debunked, a greater amount of content, including from mainstream sources, often gets the basic idea right but some details wrong or omitted. An even greater volume of stories express an opinion that many will disagree with and flag as incorrect even when factual. I am confident we can find ways for our community to tell us what content is most meaningful, but I believe we must be extremely cautious about becoming arbiters of truth ourselves."

This "grey area" argument is made all the time. Yet machine learning classifiers work incredibly well for porn and other areas that have lots of grey. Similarly, getting rid of the 80% easy to spot, most egregious stuff is a good starting point. This argument strikes me as a red herring.

[1] "Fake news" is a nice way to say lies and propaganda. 

Continue reading on blog.eladgil.com