We didn’t set out to build a product for humans, in fact we anticipated that humans were very good at this type of task, and wouldn’t need such a tool. The original intent was to help our AI understand which articles were credible, and therefore could be used for learning, and which where fake, and shouldn’t be used.

Five years ago for the 2012 election Unpartial launched as a means to detect lean in political news. At the time the big problem was that news wasn’t fair and unbiased, but rather so much “slant” was being inserted that a reader couldn’t trust that an article hadn’t been so one sided as to be a lie of omission.

Today, Fake News is so prevalent that the US Government is investigating Facebook, and Twitter for their role in spreading it. The UK Government is looking at sanctions against Facebook and Twitter for their role in promoting Fake News that lead to changes in EU Referendum.

But it isn’t limited to politics. Fake News creates strange and dangerous health trends, ranging from anti-vaxers to people who ditch science based cancer treatments for home remedies. Fake News also is responsible for a surge in flat-earthers.

Building the “gut instinct” for an AI to detect fake news requires a very powerful set of Natural Language Processing tools, and a technology called an epistemology. (See my other LinkedIn Articles for more on epistemology) One of the challenges is that you have a chicken and an egg. Epistemologies build relationships between words and establish traits on noun entities they contain. This is done by mining the internet. While using “trusted sources” can limit the wrong information the epistemology contains, there is a lot of information that never appears in trusted sources because it is assumed the reader knows so much about the basic world that it isn’t worth mentioning. The phrase “mules have horizontal pupils” doesn’t ever appear on the internet, but an AI may need to know that this is true. If the fact doesn’t appear in all of the internet, consider how unlikely it is that it would appear in a trusted source like an encyclopedia. Facts also change, and so they need to be updated over time. New leaders are elected, populations change, and understanding of medicine changes. All of this new information takes a very long time to make it into trusted sources, which is why a “Truthiness” analyzer is needed for AI to be able to function.

Unpartial is a wrapper for the truthiness analyzer that Loki and Lobi, Recognant’s AI, use for their fact validation and gathering. Some changes needed to be made for humans, such as explaining why the article was considered fake news. As an AI, articles are mostly binary, trusted or untrusted, so when they read an article, they don’t process the whole thing if it appears to be untrustworthy.  When analyzing for humans, the level of untrustworthy needs to be shared, so the entire article is processed.

One of the first things I expect readers to do after reading this article and installing the plugin, is to check this article to see if it is trustworthy. It should score poorly. I am only relaying my account, and I didn’t support any of the above assertions with quantitative data. So, you shouldn’t really trust me. Fortunately, there are also very few knols of information in this article, so the AI isn’t really missing out.

Unpartial works by evaluating statements in an article based on a small corpus of standard facts, what one might call “common sense,” but this is not the primary method used. Instead the system looks at things like how much bias is in the article, how much effort is spent on persuading a reader to an opinion, and how well the author has written the article. “Alligators ain’t normally found in Antarctica” may be true, but because of the grammar issues in the statement, you probably wouldn’t treat the source as credible. “Third grade teacher discovers cure for cancer” is unlikely to be true since that is not the primary goal of teachers. Neither of these approaches require a huge corpus of facts, or at least not a complete corpus of facts.

Unpartial can do fact validation, but it is expensive so it wasn’t put into the public offering. Verifying that 14 million Americans did X requires the system to read all of the Internet and then co-ordinate the facts, and deal with some level of rounding, as an article that says 14 Million may be based off of data that says 14,376,047 or that says 14.3 or really even 13.7.  It is not enough to simply search the internet for the exact fact. All of the internet must be indexed by the fact extractor and then the “best” version of the fact has to be identified. Unpartial does this for offline documents and fixed corpus/domain searches, but for all of the Internet it just isn’t affordable with current levels of traffic.

What is Unpartial?

Unpartial is an AI powered article evaluation tool. One of the hardest challenges for an AI trying to learn about the world around it, is telling how much it should trust what it is reading. Simply having access to all of the internet for fact finding isn’t enough. A system needs to be able to remedy conflicting information. Unpartial wasn’t built for humans, but it works well and so the robot overlords thought we should share.

What isn’t Unpartial?

Fact checking. While fake news can be well written and seem highly credible, 99% of the time it isn’t. While the Recognant AI that powers Unpartial can do fact validation, it is expensive, and 99% of the time isn’t needed for faking suspect news. One of the primary reasons this is true is that the goal of fake news is to convince the reader of something. To do that the fake news is authored with a strong bias towards their position. Good journalism lets facts speak for themselves. This doesn’t mean that a bias article has false information, but an article that is one sided is false by omission.

How does it work?

Unpartial uses part of Recognant’s AI to evaluate articles. The evaluation looks at a lot of factors in how the article was written. This includes things like how grammatically correct the article is, how biased it is towards a position, how factually dense it is, and if the article contains subjective statements. There are other factors, but these are some of the primary ones.

What do the results mean?

The system will return several possible results based on what it has determined about an article:

  • Suspect Source: While this article may or may not be true, this author or site has a very, very, high number of fake articles.
  • Likely Satire, Parody, or Sarcastic: The tone of this article seems to imply sarcasm or parody. As such even if some of the sentiment is true, some of the statements may be false or exaggerated to be funny, or satirical.
  • Click Bait: The article uses a headline that is designed to be bombastic or draw viewers. This often means it is misleading. The contents of the article may or may not be true, but it should be viewed with skepticism.
  • Opinionated/Biased: The author shows substantial bias towards swaying the reader to a particular view. While statements may be true individually, the result is often false by omission.
  • Author fails to be definitive: The author hedges statements such that they aren’t definitive. This could be because the facts are unknown, or there is speculation. It can also be the result of the story being about future events that may not happen.
  • Limited supporting facts: There aren’t enough facts to support the position of the article, or the article is not a factual report. (It may be an editorial, or a press release)

Based on these and other factors the system assigns a trustworthiness as:

  • Seems Legit: You can likely cite this as a source
  • Consider a more reputable source: While there is no strong indication that this is not trustworthy, it doesn’t appear to be a source you would want to cite. Likely there is a better source for any information contained in this article.
  • Seems Sketchy: Several red flags make this article unstrustworthy.
  • Super Shady: There are quite a few red flags that make this article untrustworthy.
  • Fake News: This article contains so many red flags the system believes there is no chance the article is true.

Is this a Neural Network?

No. To make this work with a Neural Network the network would need to learn English, learn to tell subjective from objective statements, and all of the rules of grammar. That’s far beyond what Neural Networks can do, and even if they could the cost to operate a network of that complexity would be astronomical.

Is it ever wrong?

Yes. Just as humans are fooled by all sorts of things, the AI can be fooled as well. Taking a well written article about the earnings report for a stock and doubling all of the numbers would fool the AI. Just as a human has no way to know the numbers in advance, the AI doesn’t either. The full system can verify that a fact has been reported more than once and from how reputable a source, but the version made available here doesn’t because of the cost associated with doing so.

Why is this better than crowdsourcing?

Crowdsourcing is a democracy, and democracies can be bought. Our AI won’t change its vote for cash. It also can make a decision in a matter of seconds, whereas with crowdsourcing if none of the crowd have visited an article it won’t have a score. (A human can’t read an article as fast as our AI either)

Do you use the domain for determining if an article is fake?

No. While we feel this might increase the accuracy, we also feel this is cheating. Many questionable sites have real news from AP and other syndicators. Those stories are often true. While we don’t want to push traffic to these sites or lend credibility to them, we decided that a per article score was the best option.

Do you keep track of who is the most trustworthy sites?

We will. There are some challenges that to do this we’d need to either analyze all of their articles, or pick a day and analyze all of their articles going forward. Also because some fake news sites get no traffic to their legitimate stories, and lots to their fake stories a simple percentage of stories doesn’t paint an accurate picture. As we start to get data from users we can use popularity to better assess how trustworthy a site is.

Can I license the AI as an API?

Most of the technology for this product is built into the Recognant API available through Mashape/Rapid API. If you want the full API we can make that available. If you want the version with fact checking that is available as well.