Natural Language Processing and Understanding Applied to Search

googlesuckGoogle focuses on Internet search because their algorithm doesn’t really work for Internal documents, and they rely on the sheer volume of documents to provide answers to questions. Essentially they rely on the fact that everything written on the Internet multiple times so the various ways you might search for it are all likely to find something. This is not true of most corporate intranets, nor is it true for things that are topics of research.

Often what we are searching for, and what someone wrote are very different. Consider the example of a computer processor. Say you work at Intel and someone asks how fast is the new Octium process. “Fast” could be the clock speed 6.2 Ghz or it could be the number of Petaflops, 1.7, or it could a multiple of the Pentium it replaced 9.5x. ┬áIn the docs it is unlikley that it says “Fast” anywhere on these metrics.

Natural Language Processing and Understanding solve this issue. When combing through the docs on the intranet, or the Internet the software indexes sentences that contain “speeds” this could be as Mhz, Ghz, Petaflops, Teraflops, or “9.5x the speed of”. There are dozens of way to say something is faster.

When results are returned the relevant sentences can be highlighted so that the person searching can pick with measure is most appropriate for their needs, but they don’t have to do a bunch of searches looking for things like “Octium Ghz” which isn’t “natural” at all.

This extends to non-technical searches as well. Consider asking “which presidents died in office.” To answer this question you have to look for presidents who have a year of death that matches the year their term ended. The ability to find this answer requires parsing either structured data, or structuring data from documents to then do the calculations against.

Currently Google can’t even distinguish between a question, and a phrase to be found. “Why does…” is not a phrase a users is seeking in a document, it is the preface to a question that they want answered. Without NLP answering direct questions is not possible.