March 2017

Narrative Science Saves the World

NLP Tools for the Data Miner

Big Data is all well and good. Extracting meaning from Big Data is another story. That's where Natural Language Processing (NLP) comes into play. Spreadsheets and relational databases only account for 20% of all available data. The rest, found in social media posts, images, email, text messages, audio files, Word documents, PDFs as well as other sources, we call unstructured data.

Simply stated, NLP is a tool for uncovering and analyzing the "signals" buried in unstructured data. By using machine learning and artificial intelligence to make inferences and provide context to language, NLP enables computer programs to make sense of natural language (unstructured) text. Companies that want to understand what customers think of their products, services and brand need the kind of information NLP uncovers. So there's a lot of interest out there. Some analysts predict that the NLP market will reach $13.4 billion by 2020. That's a compound annual growth rate of 18.4 percent.

It makes you wonder, what are the tools for effective NLP, and who are the leading vendors? These are the kinds of products we need and use as building blocks for Narrative Analytics. It's foundational technology for us, like the bricks you'd use to build a house. The question remains, do we want to manufacture our own bricks, or do we want to leave that up to someone else and use our time more productively and strategically?

You might say we went looking for the best brick manufacturers. The chart shows the companies we examined, their products, and the product features. We are an independent entity compiling this information, and our aim (it's for our own self-interest) is to discover which "bricks" are best. We are actively engaged in testing and will publish those test results in a future column.

Here's a breakdown of the tool features you'll find in the chart:

Emotion and Sentiment analysis are directly useful features for clients:

  • Sentence/Document Level Sentiment/Emotion (DLS/E)
  • Entity Level Sentiment/Emotion (ELS/E)
  • Concept Level Sentiment/Emotion (CLS/E)

These tool features have value in relation to a user facing feature, such as sentiment analysis or a knowledge graph. They add accuracy and context:

  • Relation Extraction (RelEx)
  • Paragraph Splitting (PSplit)
  • Co-reference Resolution (CoRef)
  • Named Entity Recognition (NER)
  • Named Entity Disambiguation (NED)
  • Topic Extraction (TopEx)
  • Summarization (Summ)
  • coreNLP - morphological analyzer, n-gram extraction, lemmatization, PoS tagging, stop-word list, etc.

The vendor list:

Vendor-by-vendor feature matrix:

  DLS ELS CLS RelEx PSplit CoRef NER NED TopEx Summ CoreNLP
IBM Watson          
Lexalytics ?  

We will evaluate for these factors:

  • Languages supported (list of languages)
  • Accuracy (F1-score, Precision & Recall)
  • Performance (Memory & Speed)
  • Price ($)
  • Customization (F1-score, Precision & Recall)
    • Ability to customize Corpus, sentiment dictionary, etc
  • Configurable rules engine (yes/no)
  • Business continuity (A-F)


Written by

Babak Rasolzadeh, Director of Data Science

More from Narrative Pulse

Subscribe to receive the latest articles: