People have always had an interest in what other people think, or what opinion they hold. Since the inception of the internet, increasing numbers of people are using websites and social media platform for expressing their opinion. Due to platforms such as Facebook, Twitter etc., it has become feasible to analyze and extract the public opinion on a certain topic, news story, product, or brand. Opinions that are mined from such services can be valuable. Data mined from these sources can be analyzed and presented accordingly to easily identify the online mood (positive, negative or neutral). This allows individuals or business to be proactive as opposed to reactive when a negative conversational thread is emerging. Alternatively, positive sentiments can be leveraged to identify product advocates as well to shape the business strategy by seeing the parts of the strategy that are working.
Author Archives: Shubham Jha
Have you ever been in a situation where you need to store gigantic amount of data in your Salesforce org? Are you tired of using third-party storage systems and writing web service calls to get the massive amount of data? Having to crunch those large numbers, though a reflection of your own success, can be troublesome and performance degrading. Salesforce has come to the rescue again by introducing big objects-which are objects with massive storage capabilities on the Salesforce platform itself. They provide consistent performance over a data set of the order of billions and are accessible through a standard set of APIs to your org or to an external system.
“Anything that can be connected, will be connected.” This quote by Jason Morgan of Forbes.com, is going to be, what future holds, with the increasing growth of the concept of IoT. Simply put, Internet of things is a system of electronic devices or components, interconnected with each other through Internet and capable of exchanging data and information. The devices include everything ranging from cellphones, coffee makers, headphones, smart wearables to components of machines like engine of an airplane etc. The IoT allows the devices to be remotely sensed and controlled using the existing network structure; which enables the opportunity to directly integrate them with other devices or networks. So basically, IoT is a giant network of connected things.
Why use IoT & How big is it?
A survey conducted by HP estimated that the growth of IoT will be exponential and by 2025, over one trillion devices will be connected through IoT. Another report by Cisco predicts that IoT will generate $14.4 trillion in value across all industries in the next decade. These surveys suggest towards a fully automated future.
Since a long time, engineers have been striving to make machines perform tasks that human beings do; which has led to birth of the field of machine learning. Understanding the language humans speak, constitutes a vital part of this field. This field of computer science which deals with human-machine interactions, especially concerned with computer programs which can process natural language efficiently, is known as Natural Language Processing, mostly referred to by the abbreviation NLP.
NLP sits at the intersection of computer science, artificial intelligence and computational linguistics. “By utilizing Natural Language Processing algorithms, developers can organize and structure textual data to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.” (En.wikipedia.org, 2017)
Natural Language Processing is characterized as a hard problem in computer science since human language is rarely precise, or plainly spoken. To understand human language, one must not only understand the words but their meaning & context and how they interconnect to form meaning. The vagueness and ambiguous nature of human language makes it difficult to learn for computers while being easy to learn for humans.
Components of NLP
There are two components of NLP which are listed as follows:
- Natural Language Understanding(NLU)
This includes understanding the different aspects of the language and mapping the input text in natural language to useful representations. This is the harder of the two components since this section has to deal with the ambiguity & complexity of the language. There are mainly three levels of ambiguity which are as follows:
- Word-level or Lexical Ambiguity
- Syntax Level or Parsing Ambiguity
- Referential Ambiguity
- Natural Language Generation(NLG)
As evident from the name, NLG is the process of producing or generating meaningful phrases and sentences in the form of natural language. It involves text planning, sentence planning and text realization.
Syntax: It refers to arrangement of words which form a sentence. It also involves determination of structural role of each word in the sentence.
Phonology: It is the study of organizing sounds systematically.
Morphology: It is study of how words are constructed using primitive meaningful units.
Semantics: It deals with the meaning of words and how they can be joined/combined to form meaningful sentences.
Discourse: This determines how the immediately preceding sentence can affect the interpretation of the next sentence.
Pragmatics: This deals with how the interpretation of a sentence changes according to the situation.
What can developers use NLP algorithms for?
- Summarizing blocks of text to extract the meaningful information from the given text, ignoring the remaining non-relevant text
- Understanding the input and generating the output in Chatbots
- Deriving the sentiment of a piece of text using Sentiment analysis
- Break up large text into simpler tokens such as sentences or words
Some Open Source NLP Libraries
- Apache OpenNLP
It is a Java based machine learning toolkit provided by Apache, that supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution. OpenNLP also includes maximum entropy and perceptron based machine learning. It provides built-in Java classes for each functionality as well a command line interface for testing the pre-built agents.
- Natural Language Toolkit(NLTK)
It is a platform for building Python programs to read and process human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
- Stanford CoreNLP
Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract or open-class relations between entity mentions, get the quotes people said, etc.
MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Apart from classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields.
These are few of the many open source libraries and toolkits available for development on Natural Language Processing which can be utilized by developers in their applications.
In conclusion, Natural Language Processing is an important part of the artificial intelligence field and needs to be given importance if someone wants to master the trade of Machine Learning or Artificial Intelligence.