Machine Learning with Python

Machine Learning and Artificial Intelligence are considered as an integral part of the future technologies.

Artificial Intelligence is an area focused on developing intelligent machines that work and react like humans. To achieve this Artificial Intelligence considers all the traits that can help achieve the feat, these traits include perception, learning and planning. Machine learning on the other hand focuses on development of programs in such a way that systems can access data and use it to learn for themselves Artificial Intelligence focuses on making machines smart i.e. react as the situation demands whereas machine learning is based on providing machines access to data, making them learn themselves which makes their decisions learnt rather than smart.

For purview of our topic lets focus on Machine Learning now.

Machine Learning vs. Artificial Intelligence-The identical twins or are they really?

Since you are reading this, I assume you are aware of, or at least have heard about Machine Learning and Artificial Intelligence. Being two of the hottest buzzwords in the industry right now, these are often used interchangeably leading to some confusion. However, these two have different meanings and applications. The two terms are very strongly related though, as they share a containership relationship between them where the former is a subset of the later. Lets dive deep into these topics and try to find the reason for this confusion and related solutions.

Why this confusion?

The main culprit behind this confusion is the interchangeable use of these two terms and the limited knowledge of the subject among the developer as well as user community. Artificial intelligence is heavily dependent on machine learning,

Machine Learning and AI confusion

which has led to the perception that both terms refer to the same thing. This confusion has spread like wildfire in the industry and only people who are experts in this field, know the clear distinction among these terms.

Artificial Intelligence-The Big Brother

Artificial Intelligence is the intelligence demonstrated by machines which emulates a human like thinking and behavior, allowing them to make their own decisions in real life situations. Going by the computer science definition, AI is referred to as the study of intelligent agents, which are devices that perceive their environment and take actions accordingly in order to maximum fulfillment of their goals. These agents mimic certain cognitive functions, which humans relate with the human mind, like problem solving and learning. AI, traditionally, attempts to solve problems such as Reasoning, Knowledge Representation, Learning, Planning, Natural Language Processing etc. Generating an intelligent agent which can think like humans is the long-term goal since it makes use of all the former techniques mentioned.

NLP – Natural Language Processing

Since a long time, engineers have been striving to make machines perform tasks that human beings do; which has led to birth of the field of machine learning. Understanding the language humans speak, constitutes a vital part of this field. This field of computer science which deals with human-machine interactions, especially concerned with computer programs which can process natural language efficiently, is known as Natural Language Processing, mostly referred to by the abbreviation NLP.

NLP sits at the intersection of computer science, artificial intelligence and computational linguistics. “By utilizing Natural Language Processing algorithms, developers can organize and structure textual data to perform tasks such as automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.” (En.wikipedia.org, 2017)

Natural Language Processing is characterized as a hard problem in computer science since human language is rarely precise, or plainly spoken. To understand human language, one must not only understand the words but their meaning & context and how they interconnect to form meaning. The vagueness and ambiguous nature of human language makes it difficult to learn for computers while being easy to learn for humans.

Components of NLP

There are two components of NLP which are listed as follows:

    • Natural Language Understanding(NLU)
      This includes understanding the different aspects of the language and mapping the input text in natural language to useful representations. This is the harder of the two components since this section has to deal with the ambiguity & complexity of the language. There are mainly three levels of ambiguity which are as follows:

          1. Word-level or Lexical Ambiguity
          2. Syntax Level or Parsing Ambiguity
          3. Referential Ambiguity


    • Natural Language Generation(NLG)
      As evident from the name, NLG is the process of producing or generating meaningful phrases and sentences in the form of natural language. It involves text planning, sentence planning and text realization.


NLP Terminology

Syntax: It refers to arrangement of words which form a sentence. It also involves determination of structural role of each word in the sentence.

Phonology: It is the study of organizing sounds systematically.

Morphology: It is study of how words are constructed using primitive meaningful units.

Semantics: It deals with the meaning of words and how they can be joined/combined to form meaningful sentences.

Discourse: This determines how the immediately preceding sentence can affect the interpretation of the next sentence.

Pragmatics: This deals with how the interpretation of a sentence changes according to the situation.


What can developers use NLP algorithms for?

    • Summarizing blocks of text to extract the meaningful information from the given text, ignoring the remaining non-relevant text
    • Understanding the input and generating the output in Chatbots
    • Deriving the sentiment of a piece of text using Sentiment analysis
    • Break up large text into simpler tokens such as sentences or words


Some Open Source NLP Libraries

    • Apache OpenNLP
      It is a Java based machine learning toolkit provided by Apache, that supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution. OpenNLP also includes maximum entropy and perceptron based machine learning. It provides built-in Java classes for each functionality as well a command line interface for testing the pre-built agents.


    • Natural Language Toolkit(NLTK)
      It is a platform for building Python programs to read and process human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.


    • Stanford CoreNLP
      Stanford CoreNLP provides a set of human language technology tools. It can give the base forms of words, their parts of speech, mark up the structure of sentences in terms of phrases and syntactic dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract or open-class relations between entity mentions, get the quotes people said, etc.


    • MALLET
      MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. Apart from classification, MALLET includes tools for sequence tagging for applications such as named-entity extraction from text. Algorithms include Hidden Markov Models, Maximum Entropy Markov Models, and Conditional Random Fields.


These are few of the many open source libraries and toolkits available for development on Natural Language Processing which can be utilized by developers in their applications.

In conclusion, Natural Language Processing is an important part of the artificial intelligence field and needs to be given importance if someone wants to master the trade of Machine Learning or Artificial Intelligence.




