Machine Learning with Python

Machine Learning and Artificial Intelligence are considered as an integral part of the future technologies.

Artificial Intelligence is an area focused on developing intelligent machines that work and react like humans. To achieve this Artificial Intelligence considers all the traits that can help achieve the feat, these traits include perception, learning and planning. Machine learning on the other hand focuses on development of programs in such a way that systems can access data and use it to learn for themselves Artificial Intelligence focuses on making machines smart i.e. react as the situation demands whereas machine learning is based on providing machines access to data, making them learn themselves which makes their decisions learnt rather than smart.

For purview of our topic lets focus on Machine Learning now.

Machine Learning can be implemented in different programming languages and Python leads the lot by being the most widely used amongst them. All the tech giants are investing a major amount of their resources in these fields and are looking for fresh minds for the same.

The most important question “How do I get started”?

We’ll come to that in a moment.

For now, let’s focus on “What’s needed to get started”.

We need to have the knowledge of some common terms used in machine learning.

Algorithm

Algorithms are set of rules that are required for problem-solving operations.Machine learning has a host of algorithms and they vary depending upon the type of machine learning.

Model

A machine learning model consist of the learning algorithm and the training data to learn from i.e. a model is developed using an algorithm with suitable training data.

Feature

They are independent individual features that act as inputs to the model.

Label

It is the final output. While training of the model(supervised machine learning) it acts as pattern generator for a certain set of features and during prediction it is predicted value(output).

Pre-Processing

It describes any kind of process that is needed to prepare raw data for another processing procedure.

Now let’s get started!

First and the foremost we need to start with the building blocks of ML and that’s DATA.

Whoa! you knew that? Great!

So, we said Data, believe me we require a lot of it. And data comes after year’s long studies and observations and may even be erroneous and incomplete sometimes.

To handle that we need a ‘miracle’! Really? Nah! We have a concept known as pre-processing (includes data encoding). Using this we convert textual data into numerical data as the machine learning are mathematical functions and no mathematical functions accept textual data as input. And the features that already in numerical form use mean, median and mode to fill up for the missing values. Ya! that’s true that the amount of makeup depends on the variance of the large collection of data.

ML can be sub-categorized into 3 parts that’s supervised, unsupervised and reinforcement. For simplicity we’ll stick with supervised learning as of now. In this we divide the data into input data(features) set and output data set(labels). Depending on the type of ML algorithm that we choose to move forward with, we need to decide upon the categorization of the output label data set.

Now we move on to recursive feature elimination where we select the ML algorithm and provide the number of features that will be considered at the time of model fit.

So far following me? Cool! Now the data set needs to be further divided into training and test data. We should always follow the best approaches hence we’ll go for K-fold validation that not only divides the data but also helps in getting the best possible fit for the model.

Now the training data that we get after K-fold validation is used to train the model. In this process each fold is once used as a test data and the remaining folds are used to train the model. Hence each fold once forms a test data. Each round of the iteration is given a score.

Based on the results we determine the best combination of the folds that should be used to get the best fit model. Once we get the best fit model for the algorithm using the data that had been provided, this model can be persisted in memory and can be used in future for prediction of output labels.

Let’s get a better understanding of this:

Machine Learning Steps

Machine Learning Steps

Note that all the steps listed below are similar for any language but the tools and software that have been mentioned are for Python developers.

1)  Dataframe Creation

Data capturing uses N dimensional arrays(ndarrays) of ‘numpy’ and pandas dataframe.

Data frames are like excel sheets in which we can define indexes or names to rows and columns. Each column in a dataframe represents a feature.

2) Data-preprocessing

In order to convert textual data into numerical data it’s preferable to use OneHotEncoder or LabelEncoder but it completely depends on the developer’s choice as some even prefer CountVectorizer, TfidfVectorizer and HashingVectorizer.

Now to deal with blank values we can use SimpleImputer or imputer.

3)Splitting of dataframe

Data frames as we mentioned above can be easily broken into input and output labels.

4)Recursive Feature Elimination

Recursive feature elimination is a process of recursively removing features and building a model on the specified number of features. In this step, after breaking of dataframe we use recursive feature elimination in order to use the selected machine learning algorithm and specify number of features.

5) K-Fold Cross Validation and model fit

Once we complete all of that we move on to K-fold validation. K-fold cross validation is a resampling procedure used to evaluate a model on a limited set of data. Any type of K-fold validation can be used but we prefer Stratified K-fold algorithm to divide the data into different pairs of training and test datasets and then fit them to the instance of Recursive Feature Elimination created above. The pair with best score is used as to obtain the model.

6) Model Persistence

Finally, the model is persisted using pickle library for future predictions.

What Next?

This will help the beginners strengthen their initial concepts of machine learning and will act as a jump start for your endeavors in accomplishing machine learning with python. Of course, there are options available for development in contrast to Python, the popular ones being of course Scala, Java and Go. But the market and libraries has a monopoly of python. If you are a person who wants to try new things the options are open. But if you want to the save the pain and focus only the product then python is always in your arsenal.

We at Mirketa are focused on providing the best ML and AI solutions to any on-board problem. We are developing a dedicated forecasting application based on Machine learning, feel free to reach out to us for your AI needs. Do drop us your queries in the chat section on our website we’ll be happy to respond.

Vipul Goyal

Working as a Salesforce Engineer on AI and ML projects at Mirketa.
Posted in Artificial Intelligence, Machine Learning, Salesforce AI, Salesforce Machine Learning. Tagged with , , , .

Leave a Reply

Your email address will not be published. Required fields are marked *

*