Top Python Libraries for Machine Learning

Today, Python is one of the most popular programming languages and it has replaced many languages in the industry. There are various reasons for its popularity and one of them is that python has a large collection of libraries. With python, the data scientists need not spend all the day debugging. They just need to invest time in determining which python library works best for their ongoing projects. So, what is a Python Library? It is a collection of methods and functions that enable you to carry out a lot of actions without the need for writing your code.

Python is one of the most popular programming languages for solving the problems associated with machine learning. Professionals from various other backgrounds are learning Python due to the lucrative job career associated with it. Python libraries like Keras, Theanos, TensorFlow, and Scikit-Learn have made programming machine learning relatively easy. In this post, we will talk about the most popular Python libraries for machine learning.

Theano

Theano is a Python library that enables you to evaluate, optimize, and define mathematical expressions that involve multi-dimensional arrays effectively. It is one of the most heavily utilized deep learning libraries till date. Theano is similar to NumPy(fundamental package for the purpose of scientific computing with python), along with mathematical expressions and operations.

This library optimizes the utilization of CPU and GPU and enhances the performance of data-intensive computation. Theano code is written in such as way that it takes the advantage of how a computer compiler functions. This library actually serves as the neural networks’ building block. You can directly use this library if you require flexibility and fine-grain customization. For more advanced concepts in Theano, you can refer to the Theano tutorial.

Let’s have a look at an example where Theano supports functions having multiple outputs. We can compute the squared difference, absolute difference, and element-wise difference at the same time between two matrices, a and b.

a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = theano.function([a, b], [diff, abs_diff, diff_squared])

The function f returns the 3 variables when we use it.

f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1.,  0.],
       [-1., -2.]]), array([[ 1.,  0.],
       [ 1.,  2.]]), array([[ 1.,  0.],
       [ 1.,  4.]])]

Scikit-Learn

Scikit-Learn is a machine learning library for python and is designed to interoperate with the scientific and numerical libraries of python such as SciPy and NumPy. It is majorly considered for bringing machine learning into a production system.

Scikit-learn offers a range of unsupervised and supervised learning algorithms through a consistent interface in python. Scikit-learn is built upon the SciPy. So, before you can use scikit-learn, you have to install SciPy. Modules or extensions for SciPy are conventionally named as SciKits. The module, as such, provides learning algorithms and is known as scikit-learn.

The vision for the library is a level of support and robustness needed for use in production systems. This means an in-depth focus on concerns like performance, documentation, collaboration, code quality, and ease of use. Scikit-learn is focused on modeling data but not on summarizing, manipulating, and loading data.

Some popular model groups provided by scikit-learn include supervised models, manifold learning, parameter tuning, feature selection, dimensionality reduction, clustering, and cross validation. For a deeper understanding of scikit-learn, you can check out the scikit-learn tutorials.

Let’s consider the CART(Classification and Regression Trees) example using scikit-learn python library. We use the CART decision tree algorithm in this example for modeling the Iris flower dataset. This is provided as an example dataset with the library and is loaded. The predictions are made on the training data after the classifier is fit on the data. At the end, a confusion matrix and the classification accuracy are printed.

# Sample Decision Tree Classifier
from sklearn import datasets
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit a CART model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target)
print(model)
# make predictions
expected = dataset.target
predicted = model.predict(dataset.data)
# summarize the fit of the model
print(metrics.classification_report(expected, predicted))
print(metrics.confusion_matrix(expected, predicted))
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')
             precision    recall  f1-score   support
          0       1.00      1.00      1.00        50
          1       1.00      1.00      1.00        50
          2       1.00      1.00      1.00        50
avg / total       1.00      1.00      1.00       150
[[50  0  0]
 [ 0 50  0]
 [ 0  0 50]]

This shows you the trained model’s details, its skill according to few common metrics, and the confusion matrix.

TensorFlow

TensorFlow is a famous open source deep learning library developed by the Google Brain team within the Machine Intelligence Research organization of Google. It is a blend between network specification libraries such as Lasagne and Blocks, and symbolic computation libraries such as Theano. If you are using Google voice search or Google photos, then you are utilizing the models built with Tensorflow.

Tensorflow is a computational framework used for the purpose of expressing algorithms that involve numerous Tensor operations. As neural networks are expressed in the form of computational graphs, their implementation is done using Tensorflow in the form of a series of operations on Tensors. Tensors are nothing but the N-dimensional matrices that represent our data.

Distributed computing is the major benefit of Tensorflow, especially among multiple-GPUs. Tensorflow offers utilities for effective data pipelining, and consist of built-in modules for serialization, visualization, and inspection of modules. Recently, the Tensorflow team started incorporating support for Keras(a deep learning library). To understand how to accomplish a specific task in TensorFlow, you can refer to the TensorFlow tutorials.

Let’s have a look at computation and linear regression with Tensorflow.

Computation

The below example shows how you can define constants, create a session, and perform computation with the constants using the session.

import tensorflow as tf
sess = tf.Session()
a = tf.constant(10)
b = tf.constant(32)
print(sess.run(a+b))

If you run the above code, it displays the result as 42.

Linear Regression

The below example reveals how you can define variables(b and W) and the variables which are the result of computation(y).

import tensorflow as tf
import numpy as np
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but Tensorflow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
# Before starting, initialize the variables.  We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# Fit the line.
for step in xrange(201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(W), sess.run(b))
# Learns best fit is W: [0.1], b: [0.3]
(0, array([ 0.2629351], dtype=float32), array([ 0.28697217], dtype=float32))
(20, array([ 0.13929555], dtype=float32), array([ 0.27992988], dtype=float32))
(40, array([ 0.11148042], dtype=float32), array([ 0.2941364], dtype=float32))
(60, array([ 0.10335406], dtype=float32), array([ 0.29828694], dtype=float32))
(80, array([ 0.1009799], dtype=float32), array([ 0.29949954], dtype=float32))
(100, array([ 0.10028629], dtype=float32), array([ 0.2998538], dtype=float32))
(120, array([ 0.10008363], dtype=float32), array([ 0.29995731], dtype=float32))
(140, array([ 0.10002445], dtype=float32), array([ 0.29998752], dtype=float32))
(160, array([ 0.10000713], dtype=float32), array([ 0.29999638], dtype=float32))
(180, array([ 0.10000207], dtype=float32), array([ 0.29999897], dtype=float32))
(200, array([ 0.1000006], dtype=float32), array([ 0.29999971], dtype=float32))

Keras

Keras is a high-level neural networks application programming interface(API) and is written in python. It is one of the most user-friendly libraries used for building neural networks and runs on top of Theano, Cognitive Toolkit, or TensorFlow. The main reason behind developing this library is to enable faster experimentation.

If you need a python library that runs seamlessly on GPU and CPU, supports both recurrent networks and convolutional networks, or combinations of the two, and enables fast and easy prototyping(via extensibility, modularity, and friendliness), you can use Keras.

It enables the users to choose whether the models they develop are executed on the symbolic graph of TensorFlow or Theano. The user interface of Keras is touch-inspired. Hence, Keras is definitely worth a look if you have previous experience with machine learning in Lua. The Keras community is very active and quite large, thanks to its relative ease of use and excellent documentation.

This library consists of a large number of implementations of generally used neural network building blocks like optimizers(RMSProp, Adam), activation functions, objectives, layers, and a host of tools to make working with text data and image easier.

You can construct both graph-based networks and sequence-based networks with Keras. This makes implementing highly complex network architectures like SqueezeNet and GoogLeNet easier. You can also refer to other example models in Keras and Computer Vision class from Stanford.

The following code shows the process for building a convolutional neural network using Keras after installing Keras. Here, we will be training a classifier for handwritten digits which have over 99 percent accuracy on the Modified National Institute of Standards and Technology(MNIST) dataset.

# 1. Import libraries and modules
import numpy as np
np.random.seed(123)  # for reproducibility
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras.datasets import mnist
# 2. Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# 3. Preprocess input data
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# 4. Preprocess class labels
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
# 5. Define model architecture
model = Sequential()
model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1,28,28)))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
# 6. Compile model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
# 7. Fit model on training data
model.fit(X_train, Y_train,
          batch_size=32, nb_epoch=10, verbose=1)
# 8. Evaluate model on test data
score = model.evaluate(X_test, Y_test, verbose=0)

PyTorch

Among the list of python deep learning libraries, PyTorch is relatively new and it’s a loose port of Torch library to python. This library is notable as the FAIR(Facebook AI Research Team) backs it. The other main reason for its significance is it can handle dynamic computation graphs. This feature is absent in TensorFlow, Theano, and derivatives. Pytorch provides flexibility as the deep learning development platform.

Pytorch is an easy to use API and integrates smoothly with the python data science stack. It is quite similar to Numpy. Pytorch offers a framework to build computational graphs on the go, and can even alter them during runtime. This is valuable in situations where we don’t know how much memory will we need for building a neural network. Other benefits of Pytorch include simplified preprocessors, custom data loaders, and multiGPU support.

The multidimensional arrays in Pytorch are known as Tensors and Pytorch supports various types of Tensors. Autograd module(automatic differentiation) is a technique used by Pytorch while creating neural networks. By using this technique, time can be saved on one epoch by calculating the differentiation of parameters during the forward pass itself. The optim module used by Pytorch implements different optimization algorithms used for creating neural networks. The nn module used by Pytorch defines a module set. We can think of this set of modules as a neural network layer that generates output from input and may have few trainable weights. You can refer to the PyTorch tutorials for other details.

Now, let’s have a look at the code for building a simple fully connected neural network in PyTorch.

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
def simple_gradient():
    # print the gradient of 2x^2 + 5x
    x = Variable(torch.ones(2, 2) * 2, requires_grad=True)
    z = 2 * (x * x) + 5 * x
    # run the backpropagation
    z.backward(torch.ones(2, 2))
    print(x.grad)
def create_nn(batch_size=200, learning_rate=0.01, epochs=10,
              log_interval=10):

    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])),
        batch_size=batch_size, shuffle=True)
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.fc1 = nn.Linear(28 * 28, 200)
            self.fc2 = nn.Linear(200, 200)
            self.fc3 = nn.Linear(200, 10)

        def forward(self, x):
            x = F.relu(self.fc1(x))
            x = F.relu(self.fc2(x))
            x = self.fc3(x)
            return F.log_softmax(x)
    net = Net()
    print(net)
    # create a stochastic gradient descent optimizer
    optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)
    # create a loss function
    criterion = nn.NLLLoss()
    # run the main training loop
    for epoch in range(epochs):
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = Variable(data), Variable(target)
            # resize data from (batch_size, 1, 28, 28) to (batch_size, 28*28)
            data = data.view(-1, 28*28)
            optimizer.zero_grad()
            net_out = net(data)
            loss = criterion(net_out, target)
            loss.backward()
            optimizer.step()
            if batch_idx % log_interval == 0:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(data), len(train_loader.dataset),
                           100. * batch_idx / len(train_loader), loss.data[0]))
    # run a test loop
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        data = data.view(-1, 28 * 28)
        net_out = net(data)
        # sum up batch loss
        test_loss += criterion(net_out, target).data[0]
        pred = net_out.data.max(1)[1]  # get the index of the max log-probability
        correct += pred.eq(target.data).sum()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
if __name__ == "__main__":
    run_opt = 2
    if run_opt == 1:
        simple_gradient()
    elif run_opt == 2:
        create_nn()

Conclusion

The above libraries top the list of python libraries that both data scientists and engineers are extensively using today. It’s worth gaining in-depth knowledge of machine learning libraries or atleast, getting familiarized with them. There are many other frameworks and libraries that deserve attention for particular tasks. So, if you have any library in mind other than the ones mentioned above, you can let our audience know in the comments section.

Machine Learning