Getting Started on Deep Learning with Python

 

An Introduction to Deep Learning

In Karl Friston’s wonderfully entitled paper ‘The history of the future of the Bayesian brain’, he recalls his working with Geoffrey Hinton, how Hinton emphasized Bayesian formulations and generative models, and how Friston developed his biological minimization of ‘Variational Free Energy’ theory from Hinton’s ideas, adopting Hinton’s references to Free Energy, Kullback–Leibler divergence and Helmholtz and Boltzmann Machines within the field of artificial neural networks.

Hinton (co-)invented ‘Boltzmann Machines’ which are recurrent  artificial neural networks that have randomized weights or neuron function (i.e. ‘stochastic’) and he also invented fast learning algorithms for  ‘Restricted Boltzmann Machines’ (where neurons have connections to neurons in other layers but not to those in the same layer).

He modestly claims that his efforts over the decades led to a 10-fold increase in performance but that, during this time, Moore’s Law increased computing power by 100,000! Added to that was the new availability of large data sets with which to train networks.

But the result of all this was that ‘deep’ neural networks (those with more than 1 hidden layer i.e. those with more than 3 layers in total) were able to perform very good feature extraction in a reasonable time. Lower layers in the hierarchy extra simple features uppon which the higher layers can extract more and more elaborate features. This then resulted in a rapid commercialization of such algorithms for applications like speech recognition, as used in Google Voice search and Apple’s Siri.

So now the emeritus Professor Hinton is a founding father of ‘Deep Learning’ and works part-time at Google.

A new strand of posts here will look at Deep Learning and how it works. These will be based around the Python computer language. This ‘Introduction to Deep Learning with Python’ video by Alec Radford at indico talks through some Python code for optical character recognition. Below, I cover installing all the code and applications to be able to run the code shown in the video, to get us started.

 

Overview of Installing Python

To get this code running on a Windows PC, we need:

  1. The python source code.
  2. Python itself
  3. The NumPy maths package, required by the source code.
  4. The Theano numerical methods Python package, required by the source code.
  5. ‘Pip’ (‘Pip Installs Python’) – for installing Python packages!
  6. The ‘MinGW’ gcc compiler, for compiling the Theano package for much faster execution times.
  7. The MNIST data set of training and usage character bitmaps.

 

Installing Anaconda

Anaconda2 provides 3 of the above:

  • Python 2.7
  • NumPy
  • Pip

Go to:

https://www.continuum.io/downloads

and go to the ‘zipped Windows installers’ (to work whether behind a firewall or not).

Download the latest 32-bit version for Python 2:

Anaconda2-2.5.0-Windows-x86.zip

Double-clicking on the downloaded ZIP file automatically pushes through to the Anaconda2-2.5.0-Windows-x86 application (Windows understands ZIP compression format). Double-click on this Anaconda2-2.5.0-Windows-x86  application to install Anaconda. Selecting to install ‘just for me’ will probably be easier hence install to the user area – C:/Users/User/Anaconda2_32. (Add the ‘_32’ suffix as in case we need to install a 64-bit installation later on.)

Have ‘yes’ ticked for adding Anaconda to PATH. Have ‘yes’ ticked for Anaconda to have the default Python 2.7. Installation then takes a while.

 

Installing the Main Python Packages

Locate the ‘Anaconda Prompt’ – easiest through the Windows search. This opens a command shell.

Go to the Anaconda2_32\Scripts directory:

cd Anaconda2_32\Scripts

‘Pip’ (pip.exe0 and ‘Conda’ (conda.exe) will be in here.

Installation will generally use Conda rather than Pip. Ensure you have the latest packages to install, but first ensure you have the latest Conda to install them!:

conda update conda

Select ‘y’ if not up to date. Continue:

conda update –all

Finally, install the desired packages:

conda install scipy

conda install numpy

 

Installing GCC for Compiling the Theano Package

The Theano numerical methods package can be interpreted but this will be very slow. Instead, the package should be compiled. For this, the MinGW (‘Minimalist Gnu for Windows’) compiler should be installed. Follow the link from:

http://www.mingw.org/wiki/Getting_Started

to SourceForge to automatically download the setup executable:

mingw-get-setup.exe

into the Downloads directory.

Double-click this and install this. Select

C:\User\Users\MinGW

as the install directory (for consistency with the Anaconda2-32 installation).

 

Setting the Path to point to GCC

To ensure that Conda will ‘see’ the compiler when doing the Theano installation, confirm that the PATH environment variable compiler points to it. Select:

Start -> Control Panel -> System -> Advanced -> Environment Variables

(Alternatively, in the Search window, type Environment and select ‘Edit the Environment Variables’.)

Double-click on ‘PATH’ and add MinGW to the start/top of the list. It should point to:

C:\Users\User\MinGW\lib

C:\Users\User\Anaconda2_64

C:\Users\User\Anaconda2_64\Scripts

C:\Users\User\Anaconda2_64\Library\bin

 

Installing the Theano Package

Then install the Gnu c++/g++ compiler to speed-optimize the Theano library. In the ‘Anaconda Prompt’ shell, ensure that you are in the correct directory:

cd \Users\User\Anaconda2_32\Scripts

and type:

conda install mingw libpython

And finally install the numerical methods python library ‘Theano’:

pip install theano.

 

Download the Example Python Code

The text with the YouTube video points to the code at:

https://github.com/Newmu/Theano-Tutorials

and click ‘Download ZIP’. Double click on the downloaded ZIP and copy the Theano-Tutorials directory to C:\Users\User\Anaconda2.

 

Downloading the MNIST Character Dataset

The MNIST character dataset is available through Yann LeCun‘s personal website:

Windows cannot unzip ‘gzip’ (*.gz) files directly. I you don’t have an application to do this, download and run ‘7zip’:

http://www.7-zip.org/

Gzip (*.gz) need to be associated with ‘7zip’. Then double-click on each gzip file in turn and ‘extract’ the uncompressed files from them. These should all be installed under:

C:\Users\User\Anaconda2_32\Theano-Tutorials-master\media\datasets\mnist

There is a mismatch between the filenames in the MNIST dataset and the file references in the Python code. Using the Windows Explorer, change the ‘.’ in all the filenames to a ‘-‘ e.g. rename train-images.idx3-ubyte to train-images-idx3-ubyte.

 

Running the Code

The Anaconda installation includes the ‘Spyder’ IDE for Python. Search for ‘Spyder Desktop App’ and run.

Browse to set the working directory (top right) to:

C:\Users\User\Anaconda2_32\Theano-Tutorials-master

An open the first Python script (File -> Open):

0_multiply.py

This shows the source code.

Select  Run -> Run (F5) to execute this code.

Selecting other programs are likely to result in either a ‘memory error’ or ‘No module named foxhound.utils.vis’.

The memory error issue can be overcome by running the code from the Anaconda Prompt:

cd C:\Users\User\Anaconda2_32\Theano-Tutorials-master

python 4_modern_net.py

This still means that 3_net.py and 5_convolutional_net.py cannot be run, and what the other programs are actually doing hasn’t been discussed. That is left for another time.

Advertisements
This entry was posted in Uncategorized and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s