NEW: RESOURCE PAGE

Searching for tutorials and software about Deep Learning and Neural Nets? Be sure to look at my Resource Page!
Looking for Octave? Go to my Easy Octave on Mac page!

Thursday, April 14, 2016

Numerai stuff.

We all know and love Kaggle, but it has a number of faults, the worst of which in my eyes is the dirty data which gets dumped there. People who are good at AI, I mean people like me (smile) , are often not very good at the mundane but so necessary potato peeling with a blunt knife which Kaggle seems to require to clean the data.

So, to summarise, Numerai seem to provide cleaner data and more winners. You can go to their site, to FastML, or read the Reddit AMA for more info, I will add links in here as Long as the topic interests me.

http://fastml.com/numerai-like-kaggle-but-with-a-clean-dataset-top-ten-in-the-money-and-recurring-payouts/

https://www.reddit.com/r/MachineLearning/comments/3wdr9e/numerai_a_global_ai_tournament_to_predict_the/

http://fastml.com/what-you-wanted-to-know-about-auc/

Numerai seem to be supporting an interesting Pyhon multicore framework called MachineJS which is optimised for Macs.
https://blog.numer.ai/2016/02/25/machineJS

You may wonder how Numerai can afford to give their data away. The answer is they encrypt it first, using a process named homomorphic encryption, which allows one to obfuscate data while preserving one's ability to work with it.
https://medium.com/@Numerai/encrypted-data-for-efficient-markets-fffbe9743ba8#.d1zxehecc


-----------
An interesting link to some recurrent net explanations and examples
http://karpathy.github.io/2015/05/21/rnn-effectiveness/

Wednesday, February 10, 2016

Simple plotting with Python 2.7 in Jupyter

 If you're as old as me —ancient— every few years you have to go through the graphical equivalent of  Hello World. As I'm a Python beginner, I decided to save all the necessary Python incantations and Jupyter magics for choosing axis labels, a line style, color, legend box etc, here so I can do it again in a few months when I'll have forgotten. And sorry, yes, it's a screenshot.

 The one thing really really needed for iPython/Jupyter is the %maptplotlib inline magic.  Although there are cuter alternatives it gets the images into the notebook in an acceptable way. 




Sunday, January 31, 2016

Still working away at the coalface

When you don't know a programming environment, your life consists of using Google to locate code code fragments, mod them try and get them to work, and then try to figure out how they work. 

At some point I guess I'll have learnt some Python, and feel comfortable with Numpy broadcasting and slicing and dicing arrays, and feel comfortable writing my own code without checking the shape() of each array every minute. That day I'll probably be told that Python is obsolete ...

At the moment, my hardest problem with iPython/Jupyter was to find a clean way to inline plots.

Either of the following two lines seems to work ok for now.
%matplotlib inline
%matplotlib notebook

I used a Hinton diagram module to display character images from the in-memory matrix. 

I'm starting to think one might profit from keeping around the original file data before it gets normalized, shuffled. 

Anyway, here is today's proof of work.




Friday, January 29, 2016

So I killed my kernel. Maybe a VM with checkpointing is a good way to work?

If anyone knows a clean way to checkpoint computational state in iPython/Jupyter, could they leave me a comment, or drop me an email ...please?

As we're all aware, I'm not the sharpest knife in the box. I killed my Python kerne while doing the Udacity assignments. The fact that I know no Python makes this almost predictable, I guess :)

I need to quickly figure out a way of checkpointing my work. This is the second time I've lost state, and that list of 529114 training exemplars takes a LOOONG time to uncompress. Suddenly, running the notebook  fully inside a VM sounds like a timesaver.





Wednesday, January 27, 2016

Getting the Udacity assignment up.

Lucky I got the stuff from Github yesterday,   the Register informs me today was an unscheduled holiday for OSS developers :)


Unfortunately, no holiday here. So to do the first assignment using my conda installation of TensorFlow, I now have to chase down a bunch of modules and do "conda install".  Ok, I've downloaded  the example files in the notebook, everything is copacetic. But maybe just maybe getting the docker image could have been smarter ....if all the install work is done there for the student.


Tuesday, January 26, 2016

Setting up the Udacity TensorFlow examples.

These are my worknotes. I'm an old geek, few skills, gray hair. So embarassed. But at least I cannot lose this lab book :)

I posted a short test of TensorFlow last year -a Julia curve done in an iPython notebook Now I'm on a different machine, and trying to set up the Udacity assignments. And I need to use version control.

Anaconda is installed. Python 2.7.
The iPython Launcher is up for 2.7. Notebooks work.
TensorFlow is installed and working in the notebook.

Command-line Git is installed on my machine. Hey, I've never used this thing.
$ git --version
git version 2.5.4 (Apple Git-61)

Wait! There's a book on Git here. Ok, I've scanned thru a couple of chapters.

Hmmm, better set up a decent text editor. Let's try the excellent and free TextWrangler which is a cutdown version of BBEdit Pro.

git config --global core.editor /Applications/TextWrangler.app 

Ok. Now we go and try to pull down the full examples folder. Wait - there is now also a new thing called skflow!

$ git clone https://github.com/tensorflow/tensorflow.git
Cloning into 'tensorflow'...
remote: Counting objects: 14815, done.
remote: Compressing objects: 100% (77/77), done.
remote: Total 14815 (delta 33), reused 0 (delta 0), pack-reused 14738
Receiving objects: 100% (14815/14815), 21.86 MiB | 1.78 MiB/s, done.
Resolving deltas: 100% (10390/10390), done.

Checking connectivity... done.

$ cd tensorflow
Edmunds-MBP:tensorflow edmundronald$ ls
ACKNOWLEDGMENTS README.md configure navbar.md third_party
AUTHORS RELEASE.md eigen.BUILD png.BUILD tools
CONTRIBUTING.md WORKSPACE google six.BUILD util
LICENSE bower.BUILD jpeg.BUILD tensorflow

Phew. I've survived my first interaction with Git. More tomorrow.












Monday, January 25, 2016

TensorFlow DeepLearning course on Udacity

As usual I'm blogging about what I'm viewing. There's a lot of needless detail because I'm an idiot, and I need a notebook or I forget everything.

Google has put up a new free course on Deep Learning on Udacity, which I'm auditing.  Credit where credit is due, the instructors is Vincent VanHoucke and the course was developed by Arpan Chakraborty.  TensorFlow is used for the practical work. 

Contrary to Professor Hinton's Coursera materials, this course is  practical, and seriously quick-paced; the video immediately starts off with AX+b, Softmax and CrossEntropy, so you should have some knowledge of linear algebra and Python.

To give you an idea of the course difficulty, here is the github repo with the assignments. You hit the first one after about an hour in. 

For rusty old geeks like me, I recommend finding a Cheatsheet for Python and Numpy. Google is your friend.It seems that Python will be the unavoidable scripting language workhorse for the rest of the decade, superseding things like Matlab and Octave.  Actually I quite like iPython's notebook interface. 

Readers may be surprised that programming fluency in whatever language is being used is not a requisite for AI work, but IMHO it really isn't. Getting a model that works is usually the hard part. 

I will blog my experiences later. For now, you can get a headstart by reading some of my old posts on TensorFlow, and especially the one detailing  TensorFlow installation with Anaconda. I just followed my own advice and it works well. BTW, the Anaconda graphical launcher is great, but it writes notebooks directly into your home directory at the top level. 

BTW, here is a trick to figure out where TensorFlow lives on your comp. And btw, the base install contains MNIST data.

$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
I copied the trick from this install and getting started page, linked by VanHoucke, but I still recommend conda ...unless you want to run one of the supplied Docker images. 

Monday, November 16, 2015

Conference papers are the natural prey of the iPad Pro.

Yesterday I went through the Mandelbrot Tensorflow tutorial.

Today, I went out to get an iPad Pro. It makes a wonderful PDF reader. I find PDFs are really hard to read on a laptop or small iPad,  but the iPad Pro and the Surface Pro have nice screens and can use pens for annotation. 

PDF Expert from Readdle is free this week, and it's my solution to the annotation problem on the iPad Pro. Of course if you don't need to annotate, you can just iBooks to ingest the PDFs and file them away - the trick is to click at the top right of Safari after the download finishes, and choose "open in iBooks".  However PDF Expert does have one trump card: It can index and search through your whole PDF library! 

Others may prefer to use OneNote on the Surface Pro: Here the trick is to run a book or paper through the "Print to OneNote" device, at which point you can drop it into your notebook and use all of OneNote's tools on the text, including and especially search and annotation.  In theory this allows you to quickly search a whole library of saved papers. 

The Apple Pencil is hard to find, but I have the one from 53, it works great, and I love the soft tip when using it as a typing and selection tool. Here is the Amazon link, they also have them at Apple retail stores. By the way, the Paper app from 53 is really really great; it was a great art app before, now it's a great notetaking app. 

By the way, I would be willing to bet that the electronics in the Apple Pencil and the 53 pencil are very similar. 

The improvements in stylus reactivity on the iPad Pro are spectacular even with a dumb stylus, and seem mostly due to a scan rate speedup in the display's touch sensor system. This means that actually any capacitative-display dumb stylus should work well on the iPad Pro.

Certainly the cheap stylus I also tested allowed me to write and draw fluidly with Paper.

If you are looking for test PDFs, there is always the TensorFlow white paper which describes Google's Artificial Neural Network simulation system in design, architecture, and implementation.  In addition to the ANN-specific stuff, it has a lot of abstruse details on the implementation issues for a networked dataflow simulator, and presumably points to references on the dataflow concept which might be worth chasing up. If Google have their way, the dataflow concept is due for a comeback.

The convincing argument which makes it worth learning TensorFlow is that for Google it serves both as a research environment, training software and production tool. This is software which has been tested, has seen production , and will not turn into abandoware come the morning. 

Saturday, November 14, 2015

Hello World with TensorFlow, Conda.

This continues my working notes.  The last post was about installing TensorFlow, this one is about starting to work it.  I'm using Conda. Select Python 2.7 and launch a notebook and copy and paste from the tutorial. I've jumped straight to the interactive usage part of the tutorial: Note the tf.InteractiveSession()



As I parse this, we create a  session sess that sits there in the background but which we don't to refer to, and we create a constant and a variable node which become part of the default grapg; then we ask the variable to run its initializer —resetting it to its default starting state— declare an operator node which is also added to the default graph, run it with its eval method and print the result. I'm not yet sure  of the semantics, but at least I have this thing running and can play with it.

As of a few days ago, I didn't know anything about Python, Conda or TensorFlow, so I think this is a nice HelloWorld run, and I'll go have a drink.