This is a short tutorial to compare the outcome of applying Deep Learning techniques to a text classification problem, using word embeddings and a convolutional neural network (CNN), via Keras (with Theano, for simplicity - but any Keras backend will do), GloVe embeddings, and a SciKit-Learn dataset. The original tutorial is taken from very useful blog read more
Given the last few years of hype around Deep Learning, knowing one of those frameworks is probably no longer an option, at least if you are a professional Machine Learning engineer. Personally, I always favor free, open source solutions, so Apache MXNet would be the natural fit. However, I …
This is a question I get on a frequent basis by colleagues that are serious programmers or software developers and are planning to pick up data analytics. There are three fundamental topics in mathematics that you need to cover, assuming that you are an expert in software development and …
tl;dr Surprisingly, it is hard to find a good command-line tool for sentence segmentation and word tokenization that works well with European languages. Here, I present segtok, a Python 2.7 and 3 package, API, and Unix command-line tool to remedy this shortcoming.
Text processing pipelines
This is the …
tl;dr Right now, use Wapiti unless you want to go beyond first-order and/or linear models, need the fastest possible training cycles, or are a Scala programmer, in which case you would be best advised to choose Factorie. OK, so that's that for a stressed out generation; Read …
If you are a computational linguist, data analyst, or bioinformatician working with biological text corpora (on medicine, neuroscience, molecular biology, etc.), you will rather sooner than later need access to MEDLINE. Right now, the MEDLINE subset ("baseline") of PubMed contains nearly 23 million records, all with titles, author names, etc …read more
Update 2015-07-24: Please check out the latest slides for this course, which now includes a quick introduction to dependency parsing, a more terse start, and several corrections and improvements all over.
Last week we had a really great time at the first hands-on text mining workshop in the context of …read more
Recently, a colleague of mine asked me to introduce the most important concepts of Node programming to a flock of interested people in our research group. Initially, I declined, considering the vast number of tutorials and books, but then thought it might be quite an interesting challenge: Is there …
UPDATE: Installing the Scientific Python stack from "source" has become a lot simpler recently and this tutorial was updated accordingly in November 2013 to use with OSX Mavericks and, in particular, Python 3.
Installing a full-stack scientific data analysis environment on Mac OSX for Python 3 and making sure the …read more
I just had a rough time figuring out how to bypass all the security features of the Rails project I am developing to write decent controller specs with RSpec. I am using AuthLogic as authentication module and declarative authorization (DA) for exactly that. However, when I started to write controller …read more
I now have tested MobileMe, SugarSync, and DropBox for quite a while to decide which service to buy for syncing my “electronic life” between my Macs (soon I’ll be managing two OSX Server blades, one Mini, and two MBPs!). After this period, there is no doubt to me: I …read more
Actually, I already considered myself very lucky this year for visiting the jungle in the Amazons. I would have not thought I could repeat such an experience that soon again, but was proven wrong shortly after. A few weeks ago, my girlfriend spent some weeks in Spain and we used …read more
Usually, I prefer to steer free from the day-to-day mainstream news, yet even I have to accept a low level of "noise" if I want to know at least something about the most significant things going on. However, currently I get the overwhelming feeling that the whole news world is …read more
|Python pre-3.0||Python post-3.0|
|str.encode||bytes.translate or (new) str.encode|
|str("x") == unicode("x")||bytes("x") != str("x")|
This change in Python 3.0 might be more than useful …
Colombia is one of the most magnificent countries I have been so far. If you know a tiny bit of its history with all the bloody civil wars which have been almost continuously tormenting the country since the 40s, it is more than astonishing to find that the people themselves …read more
In the last few weeks I had been to Cadiz and Almeria, both in the south of Spain. Well, actually I was in none of those cities, but in places close by.
The first trip was down to Cadiz, about 30 km south of the city some of our friends …read more
When I decided to move to Spain, I would never have thought of skiing here. There are the Pyrenees, where you could expect some areas, but I was not expecting anything anywhere else. Meanwhile, I have been skiing in Asturias (north, center) and Sierra Nevada (south, with a view of …read more