This is a short tutorial to compare the outcome of applying Deep Learning techniques to a text classification problem, using word embeddings and a convolutional neural network (CNN), via Keras (with Theano, for simplicity - but any Keras backend will do), GloVe embeddings, and a SciKit-Learn dataset. The original tutorial is taken from very useful blog read more
A quick reference for working with TensorFlow
Introduction
Given the last few years of hype around Deep Learning, knowing one of those frameworks is probably no longer an option, at least if you are a professional Machine Learning engineer. Personally, I always favor free, open source solutions, so Apache MXNet would be the natural fit. However, I …
What are the best books [for programmers] to get into Data Science?
Introduction
This is a question I get on a frequent basis by colleagues that are serious programmers or software developers and are planning to pick up data analytics. There are three fundamental topics in mathematics that you need to cover, assuming that you are an expert in software development and …
An efficient online sequence tagger resource for GATE
tl;dr for a stressed out generation: GATE's Generic Tagger framework is a CREOLE plug-in that allows you to wrap any existing sequence tagger and use it to create annotations in your pipeline, but it is a bit slow. Therefore, I have created the Online Tagger GATE plug-in that …
read moresegtok - a segmentation and tokenization library
tl;dr Surprisingly, it is hard to find a good command-line tool for sentence segmentation and word tokenization that works well with European languages. Here, I present segtok, a Python 2.7 and 3 package, API, and Unix command-line tool to remedy this shortcoming.
Text processing pipelines
This is the …
A review of sparse sequence taggers
Introduction
tl;dr Right now, use Wapiti unless you want to go beyond first-order and/or linear models, need the fastest possible training cycles, or are a Scala programmer, in which case you would be best advised to choose Factorie. OK, so that's that for a stressed out generation; Read …
MEDLINE Kung-Fu
If you are a computational linguist, data analyst, or bioinformatician working with biological text corpora (on medicine, neuroscience, molecular biology, etc.), you will rather sooner than later need access to MEDLINE. Right now, the MEDLINE subset ("baseline") of PubMed contains nearly 23 million records, all with titles, author names, etc …
read moreAn Introduction to Statistical Text Mining
Update 2015-07-24: Please check out the latest slides for this course, which now includes a quick introduction to dependency parsing, a more terse start, and several corrections and improvements all over.
Last week we had a really great time at the first hands-on text mining workshop in the context of …
read moreGetting started with a "virtual" Go environment
Concurrent Node.js
Introduction
Recently, a colleague of mine asked me to introduce the most important concepts of Node programming to a flock of interested people in our research group. Initially, I declined, considering the vast number of tutorials and books, but then thought it might be quite an interesting challenge: Is there …
Installing a full stack Python data analysis environment on OSX
UPDATE: Installing the Scientific Python stack from "source" has become a lot simpler recently and this tutorial was updated accordingly in November 2013 to use with OSX Mavericks and, in particular, Python 3.
Installing a full-stack scientific data analysis environment on Mac OSX for Python 3 and making sure the …
read moreRails: RSpec'ing controllers with declarative authorization AND AuthLogic
I just had a rough time figuring out how to bypass all the security features of the Rails project I am developing to write decent controller specs with RSpec. I am using AuthLogic as authentication module and declarative authorization (DA) for exactly that. However, when I started to write controller …
read moreMobileMe vs. SugarSync vs. DropBox
I now have tested MobileMe, SugarSync, and DropBox for quite a while to decide which service to buy for syncing my “electronic life” between my Macs (soon I’ll be managing two OSX Server blades, one Mini, and two MBPs!). After this period, there is no doubt to me: I …
read moreMy first visit to a volcano
Actually, I already considered myself very lucky this year for visiting the jungle in the Amazons. I would have not thought I could repeat such an experience that soon again, but was proven wrong shortly after. A few weeks ago, my girlfriend spent some weeks in Spain and we used …
read moreNews, Swines & Pigs
Usually, I prefer to steer free from the day-to-day mainstream news, yet even I have to accept a low level of "noise" if I want to know at least something about the most significant things going on. However, currently I get the overwhelming feeling that the whole news world is …
read moreWhy I love Python 3.0: Unicode + UTF-8
tl;dr summary
Python pre-3.0 | Python post-3.0 |
---|---|
str.encode | bytes.translate or (new) str.encode |
str.decode | bytes.decode |
unicode | str |
unicode.encode | str.encode |
unicode.decode | *n/a* |
str("x") == unicode("x") | bytes("x") != str("x") |
This change in Python 3.0 might be more than useful …
Amazonas 101
Colombia is one of the most magnificent countries I have been so far. If you know a tiny bit of its history with all the bloody civil wars which have been almost continuously tormenting the country since the 40s, it is more than astonishing to find that the people themselves …
read moreTextMate Python and Django cheat sheet
Travelling around Spain
In the last few weeks I had been to Cadiz and Almeria, both in the south of Spain. Well, actually I was in none of those cities, but in places close by.
The first trip was down to Cadiz, about 30 km south of the city some of our friends …
read moreSkiing in Spain
When I decided to move to Spain, I would never have thought of skiing here. There are the Pyrenees, where you could expect some areas, but I was not expecting anything anywhere else. Meanwhile, I have been skiing in Asturias (north, center) and Sierra Nevada (south, with a view of …
read more