Geat! So we've built a neural network that learns to classify documents.
Yet, as we can see, it takes ages to train (and even with a GPU: a lot more time) compared to all former models. Worse, it does not achieve the accuracy of the best off-the shelf models from SciKit-Learn. The above model can be made to achieve around 80% max. accuracy if you can use 300- or 100-dimensional word-embeddings and train the model long enough (15-20 epochs). With 50d embeddings and no embedding layer training, you need to run for about 20-30 epochs to converge on around 67% accuracy. So these results are a long shot from the 90% SOTA accuracy that is possible on this dataset and even the 85% we can achieve with the very ad-hoc "blitz-classification-experiment" from SciKit-Learn's own tutorial.
Overall, I wrote this "tutorial" mostly to demonstrate that you should probably focus on simple things first, before you dive head-first into Deep Learning:
- Learn your own embeddings (ideally from a text collection matching your target domain), and make sure to add those mission-critical collocations that Mikolov points out in his famous word2vec NIPS paper (pro-tip: all of which is trivial when working with Gensim).
- It is cool that we now can replace old-school TF-IDF vectors with modern-day neural word embeddings; But do use them in an "old school" ML model first, because its simple and fast; If for nothing else than to make sure you need or even can expect more performance from a Deep Learning classifier at an econmically viable expense.
And so, the challenge remains: How to actually beat the state-of-the-art in text classification with Deep Learning?
At the very least: "Its tricky"!
Very well-designed conv nets, and rather recent research on belief networks only in the most recent years managed to achieve the same ballpark results as the state-of-the-art results with "old school" ML models. But most of even the current Deep Learning research does not actually beat those "old" models on these two datasets.
(Which is not to say that stuff like VAEs are very cool though - and probalby would unfold their "full beauty" if you had a much larger dataset - see next.)
And probably by using LSTMs, GRU-RNNs, and (IMO: in particular) CharCNNs, to train sequence models might get you even beyond the state-of-the-art - if you have the resources and the data to even think of that, and an extensive amount of time to develop your specialized classifier/system.
(And the resources to run the inference on that mega-model in production, too, by the way...)
That is to say, yes, Deep Learning can claim it beats standard ML methods on this task (text classification), but the effort to do so is highly disproptionate if compared to "traditional" ML methods: You need very large datasets, designing the model is incredibly complex and time consuming, and training and using the setup is several orders of magnitude more costly.
In my opinion, the Deep Learning literature is littered with evaluation results that claim to beat all former state-of-the-art, but indeed are quite frequently not much better (or ex-aequo, and often even worse). However, computer vision, machine translation, and dependency parsing being the now famous cases where Deep Learning indeed has "pushed the envelope" by a substantial margin on the same, public (and often, small) community datasets for evaluating the approach and comparing it to existing methods. And nearly no paper at all discusses how much more resources go into setting up, developing, training, and using Deep Learning models as opposed to traditional Machine Learning.
That being said, many other applications (apart from CV, MT, and DP) can profit from Deep Learning for the following reason:
Iff you have much more training data (thousands, or even millions of examples per label), then, because Deep Learning can easily be scaled to work on such gigantic datasets, it indeed beats other methods (Support Vector Machines, Random Forrests, Nearest Neighbours, Gradient Boosting, etc.).
At the end of the day, I think Deep Learning is a very exciting technology you should learn to master, but you should take much of it with a very large grain of salt due to how much time and money you will need to invest.