Visualizing Machine Learning Tasks With Word Embeddings

Organize over 400 concepts in space based on their semantic similarity (using a GloVe model pre-trained on 2 billion Tweets).

Machine learning is delivering value to rapidly increasing breadth of industries. Often in a form of clever and novel use cases that leverage it's incredibly horizontal application potential. Contrary to the intuition, what lies behind the innovation are quite often well understood tasks and the creative leap is in their identification and impactful integration into the context of the unique problem at hand. We will look into the landscape of those tasks as they are categorised under machine learning research datasets by Papers With Code (an amazing resource compiling research papers along with the supplied code and data). We've webscraped close to a 500 machine learning tasks and embeded them into a 25-dimensional space (based on a semantic understanding of words) using a GloVe (Global Vectors for Word Representation) model, pre-trained on 2 billions of Tweets. After reducing the dimensionality from 25D to 3D and 2D, using PCA (Principal Component Analysis), we've been able to visualize the tasks in a way that captures meaning within their spacial interrelations. You can, for instance, explore the neighbors of a chosen task in order to discover other semantically similar tasks.

👇 Check out visualizations below and this Jupyter notebook 📔 with the process walk-through 👇

On the second look, the scarry-looking plot uncovers interesting, meaningful relationships learned by processing 2 billions of Tweets. (eg. tasks of detecting hate speech, abusive language and emotions are embeded close to each other).

Interactive Visualizations

Interactive Visualization - Machine learning tasks embeded in 2D space.

Interactive Visualization - Machine learning tasks embeded in 3D space (zoom, rotate and pan around).

Code Deep Dive

Follow this link to the Jupyter notebook to get hands-on with the visualizations and to explore the code! 🙏