Why You Should Always Use Feature Embeddings With Structured Datasets

A simple technique for boosting accuracy on ANY model you use

Machine Learning

Michael Malin

February 10, 2021

Feature embeddings are one of the most important steps when training neural networks on tabular data tables. Unfortunately, this technique is seldom taught outside of natural language processing (NLP) settings and is consequently almost completely ignored for structured datasets. But skipping this step can lead to significant drops in model accuracy! This has led to a false understanding that gradient boosted methods like XGBoost are always superior for structured dataset problems. Not only will embedding enhanced neural networks often beat gradient boosted methods, but both modeling methods can see major improvements when these embeddings are extracted. This article will answer the following questions:

What are feature embeddings?
How are they used with structured data?
If they are so powerful, why are they not more common?
How are embeddings implemented?
How do I use these embeddings to enhance other models?

Michael Malin

With over 12 years of experience, I have deployed over 40 successful projects across all AI domains including computer vison, NPL, GNN, and forecasting. I specialize in TensorFlow graph and deep learning networks.