Why You Should Always Use Feature Embeddings With Structured Datasets
A simple technique for boosting accuracy on ANY model you use
Machine Learning
February 10, 2021
Feature embeddings are one of the most important steps when training neural networks on tabular data tables. Unfortunately, this technique is seldom taught outside of natural language processing (NLP) settings and is consequently almost completely ignored for structured datasets. But skipping this step can lead to significant drops in model accuracy! This has led to a false understanding that gradient boosted methods like XGBoost are always superior for structured dataset problems. Not only will embedding enhanced neural networks often beat gradient boosted methods, but both modeling methods can see major improvements when these embeddings are extracted. This article will answer the following questions:
- What are feature embeddings?
- How are they used with structured data?
- If they are so powerful, why are they not more common?
- How are embeddings implemented?
- How do I use these embeddings to enhance other models?
Read more on Towards Data Science