TabStruct: Measuring Structural Fidelity of Tabular Data#
Overview#
TabStruct is an end-to-end benchmark for tabular data generation, prediction, and evaluation. It ships with ready-to-use pipelines for:
generating high-quality synthetic tables
training predictive models
analysing results with a rich suite of metrics - especially those that quantify structural fidelity
All components are designed to plug-and-play, so you can mix, match, and extend them to suit your own workflow.
Key Features#
Data generation
Out-of-the-box support for popular tabular generators: SMOTE, TVAE, CTGAN, NFlow, TabDDPM, ARF, and more.
Evaluation dimensions
Density estimation – How well does the synthetic data approximate the real distribution?
Privacy preservation – Does the generator leak sensitive records?
ML efficacy – How do models trained on synthetic data perform compared to real data?
Structural fidelity – Does the generator respect the causal structures of real data?
Predictive tasks
Classification & regression pipelines built on scikit-learn, with optional neural-network backbones.
Installation#
We recommend managing dependencies with conda + mamba.
# 1️⃣ Upgrade conda and activate the base env
conda update -n base -c conda-forge conda
conda activate base
# 2️⃣ Install the high-performance dependency resolver
conda install conda-libmamba-solver --yes
conda config --set solver libmamba
conda install -c conda-forge mamba --yes
# 3️⃣ Create a new conda env
conda create --name tabstruct python=3.10.18 --no-default-packages
conda activate tabstruct
# 4️⃣ Set up the env
bash scripts/utils/install.sh
Note
Search the codebase for absolute paths and replace them with paths on your machine.
Citation#
For attribution in academic contexts, please cite this work as:
@inproceedings{jiang2025well,
title={How Well Does Your Tabular Generator Learn the Structure of Tabular Data?},
author={Jiang, Xiangjian and Simidjievski, Nikola and Jamnik, Mateja},
booktitle={ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy}
}
Contents#
Guide
Reference