TabStruct: Measuring Structural Fidelity of Tabular Data

TabStruct: Measuring Structural Fidelity of Tabular Data#

arxiv license python

Overview#

TabStruct is an end-to-end benchmark for tabular data generation, prediction, and evaluation. It ships with ready-to-use pipelines for:

  • generating high-quality synthetic tables

  • training predictive models

  • analysing results with a rich suite of metrics - especially those that quantify structural fidelity

All components are designed to plug-and-play, so you can mix, match, and extend them to suit your own workflow.

Key Features#

Data generation

  • Out-of-the-box support for popular tabular generators: SMOTE, TVAE, CTGAN, NFlow, TabDDPM, ARF, and more.

Evaluation dimensions

  • Density estimation – How well does the synthetic data approximate the real distribution?

  • Privacy preservation – Does the generator leak sensitive records?

  • ML efficacy – How do models trained on synthetic data perform compared to real data?

  • Structural fidelity – Does the generator respect the causal structures of real data?

Predictive tasks

  • Classification & regression pipelines built on scikit-learn, with optional neural-network backbones.

Installation#

We recommend managing dependencies with conda + mamba.

# 1️⃣ Upgrade conda and activate the base env
conda update -n base -c conda-forge conda
conda activate base

# 2️⃣ Install the high-performance dependency resolver
conda install conda-libmamba-solver --yes
conda config --set solver libmamba
conda install -c conda-forge mamba --yes

# 3️⃣ Create a new conda env
conda create --name tabstruct python=3.10.18 --no-default-packages
conda activate tabstruct

# 4️⃣ Set up the env
bash scripts/utils/install.sh

Note

Search the codebase for absolute paths and replace them with paths on your machine.

Citation#

For attribution in academic contexts, please cite this work as:

@inproceedings{jiang2025well,
  title={How Well Does Your Tabular Generator Learn the Structure of Tabular Data?},
  author={Jiang, Xiangjian and Simidjievski, Nikola and Jamnik, Mateja},
  booktitle={ICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy}
}

Contents#