Overview

Contents

Overview#

TabStruct is a modular framework for tabular machine learning with two pipelines:

Prediction: supervised learning on tabular data
Generation: synthesise tabular data and evaluate it

Key components#

Experiment runner (experiment/run_experiment.py): CLI entrypoint
Pipelines (experiment/pipeline): orchestrates model training/eval
Data layer (DataHelper, DataModule): split, curate, preprocess, and load data
Model layer (prediction/models, generation/models): sklearn baselines, Lightning models and other tabular models
Tuning (experiment/tune): Optuna sweeps

Metrics#

Classification: balanced_accuracy, F1_weighted, precision, recall, AUROC_weighted, ECE, cross_entropy_loss
Regression: rmse, mse, r2

W&B#

Project and entity are defined in src/tabstruct/common/__init__.py
All runs log summaries and artifacts under logs/ and W&B