Command line reference#

Entry point: python -m src.tabstruct.experiment.run_experiment

Core arguments#

  • –pipeline: prediction | generation

  • –model: see Models section

  • –task: classification | regression (prediction); generation infers from dataset

  • –dataset: dataset name (as supported by tabcamel)

  • –test_size, –valid_size: float in (0,1] or integer counts

  • –split_mode: stratified | random (regression -> random only)

  • –seed: int

  • –device: cpu | cuda

Data curation#

  • –curate_mode: sharing

  • –curate_ratio: float, number of curated per real sample

  • –generator, –generator_tags: reference past generation in W&B

  • –synthetic_data_path: explicit path

Lightning training#

  • –max_steps_tentative, –batch_size_tentative, –full_batch_training

  • –optimizer [adam|adamw|sgd], –gradient_clip_val

  • –lr_scheduler [none|plateau|cosine_warm_restart|linear|lambda]

  • –metric_model_selection, –patience_early_stopping

  • –log_every_n_steps_tentative, –check_val_every_n_epoch_tentative

Evaluation toggles#

  • –eval_only, –disable_eval_density, –disable_eval_privacy, –enable_eval_structure

Tuning#

  • –enable_optuna, –optuna_trial, –disable_optuna_pruning, –tune_reduction, –tune_max_workers

W&B#

  • –tags, –wandb_log_model, –disable_wandb, –checkpoint_tags

Notes#

  • For generation eval-only, either provide --synthetic_data_path or ensure matching --generator_tags are retrievable.

  • Regression tasks require --split_mode random.