Social Foundations of Computation

Predictors from Causal Features Do Not Generalize Better to New Domains

2024

Conference Paper

sf


We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets covering applications in health, employment, education, social benefits, and politics. Each dataset comes with multiple domains, allowing us to test how well a model trained in one domain performs in another. For each prediction task, we select features that have a causal influence on the target of prediction. Our goal is to test the hypothesis that models trained on causal features generalize better across domains. Without exception, we find that predictors using all available features, regardless of causality, have better in-domain and out-of-domain accuracy than predictors using causal features. Moreover, even the absolute drop in accuracy from one domain to the other is no better for causal predictors than for models that use all features. If the goal is to generalize to new domains, practitioners might as well train the best possible model on all available features.

Author(s): Vivian Y. Nastl and Moritz Hardt
Book Title: arXiv preprint arXiv:2402.09891
Year: 2024

Department(s): Social Foundations of Computation
Bibtex Type: Conference Paper (conference)

State: Submitted

Links: ArXiv

BibTex

@conference{nastl2024predictorscausalfeaturesgeneralize,
  title = {Predictors from Causal Features Do Not Generalize Better to New Domains},
  author = {Nastl, Vivian Y. and Hardt, Moritz},
  booktitle = {arXiv preprint arXiv:2402.09891},
  year = {2024},
  doi = {}
}