Augmentation as Simulation

Version 1.0 – Public

Data augmentation has evolved from a model optimization technique into a foundational part of many modern machine learning pipelines. While its benefits are well recognized, this chapter discusses the audit implications of augmentation when it begins to substitute for real-world data collection.

Understanding Augmentation

Augmentation refers to the process of creating modified versions of existing data to improve model robustness or generalization. Common techniques include rotation, cropping, noise injection, and synonym replacement. In some domains, generative methods are now used to simulate entirely new examples.

Why It Matters in Audit

Common Issues Observed

Audit Considerations

Policy Recommendations

When augmentation contributes significantly to training volume, it should be disclosed as a synthetic data process. Organizations are encouraged to version augmentation pipelines and make their effects transparent during audit reviews.

Maintainer: aiauditframework.com