AI Data Preprocessing Audit Workpaper: NumPy Data Handling Risks

Audit Focus Area

Evaluating potential data integrity risks in NumPy-based preprocessing pipelines that may lead to:

Evidence Collection Procedures

1. Low Semantic Awareness

Where to Find Evidence: Example Tests:
if np.issubdtype(df['patient_id'].dtype, np.floating):
    raise ValueError("Numeric IDs coerced to float - may lose precision")

if 'Likert_scale' in df.columns and np.mean(df['Likert_scale']) not in [1,2,3,4,5]:
    print("Warning: Ordinal data treated as interval")

2. No Audit Trail

Where to Find Evidence: Example Tests:
assert 'input_mean' in transformation_metadata, "No record of pre-normalization stats"
assert hashlib.md5(input_array).hexdigest() == saved_checksum, "Input altered before processing"

3. Silent Value Coercion/Clipping

Where to Find Evidence: Example Tests:
if np.max(output_array) == upper_bound:
    print("Warning: Values may have been silently clipped")

if (input_array.dtype == np.int32) and (np.max(input_array) > 2**30):
    print("Warning: Potential int32 overflow risk")

Workpaper Template

RiskTest PerformedEvidence LocationResult
Semantic unawarenessOrdinal data averagednormalize.py line 22FAIL
No audit trailMissing input checksumspreprocess_data.ipynbFAIL
Silent value clipping5% of values at upper boundscale_features() outputWARNING
Type coercionID field converted to float64Git commit #a1b3d4FAIL

Key Findings

Recommendations

Semantic Guards

def safe_convert_to_int(arr):
    if not np.all(np.modf(arr)[0] == 0):
        raise ValueError("Float-to-int conversion would lose precision")
    return arr.astype(int)

Audit Trail

def logged_operation(arr, op_name):
    print(f"{op_name} - Input hash: {hashlib.sha256(arr.tobytes()).hexdigest()}")
    return arr

Bounds Checking

def safe_clip(arr, min_val, max_val):
    n_clipped = np.sum((arr < min_val) | (arr > max_val))
    if n_clipped > 0:
        warnings.warn(f"Clipped {n_clipped} values")
    return np.clip(arr, min_val, max_val)

Auditor Notes

Attach:

Sign-off:

AuditorDateReviewer
[Your Name][Date][AI Governance Lead]

Framework References