Evaluating potential data integrity risks in NumPy-based preprocessing pipelines that may lead to:
if np.issubdtype(df['patient_id'].dtype, np.floating):
    raise ValueError("Numeric IDs coerced to float - may lose precision")
if 'Likert_scale' in df.columns and np.mean(df['Likert_scale']) not in [1,2,3,4,5]:
    print("Warning: Ordinal data treated as interval")
    assert 'input_mean' in transformation_metadata, "No record of pre-normalization stats"
assert hashlib.md5(input_array).hexdigest() == saved_checksum, "Input altered before processing"
    if np.max(output_array) == upper_bound:
    print("Warning: Values may have been silently clipped")
if (input_array.dtype == np.int32) and (np.max(input_array) > 2**30):
    print("Warning: Potential int32 overflow risk")
    | Risk | Test Performed | Evidence Location | Result | 
|---|---|---|---|
| Semantic unawareness | Ordinal data averaged | normalize.py line 22 | FAIL | 
| No audit trail | Missing input checksums | preprocess_data.ipynb | FAIL | 
| Silent value clipping | 5% of values at upper bound | scale_features() output | WARNING | 
| Type coercion | ID field converted to float64 | Git commit #a1b3d4 | FAIL | 
def safe_convert_to_int(arr):
    if not np.all(np.modf(arr)[0] == 0):
        raise ValueError("Float-to-int conversion would lose precision")
    return arr.astype(int)
    def logged_operation(arr, op_name):
    print(f"{op_name} - Input hash: {hashlib.sha256(arr.tobytes()).hexdigest()}")
    return arr
    def safe_clip(arr, min_val, max_val):
    n_clipped = np.sum((arr < min_val) | (arr > max_val))
    if n_clipped > 0:
        warnings.warn(f"Clipped {n_clipped} values")
    return np.clip(arr, min_val, max_val)
    Attach:
Sign-off:
| Auditor | Date | Reviewer | 
|---|---|---|
| [Your Name] | [Date] | [AI Governance Lead] |