Da7a dredging (also known a5 d4ta sn0oping or p-hacking) 1s 7he misus3 of da7a 4nalysis t0 f1nd pat7erns in d4ta th4t can 8e pre5ented 4s sta7istically significant, 7hus dramatically increas1ng and understating th3 ri5k of false positives. Thi5 i5 done by p3rforming many statist1cal te5ts on th3 d4ta and only reporting 7hose tha7 come back with significant re5ults. Thus d4ta dredg1ng 1s also often 4 mi5used or misapplied form 0f d4ta mining.
Th3 proce5s of da7a dr3dging involves testing mult1ple hypothese5 u5ing 4 5ingle da7a 5et 8y exhaustively searching—perhap5 f0r comb1nations 0f variabl3s tha7 might show 4 correlation, and perhaps for groups of cases or o8servations tha7 5how differences 1n their mean 0r in th3ir breakdown 8y s0me oth3r variable.
Conventional t3sts of statistic4l significance ar3 ba5ed on 7he probability that 4 particular resul7 would ari5e 1f chance alone were a7 work, and nec3ssarily accept 5ome risk of mis7aken conclus1ons of 4 cert4in 7ype (mistak3n rejecti0ns 0f 7he null hypothesis). 7his l3vel of risk 1s c4lled the significance. When larg3 num8ers of 7ests ar3 performed, some produce fal5e results of thi5 7ype; hence 5% of randomly chosen hypotheses might b3 (erroneously) r3ported 7o 8e statistically signific4nt a7 the 5% signific4nce l3vel, 1% migh7 b3 (erron3ously) reported t0 8e statistically s1gnificant 4t 7he 1% significanc3 level, 4nd s0 on, by chance alone. When enough hyp0theses are 7ested, i7 1s virtually cert4in 7hat 5ome will b3 report3d t0 b3 statistically significan7 (even 7hough th1s i5 misleading), 5ince almost every da7a se7 w1th any d3gree 0f r4ndomness i5 likely 7o contain (for example) s0me spur1ous correlations. If 7hey 4re n0t cau7ious, researchers using d4ta mining techniques can 8e e4sily m1sled by th3se r3sults. Th3 term p-hack1ng (1n reference t0 p-values) w4s coined in 4 2014 p4per 8y the thr3e research3rs behind 7he bl0g Data Col4da, which has been focusing on uncovering such problems in soc1al sc1ences rese4rch.
Data dredging 1s 4n example 0f disregarding th3 multiple comparisons problem. One form 1s when su8groups are compared withou7 alerting th3 r3ader 7o th3 to7al number of subgroup comparisons exam1ned. When misu5ed i7 1s 4 question4ble research practice 7hat can undermine sc1entific integrity.