Summary: | Approved for public release; distribution is unlimited === With digital storage becoming cheaper, bigger, and more prevalent, finding evidence from the hard drives collected for a case is too difficult and time consuming. Simply reading an entire drive takes hours and it takes even longer to analyze the drive for deleted files and data fragments. Investigations frequently involve multiple drives, and this traditional method of reading entire drives for analysis simply cannot keep up in modern cases. Furthermore, investigators often search drives only for known files, which we call target data, that could help identify a drive holding evidence such as child pornography or malware. Triage is needed to sift through drives to quickly identify drives containing target data. One way is by randomly sampling drive data to find known files or to give a confidence that less than some small amount is present. We determine the optimal sampling strategy bypassing the file system to find even deleted files and fragments in minimum time with maximum confidence. With 15 minutes of sampling we can give a 90% confidence that less than 10MiB of target data is present on a 500GB hard disk drive. By using statistical sampling in combination with sector hashing, our software forms an efficient triage tool for digital forensics.
|