| Abstract | Reconstructing past events in IT systems is a critical bottleneck in forensic investigations, consuming valuable time from investigators. It requires meticulous analysis of complex digital traces in an environment where attackers may try to erase traces. For example, deletion of digital artifacts is an anti-forensic technique used to jeopardize the success of forensic investigations.To address these challenges, we introduce Investigator Copilot, a novel framework that automates post-mortem event reconstruction using explainable machine learning. To overcome the general scarcity of datasets, Investigator Copilot replays realistic events on virtual machines, and creates datasets by extracting, normalizing and labeling traces from corresponding hard disks. Using these datasets, Investigator Copilot trains human-interpretable decision tree stumps that evaluate digital evidence and combines these binary classifiers in Forensic Forests. Forensic Forests utilize an adjusted voting scheme to provide robust event reconstruction even when faced with deleted evidence.We evaluate our approach by executing 2100 events on 50 virtual machines, training Forensic Forests and measuring their event reconstruction performance on previously unseen data. Our results demonstrate that tree-based classifiers perform exceedingly well in event reconstruction. When measuring reconstruction performance on manipulated evidence, we observe that Forensic Forests significantly outperform the state-of-the-art, which positions them as a valuable tool for investigators. Our findings indicate that automated frameworks such as Investigator Copilot can contribute to the efficiency and robustness of forensic analyses, and may save scarce resources of human investigators. |
|---|