| Abstrakt | Recently, image generation models like Stable Diffusion have gained significant popularity due to their remarkable achievements. However, their widespread use has raised concerns about potential misuse, particularly regarding acquiring training data, including using copyright-protected material. Various schemes have been proposed to address these concerns by introducing inconspicuous perturbations (poisons) to prevent models from utilizing these samples for training.
We present LightShed, a generalizable depoisoning attack that effectively identifies poisoned images and removes adversarial perturbations, showing the limitations of current protection schemes. LightShed exploits the wide availability of these protection schemes to generate poisoned examples and models their characteristics. The fingerprints derived from this process enable LightShed to efficiently extract and neutralize the perturbation from a protected image. We demonstrate the effectiveness of LightShed against several popular perturbation-based image protection schemes, including NightShade, recently presented at IEEE S&P 2024, and Glaze, published at Usenix Security 2023. Our results show that LightShed can accurately identify poisoned samples, achieving a TPR of 99.98% and TNR of 100% on detecting NightShade and effectively depoisoning them. We show that LightShed generalizes across perturbation techniques, enabling a single model to recognize poisoned images. |
|---|