My current mission, other than selling all my old tech crap on eBay, is to finally get my data backup strategy in order.

I’ve moved all my files to my Synology DS212 NAS and along the way I’ve found many redundant copies of my music collection. Synology has the pretty nice Storage Analyzer package for analysing disc usage etc. Unfortunately, it is not well adapted for bulk file deduplication as files must be deleted one by one.

So, I present Deduper. This is a minimal-dependency Python script that can run directly on the NAS (via SSH) without requiring the device to be modded in any way.

The script will perform SHA1-hash-based duplicate detection and then (optionally) delete duplicates based on a “keep-first-in-copy-aware-order” strategy which seems to work pretty well for music files.

Tags: Python, Deduplication, Synology

All content © 2018 Richard Cook. All rights reserved.