Thanks. This clarifies one point that I failed to get from earlier reporting - that the actual “scanning” is actually done in the cloud. It is the hash generation that is on-device.
In other words, this is what’s actually happening:
- When your phone uploads an image to iCloud, a Neural Hash of the image is generated and wrapped in a “safety voucher” object that is uploaded along with the image.
- Apple’s iCloud system does not look at the uploaded images, nor does it look directly at the safety vouchers or hashes. Instead, they have an algorithm (which seems to require some expertise in cryptography to completely understand), which can bulk-analyze all of your images’ safety vouchers and determine which of them contain Neural Hashes matching images in the CSAM database.
- When this algorithm reports more than a threshold number of images (Craig said 30), then someone from Apple goes and looks more closely at the flagged images to see if there really is CSAM material or if they are false positives. Law enforcement is manually notified if it really is CSAM.
The interesting part (which I’ve read in a few technical articles elsewhere) is that the iCloud servers are not looking at the raw Neural Hashes in the images, and they might not even be able to see them. Instead, their system uses the safety vouchers alone to determine if they match the CSAM database.
I assume this means the Neural Hash (along with other metadata) is encrypted with some kind of public/private key pair in order to produce the safety voucher. If they are (for example) encrypted with Apple’s public key, only code in possession of the private key (e.g. Apple’s iCloud servers) could read the contents.
Similarly, if the master database of CSAM hashes (stored on iCloud servers) is encrypted with Apple’s private key, anybody could read it, but nobody else could modify it or replace it (protecting against third parties trying to scan for images not coming from NCMEC).
Now, if this is all they’re doing, anyone with Apple’s private key could run the comparisons. But if Apple thinks they may want to (or be forced to) run this algorithm for Chinese customers (where everything will be stored on China-run servers), there must be a mechanism where scans can be done without divulging their private key (otherwise, China could look directly at the hashes in the vouchers or could generate new master databases).
I think this is being done, but I haven’t studied the published documentation enough to be certain.