FAQ about Apple's Expanded Protections for Children

We’ll see what happens if and when it happens. So far Apple hasn’t mentioned any details about expanding the program beyond the US. Maybe they will, or maybe they won’t want to initiate it in China. Facebook did get banned in certain areas in China for publishing posts about anti government protests, but this was avoided by people using VPMs.

News such as this that I read on a regular basis bothers me a lot more than anything that is very hypothetical:

My guess is that Apple will not want to loose the the right to sell in China. If that is the case, they most likely won’t offer it there. But if they have a database of hashes equal to the US version, And China might want to agree to participate on Apple’s terms, particularly if they want to keep Apple manufacturing so many of its products there.

1 Like

Nope, thanks! I’m quite comfortable in the conclusion I drew above.

1 Like

Just a reminder: while there are certainly concerns associated with what/how/when Apple will apply its CSAM scanning algorithms/policies, there is (in the minds of some) the larger issue of Apple turning 180 degrees from the position Tim Cook took in 2016 when the FBI was trying to strong-arm Apple into installing a back door in the iPhone. Many people find that Apple’s change in position on tinkering with access to our individual devices is deeply concerning.

And as far as “if you believe that Apple will violate the law and their own policies despite their statements to the contrary,” a) no one in this discussion has expressed concerns about Apple “violating the law” (US law, that is) and b) Apple is now ON RECORD for violating their own policies if you consider a CEO’s public assertions as constituting corporate policy.

1 Like

Joanna Stern of the Wall Street Journal has a good interview with Craig Federighi of Apple about the CSAM Detection and Communications Safety in Messages features. I strongly recommend that everyone watch it.

To my mind, the most interesting comment from Federighi that wasn’t explored was the claim that the system is auditable in multiple ways and places.

3 Likes

Thanks. This clarifies one point that I failed to get from earlier reporting - that the actual “scanning” is actually done in the cloud. It is the hash generation that is on-device.

In other words, this is what’s actually happening:

  • When your phone uploads an image to iCloud, a Neural Hash of the image is generated and wrapped in a “safety voucher” object that is uploaded along with the image.
  • Apple’s iCloud system does not look at the uploaded images, nor does it look directly at the safety vouchers or hashes. Instead, they have an algorithm (which seems to require some expertise in cryptography to completely understand), which can bulk-analyze all of your images’ safety vouchers and determine which of them contain Neural Hashes matching images in the CSAM database.
  • When this algorithm reports more than a threshold number of images (Craig said 30), then someone from Apple goes and looks more closely at the flagged images to see if there really is CSAM material or if they are false positives. Law enforcement is manually notified if it really is CSAM.

The interesting part (which I’ve read in a few technical articles elsewhere) is that the iCloud servers are not looking at the raw Neural Hashes in the images, and they might not even be able to see them. Instead, their system uses the safety vouchers alone to determine if they match the CSAM database.

I assume this means the Neural Hash (along with other metadata) is encrypted with some kind of public/private key pair in order to produce the safety voucher. If they are (for example) encrypted with Apple’s public key, only code in possession of the private key (e.g. Apple’s iCloud servers) could read the contents.

Similarly, if the master database of CSAM hashes (stored on iCloud servers) is encrypted with Apple’s private key, anybody could read it, but nobody else could modify it or replace it (protecting against third parties trying to scan for images not coming from NCMEC).

Now, if this is all they’re doing, anyone with Apple’s private key could run the comparisons. But if Apple thinks they may want to (or be forced to) run this algorithm for Chinese customers (where everything will be stored on China-run servers), there must be a mechanism where scans can be done without divulging their private key (otherwise, China could look directly at the hashes in the vouchers or could generate new master databases).

I think this is being done, but I haven’t studied the published documentation enough to be certain.

1 Like

After re-reading this document, I think I understand what’s going on. For those who can follow a technical document, I strongly recommend reading it, since there are a lot of important details that I don’t think anyone will be able to summarize accurately (and I’m including myself and this message in that category.

That being said, let me see if I can provide a good-enough summary for everybody else.

Core technologies

These three cryptographic technologies are all used together. Note that they are all used together. Individually, any one could be vulnerable, and I think this has led to a lot of confusion among people discussing the subject.

  • NeuralHash

    This is a neural network algorithm designed to generate a hash (a large number) based on an image’s content. The algorithm is designed to minimize the possibility that basic transformations will change the hash. Apple’s example shows an image converted from color to grayscale (color space conversion). It stands to reason that they also trained the neural net to deal with other similarly basic transformations like rotation, reflection and cropping out background.

    Apple doesn’t describe all the details in their document, but it must be something they have published because they have said that NCMEC is generating the hashes of the CSAM database images. (Apple isn’t receiving a copy of the images to generate the master database).

  • Private Set Intersection (PSI)

    This is a very interesting cryptographic protocol that allows two parties (e.g. you and Apple) to compare data sets without either party knowing what the other has. In this case, Apple has the master set of NeuralHashes for the CSAM data and you have the set of NeuralHashes for all your images, both of which are encrypted. The PSI algorithm allows the server to know where these two sets intersect (that is, which of your images match the CSAM database) without gaining any information about non-intersecting data (any other images you have).

  • Threshold Secret Sharing (TSS)

    This is another cryptographic technique that allows a critical secret to be split into many pieces, where someone receiving the data needs a minimum number of pieces in order to reconstruct the secret. For example, you may distribute your secret in 1000 pieces with a threshold of 10. Someone with any 11 pieces can reconstruct the secret, but someone with only 10 can’t.

    In this case, your phone is generating a secret encryption key that is split among all your images. Apple needs a minimum number (Craig said 30 in the interview) in order to reconstruct the key

Creating the master CSAM database

Apple receives NeuralHashes from NCMEC (and other related organizations). They perform a number of (unspecified) mathematical transformations, the last of which is a blinding step that is based on a cryptographic secret known only to Apple.

The result of this is that the set of NeuralHashes is transformed into a “blinded hash” database, which is stored on everybody’s phones. The nature of this database is that you can check to see if a given NeuralHash is in it (if you have Apple’s secret key), but you can’t determine which entry in the database actually matches. You can also use it (without Apple’s secret key) to generate encryption keys from NeuralHash values, which will only be valid for NeuralHash values that are in the database.

On-device PSI processing

Note that this is only the PSI part of the on-device processing. Additional processing happens first (see below).

The security voucher’s payload (Neural Hash and other data, see below) is encrypted and wrapped as follows:

  • A crypotgraphic header (including a derived encryption key) is computed from the image’s NeuralHash and the local copy of the blinded hash.
  • The derived key is used to encrypt the voucher’s payload.
    • If there was a database match, then server can use the cryptographic header and the server-side secret to compute the derived encryption key and decrypt the payload.
    • If there was not a database match, then bogus key data will be generated. The server won’t be able to decrypt the payload (so Apple learns nothing about non-matching images)
    • The device doesn’t know if the image matches or not. It just gets an encryption key from the blinded hash.

The server attempts to compute the decryption key using the cryptographic header in the voucher and the server-side blinding secret key. If the image matches something in the master database, the decryption succeeds and the payload is visible. If the image is not in the database, the decryption fails and the payload remains opaque.

Voucher generation

If only PSI was used, Apple could immediately detect even a single matching image, but they chose to limit the system to only allow Apple to do anything when a threshold (30, according to the interview) of matches is received.

This is done by using TSS on the payload before passing it to the PSI processing layer.

  • A per-device encryption key (unknown to Apple) is generated, one per device.
  • The payload starts as the image’s NeuralHash and a “visual derivative”.
    They don’t say what this derivative is, but I assume it’s some kind of low-resolution small-size version of the original image, so humans can review it without seeing the original image.
  • The payload is encrypted with the per-device encryption key
  • A cryptographic header containing a piece of the per-device key (according to the TSS algorithm) is also stored in the payload. So Apple can’t decrypt the payload from just one image - they need 30 or more in order to reconstruct the key.
  • This payload is then passed to the PSI algorithm to create the security voucher, which is uploaded to iCloud.

The full set of server-side processing

  • Apple receives the security voucher along with the image.
  • The PSI algorithm is used to try and decrypt the security voucher.
    • If decryption fails, then the image is not in the CSAM database, Apple can’t view the contents of the security voucher and processing stops.
    • If decryption succeeds, then the image is in the CSAM database, but the payload is further encrypted by the TSS algorithm.
  • If Apple receives more than a threshold number of matching images (30), then it can use the full set of the (PSI-decrypted) voucher payloads to derive the per-device encryption key used to create those payloads. It can then decrypt them all, revealing the original NeuralHashes and the visual derivative images, which Apple people can review before referring the account to law enforcement.
  • If Apple receives fewer than the threshold number of matching images, then the TSS algorithm can’t reconstruct the per-device key and Apple can’t view the payloads.

Synthetic vouchers

So far, although Apple can’t view the hashes until a threshold of matches has been received, they could still know how many matches there are, which could still be problematic. It might even be worse to have a report that there are (for example) 5 matches, but no ability to check whether they are true or false positives.

To deal with this, your device also occasionally produces synthetic vouchers for images (instead of ones that correspond to the image). A synthetic voucher:

  • Uses dummy data for the underlying image information
  • The secret share of the inner key (for the TSS algorithm) is replaced with random data
  • The cryptographic header and outer encryption key are chosen so they always make a PSI match on the server.

The percentage of synthetic vouchers generated is designed to match the same order of magnitude as the TSS threshold number.

So, everybody with a non-trivial number of images will have at least 30 PSI matches on the database, but the TSS layer can’t be decrypted using data from synthetic vouchers because the key fragments are random data.

If the TSS decryption succeeds, then we know that at least 30 of the images are real (not synthetic) matches, because it was able to reconstruct the device key. Then Apple can review the payloads and take action.

In other words, Apple knows absolutely nothing about whether you have CSAM data or not until 30 or more matching images have been uploaded, but once that threshold is crossed, they can immediately identify them all for manual review (but still without having any ability to access non-matching images).

It’s a really slick algorithm. I’m quite impressed.

4 Likes

My final comments, based on the above summary of Apple’s CSAM Detection Technical Summary:

  • I think Apple’s algorithms do a remarkably good job of satisfying all parties. Apple can detect when a threshold number of CSAM images is uploaded, but they know nothing until that point. (Especially if, in the future, the original images are E2E encrypted.)

  • The on-device processing is meaningless without the server because the generated security vouchers can only be decrypted using the server-side PSI algorithm, which requires Apple’s secret key.

    So even if Apple wanted to scan all of the photos on your device, they can’t do it with this algorithm.

  • Third parties can’t interfere with the system, because the blinded hash used on the server needs to match the blinded hash on your phone. If they aren’t the same, the whole system falls apart.

    If any third party tries tampering with the hash on phones, the server won’t be able to detect anything. If a third party tries to replace the server-side database, they’ll need to use Apple’s secret key to generate that blinded hash and they’ll still need to push it into everybody’s phone.

    In other words, only Apple can change this database, and they can’t do it in secret because whenever they do, they need to push out updates to everybody (I assume in an iOS update). Security researchers worldwide will immediately know whenever this happens.

I don’t think this will satisfy everybody (nothing possibly could), but it’s a remarkably good approach to the problem.

6 Likes

And Apple has now released Yet Another Document, this time focusing on the security threat model.

Interesting summary, thanks.

How?

(And, also, I’m not sure it makes an enormous difference. We already know that China can access their users iCloud stores, so that hasn’t been hidden)

Thank you Adam, that was very interesting. I appreciate your comment also about what was not explored. He helped to clarify significant issues and, again, for me, the concern to protect children is very, very important. So too are the privacy issues but for the present I think he makes a good case.

1 Like

Because the database is on every phone. If it changes, anybody bothering to look for it will see that it has changed.

Even if Apple doesn’t put the file in a place that is easy to download, anyone who jailbreaks his phone (and therefore has access to the real file system) will be able to find and download it. Well within the capability of any security researcher.

Additionally, Apple’s just-published threat-review document (see @ace’s post), says that the database is shipped as a part of the OS itself (not a separate download) and they will be publishing hashes of it so you can confirm that the one you have hasn’t been tampered with or replaced.

1 Like

That file will not allow you to see what’s being checked against. The hashes have deliberately been created that way.

That file will also have to change routinely so as to keep up with an expanding database, unless you assume that for some reason no new child porn will ever be added from here on out.

With time we’ll see this file change and we won’t be able to tell if it’s because NCEMC added to their database or because Apple was forced to add material to check against while under gag order not to disclose it. I’d like to say this is harder in the States and much more likely in China, but with all the crap we saw after 9/11, especially with FISA & FISC, I have my doubts here as well.

1 Like

Yes, the blinded hash will be updated. Apple says it’s part of the signed OS image, so it will probably be updated whenever you get a new OS update. But this also means nobody else can replace it. The hash table on your phone (which has to match the one on the server) will be the official one Apple publishes.

If you believe Apple can’t be trusted, then this entire discussion is moot.

But this is an incredible amount of software development for a company that isn’t serious about privacy. They could have done what Facebook and Google do (just scan everything on the cloud servers and not tell anyone), if that was their intent.

3 Likes

I’ll repeat this yet again. It’s not about trusting Apple. It’s about Apple putting itself in a position where they are exposed to coercion.

I have no doubts what Apple pulled off here was technically good. I wouldn’t be surprised to learn it’s better than what MS or Google have implemented for their cloud searches.

But that’s besides the point. This is not a problem that can be circumvented by an elegant engineering solution. A fallacy that can often be observed in tech these days—not everything has an engineering solution (insert joke here about when you have a hammer…). The threats Apple now will come under, despite excellent engineering, are a beautiful example of that.

1 Like

There’s lots more detail in the new Security Threat Model Review document that answers this and similar concerns about coercion. It’s essential reading for anyone participating in this thread—I’ll have coverage up shortly.

1 Like

Unfortunately, reading through that document does not indicate to me that Apple has yet properly understood that they cannot engineer themselves out of this.

I’ve pointed this out before, but in principle, the damage is already done. They have demonstrated that they have created this tech and that they can deploy and use it. That is now public knowledge. They can swear they have good intentions all day long (and I’m inclined to actually believe that), but that won’t shield them from the pressure they can now be exposed to by authorities in countries where they can’t just walk away (eg. certainly China and the US).

3 Likes

If Apple changes the database because China or the US is forcing them, and they publish a matching number, how will the security researchers know, again?

You’re fixating on a third party doing this against Apple’s will. I’m worried about state actors forcing Apple to do this. In the latter case, I don’t see how people notice easily.

It’s not that Apple can’t be trusted, it’s that state level actors have a whole different level of power and you don’t create tools for them to use easily, ringed around with promises like “we will refuse” which are not possible to keep.

2 Likes

My main questions haven’t been answered: Why is Apple doing this? Why are they risking losing my trust?

Go read my new article on this and the first comment. I think @xdev hits it on the head. Apple didn’t intend to risk anything—they just completely failed to anticipate how it would be received.

Now that we know the details explained in that article, let’s discuss the state actor concerns over there.

2 Likes

Don’t get carried away by Apple’s sophisticated implementation.

My question remains: Why does Apple CSAM detection at all? CSAM is awful, but there are many similar awful things in our societies.