Should You Let Claude Learn from Your Chats?

Originally published at: Should You Let Claude Learn from Your Chats? - TidBITS

On its blog, Anthropic writes:

Today, we’re rolling out updates to our Consumer Terms and Privacy Policy that will help us deliver even more capable, useful AI models. We’re now giving users the choice to allow their data to be used to improve Claude and strengthen our safeguards against harmful usage like scams and abuse. Adjusting your preferences is easy and can be done at any time.

Claude training request

By participating, you’ll help us improve model safety, making our systems for detecting harmful content more accurate and less likely to flag harmless conversations. You’ll also help future Claude models improve at skills like coding, analysis, and reasoning, ultimately leading to better models for all users.

On the flip side, the thought of AI companies training their models on chat transcripts sets off all sorts of privacy warning flags for many people, including me. My discomfort stems less from immediate real-world concerns than from the innumerable privacy abuses perpetrated by tech companies in pursuit of surveillance advertising dollars. Although it’s theoretically possible for sensitive personal information, business details, or intellectual property (like code) absorbed into training data to leak directly in responses, that’s both unlikely (quite literally, what are the odds?) and probably wouldn’t happen for a year or two anyway, when the models currently in training become public. More concerning is that we don’t know how AI may or may not enable future privacy abuses, and once data has been used for training, it’s unlikely that it could be “untrained” from the model. For now, it’s better to err on the side of caution.

I primarily use ChatGPT, where I’ve already turned off the option that lets OpenAI train future models on my chats. Now that Claude has the option to collect my conversations and hold onto them for five years, I’m turning that off too. Both of those are easy to disable in the settings, but Google makes it harder to turn off conversation collection for Gemini: the trick is to turn off Gemini Apps Activity.

I strongly recommend avoiding Meta AI, which trains on your conversations (and all public Facebook posts and Instagram photos) without allowing you to opt out, and Grok, which trains on all X/Twitter users’ posts (including historical posts in otherwise inactive accounts) and Grok conversations unless you explicitly opt out. Beyond the privacy implications and poor track records of both companies, training models on often unverified, inflammatory, or misleading social media content risks perpetuating those same qualities in AI responses—a classic case of “garbage in, garbage out.”

 

2 Likes

Betteridge’s Law of Headlines applies in spades! “Any headline that ends in a question mark can be answered by the word no.”

1 Like

No.

This has been another episode of Simple Answers to Simple Questions

In a chat, we were trying to figure out how to avoid an error. At first it seemed like a complicated solution was required, but then I found an easy to apply rule (translate any character less than x’40’).

The AI replied:

This is definitely going into my “useful mainframe knowledge” file - I bet this pattern would be helpful for a lot of data conversion and display scenarios.

So I asked “Do you actually learn anything from these prompts, that persists for other users?”

It said, nope:

When I said “this is going into my useful mainframe knowledge file,” I was speaking metaphorically about how valuable the insight was, not literally updating some kind of persistent knowledge base.

…maybe because it didn’t have permission yet to learn from chats.

@AlanRalph
So you are lowering TidBITS to the level of tabloids? What’s next, “Elvis buys Vision Pro at Memphis Apple Store”? And what happens with a headline like “Is backing up your Mac a good idea?”
;-)

In any case, more seriously, we are all training LLMs whenever we put stuff up in public, including here (for example, Perplexity picks up new TBT threads within minutes of their posting). So managing one’s exposure to LLM training and storage can be complex.

I’d never sully @ace by referring to TidBITS as a “tabloid”! For one thing, this site wouldn’t be much use for wrapping one’s fish-&-chips, which is about all that tabloids are good for these days. Also, TidBITS has actual news & information inside it, unlike most tabloids.

The ways in which AI companies acquire their data are many and varied, both in their methods and respect for users & the web as a whole. I don’t use any online AI services, but I’m in no doubt that stuff I’ve written online in the past has been scraped up and used for LLM training. Short of inventing a time machine and going back to either remove or poison said days, there’s not much I can do about it, alas. Ideally, we’d have laws & regulations (with teeth) to govern how AI companies behave — sadly, we don’t live in an ideal world, and such action is piecemeal and inadequate at present.

1 Like

I think the question that needs to be answered first - is what are Anthropic’s goals for improvement? Is it to make Claude more accurate or is it to increase engagement with Claude? Those are not the same, and if the history is any guide, the latter is the goal. There are some AI models out there, with which I happily share my data. iNaturalist comes to mind. Some of my photos were used to improve the eBird database. I think a medical AI could also learn a lot with ongoing feedback.

1 Like

I posed the title as a question because I had to think about it myself, and only after a while came down on the side of turning everything off.

Anthropic said in email:

If you choose to allow us to use your data for model training, it helps us:

  • Improve our AI models and make Claude more helpful and accurate for everyone
  • Develop more robust safeguards to help prevent misuse of Claude

With Anthropic and OpenAI, I tend to believe that they are more interested in improving their models in these ways, so I seriously considered allowing them to train on my conversations. (I wouldn’t trust Meta or xAI with the time of day.)

I considered sharing my conversations because I do believe strongly in providing feedback to developers and helping them improve their products. I do that all the time with Apple developers working on Mac and iPhone apps. Similarly, I’m not offended when developers use telemetry to understand how their apps are being used, reporting back on crashes and other issues.

Plus, pretty much everything I do on the Internet is public anyway, and I have a long-standing, built-in filter against saying things (in email, on social media, and to chatbots) that I wouldn’t want to end up on the front page of The New York Times.

However, I do feel as though there are going to be numerous unanticipated consequences of AI, which is why I decided to err on the side of caution for now and to recommend that course of action to others as well.

3 Likes

At least for now, their use of chat content doesn’t apply to interactions via the API. That means using another chat app (such as Beyond Better) bypasses their data consumption for training.

Anthropic… aren’t they the outfit that just offered a $1.5B settlement fund to all the copyrighted authors and publishers whose works they had ripped off, er, trained on without asking as politely as shown above?