Originally published at: Visual Intelligence: Occasionally Useful, but Often Flawed - TidBITS
I’ve been exploring the “visual intelligence” aspect of Apple Intelligence in iOS 26 on my iPhone 17 lately, and while it’s not game-changing, it is occasionally useful and can be faster than using a dedicated app. It could be even more successful if it worked more reliably and wasn’t deliberately limited by Apple.
(As an aside, it’s odd that Apple, never a company to shy away from attempting to claim everyday words as product or feature names, doesn’t capitalize “visual intelligence” and treat it as a proper noun, but perhaps there was too much conflicting intellectual property.)
What Is Visual Intelligence?
An iPhone-only feature, visual intelligence leverages the artificial intelligence capabilities of iOS to make what you see through the camera or on the screen interactive and actionable in ways that weren’t previously possible. Visual intelligence debuted in iOS 18 and has gained a few features in iOS 26. Everything I say below is accurate for iOS 26, but a few options might not be available if you’re still using iOS 18.
Before I examine the various things you can learn from or do with visual intelligence and assess their effectiveness, let’s ensure we’re all on the same page about how to use it.
First, you must turn on Settings > Apple Intelligence & Siri > Use Apple Intelligence, and you’ll probably want to enable Settings > Apple Intelligence & Siri > ChatGPT > Use ChatGPT to make the Ask feature functional. An account is helpful for continuing conversations within ChatGPT, but not necessary for basic responses.
Visual intelligence has two modes: camera mode and screenshot mode. In camera mode, you can learn more about the world around you as viewed through the camera’s viewfinder; in screenshot mode, you can work with anything that appears on your iPhone screen.
- Camera mode (iPhone 16/iPhone 17): To invoke visual intelligence in camera mode, make sure you’re not in the Camera app, and then press and hold the Camera Control button on all iPhone 16 models (except the iPhone 16e), all iPhone 17 models, and the iPhone Air. (If you’re in the Camera app, pressing and holding the Camera Control starts taking a video.) Although you can trigger visual intelligence directly from the Lock Screen, I found that it behaves inconsistently when Face ID hasn’t yet unlocked the iPhone, often showing a black screen that required some button pressing to escape. It’s better to start from the Home Screen or within an app once the iPhone is unlocked.
- Camera mode (iPhone 15/iPhone 16e): To activate visual intelligence on the iPhone 15 Pro, iPhone 15 Pro Max, and iPhone 16e—which support Apple Intelligence but lack the Camera Control—you need to use the Action button (configure it in Settings > Action Button), a Lock Screen button (press and hold the Lock Screen to customize), or a Control Center button (open Control Center, tap + in the upper left, and add a control).
- Screenshot mode: To trigger visual intelligence for what you see on the iPhone screen in screenshot mode, press the side button and volume up button simultaneously. In iOS 26, an interactive preview of the screenshot appears, but only if Settings > General > Screen Capture > Full-Screen Previews is enabled. This setting increases the number of steps needed to save a screenshot (which I do frequently), so you might prefer to turn off full-screen previews to revert to temporary screenshot thumbnails. If you use thumbnails, access visual intelligence’s screenshot mode by tapping the thumbnail before it disappears.
Once visual intelligence has an image to work from, you have a variety of paths forward:
- Ask, Search, and dedicated actions: For every use of visual intelligence an Ask button lets you discuss the image with ChatGPT, a Search button sends it to Google and other apps, and other buttons appear depending on what visual intelligence has identified in the image. If you plan to use visual intelligence at all, make sure that Settings > General > Screen Capture > Automatic Visual Look Up is turned on.
- Lock the image: When in camera mode, lock the live image by pressing the Camera Control a second time or tapping the shutter button. That’s also necessary for all visual intelligence actions other than Ask and Search. Screenshot mode locks the image by definition. When an image is locked, the shutter button displays an X; tap it to return to the live preview.
- Exit visual intelligence: To exit the visual intelligence preview screen, swipe up from the bottom of the iPhone screen or tap the X button in the upper-left corner of a screenshot mode preview. You can also press the side button.
- Return to the last visual intelligence screen: Even after you leave visual intelligence or are bounced out to another app, for a short time, you can return to the last locked image by pressing and holding the Camera Control again.
So, what can you do with visual intelligence?
Ask: Visual Intelligence as a Conduit to ChatGPT
Tapping the Ask button initiates a conversation with ChatGPT, but one mediated through Apple that’s limited to terse answers. Apple’s initial prompt to ChatGPT appears to be something like “Give a concise description of this image,” and if you don’t immediately continue with a specific question, that’s what ChatGPT will respond to on its own. It’s a more-or-less reasonable default prompt, though I’ve seldom found it useful.
Instead, tap the text entry field at the bottom of the screen to pose a question to ChatGPT. The brief answers appear in a response window that never occupies more than two-thirds of the screen; occasionally, you’ll need to scroll a bit to see the entire response. You can also ask follow-up questions using the same field at the bottom.
For a more fluid interaction, tap the microphone icon on the keyboard to start dictation. Unusually, that causes the keyboard to hide and shows the glowing Siri animation, with your dictation appearing at the top of the screen. You can continue speaking to ask follow-up questions without needing to tap or type.
At the end of each response (the image just above requires scrolling), you’ll see “ChatGPT—Check important info for mistakes” with “ChatGPT” in blue. That’s actually a link, and tapping it opens the conversation in the ChatGPT website or, if you have it, the ChatGPT app. If you’re logged into the site or app, or if you’ve connected Apple Intelligence to your ChatGPT account in Settings > Apple Intelligence & Siri > ChatGPT, the conversation will also be saved in your ChatGPT library.
Switching to the ChatGPT site or app reveals just how heavily Apple is mediating the inputs and outputs from ChatGPT. The left image shows two responses mediated by Apple: the initial one triggered by the image upload, plus an answer to my follow-up question. The right image shows the beginning of a much more detailed, conversational reply that came after I took the same photo using the ChatGPT app. In other words, although visual intelligence may rely on ChatGPT technology, it’s not the same as talking to ChatGPT itself.
I can think of several reasons Apple does this mediation. To start, Apple wants responses to be short enough to be read on a small panel with a relatively large font. That’s a reasonable user experience goal, but it conflicts with how most chatbot conversations typically unfold. Additionally, I believe Apple is fundamentally uncomfortable with ChatGPT in particular and with handing over functionality to third parties in general. By requiring ChatGPT to provide terse answers and stripping them of all conversational warmth and depth, Apple maintains more control over the interaction. Apple is nothing if not a control freak.
Unfortunately, Apple takes this control freakery too far, refusing to engage on topics that wouldn’t go beyond the usual chatbot guardrails, such as preventing harmful or illegal content, protecting intellectual property, and safeguarding personal privacy. Visual intelligence won’t accept many questions related to health, politics, financial matters, and other potentially tricky topics. And don’t even try to invoke visual intelligence on any naughty bits.
If you ask a question that Apple considers off-limits, visual intelligence pretends it didn’t hear you, responding, “ChatGPT asks: What would you like to know?” Don’t blame ChatGPT; that’s actually Apple saying, “Apple Legal worries that answering could lead to a lawsuit. Try again.” Needless to say, ChatGPT itself will address many of these questions, such as this innocent request to identify several old tubes of prescription creams I found in the drawer. (The bottom one was for our cat, and ChatGPT identified it accurately.)
If you feel that Apple’s responses are unsatisfying or that Apple is unreasonably refusing to answer an innocuous question (visual intelligence wouldn’t even comment on whether a pecan pie recipe might cause weight gain), you can switch to the ChatGPT website or app and continue the conversation in the native chatbot interface.
Accessing ChatGPT through visual intelligence becomes non-trivially useful when you ask it to manipulate the information shown in the image. For example, I took a screenshot of a recipe and asked it to calculate the calorie count per serving, which would take longer to do manually.
Here’s another example. I have a lot of old computer books, and I could imagine wanting to build a list of them. It would be way too much work to type in all their titles manually, but by using visual intelligence to send an image to ChatGPT, I could get it to read the titles and generate a list. The OCR needs to be checked because mistakes happen, and I’d have to tap the ChatGPT link to get to the information in ChatGPT, where I could work with it (you can’t copy from visual intelligence’s responses), but it’s a start.
Frankly, for anything more than a simple response or two, talking directly to ChatGPT is a much more satisfying experience than using visual intelligence. (In the example above, I’d probably want a book list from more than one shelf, and it would be much easier to take a series of photos and send them to ChatGPT all at once.) However, visual intelligence remains useful for initiating a ChatGPT conversation by using the Camera Control or a screenshot—it’s a bit easier than taking a photo in the ChatGPT app or uploading a screenshot, particularly if you have already become accustomed to using the Camera Control.
Search: Using Visual Intelligence to Shop
The other standard button that appears in the visual interface is Search, and tapping it initiates a Google search for similar images. You can also circle part of a locked image or screenshot to limit the search to your selected area. Visual intelligence may also display tabs for search results from other apps—I get eBay and Etsy. Apple seems to believe that image search is primarily useful for shopping.
Google search results can be more general, but eBay and Etsy are entirely focused on shopping. Invoking visual intelligence on the snake plant that shares my office brought up some random articles with similar photos in Google, but only purchasing options in the other two. I can’t see the utility here—the Google search results are based purely on what my plant looks like, and since I already have one, I don’t want to buy another. Maybe if I were admiring someone else’s houseplant and wanted to get my own?
I also tried using visual intelligence on my Garmin Forerunner 645 Music watch. This time, the Google search results focused more on buying that particular watch, whereas Etsy offered bands for Garmin watches, and eBay displayed other Garmin watches. Again, I’m uncertain of the utility here, other than learning that Etsy might be a source for better watch bands (I’ve never liked Garmin’s bands).
To be fair, I’m not interested in buying anything related to a snake plant or a Garmin watch—I was just casting around for examples. However, I might be interested in a vinyl skin for my new 14-inch MacBook Pro because I’m fond of the skin I got for my 13-inch MacBook Air some years ago so Tonya and I can keep our laptops straight.
This time, all three search tabs were populated at least reasonably, although I was amused that two of the Google search results pointed to eBay and Etsy items. Etsy did a better job than eBay at finding items visually similar to my existing skin.
However, the experience still wasn’t satisfying. Several times, visual intelligence reported that Google was unavailable, which is almost impossible to believe on Google’s side—I’d blame Apple. In many of my tests, everything I tapped in the Etsy list failed with an error, so I couldn’t see more details. In other tests, Etsy wouldn’t load results at all. I was able to tap through to eBay listings successfully, but tapping the back button merely took me to another screen in the eBay app. Pressing and holding the Camera Control sometimes returned me to my results, but other times forced me to start over with a new image. In short, using visual intelligence search results as a springboard to look at and potentially compare multiple items is frustrating at best.
Ultimately, I’m not convinced that using the Search button in visual intelligence solves many real-world problems or even informs any real-world desires. To be fair, I’m old enough that I generally know what I’m looking at, and I buy things judiciously since I already have a lifetime of possessions. Perhaps the feature would appeal more to a younger person who is more interested in general information about the world around them and spends a lot of time shopping.
Visual Look Up, Text Analysis, and Data Detectors
Along with the Ask and Search buttons, visual intelligence aims to display contextually relevant buttons that offer more information or actions related to what appears in the image. Some of these function like Visual Look Up in Photos, where Apple’s machine learning identifies items in images such as plants, animals, landmarks, artworks, books, food dishes, symbols (such as laundry, dashboard lights, and utility symbols), and more. Other buttons activate common AI text features—summarization, speech synthesis, and translation—or harken back to the data detectors that have long provided shortcuts for detected phone numbers, addresses, and other information.
Visual Look Up
To use Visual Look Up within visual intelligence, point your iPhone at an object, lock the image, and tap the name that appears above the shutter button. Visual intelligence then displays information about the object, often sourced from Wikipedia. Within visual intelligence, Visual Look Up on objects is roughly as accurate and useful as it is in Photos.
I have not attempted to compare visual intelligence’s lookups against a dedicated identification app like iNaturalist’s Seek, which claims to be able to identify over 80,000 species. My suspicion is that visual intelligence is sufficient for most things, but may not perform as well with objects that could plausibly be any one of several things.
Visual intelligence has another capability related to Visual Look Up—if you take a picture of a business or location that’s a known place in Maps, a button with its name appears. Tapping that button shows the same information you would find in Maps. Interestingly, this feature seems to depend on both geolocation and the image; I couldn’t get it to identify businesses just by placing a picture of the business in the viewfinder. I suspect this feature is most useful for city dwellers who may need to evaluate many more businesses on a regular basis than those of us in less urban areas.
AI Text Analysis
For images that contain text, visual intelligence relies on Apple’s Live Text technology to perform OCR and feed the results to other Apple Intelligence-powered features. In most situations, you’ll need to lock the image by pressing the shutter button or the Camera Control a second time before any of these dedicated buttons appear:
- Summarize: When visual intelligence detects a block of text, it displays a Summarize button. Visual intelligence’s summarization works, in the sense that it compresses whatever text can fit on a single iPhone screen into a very short, coherent summary. But how often would you want a two-sentence summary of a page of text? I can understand wanting a high-level overview summary of a dense 30-page PDF, but as I wrote in “When Are Summaries Valuable?” (10 January 2025), “The value of a summary is, within limits, proportional to the difference in length between the source and the summary.” At this point and in the context of visual intelligence, summarization feels like a stale AI gimmick.

- Read Aloud: When viewing text in the iPhone’s native language, the Summarize button is always paired with a Read Aloud button, which seems like an accessibility boon. On-demand speech synthesis might be extremely helpful to some people if it weren’t so terribly implemented. Read Aloud poorly identifies where the text starts, often skips entire lines, pauses awkwardly at the end of every line (it’s particularly jarring with hyphenated words), sometimes just reads random words, and stopped somewhere between 5 and 45 seconds in all but one of my many tests. It’s utterly useless at the moment; hopefully, Apple will fix it.

- Translate: If visual intelligence detects text in another language, it offers a Translate button. Apple offers various methods of translating text in iOS, including a dedicated Translate app, but since they all use the same translation engine, visual intelligence is just as effective for quick translations as any other option. Since I am not multilingual, I can’t comment on the quality of this English translation of the Japanese translation of my book, but it at least conveys the sense of the original.

Data Detectors
Once visual intelligence uses Live Text to OCR any text in your image, it also searches for specific types of text—dates, URLs, phone numbers, email addresses, physical addresses—and displays buttons with suitable actions.
- Add to Calendar: If you see a flyer advertising an upcoming event, visual intelligence can help you quickly add it to your calendar. Simply tap the Add to Calendar button. Be sure to verify that the detected date and time are correct because I have seen it make mistakes. If necessary, you can edit the event before adding it. I haven’t encountered a real-world use for this feature, but it works pretty well in my testing and seems like it could be useful.

- URLs: In many cases, flyers will also sport URLs, as shown in the example above. When visual intelligence detects their presence, it presents a button for each; tapping the button opens the link in your default Web browser. You’ll notice that this flyer also has a QR code; visual intelligence works just like the Camera, allowing you to open a QR code with a tap. Using visual intelligence to open a URL is definitely easier than manually typing the URL.
- Call: When visual intelligence detects a phone number, it displays a button with the name or number to call, and tapping it… results in a black screen? Every time I tried a phone number, my iPhone 17’s screen would go completely black, and only pressing the side button a few times would bring it back, showing the confirmation dialog from which the number could be dialed. Perhaps this is a bug in iOS 26.1 beta 3? Phone number detection didn’t exist in iOS 18. Although most numbers were recognized more accurately, the detection in the screenshot below is particularly poor, so I strongly recommend verifying numbers before calling.
John isn’t at DriveSavers anymore.Email: As shown above, detected email addresses activate a button with a tiny envelope icon that, when tapped, starts a new email message with that address in your default email app. Be very careful with this feature, as I noticed several OCR mistakes during my testing that would have rendered the email address incorrect.
- Open in Maps: More successful, and just in view on the right in the screenshot above, is a button with a location arrow that appears when visual intelligence detects an address. It opens the address in Maps for quick directions. As with opening URLs, it’s easier to send an address to Maps using visual intelligence than it is to enter an address manually or even copy and paste it into Maps.
With a few exceptions, these features are the most successful parts of visual intelligence, perhaps because they’re less ambitious than handing tasks off to ChatGPT and Google.
Should You Try Visual Intelligence?
Visual intelligence demos well. You really can point your iPhone at an object, press and hold the Camera Control, and quickly learn more about what appears onscreen. However, in more involved usage, you’ll likely experience erratic behavior and results that Apple has intentionally limited in scope and topic. My guess is that those who use visual intelligence regularly have identified specific use cases where it works best for them. In particular, the Visual Look Up and data detector features are the most valuable, along with translation, although that can also be easily accessed in the Translate app.
Overall, visual intelligence is more of a party trick than a trustworthy tool right now. If you find its larger promise compelling, get the app for the chatbot you prefer: ChatGPT, Claude, or Gemini—they’ll all provide a more comprehensive and conversational experience than visual intelligence’s Ask and Search buttons.
What I find most intriguing—and a little worrying—about visual intelligence is that it feels like a preview of what we might experience with augmented reality glasses from Apple. For many people, the promise of AR glasses is they’d deliver information about what you see with minimal interaction—visual intelligence in your field of view. But if today’s visual intelligence feature is any indication, Apple will need to put much more thought into the design and technology of AR glasses to identify objects and text reliably, offer a fluid user experience, go beyond trivial details, and let people customize the depth and style of responses.













