Background: What are ALT texts?

An "ALT text" is a short text that can replace the image for people who cannot see. The first rule of most accessibility standards is that appropriate ALT texts are mandatory. Simple rule, right?!

When I teach digital accessibility, most students already know this rule. (Unfortunately, this is the ONLY thing some web people know about accessibility, despite it being a very broad and important topic.)

Why are ALT texts challenging?

Even though the basic requirement is simple, ALT texts can actually be quite a challenge. When we audit websites, apps and pdf documents for accessibility, ALT text failures are still common.

There are many reasons for this, including the following:

  • Some editors have not understood the role of the ALT text, so they may, for example, duplicate text that is already presented next to the image, or put copyright information in the ALT text instead of a description of the image.
  • Lack of support in publishing tools. Many content management systems fail to remind editors when they forget ALT texts. Or even worse: Some tools do not even have a field for image descriptions.
  • Tight deadlines for editors, who cannot find the time to formulate a useful image description.
  • It is not always easy to briefly describe an image. After all, some images say more than a thousand words. So you may need to put it outside of the image. This can be difficult if the page template offers no suitable place (such as an expandable area).
  • It is not always easy to determine what part or aspect of an image is relevant, or even if the image has a meaning at all or is only meant for decoration. Should an ALT text describe what people look like, for example? Yes, sometimes, but it is quite a delicate task to formulate such a description wisely.
  • Some sites (eg. product catalogs in e-commerce) contain thousands of images, many of which are replaced quite frequently. This makes it hard to keep up.

ALT texts can be expensive

In fact, in the Swedish government's assessment of the cost of implementing the European Accessibility Act, the production of ALT texts in e-commerce stores was by far the largest annually recurring cost, amounting to almost a billion SEK every year (almost 10.000.000 EUR). Source: SOU 2021:44 (pdf), page 582.

Can AI help?

Early this year, I explored the automatic image analysis features already present in common software such as Microsoft Office, Edge and some screen readers. For example, I asked one of these systems to describe the following photo by Anders Hellberg (cc-by-sa):

Greta Thunberg in front of a sign saying Skolstrejk för klimatet

The result:

"An image showing clothes, yellow, orange."

I showed this example during one of my accessibility courses, concluding that automated image analysis is not by far mature enough to replace manual ALT text writing.

However, one of the participants picked up their cellphone, captured the image and sent it to ChatGPT, which immediately responded with a much better description:

"The picture shows Greta Thunberg, a Swedish climate activist who gained international recognition for her efforts to raise awarenessabout climate change. In the photo, she is standing in front of a gray background and looking directly at the camera. She has short, lightbrown hair and is wearing a yellow raincoat with a hood pulled up over her head. She appears to be holding a sign that is partially visible in the bottom left corner of the image."

This was not perfect. For example, it failed to include the text shown on the banner in the background, and it was a bit too lengthy in my opinion. But still. It's a million times more useful than the auto-generated ALT!

AI really seems to be the Santa Claus of our time!

AI eyes in the cloud

Not surprisingly, there are already several cloud services offering AI generated ALT texts for all of our images. Some of them even offer plug-ins for popular publishing platforms.

Two examples are EveryAlt and Alttext.ai, and I am sure there are others, too. I have not made a detailed comparison of their respective strengths and weaknesses. I guess they are better at writing texts in English than other, smaller languages such as Swedish, but also that the quality keeps improving at a rapid pace.

I tried to find out about the cost, though: They all have different pricing models. But let's say you have a budget of 10 EUR per month. That should be enough for about 1000 images.

So, it's not free, but not extremely expensive either. Considering all of the challenges with ALT texting (see above), and the salaries of competent human writers, an organization publishing many images should be able to save substantial amounts.

Warning!

As with all machine learning and AI tools today, you often get impressive results. But also, from time to time, you get disastrous results.

Usually, the errors are easy to spot and sometimes even amusing. But in a worst case scenario (unfortunately fully realistic) a misleading ALT text can lead to disaster.

A user who trusts a website or app will sometimes make decisions based on information they find there. Even if the decision may have very serious implications.

For example, an ALT text saying that a child is eating a certain medicine could make a non-seeing parent confident to give the medicine to their child. This could be very dangerous if in fact the medicine was only for grown-ups.

And it's very hard to know what information will be used as basis for high-stakes decisions.

Conclusion

Go ahead explore AI generated ALT text writing. But never allow an AI generated ALT text to be published without manual supervision!

Users who want un-supervised auto-generated ALT texts can - themselves - request such descriptions from elsewhere. There are tools for that (eg. Picture Smart in the Jaws screen reader and GPT-4 based Be my AI). When the user has taken the initiative, they will (hopefully) be aware that such descriptions may contain errors. So they will be somewhat critical to the information obtained, and adapt risk-taking accordingly. But if they read an ALT text directly from you as a publisher, they should be able to rely on it.

Recommendation

Ideally, a form for editing ALT texts should have at least four parts:

  • A text entry field for displaying and editing the ALT text
  • A checkbox called something like "Purely decorational, hide this image from screen readers"
  • An indication of whether the ALT text has been approved by a human or not
  • An "Approve" button

No image should be published unless either it has a check in the checkbox or a proper ALT text. And no ALT text should reach the end user unless it has been approved by a human editor.

EDIT 2023-12-28: After suggestions from Tobias Christian Jensen, I'd like to add that the quality of generated texts can be greatly improved if you can have a dialogue with the AI system, perhaps prompting it to include some important detail or take a specific viewpoint. So a few more fields would probably be needed to make the ALT text editor better.

Perhaps something like this:

Mock-up of ALT text editor widget, with the UI components described in this section. Proposed ALT text is: Santa stuck in a chimney too narrow for his belly.

Today is December 23, 2023

The date is important, because at the rate artificial intelligence is currently developing, many quite recent recommendations have already become outdated. The above recommendation has been designed not to expire anytime soon, but it is hard to know for sure.

Update 2024-01-04: Thanks to Sander Nijsingh for feedback, helping me to improve the conclusion section.