By Pär Lannerö

A little more than a week ago, the W3C requested comments on a whole group of new draft standards: The Personalization Documents. They define semantics for personalization of web content. Towards the end of this blog post I provide  comments on the proposals, but first an introduction.

Why should you care? And what's semantics?

Without semantics, data is meaningless

A number or a word all by itself has little value:


But if you know the meaning of this data - its semantics - it can be extremely valuable.

Your pin code: 2392 

Semantics can help you open doors, get money, make good decisions and connect with people.

Visual design can communicate meaning

Finger entering code 2392 on keypad

With good visual design, we can often understand the meaning of a number or a sentence by looking at it. We can extract meaning from colors, sizes, positions, adjacent text, lines, arrows, headers, symbols and diagrams.

- At least if we are digitally literate human beings with sufficient eyesight, language skills, background knowledge, and ability to focus, process information and remember.

Visual design does not meet all users' needs

Some users do not see at all - but listen or feel web pages. Others, on the contrary, can only understand images. Some get frustrated by simplification, others require it. Some users prefer details, but have too little time to read. Others again need translation. Some struggle with words. Others with numbers. Some visitors are bots, perhaps only looking for numbers.

How can design accomodate for such a wide range of different needs?

Invisible design can meet diverse needs

Just as responsive web design can optimize content to screen width, personalized web design can optimize content to individual user needs!

The key to personalized web design is semantics. Invisibly hidden as in-page metadata.

Being a markup language, HTML has always been used to assign standardized semantics to page content. The  A element for links and H# elements for headers have been around since the birth of the web. HTML5 further standardized the semantics of page structure (nav, main, footer...). The ARIA specification adds even more standard semantics, primarily used by screen readers (aria-expanded, aria-current, aria-invalid...).

What's in the new proposed standard?

The new specifications propose a number of new HTML attributes, all prefixed with  data-. They also list values to be used for those attributes.

Here are a few examples:

Let the user chose level of detail

Web browsers with a slider going from Display details to Simplify.

Browsers can feature a simplification switch if content authors add simplification semantics.

A page section containing non-essential but nice-to-have information could be decorated with the html attribute data-simplification=”low" so that it can be hidden by the browser from users who need to reduce complexity. We already have something like this in the "reader mode" views offered by many web browsers, but the behaviour of those are determined by their developers and not easy to predict. The new specifications could give control over simplification to content authors.

Let the user chose familiar icons

If a button with the text ”Checkout” (proceed to payment) is decorated with the html attribute data-action="checkout" a web browser can present an icon that the user can recognize.

Cash register pictogram

Checkout symbol from the library

At least a handful different languages exist that consist of images. For example PCS, Bliss, Widgit and Pictogram. The genius of standardizing on terms rather than on icons is that the user can configure from what language the browser should pick icons to include as illustrations within web pages.

And of course, instead of images, the web browser could present translations into other languages. As long as the list of standardized values is kept reasonably short, maintaining a centralized repository of translations into most languages is absolutely feasible.

Make shortcuts possible

Screen readers already offer users shortcuts to landmark regions and page headings. With the data-destination attribute, it will be possible to offer even more standardized shortcuts, potentially saving time and facilitating orientation.

Important destinations/actions in the page can be linked to buttons with standard positions in the browser chrome, and/or to shortcut commands.

Let the user chose between alternative expressions

A web page could contain several different versions of the same information, each marked up with standardized semantics about its properties. For example: 
  • Easy to read version
  • Version with less numbers (for dyscalculia)
  • Version adapted to person with a certain vocabulary size (for language learners etc)
  • Literal meaning of idiomatic expressions:

It is <span data-literal="raining hard">raining cats and dogs</span>

My comments

In general, I would like to encourage this initiative. It would be fantastic to see it widely implemented. The major concern is about complexity. Web accessibility is still a challenge to many developers and content authors. Adding to the already substantial volume of specifications and recommendations means even more work for them. But challenges are stimulating, especially when there are obvious benefits!

Data-purpose should be standardized ASAP

In Europe, WCAG 2.1 AA is sort of mandatory, at least for public sector websites. This has many benefits. (I have spent large parts of the last five years propagating for this and helping organizations live up to the requirements.) But there are a few problems when it comes to implementation. One of them relates to the success criterion 1.3.5 Identify input purpose:

The idea behind this criterion is great. It says that the purpose of common input fields (such as email, name, street address and so on) should be specified with standard semantics. This can ensure better precision when web browsers offer auto-fill suggestions based on previously entered information. This could save time and energy for many users. Especially users with some motor or cognitive impairments. It could also reduce the number of input errors. In theory (but implementations are still rare), the browser can also, thanks to this criterion, present icons, labels or translations that are familiar to the current user. This can make forms much easier to users struggling with understanding the default written labels.

Currently, however, the only recommended way to fulfill this criterion is by using the autocomplete attribute. This means that web browsers can remember data that the user enters. The criterion expressly states that it applies to information about the user, so the data will often be Personally Identifiable Information.  Ideally, users do not share the same browser storage, but in reality, users borrow computers from each other, from libraries and from other places. Sometimes without logging in with their own account. Thus data in autocomplete fields can leak to unintended recipients. Therefore, we as consultants must advise web developers to always consider using the autocomplete=”off”, despite the many benefits. (By the way, this caution, in my opinion, should be written into the Understanding document.) The new specification proposes a new attribute, data-purpose, that will make it possible to combine the advantages of personalized presentation (eg. familiar icons) with privacy protection:

autocomplete=”off” data-purpose=”cc-number”

This is, in my opinion, a significant contribution in the proposed specification.

The only trouble: It does add code complexity and to the body of knowledge that web developers need to be familiar with.

The data-action, data-destination and data-simplification attributes seem like great ideas

While all standardization adds complexity, I can see great value in enabling content editors to add standardized in-page semantics for link destinations and button actions. Except for explanatory symbols and tooltips, such metadata could be used to duplicate essential features from the page in the browser chrome, and/or enable web browsers to offer shortcuts similar to the old accesskey attribute, but in a more consistent manner across the web.

The data-simplification is another great possibility. Let’s hope it will be widely implemented, both by content authors and user agent designers!

How are the lists of standard values to be maintained?

It can be expected that new values will be needed, as technologies and user behaviors change over time. Should the W3C regularly update the specification, or is it better to point at a list maintained externally by, for example, IANA or some community effort that could update the list continuously?

Update: This question has been answered by a member of the Task Force.

I hesitate about supporting data-distraction and data-symbol

The data-distraction attribute is proposed as a way to declare, for example, when non-essential content includes distracting animation or sound. Intentions are good, because many users struggle with focus, but there are already requirements in WCAG that address such needs. I doubt many content authors will happily declare part of their design as non-essential distractions. This reminds me of the old P3P standard intended to facilitate configuration-based filtering of web content based on privacy protection. Many websites published empty or misleading P3P metadata with the only intention of NOT being blocked by user preferences.

An attribute not expected to be used as intended probably does more harm than good, in part because it adds complexity to the overall specification. Developers and content authors already have a lot to learn. You should be able to publish on the web without first getting a degree in information modelling.

The data-symbol attribute is meant to enable editors to indicate that a certain symbol could be used to present the element. Again, I support the underlying objective here: Supporting users who need to complement text or layout with symbols. But this objective, as we have seen already, is partially adressed by the data-purpose, data-destination and data-action attributes. In order not to make the specification more complex than needed, perhaps the symbol attribute could be postponed to a potential future version of the specification, in case the use of complementary symbols proves popular. (I am not sure if this is a good idea, but would like to ask the question.)

There’s more

One of the specifications, the Personalization help and support 1.0, includes a number of other great proposals for in-page metadata. This includes, among other things, a way to offer alternative content for users struggling with numbers or requiring easy-reading or literal language (no idiomatic expressions, for example). The comment period for this specification ended last summer.

Some personal reflections

As you may have guessed from our name, Metamatrix, the company was founded on the belief that metadata is fundamental to making the matrix (an early name for the entire online world) a better place. This was back in 1999. Since then our focus has slowly shifted from metadata and semantics towards accessibility. I am very happy to see how these two themes increasingly overlap!

A11y - a road to a more semantic web?

There is nothing new about trying to add more standard semantics to the web. It has been done many times. Microformats, RDFa, and (eg. itemprop) are all examples of schemes intended to make web pages more semantically rich and machine readable.

As you may remember, Web inventor Sir Tim Berners-Lee did not only propose a web of linked documents, but also a semantic web of linked data. Under his leadership the W3C published a whole family of specifications aimed at making this dream come true.

However, the vision of an open semantic web has been hard to realize. It exists, but is quite fragmented. There are many reasons for this. For example:

  • The vision has always been quite theoretical, and, if you like, invisible. Not many people have been able to see the beauty of it, let alone understood how to contribute.
  • Platform companies have cherrypicked the most commercially interesting semantic concepts. For example, Facebook quickly managed to lay hands on the concepts ”knows” and ”likes”. Google, after cleverly exploiting the A element (links) have claimed a lot of information about physical locations and many other meaningful aspects of our data.
  • Explicit semantics paves the way for automation. Not all organizations want to facilitate this. Perhaps they prefer addressing their audiences with colors, feelings and copywriting to getting blocked by a stupid "smart filter" that only cares about price.
  • There’s also the chicken or the egg dilemma: You need useful applications to be able (technically and financially) to publish semantically rich data. You need semantically rich data to be motivated to build useful applications that can realize the benefits.
Accessibility has lately gained a lot of momentum thanks to regulation. It is possible that we may get a more semantic web thanks to accessibility regulation, and because users want to personalize their web experience. Using the invisible semantics to paint the web in their preferred colours.