This is a technique pulled from KnowledgeBase, a digital accessibility repository available through TPGi’s ARC Platform. KnowledgeBase is maintained and consistently updated by our experts, and can be accessed by anyone with an ARC Essentials or Enterprise tier subscription. Contact us to learn more about KnowledgeBase or the ARC Platform.
Introduction
When you’re scripting for a screen reader, then you’re scripting for the browser, with exactly the same level of JavaScript support. JAWS with Chrome, or NVDA with Firefox, are simply Chrome or Firefox being mediated by external software.
However complications can arise with event handling, because screen readers intercept events, and don’t necessarily pass them on to the browser. Even then, events don’t always match the input type, since desktop screen readers are typically controlled with a keyboard, yet keyboard actions might fire mouse events.
To understand this, we first need to understand screen reader interaction modes.
This is a guide to JavaScript event handling in the popular Windows screen readers, looking at differences you may encounter compared with vanilla browsers (when a screen reader isn’t running):
- Screen reader interaction modes
- The curious case of buttons
- Interaction modes in practice
- Scripting outside the role
- Notes on re-purposing HTML
VoiceOver and mobile will be covered in a separate article.
Screen reader interaction modes
JAWS and NVDA have two main interaction modes, commonly known as “virtual cursor mode” and “forms mode” (although the specific terminology varies between them, I’ll only be using the terms above for simplicity).
Virtual cursor mode
This is the default interaction mode, which allows users to reach all the content on a page via custom keystrokes. For example, ArrowDown navigates lines of text, while ArrowRight navigates letters or words. A range of hotkeys also allow for content-specific navigation, such as H to move between headings, or L for lists.
The basic interactions are similar to navigating a <textarea>
, where you can use arrow keys to move the caret in any direction. Screen reader users can navigate whole pages this way. But what if the user wants to actually type into a textarea — they would need the H key to produce a literal “H”, rather than navigating to the next heading, and this is where forms mode comes in.
Forms mode
With this interaction mode, the custom keystrokes are no longer available, but are purposed back to whatever’s applicable to the focused element. This is essentially the same as using a vanilla browser, where elements are navigated using Tab and Shift+Tab, and only focusable elements can be reached. Arrow keys are then used for standard actions, like moving around a textarea, or changing the value of a slider.
Mode switching
Screen reader users can switch the mode at any time, but they often don’t need to, because the screen reader itself will automatically switch, based on how they’re interacting. For example, if the user presses Tab to reach a <textarea>
, then the screen reader will automatically switch into forms mode. The same is true with most form controls, hence the name.
The curious case of buttons
The <button>
element is an unusual case, in that it is a form control, yet it doesn’t trigger forms mode. This difference can be a source of some confusion for developers, but also makes it a perfect way to illustrate how interaction modes affect event handling (and hopefully dispel some confusion).
We can test button behavior using five core events:
keydown
keyup
mousedown
mouseup
click
The test involves navigating to the button using Tab, then pressing Enter followed by ArrowDown, to see what events we get. The button markup itself is very simple:
<button>Call me Benjamin</button>
So firstly, using a vanilla browser produces the events you’d expect:
Key | Events |
---|---|
Enter | keydown click keyup |
ArrowDown | keydown keyup |
The keydown
and keyup
events correspond with the action of pressing a key, while a click
event is additionally fired when pressing Enter (or Space). The click
event is mode-agnostic, and fires for any corresponding input (e.g., keyboard action, mouse click, finger tap).
Now when we do the same test with a screen reader running, we get notably different results (remember that we’re still in virtual cursor mode):
Key | Events |
---|---|
Enter | mousedown mouseup click |
ArrowDown | (none) |
Pressing Enter fires mouse events, not keyboard events, while ArrowDown fires neither.
When is a keyboard not a keyboard?
We don’t get events for ArrowDown because that key is used by the screen reader’s virtual cursor, to navigate lines of text. If its key events were passed directly to JavaScript, then scripted event handlers might interfere with or block the key entirely, making it impossible for screen reader users to navigate that way. The equivalent for a mouse user would be like locking the cursor position so they can’t move it away from the button.
We get mouse events for Enter because the virtual cursor is a kind of pointer input, which fires a subset of pointer events that relate to user actions. It’s a “virtual” pointer because its events are simulated, rather than coming from a literal pointing device, but if we clicked that button with a mouse, we get exactly the same events.
When is a pointer not a pointer?
The test results above don’t include events from the preceding Tab key. For vanilla browsers, a keyup
event is fired when the key is released, yet this is also true with a screen reader in virtual cursor mode — we still get the Tab keyup
event. We also get a preceding focus
event in both cases.
However those events wouldn’t fire if we reached the button using virtual cursor keystrokes (rather than pressing Tab). None of them occur until the Enter key is pressed, which then fires a focus
event along with the pointer events, simulating the focusing behavior of mouse clicks.
When it’s both!
Screen readers simulate pointer events for the sake of compatibility, for access to functionality which isn’t otherwise keyboard accessible. In most cases, only actuation events are fired (mousedown
and mouseup
, pointerdown
and pointerup
), however screen readers can fire other pointer events in certain cases. For example, JAWS users can manually trigger mouseover
events, with content that’s announced as having OnMouseOver
functionality.
The purpose of interaction modes is to give the user more control. Simulated pointer events support that intent, and this is the larger point to understand in development terms — screen reader input is a hybrid model, which can generate both mouse and keyboard events, depending on how the user is interacting.
Interaction modes in practice
It’s useful to understand how virtual cursor mode behaves, but in practice, you mostly won’t need to account for this:
- For basic interactivity, the mode doesn’t matter — workaday things like button
click
orfocus
on a link, will trigger those events either way. - For complex interactivity, forms mode is dependable — patterns that require forms mode can trigger automatic mode switching (more on that in a moment).
With the screen reader in forms mode, our button test produces the same results as vanilla browsers:
Key | Events |
---|---|
Enter | keydown click keyup |
ArrowDown | keydown keyup |
Forms mode makes it possible to create sophisticated applications that work for screen reader users, in essentially the same way as vanilla keyboard users.
Mode switching with ARIA
Automatic mode switching isn’t limited to form controls, it can also be triggered by certain ARIA role attributes.
ARIA roles can be used to create custom versions of native controls, such as "button"
and "checkbox"
, but are more often needed for complex widgets that don’t exist in HTML, like "tablist"
and "tree"
.
Using role="tree"
doesn’t create a functional tree view, it only tells the user to expect that. The role implies a set of expected interactions, such as navigating the tree using arrow keys, which a screen reader might announce as part of the role description (depending on user settings). These still have to be scripted, but scripting arrow keys is only possible in forms mode, hence the role
itself would need to trigger that.
And this is what happens — interactive roles that imply forms-mode scripting, also trigger an automatic mode switch.
Examples of roles and their switching behavior
Widget roles like "slider"
and "tab"
imply the use of arrow keys, so focusing those elements switches to forms mode. The same is true for composite roles where the whole thing might take focus, like "tree"
, "menubar"
, and "combobox"
. Composite roles that are not interactive but have interactive descendants, will generally trigger the mode switch only when that descendant is focused, such as "gridcell"
within a "grid"
.
Conversely, widget roles like "button"
and "checkbox"
only have simple click actions, and don’t trigger forms mode because they don’t need to. Non-interactive widget roles like "tabpanel"
and "progressbar"
don’t need interaction events at all, so they don’t trigger forms mode, and the same is true for landmark roles, live regions and window roles.
Switching back to virtual cursor mode is also automatic. When focus moves from a forms-mode element to a non-forms-mode element, then the mode will usually switch back to virtual cursor, for example, when tabbing from a "textbox"
to a "button"
. However this won’t happen in some cases where the second element is validly contained by the first, for example, tabbing from an "application"
element to a "button"
inside it, won’t switch back to virtual cursor mode (not until focus leaves the application).
Documenting this stuff gets kinda convoluted, yet it’s actually quite intuitive in practice, and you generally won’t have to think about it, it just happens when it’s supposed to.
You’ll find out as you go along anyway — if you’re scripting for JAWS and NVDA, then you obviously need to test with them, so that’s when you’ll know what works for whatever you’re building.
But what if it doesn’t work?
Scripting outside the role
Sooner or later, you might want to script functionality that relies on forms mode, within a role context that doesn’t trigger the mode switch. These situations might be a technical problem, or they might point to a deeper conceptual problem.
Missing or invalid roles
Mode switching can’t be relied on for elements with invalid role combinations, or which don’t have any role context at all, for example:
<div>
<span tabindex="0">May Day</span>
<span tabindex="-1">Winter Dreams</span>
<span tabindex="-1">Absolution</span>
...
</div>
We could define a keydown
listener to move the roving focus using ArrowDown
and ArrowUp
events, and that would work fine in vanilla browsers. However it won’t work with a screen reader in virtual cursor mode, nor will it trigger forms mode, because the <span>
elements have no determinable role.
But this particular problem is deeper than scripting events — the markup doesn’t have any meaningful semantics, so even the static content isn’t useful for screen reader users. There’s no point trying to script functionality for content that doesn’t mean anything, so before scripting any interactions, you must define appropriate semantics.
This example should be an interactive list, which can be defined using the "listbox"
and "option"
roles. Since tabindex
makes them focusable, it’s now enough to trigger forms mode:
<div role="listbox">
<span role="option" tabindex="0">May Day</span>
<span role="option" tabindex="-1">Winter Dreams</span>
<span role="option" tabindex="-1">Absolution</span>
...
</div>
Sometimes though, single widgets aren’t enough on their own, and you might need forms-mode scripting for more extensive applications. ARIA has you covered here too.
Using the application role
The "application"
role is like an ARIA secret password, that switches into forms mode for itself and all its descendants, and is designed to be used for desktop-style applications. It doesn’t have any other semantic meaning, all it does is provide a scripting landmark.
But as Spider-Man soon realized, with great power comes great responsibility — regular static markup isn’t accessible at all, without switching back to virtual cursor navigation. We can’t expect users to manually switch back and forth, such expectation undermines the very purpose of this role, therefore everything within an application region must be either:
- a focusable element with a determinable role; or
- programmatically associated with such an element.
If that seems unreasonable or impractical, then don’t use the application role.
Notes on re-purposing HTML
ARIA roles override native element semantics, but they don’t override native behavior. Re-purposing elements for their behavior can be very useful, but could be problematic if the behavior and semantics aren’t compatible.
Native behaviors can save you having to script custom multi-modal events, like the click
event of a <button>
. For example, ARIA tabs could use buttons for the "tab"
elements:
<div role="tablist">
<button role="tab" aria-selected="false">Roger</button>
<button role="tab" aria-selected="true">Ben</button>
...
</div>
Then a single click
event could handle basic tab selection, for any kind of input including virtual cursors. To script that behavior manually takes half-a-dozen synchronized events. And since the "tab"
role also triggers forms mode, you can further refine their behavior with additional interactions, such as arrow keydown
navigation.
But it’s very important to only use compatible combinations — don’t use non-interactive semantics on interactive elements, and don’t use interactive semantics on elements with conflicting behavior:
<button role="caption">Who am I?</button>
...
<summary role="radio">What am I for?</summary>
Unfortunately, there’s one common contradiction that refuses to go away — a pattern so ubiquitous, I feel obliged to demonstrate how to make it right (if you have to):
<a href="#" role="button">Call me Ersatz</a>
In vanilla browsers, links can only be activated with Enter, whereas buttons are expected to also support Space activation (which only fires the click
event on keyup
). This will have to be scripted, ensuring that the keyup
target matches the preceding keydown
(and doesn’t cause scrolling).
The native link behavior also has to be suppressed, to stop the browser following that junk href
. But on the plus side, the lack of forms mode doesn’t matter here, since screen readers already support using Space to activate links, in either mode.
So overall, this is what’s required:
let keydownTarget = null;
button.addEventListener('keydown', (e) => {
if(e.key === ' ') {
keydownTarget = e.target;
e.preventDefault();
}
});
document.addEventListener('keyup', (e) => {
if(e.key === ' ' && keydownTarget === e.target) {
keydownTarget.click();
}
keydownTarget = null;
});
button.addEventListener('click', (e) => {
console.log('button was clicked');
e.preventDefault();
});
Whereas a real <button>
would only need this:
button.addEventListener('click', (e) => {
console.log('button was clicked');
});
Ain’t buttons great.
They never get old.
Further reading
- Understanding screen reader interaction modes (tink)
- Keyboard Shortcuts for JAWS (WebAIM)
- Keyboard Shortcuts for NVDA (WebAIM)
- WAI-ARIA Roles (MDN)
- ARIA states and properties (MDN)
- Using the ARIA application role (tink)
- Author guidance to avoid incorrect use of ARIA (W3C)
- ARIA Authoring Practices Guide (W3C)
Image credit: Zyanya Citlalli.
Comments