AI/ML – Why all the Hype?
There is a lot of optimism, enthusiasm and trepidation on the interwebs about the transformative effect Artificial Intelligence (AI) will have on everything from marketing to the creative arts.
I often say that I am not an expert, but I know people who are. This is very much the case in terms of AI. Fortuitously, I have direct access to someone who is steeped in both accessibility knowledge and AI — or should I say Machine Learning (ML)? I am referring to David O’Neill, a Research Fellow at Vispero (the parent company of TPGi). He has a low social media profile but has quietly been driving improvements in automated accessibility testing, and accessibility testing in general, for decades.
In the following interview, we will learn about David, and how he envisions AI/ML to be a force for major improvements in how we tackle the thorny problems of making technology work better for people — all people.
David, thank you for talking with me. Please provide a little history of your work in accessibility:
My pleasure, Steve. I have worked in the field of Accessibility since the late 1980s. I began my career on the hardware side, designing microprocessor-based speech devices for people living with vision impairment. We were a small group of good people solving important problems with cool technology. My career in accessibility evolved alongside broader innovations in the technology landscape. Most of my contributions have been in accessible software development. I have been fortunate enough to build accessible solutions for leadership organizations, government agencies, commercial enterprises, and even some start-up’s. In the last 10 years or so, I have been focused on building accessibility platforms and tooling to help product teams identify, understand, and remediate accessibility issues. I am now a Technical Fellow at Vispero, where my role is largely focused on researching and developing unique technology solutions for our customers and the field.
Please provide a little history of your work in AI/ML:
AI/ML has evolved much as any other sub-field of computer science – often accompanied with much hype followed by disillusionment. In my early days, I worked for a large petrochemical firm and built several “Expert Systems” for their engineering functions. These were rudimentary neural networks meant to “automate” complicated decision trees in highly specialized domains. In the field of accessibility, I’ve built embedded solutions for image to speech conversion as well as ML regression modelling to “judge/predict” accessibility of web applications across the depth of user flows. However, my expertise and passion encompasses the broader field of Data Science, specifically Knowledge Engineering for ML models, which is a prime focus of my current work at Vispero.
Words have meaning — what are the differences between AI and ML, and in what terms do you think of this subject?
The classic response to that question is that machine learning (ML) is a sub-field of artificial intelligence (AI). And that’s fine, however it does not really explain the difference. At the heart of this discussion is the “meaning of intelligence”. Intelligence is generally accepted as the ability to acquire and apply knowledge to solve new problems. That involves cognitive ability and some apparent reasoning. So, I think of AI as a field of science dedicated to enabling “cognitive reasoning” in computing. This is certainly an oversimplification as there are tons of objectives and sub-disciplines under AI – including ML.
ML is a sub-field of AI that focuses on the “learning” portion of the “cognitive reasoning” equation. ML is powered by statistics, probability, linear algebra, and computer algorithms that have existed for quite some time. Using these forms of mathematics on a large enough dataset, ML “models” can be “trained” to “predict” outcomes for specific “tasks”. Given the advances in computing and storage capabilities we have today, we are able to create some highly performant and accurate ML-based implementations that “predict” outputs really well.
When we combine many of these ML tasks as “nodes” under a larger “pipeline”, we can power some very exciting applications that appear to be intelligent. ChatGPT is of course a popular example. ChatGPT relies on a large language model (LLM) trained in a variety of “generative” tasks, such as conversational text generation. It is so good at these tasks that we have a hard time believing it is not human. However, is it intelligent? Buy me a beer and we can discuss it further.
What are some of the other terms we should be aware of when talking about AI/ML?
I think this question comes up a lot now as a result of ChatGPT, so with that in mind I think folks should explore several topics to better understand what ChatGPT is and how it works:
AGI – Artificial General Intelligence
This is a good topic for those interested in the “AI sentient being” discussion. It is the stated end goal for many of the organizations producing these ML models today, including OpenAI. The goal is to train a machine to match or exceed human intelligence in all cognitive tasks. This is a very fuzzy topic as it brings into question all that we believe we know today about human intelligence. It begs the question, what is the actual difference between a human and a machine? So, look this up to gain some insight into where these advances are headed and what some of our non-technical challenges will be as a result.
Large Language Models (LLM)
ChatGPT is powered by an LLM built by OpenAI. LLMs are neural networks that have been trained on large bodies of language text. These networks are capable of generating human-like text responses (sequences) to prompts. They can be fine-tuned for specific tasks such as a chat bot. GPT-2, GPT-3, and GPT-4 are examples. However, there are literally hundreds of LLMs, which have been trained from a variety of datasets, that can be implemented in many different ways to accomplish many different tasks. If you want to navigate this subject, you’ll want to have a clear understanding of what an LLM is and how it differs from traditional natural language processing (NLP) approaches.
Ultimately, these LLMs produce text sequences in response to text prompts. The stuff that happens in between is pretty important. The Transformer model architecture is the current rage, for good very reason. It is based on a neural network design that leverages a highly performant “self attention” mechanism and supports the “transfer learning” capability. Deep dive into this to better understand why these two characteristics are revolutionary when compared to RNN/CNN (recurrent/convolutional) approaches and how neural networks are used in foundational LLMs such as BERT, T5, and the GPT based models.
Everything beyond those fundamental concepts is part of the rabbit hole. Enter at your own risk!
Some rabbit holes:
- Artificial general intelligence
- Planning for AGI and beyond
- When will singularity happen? 1700 expert opinions of AGI 
- Large language model
- Understanding Large Language Models
- What are large language models (LLMs) and how do they work?
- Transformer (machine learning model)
- What Is a Transformer Model
- Attention Is All You Need (PDF, 2Mb)
- Transformers, explained: Understand the model behind GPT, BERT, and T5
What you need to know is that ChatGPT is an application built upon an LLM that uses a Transformer based neural network architecture to generate human-like text responses to prompts. It does not think, it does not know, it does not feel. It is simply an amazing stochastic parrot capable of appearing sentient during conversations.
Questions provided by CHATGP4
Since we are on the topic of AI/ML, and ChatGPT is a very visible example, let’s see what questions it has for you…
Prompt: What questions would you ask a person, who has both machine learning and web accessibility expertise, about how AI will improve accessibility and accessibility testing?
- How can AI be integrated with web accessibility tools to automatically identify and correct accessibility issues in websites and applications?
- In what ways can machine learning algorithms be used to predict and prioritize common accessibility issues, so developers can address them more efficiently?
- How can AI-powered natural language processing (NLP) be utilized to improve the screen reader experience for visually impaired users and assist in better understanding complex content?
- Can AI be used to create more dynamic and context-aware alternative text descriptions for images, charts, and other multimedia content, enhancing the user experience for those relying on assistive technologies?
- How can AI-driven tools assist in automating the process of testing web accessibility, ensuring compliance with WCAG and other accessibility standards?
- What role can machine learning play in personalizing the accessibility experience for users with different needs and preferences?
- Are there any AI-based technologies that can help create more accessible user interfaces, such as predicting optimal color contrasts, font sizes, and layout choices?
- What advancements in AI do you foresee that could revolutionize web accessibility, and how can developers prepare for these changes?
- How can AI be used to improve accessibility for users with cognitive disabilities, such as dyslexia or attention deficit disorder, by adapting content and interface elements in real-time?
- What challenges do you see in implementing AI-based solutions for accessibility testing and improvements, and how can they be mitigated?
I love that you used ChatGPT to make your job as the interviewer easier! That is a phenomenon that the world is experiencing as a result of these LLM advancements. Though each question is worthy of discussion, I would recommend using one or two more nodes in your ML pipeline – perhaps a Summarization node that distils these questions down a bit, and then either a custom trained (Named Entity Recognition) NER or Classification node to nail down specific topics/labels in the ChatGPT response. From that we can infer a simplified set of broad use cases to explore.
I’ll give that a shot using whatever “Natural Intelligence” I have access to at the moment.
I’m going to break down the GPT responses into a few key use-cases that are good discussion candidates for AI/ML models:
- Detecting accessibility issues
- Remediating accessibility issues
- Transforming user experiences for universal accessibility
First off, it is a common misconception that AI/ML can solve all problems. That simply is not the case. The efficacy of AI/ML is a function of the use-case, availability of applicable models and tasks, and the volume/quality of available data for training and/or semantic search. So, evaluating these use-cases involves fitting it with proven ML tasks and inventorying your data.
Detecting Accessibility Issues is largely a classification task. Today, we perform accessibility issue detection in a highly deterministic manner. We have functions that accept code as an input and use rule-based logic to assert a “pass or fail” outcome. We can say that the following code snippet is not accessible because it is an image with no ALT text attribute:
<img src=”someimage.jpg”>. The lack of ALT text is easy to detect which allows us to classify the
<img> element as “Inaccessible” with 100% confidence.
Can we do that specific test better with ML using a probabilistic model? And if so, is it worth it? The answer is probably not. We can train a classifier on a million code examples of images that are not accessible but have no guarantee that it will learn enough to predict “Inaccessible” on future examples with 100% accuracy. A simple rule of thumb is to use conventional deterministic algorithms whenever you can – provided the algorithms work, of course! The reasoning is simple: why trade in a sure, easy thing for a hard and costly result that only has some “probability” of being correct?
The key point here is that there is no benefit in using ML with fuzzy results and <100% accuracy on tasks that already have 100% accuracy with a traditionally programmed, non-learned algorithm.
Does that mean there is no room for ML in detection of accessibility issues? Absolutely not! We just have to approach the problem differently. We could begin by asking ourselves, what are the issues that we cannot detect today with the deterministic “code only” approach? Do we have access to a large body of example code exhibiting these accessibility issues? Are there more input dimensions (features) available to solve the problem, beyond the code? How can those new features be extracted from our tech stack and integrated into an ML pipeline?
We do extremely well as accessibility professionals at inspecting/evaluating the actual experiential cues that result from code. What does it look like? How does it behave? How does it sound? These experiential cues can be thought of as “features” and they weigh in on our brain’s “classification” mechanism, resulting in an assertion that the code is either accessible or inaccessible.
At Vispero, we are working on how to capture these “experiential cues”, along with the underlying code, for a variety of ML use cases. We have already curated an enormous amount of data related to the “technical conformance” and “usability” of code samples. We are actively integrating this data into our ML pipelines and performing feature engineering.
So, yes, accessibility issue detection is a good ML use case candidate in my view.
Remediating Accessibility Issues
The holy grail of Remediating Accessibility Issues is automatically fixing accessibility code deployed in production. That is a fine goal, given the level of inaccessibility we see on the web today. With that goal in mind, it is probably best to view this as a “machine translation” task supported by a Sequence to Sequence (seq2seq) LLM model. Yes, there are other related ML and non-ML tasks that are in fact employed for aspects of this use case today, however “machine translation” is the essential task for achieving this unicorn of a use-case.
Yes, I did call it a unicorn, but let me rant for a second. Fundamentally, translating bad code into good code is really challenging – especially as it relates to accessibility. Anyone who has worked in the accessibility field will testify as to the variety and complexity of remediation needed for any given accessibility code issue. This is not just a matter of syntax and semantics, there is also the matter of intent, and ultimately, the actual vs. ideal user experience. As of today, I don’t have confidence that we can easily fine-tune existing LLMs to automatically remediate all types of accessibility issues found in the wild. We simply don’t have a model trained on the needed features and we don’t yet have all the data needed to fine tune the tasks. That does not mean we won’t be able to do this at some point in the future. Given an enormous amount of data, a well-engineered feature space, and lots of GPUs on hand for training and inference – perhaps anything is possible.
Note: I am commenting on LLM applications in this rant, do not take what I am saying as a rejection of other conventional and ML approaches that can be used to provide “auto-remediation” of poor HTML implementations. A long-tail strategy for universal accessibility will undoubtedly require a blending of all the techniques we see today.
Testing Hypotheses can be fun
A fun exercise to test my hypothesis would be to run some tests with ChatGPT, to do some simple remediation work – such as the
<img src=”someimage.jpg”> example I referenced earlier in this discussion. Ask ChatGPT to make it accessible. I bet it will add the
alt attribute to the code snippet, perhaps hallucinate a value for that attribute, and then spit out some rational explanation of why it is needed for accessibility. That alone is impressive, and it certainly feels like intelligence.
However, if you drop a code block with markup for say a carousel or tab control with multiple content formats in each panel, including images, videos, tabular data, etc. – see how well it detects and remediates issues. I suspect it will be far from complete, and that is using one of the best LLMs available to the ML community today. So, it is safe to conclude that we have some work to do.
So, unicorn aside, why not approach the use-case of accessibility issue remediation more practically? In my view, this is a classic knowledge integration problem. Given the tools that development teams use throughout the product life-cycle, we have the opportunity to supply focused accessibility knowledge at critical points during design and development. If we do this early enough, the need for remediation will be less frequent, the total cost of accessibility will be much lower and, most importantly, the quality of the user experience will be much higher.
To be perfectly clear, knowledge is more than “documented know-how”. It includes experiential awareness, inference-based recommendations, conversational challenges, and many other forms of “micro consultations”. Some specific examples might be a machine translation task to deliver the JAWS experience for components as they are designed in Figma, a “Github Copilot” like CodeX to make inline syntax and accessibility attribution recommendations while you code in your IDE, extractive and abstractive “question/answer” based chat bots that help developers understand and process their accessibility backlog items efficiently, and yes – dynamic remediation recommendations driven from classifiers and code generators for specific conditions detectable by automated accessibility tools.
These are just a few of the practical ML implementations under investigation and development here at Vispero.
Transforming User Experiences for Universal Accessibility
The last use case, Transforming User Experiences for Universal Accessibility is perhaps the most important and the one that might get the most attention going forward. Not just from the accessibility community, but from the Tech Giants and emerging start-ups. In a nutshell, we have an enormous corpus of “knowledge” and “transaction potential” encoded and hosted on what we know today as the web. Our primary user agent to interact with this corpus is a browser with many hardware and software adaptations that enable a variety of modalities. But these mechanics are dated when you think about it. The first browser debuted in 1990, a year after my start in accessibility!
I postulate that LLMs will render direct browsing of HTML obsolete in time. LLMs can easily treat the web as a very large semi-structured document store, which can be thoroughly translated into token embeddings, then stored in vector databases, which can be easily searched and used to feed semantically relevant “context” into “high-order” AI/ML applications for multi-directional exchange. The context will natively consider the personalization needs of all users and render a user experience in new multi-modal user “Agents”. This might sound utopian, however the capability to do all of this already exists and most of these steps have been taken or are already underway.
As platforms, pipelines, model hubs, vector stores and ML workflow tools proliferate, we will see a wave of application innovation like none other since the launch of the web 34 years ago. For those who don’t remember the “dot com boom”, it was wild with innovation and even absurd at times. And although there was a bust, the web changed every aspect of our lives. It was a profound disruption. I believe we are in a similar moment with AI/ML. The very way we do things with technology is going to change because of what technology is now able to do for us (and to us). Let me simplify – be prepared for a radical change in the technological user experience as a result of AI/ML.
I can’t say I know exactly what this new UX will be. I would argue that predicting the future in the “probabilistic computing age” is harder than ever. I do believe terms like “easier”, “more efficient”, “ubiquitous” and “multi-modal” are likely to be mantras for the future. Our role as accessibility professionals will be to ensure the lessons of the present are deeply integrated into this future. As a provider of AT and home of some of our brightest accessibility engineers, Vispero is a perfect advocate and qualified influencer in this space.
A few more questions from Steve:
What are the properties of Vispero, with its history of Assistive Technology (software and hardware) development, and its depth of knowledge from Accessibility Consultancy and accessibility testing tool development, that gives it a unique advantage in making use of AI/ML?
I believe companies that have traditionally valued knowledge and treated it with formal “data as a product” strategies will be best positioned to leverage AI/ML to create enterprise value. Vispero has done this very well. We have curated decades of knowledge in both our AT software business and accessibility consulting practices. We also established a dedicated center of excellence last year and staffed it with our most senior experts who have been entirely focused on new knowledge generation. Our corpus of knowledge is integrated under our ARC Platform which now directly feeds our ML pipelines, which will ultimately power applications that create unique customer value.
There are many questions around the ethics of AI/ML. How do you envisage Vispero’s use of AI/ML to mitigate ethical concerns?
For example: SyntheticUsers.com “Test your idea or product with AI participants and take decisions with confidence.”
On the one hand, this evolution is amazing, and I am certain it will dramatically advance accessibility. On the other hand, it is terrifying how quickly these changes are taking place. This is a fast-moving disruption and I think there is a general consensus that the ethical issues are not yet fully understood. But I guess that’s to be expected given how large-scale disruption always works.
Safe to say, the voice of ethical concern is rightfully growing louder. My personal concerns are not for Vispero. Our values, our mission, our culture, and our people, all lead to a very strong ethical footprint. That does not change with innovation.
My concerns are broader and involve the societal, political and economic disruption resulting from even the most altruistic adoption of these innovations. Existing systems of production based on knowledge and reason are going to change radically. Some experts are predicting that almost half of the jobs held by humans today will be performed by machines in the not-too-distant future. And unlike past labor impacting revolutions, the ML revolution will impact a much higher volume of the workforce, and for the first time, target workers with relatively high positions in the economic value chain. This is going to happen, and it will be interesting to see how the world wrestles with it.
A brief note on bad actors. There are always bad actors seeking to weaponize innovation. So how do we handle that likelihood? Who is talking about it and what is being done in terms of the ML ecosystem to combat it? I would argue not enough, given the rate of innovation and adoption. The issues of misinformation and privacy invasion are real and become attenuated rapidly in an autonomous technosociety. As we yield the power of reason to machines, we give up a big part of what makes us human. We create conscience voids that gradually become “normal”. At present, we have no framework to judge where the balance of power sits between humanity and technology. This is a fundamental concern that needs discussion, and I don’t want to hear what ChatGPT has to say.
I am seeing a lot of hype about autonomous agents, e.g., AutoGPT. How does this type of development factor into what Vispero is building and exploring in the AI/ML space?
This question ties back to the postulation of high-order AI/ML applications rendering UX through new multi-modal user “Agents”.
An “Agent” in AI/ML speak is a problem solver. It is able to take a broad prompt and break it down into tasks that it feels are needed to solve a specific problem. Here is a very simple example:
“User: When was the founder of Google born?”
To answer this question, the Agent would likely break this down into a set of iterative “internal” question answer tasks.
- Internal Question: Who is the founder of Google?
- Internal Answer: Google was founded by Larry Page and Sergey Brin
- Internal Question: When was Larry Page born?
- Internal Answer: Larry Page was born on March 26, 1973
- Internal Question: When was Sergey Brin born?
- Internal Answer: Sergey Brin was born on August 21, 1973
It would likely feed the answers back to a summarization task and the ultimate prompt/response mechanism would look something like this:
- “User: When was the founder of Google born?”
- “Agent: Google was founded by Larry Page, born on 3/26/1973, and Sergey Brin, born on 8/21/1973.”
It is clear that LLM-based Agents will evolve to include “action augmentation”. Today, we see Agents at work in applications that can carry on conversations – tomorrow they will begin to take direction and execute tasks on your behalf.
- “User: I’m hungry.”
- “Agent: Cool, based on your meeting schedule, I suggest you order something now. UberEATS can get you a spicy Italian sandwich from Joe’s Subs in the next 15 minutes. Does that work?”
- “User: Perfect. Include a diet soda and 20% gratuity.”
- “Agent: Done. One spicy Italian sandwich with a medium diet Pepsi arriving in 15 minutes. It will be delivered by John in a blue Prius.”
The Agent did a heck of lot in that exchange, and you can easily speculate on what the internal tasks were and what the sources of data were to support the ultimate action.
The key here is to acknowledge that the Agent enabled an entirely new user experience. One that was easier, more efficient, ubiquitous. Our job at Vispero is to ensure the modalities for engaging Agents are accessible to our user community.
An extremely exciting area for us as a company, given our unique understanding and history with this challenge!