Automating Alt Text Generation in Prismic
Andri | June 19, 2024 | 4 mins
In today’s digital age, making accessibility a priority for businesses is essential. Websites should comply with the Web Content Accessibility Guidelines (WCAG), which set global standards for creating online content that is inclusive of people with disabilities.
Alternative text (ALT text) is crucial for helping those with visual impairments understand images on a website, as it’s ALT text that is read aloud by “screen readers”. Not to mention their importance for ranking purposes and SEO in Google search results.
That said, ALT text’s importance is often neglected with 58.2% of homepages missing ALT text in 2023! One of the main reasons ALT text is often overlooked is because traditionally it’s been a labour-intensive, manual process.
However, what if AI could put 'humans first' and help brands deliver a better experience to those affected with visual impairments? What if we could automate the process… idea!
During one of our Human First Collective AI workshops, we discussed: “Wouldn’t it be great to use AI to generate ALT text within a CMS?”
From this conversation we started the development of a new tool that not only automates ALT text generation using AI… but also integrates smoothly with Prismic CMS, enhancing both functionality and accessibility!
So, how does it all work?
The Technical Solution
The core of this solution lies in the use of Salesforce's BLIP, transforming images into contextually relevant captions. BLIP (Bootstrapping Language-image Pretraining) is an AI-powered model developed by Salesforce, a global leader in cloud-based software solutions. Leveraging state-of-the-art deep learning techniques, this model can seamlessly transform images into descriptive and contextually relevant captions.
Replicate serves as a robust platform that facilitates the deployment and scaling of AI models like BLIP. It provides the necessary infrastructure to handle requests at scale, ensuring that the alt text generation process is both swift and reliable.
Solution Architecture
The architecture for this solution consists of three primary components:
Prismic CMS: The central repository for managing and storing image assets. Prismic’s API facilitates external communications, such as retrieving and updating assets.
BLIP on Replicate: This is the AI model responsible for analysing images and generating ALT text. It processes images through their URLs (a web address that points directly to the image file hosted on the internet) and outputs descriptive text for each.
Integration Script: A Node.js script orchestrates interactions between the Prismic CMS and BLIP on Replicate. It retrieves images from Prismic, identifies those lacking ALT text, and utilises BLIP on Replicate to generate and subsequently update ALT text in Prismic.
Image 1 - High-level context diagram illustrating the system’s design
The Workflow
The NodeJS script starts by making a call to the Prismic API, pulling image assets from the CMS through their URL. Images without ALT text are flagged and processed through BLIP on Replicate, which analyses the content and generates appropriate ALT text for each image. This newly generated ALT text is then automatically updated for each image in the Prismic CMS, using the Prismic API’s update functionality.
A sequence diagram can be used to illustrate the interactions between different parts of the system, as can be seen here:
Image 2 - Sequence Diagram
Let's see it in action…
Image 3 - Animation showing the AI-generated alt text being added to Prismic
The Impact and Considerations
This AI integration provides numerous advantages. It improves content accessibility by generating precise ALT text, making digital environments more inclusive for individuals with visual impairments. It automates the creation of ALT text, allowing content creators to concentrate on other facets of digital engagement. It ensures the maintenance of high accessibility standards as the volume of digital content increases.
While BLIP on Replicate is integral to this solution, other AI services such as Google Cloud Vision API, Microsoft Azure AI Vision, and Amazon Rekognition also offer robust image recognition capabilities. The selection of these services may differ based on performance, compatibility, and the specific demands of your project. However, with numerous possibilities at your disposal, the decision is yours to determine which best meets your needs.
Conclusion
Integrating AI technologies such as ‘BLIP on Replicate’ with content management systems like Prismic clearly demonstrates the vast possibilities for progress in web accessibility. Automating ALT text generation not only meets a specific accessibility requirement but also enhances a commitment to a more inclusive digital sphere. As technology progresses, it remains essential to harness these capabilities to create impactful societal solutions.
LAB is well-positioned to aid businesses in harnessing AI and other web technologies to improve their digital strategies. If you're interested in discovering how LAB can advance your business, please feel free to reach out.