Foundations

Frictionless PDF accessibility with ML and OCR

Accessibility is about more than a one-to-one translation of features: It’s an entire system of engineered support, intended to create a customized user experience. Last year, the Chrome & ChromeOS Accessibility Team partnered with the Google OCR team to provide democratized accessibility to PDFs: screen-readable, navigable, and easy-to-launch.

12%: PDF usage growth
60%: A11y user reach
1M+: Weekly pages OCRed

The problem of PDFs

If you use a screen-reader, you know: 360+ billion PDFs (12% of all PDFs on the web) today are inaccessible ^[1]. While PDF accessibility has been improving, it’s still frustrating to encounter a necessary document that has not been properly processed for screen reading—and even the documents that are processed via OCR may not be easy to navigate.

Machine-learning Optical Character Recognition (ML OCR) is one of the earliest applied forms of modern AI. But rudimentary OCR systems simply provide a direct read of the text on the screen—absent information architecture, meta information, and contextual clues. Overall, this provides poor UX even when OCR is supported.

While there are systems designed for greater levels of PDF accessibility, most of them are paid and/or externalized services—you must transfer the document to another app to read it, creating friction. For users of assistive technology, having internal ML OCR functionality within the default PDF reader makes the experience less different. To create a truly accessible experience, functionality must always be readily available and available for free.

Developing a truly accessible system

For the best UX, the Chrome & ChromeOS Accessibility Team wanted to use raw ML OCR data to create a framework navigable by users with low or no vision—not just displaying the information on the screen, but automatically generating navigation and landmarks.

By post-processing the data, the Accessibility Team was able to build navigation trees and landmarks such as page numbers on-the-fly—not only could PDFs be read, but the process of reading them was made easier.

Processing on any hardware and any device

However, the process of ML OCR comes with a fairly hefty computational cost. The team had to provide computationally expensive OCR and OCR post-processing on many different platforms and hardware architectures, so users could easily use the features on their own device without an active internet connection and without privacy concerns.

To achieve this, the team had to migrate code originally developed to be run on Google Linux servers, disconnected from Google’s operational environment, compatible with all different platforms (MacOS, Windows, and ChromeOS), and all possible hardware architectures.

Further, the code that runs on Google servers assumes a certain level of security in its environment—but when the code is run on users’ computers, this can’t be assumed. Consequently, the team also needed to make their code secure enough that a malicious agent could not use it to compromise Chrome or the user’s computer.

Since the feature was not needed by all users, the team did not make it an essential part of Chrome. Instead, the team chose to deliver the feature on demand based on the user’s hardware and software configuration on their device.

Broader cross-platform accessibility on ChromeOS

Accessibility is never complete; it’s in a continual state of improvement. Looking to the future, the Accessibility Team hopes to improve bounding, UX, and fidelity, while scaling PDF accessibility to all Chrome browser users across every platform—and adding OCR to other Chrome devices that may benefit.

Since releasing ML OCR for PDFs, the team has expanded OCR support to 77 languages and seven additional scripts: Arabic, Bengali, Cyrillic, Deva, Chinese, Japanese, and Korean. Users can now have scanned docs distilled in Chrome’s reading mode⁠ via OCR for users that want a more focused and accessible view of the text they read on the web.

And screen readers will now for the first time ever be able to read PDFs on their Chromebook in the native Media / Gallery App⁠. The accessibility team has built OCR into this native app so that users can read PDFs offline or without the need to go on the browser—unlocking billions of inaccessible PDFs that can now be accessed directly on your Chromebook.

Google internal study.
↩︎