A Chrome Extension that understands emoji art for screen readers

4 min readMar 1, 2022

We are in the era of emoji art.

It getting dark at 4:30pm
○
く|)へ
〉
￣￣┗ My ability to stay awake
    ┗┓  ヾ○ｼ
   ┗┓ ヘ/    
  ┗┓ノ
          ┗┓

Person is kicked down the stairs with the caption “It getting dark at 4:30pm My ability to stay awake”

As fun as they can be to look at, they are an awful experience for people who are blind or otherwise use screen readers. That accessibility software is fairly simple, just allowing for words to be converted into speech. Every emoji character is read out individually. Other non-alphanumeric characters can cause problems for understanding.

Tweet showing the problem with emoji art

At the same time I am not convinced the solution is to stop posting them. New art forms are good things and previous forms of art have not had to go away due to accessibility. Rather, they adjusted to expand how they could be consumed.

Television has captions. Photos have descriptions which can be heard audibly. Books can be printed in braille.

Machine Learning to Identify Emoji Patterns

I started with a machine learning model using TensorFlow for Javascript, which I’d need to use later for a Chrome Extension.

Each classification is placed as a pair of files in a data/ directory structure. The first file contains a list of data for training, and the second contains a data structure representing the transformation from the original text to a usable caption.

/data
    emoji-art.regex.txt
    _training.json
    none.training.json
    none.regex.txt
    falling-down-stairs.training.json
    falling-down-stairs.regex.txt
    ...

A model file contains a series of examples for a given art classification. It is processed by <art-name>.training.json. The file contains a JSON-based array, containing an object with two properties:

text — The original text art
attribution — A link to the source material

A caption file by the name <art-name>.regex.txt contains the substitution and replacement values separated by a new line. This allows the system to generate an appropriate caption by capturing specific key information. Additional phrases may be used for the regular expression, like ALL_TEXT, which will return all of the readable text on the page.

Using regular expressions here is a bit tricky, and may change in the future, but was just one way to easily encode the necessary text transformations to something readable.

There are a handful of exceptions. A _training.json file contains a set of training data to verify the ML model. An emoji-art.regex.txt file represents the caption for an unidentified text art input. Later this unidentified data can be crowdsourced into useful labels.

Implementation

After going through the work of training the model, the first implementation was in a command-line program just to verify that the inference and captioning worked.

A text art has been pasted into the terminal. Following that, it is classified in the category “path-less-traveled” with a score of 0.49 and the caption is displayed: “A winding trail starting from me, skipping making a tweet without a typo, and ending up at regret”

From here, I wrapped up the TensorFlow model into a Chrome Extension. The extension will, when on Twitter, check each tweet against the model. If the ‘none’ label is inferred, no action is taken; any ordinary sentence is already readable.

For those with a different label, the contents of the tweet are replaced inline with the caption. That way, when a user with a screen reader listens to that tweet, they get something that is contextually the same in a format they understand.

An example of a tweet with a known emoji art format

The same tweet but with a readable caption: “A person performing a pose while saying Hey Google Talk to my test app #aogdevs”. The tweet is appended with “alt text autogenerated”.

Next Steps

This extension works fairly well, though I’m sure there are some optimizations to make it work a bit better in parallel. The use of ML is synchronous operations is not ideal, since it can be a second or longer to interpret each tweet.

But with some headway it can be a useful way to greatly improve the accessibility of any text art without each person remembering to provide alternate text.

As alluded to earlier, having to provide text for every type of emoji art is not easily scalable by one person. Ideally this technology can be implemented by a company providing accessibility software or moved to a crowdsourced model so that maintenance can be sustained.

Since TensorFlow is fairly adaptable, this model could undergo just a few tweaks and be easily embedded into your phones accessibility software or more integrated into an operating system.

https://twitter.com/Conundrum9999/status/1494187346102607876

This isn’t just limited to Twitter and text art either. Unicode more generally has a wide set of alphanumeric characters in slightly different text styles but cannot be interpreted by screen readers. This Chrome Extension can additionally replace 𝕁 with J, 𝕒 with a, and so on.

Unlike photos, which is made up of thousands of obscure pixels, text is smaller in size and is encoded more precisely. As such getting high accuracy requires fewer examples.

After putting together this proof of concept I did stop working on it. I just pushed the project to GitHub for anyone who might be interested.

A Chrome Extension that understands emoji art for screen readers

Machine Learning to Identify Emoji Patterns

Implementation

Next Steps

Written by Nick Felker

No responses yet