Base256Emoji
Saturday, March 22, 2025
While looking into the multibase group of self-identifying base encodings, I discovered Base256Emoji which is an encoding format described by an emoji to use for each byte of an input buffer. This spec is implemented, for example, in both Go and Rust
Despite replacing each byte with a typically 3-byte or 4-byte UTF-8 sequence – which is unusual for byte encodings (they often seek to reduce the length of an input sequence) – there are some nice use cases.
We’re going to implement this in Factor and then discuss some variants.
Implementation
First, we define a 256 item sequence of emojis, one for each byte:
CONSTANT: base256>emoji "๐๐ชโ๐ฐ๐๐๐๐๐๐๐๐๐๐๐๐๐โ๐ป๐ฅ\
๐พ๐ฟ๐โค๐๐คฃ๐๐๐๐ญ๐๐๐
๐๐๐ฅ๐ฅฐ๐๐๐๐ข๐ค๐๐๐ช๐โบ๐๐ค๐๐๐๐\
๐น๐คฆ๐๐โโจ๐คท๐ฑ๐๐ธ๐๐๐๐๐๐๐๐๐คฉ๐๐๐ค๐๐ฏ๐๐๐ถ๐๐คญโฃ๐๐\
๐๐ช๐๐ฅ๐๐๐ฉ๐ก๐คช๐๐ฅณ๐ฅ๐คค๐๐๐ณโ๐๐๐ด๐๐ฌ๐๐๐ท๐ป๐โญโ
๐ฅบ๐๐\
๐ค๐ฆโ๐ฃ๐๐โน๐๐๐ โ๐๐บ๐๐ป๐๐๐๐๐น๐ฃ๐ซ๐๐๐ต๐ค๐๐ด๐ค๐ผ๐ซโฝ๐ค\
โ๐๐คซ๐๐ฎ๐๐ป๐๐ถ๐๐ฒ๐ฟ๐งก๐โก๐๐โโ๐๐ฐ๐คจ๐ถ๐ค๐ถ๐ฐ๐๐ข๐ค๐๐จ๐จ\
๐คฌโ๐๐บ๐ค๐๐๐ฑ๐๐ถ๐ฅดโถโกโ๐๐ธโฌ๐จ๐๐ฆ๐ท๐บโ ๐
๐๐ต๐๐คฒ๐ค ๐คง๐๐ต๐
๐ง\
๐พ๐๐๐ค๐๐คฏ๐ทโ๐ง๐ฏ๐๐๐ค๐๐โ๐ด๐ฃ๐ธ๐๐๐ฅ๐คข๐
๐ก๐ฉ๐๐ธ๐ป๐ค๐คฎ๐ผ๐ฅต\
๐ฉ๐๐๐ผ๐๐ฃ๐ฅ"
And then we compute the reverse mapping:
CONSTANT: emoji>base256 $[ base256>emoji H{ } zip-index-as ]
With those two data blocks, we can define words to convert into and out of base256emoji:
: >base256emoji ( bytes -- str )
[ base256>emoji nth ] "" map-as ;
: base256emoji> ( str -- bytes )
[ emoji>base256 at ] B{ } map-as ;
You can try it out:
IN: scratchpad "Hello, Factor!" >base256emoji .
"๐โ๐๐๐๐ช๐
๐๐คค๐๐๐๐ฅบ๐"
IN: scratchpad "๐โ๐๐๐๐ช๐
๐๐คค๐๐๐๐ฅบ๐" base256emoji> "" like .
"Hello, Factor!"
Use Cases
The most interesting use case for base256emoji seems to be the ERC-7673: Distinguishable base256emoji Addresses proposal for Ethereum. This proposal seeks to “address spoofing attacks [that] have mislead tens of thousands of ether, and countless other tokens” by using visual emoji-based strings – a similar justification to the Drunken Bishop algorithm.
We can try base256emoji out with checksums to yield a similar benefit, displaying 16 emojis instead of a 32-byte hex-string:
IN: scratchpad "resource:license.txt" md5 checksum-bytes
bytes>hex-string .
"ebb5ab617e3a88ed43f7d247c6466d95"
IN: scratchpad "resource:license.txt" md5 checksum-bytes
>base256emoji .
"๐๐จ๐คจ๐คค๐ โจ๐น๐ฅ๐๐ผ๐ค ๐คฉโฌ๐๐ท๐ค"
Visual Collisions
Given a desire to use visually dissimilar emojis in identities, it would be useful to think about the chosen emoji set and how that might be interpreted in a visual context. Some criticism of this particular group of emojis, which are sometimes magnified by smaller font sizes, might focus on the visual similarity of:
- Similar “globe” emojis (๐ ๐ ๐)
- Similar “face” emojis (๐ ๐ ๐ ๐)
- Similar “kiss” emojis (๐ ๐ ๐ ๐)
- Similar “star” emojis (โญ ๐)
- Similar “grin” emojis (๐ ๐ ๐ ๐ ๐ ๐ )
- Similar “heart” emojis (๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ค)
- Similar “hand” emojis (๐ค ๐ โ ๐ โ ๐ค ๐คฒ)
It might be desirable to choose emojis not based on community membership or common usage, but on their most dissimilar visual identity. to make it even harder for scammers to deliberately pick lookalike emojis and rely on small text sizes, platform font differences, or user inattention.
A couple of other emoji sets can be found, for example @Equim-chan/base256 (which uses one of my favorite emojis: ๐พ), npm/base-emoji, or even the KittenMoji crate.
Fun!
This is available on my GitHub.