Re: Factor

Factor: the language, the theory, and the practice.

Base256Emoji

Saturday, March 22, 2025

#unicode

While looking into the multibase group of self-identifying base encodings, I discovered Base256Emoji which is an encoding format described by an emoji to use for each byte of an input buffer. This spec is implemented, for example, in both Go and Rust

Despite replacing each byte with a typically 3-byte or 4-byte UTF-8 sequence – which is unusual for byte encodings (they often seek to reduce the length of an input sequence) – there are some nice use cases.

We’re going to implement this in Factor and then discuss some variants.

Implementation

First, we define a 256 item sequence of emojis, one for each byte:

CONSTANT: base256>emoji "๐Ÿš€๐Ÿชโ˜„๐Ÿ›ฐ๐ŸŒŒ๐ŸŒ‘๐ŸŒ’๐ŸŒ“๐ŸŒ”๐ŸŒ•๐ŸŒ–๐ŸŒ—๐ŸŒ˜๐ŸŒ๐ŸŒ๐ŸŒŽ๐Ÿ‰โ˜€๐Ÿ’ป๐Ÿ–ฅ\
๐Ÿ’พ๐Ÿ’ฟ๐Ÿ˜‚โค๐Ÿ˜๐Ÿคฃ๐Ÿ˜Š๐Ÿ™๐Ÿ’•๐Ÿ˜ญ๐Ÿ˜˜๐Ÿ‘๐Ÿ˜…๐Ÿ‘๐Ÿ˜๐Ÿ”ฅ๐Ÿฅฐ๐Ÿ’”๐Ÿ’–๐Ÿ’™๐Ÿ˜ข๐Ÿค”๐Ÿ˜†๐Ÿ™„๐Ÿ’ช๐Ÿ˜‰โ˜บ๐Ÿ‘Œ๐Ÿค—๐Ÿ’œ๐Ÿ˜”๐Ÿ˜Ž๐Ÿ˜‡\
๐ŸŒน๐Ÿคฆ๐ŸŽ‰๐Ÿ’žโœŒโœจ๐Ÿคท๐Ÿ˜ฑ๐Ÿ˜Œ๐ŸŒธ๐Ÿ™Œ๐Ÿ˜‹๐Ÿ’—๐Ÿ’š๐Ÿ˜๐Ÿ’›๐Ÿ™‚๐Ÿ’“๐Ÿคฉ๐Ÿ˜„๐Ÿ˜€๐Ÿ–ค๐Ÿ˜ƒ๐Ÿ’ฏ๐Ÿ™ˆ๐Ÿ‘‡๐ŸŽถ๐Ÿ˜’๐Ÿคญโฃ๐Ÿ˜œ๐Ÿ’‹\
๐Ÿ‘€๐Ÿ˜ช๐Ÿ˜‘๐Ÿ’ฅ๐Ÿ™‹๐Ÿ˜ž๐Ÿ˜ฉ๐Ÿ˜ก๐Ÿคช๐Ÿ‘Š๐Ÿฅณ๐Ÿ˜ฅ๐Ÿคค๐Ÿ‘‰๐Ÿ’ƒ๐Ÿ˜ณโœ‹๐Ÿ˜š๐Ÿ˜๐Ÿ˜ด๐ŸŒŸ๐Ÿ˜ฌ๐Ÿ™ƒ๐Ÿ€๐ŸŒท๐Ÿ˜ป๐Ÿ˜“โญโœ…๐Ÿฅบ๐ŸŒˆ๐Ÿ˜ˆ\
๐Ÿค˜๐Ÿ’ฆโœ”๐Ÿ˜ฃ๐Ÿƒ๐Ÿ’โ˜น๐ŸŽŠ๐Ÿ’˜๐Ÿ˜ โ˜๐Ÿ˜•๐ŸŒบ๐ŸŽ‚๐ŸŒป๐Ÿ˜๐Ÿ–•๐Ÿ’๐Ÿ™Š๐Ÿ˜น๐Ÿ—ฃ๐Ÿ’ซ๐Ÿ’€๐Ÿ‘‘๐ŸŽต๐Ÿคž๐Ÿ˜›๐Ÿ”ด๐Ÿ˜ค๐ŸŒผ๐Ÿ˜ซโšฝ๐Ÿค™\
โ˜•๐Ÿ†๐Ÿคซ๐Ÿ‘ˆ๐Ÿ˜ฎ๐Ÿ™†๐Ÿป๐Ÿƒ๐Ÿถ๐Ÿ’๐Ÿ˜ฒ๐ŸŒฟ๐Ÿงก๐ŸŽโšก๐ŸŒž๐ŸŽˆโŒโœŠ๐Ÿ‘‹๐Ÿ˜ฐ๐Ÿคจ๐Ÿ˜ถ๐Ÿค๐Ÿšถ๐Ÿ’ฐ๐Ÿ“๐Ÿ’ข๐ŸคŸ๐Ÿ™๐Ÿšจ๐Ÿ’จ\
๐Ÿคฌโœˆ๐ŸŽ€๐Ÿบ๐Ÿค“๐Ÿ˜™๐Ÿ’Ÿ๐ŸŒฑ๐Ÿ˜–๐Ÿ‘ถ๐Ÿฅดโ–ถโžกโ“๐Ÿ’Ž๐Ÿ’ธโฌ‡๐Ÿ˜จ๐ŸŒš๐Ÿฆ‹๐Ÿ˜ท๐Ÿ•บโš ๐Ÿ™…๐Ÿ˜Ÿ๐Ÿ˜ต๐Ÿ‘Ž๐Ÿคฒ๐Ÿค ๐Ÿคง๐Ÿ“Œ๐Ÿ”ต๐Ÿ’…๐Ÿง\
๐Ÿพ๐Ÿ’๐Ÿ˜—๐Ÿค‘๐ŸŒŠ๐Ÿคฏ๐Ÿทโ˜Ž๐Ÿ’ง๐Ÿ˜ฏ๐Ÿ’†๐Ÿ‘†๐ŸŽค๐Ÿ™‡๐Ÿ‘โ„๐ŸŒด๐Ÿ’ฃ๐Ÿธ๐Ÿ’Œ๐Ÿ“๐Ÿฅ€๐Ÿคข๐Ÿ‘…๐Ÿ’ก๐Ÿ’ฉ๐Ÿ‘๐Ÿ“ธ๐Ÿ‘ป๐Ÿค๐Ÿคฎ๐ŸŽผ๐Ÿฅต\
๐Ÿšฉ๐ŸŽ๐ŸŠ๐Ÿ‘ผ๐Ÿ’๐Ÿ“ฃ๐Ÿฅ‚"

And then we compute the reverse mapping:

CONSTANT: emoji>base256 $[ base256>emoji H{ } zip-index-as ]

With those two data blocks, we can define words to convert into and out of base256emoji:

: >base256emoji ( bytes -- str )
    [ base256>emoji nth ] "" map-as ;

: base256emoji> ( str -- bytes )
    [ emoji>base256 at ] B{ } map-as ;

You can try it out:

IN: scratchpad "Hello, Factor!" >base256emoji .
"๐Ÿ˜„โœ‹๐Ÿ€๐Ÿ€๐Ÿ˜“๐Ÿ’ช๐Ÿ˜…๐Ÿ’“๐Ÿคค๐Ÿ’ƒ๐Ÿ˜ˆ๐Ÿ˜“๐Ÿฅบ๐Ÿ‘"

IN: scratchpad "๐Ÿ˜„โœ‹๐Ÿ€๐Ÿ€๐Ÿ˜“๐Ÿ’ช๐Ÿ˜…๐Ÿ’“๐Ÿคค๐Ÿ’ƒ๐Ÿ˜ˆ๐Ÿ˜“๐Ÿฅบ๐Ÿ‘" base256emoji> "" like .
"Hello, Factor!"

Use Cases

The most interesting use case for base256emoji seems to be the ERC-7673: Distinguishable base256emoji Addresses proposal for Ethereum. This proposal seeks to “address spoofing attacks [that] have mislead tens of thousands of ether, and countless other tokens” by using visual emoji-based strings – a similar justification to the Drunken Bishop algorithm.

We can try base256emoji out with checksums to yield a similar benefit, displaying 16 emojis instead of a 32-byte hex-string:

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               bytes>hex-string .
"ebb5ab617e3a88ed43f7d247c6466d95"

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               >base256emoji .
"๐Ÿ’Œ๐Ÿ’จ๐Ÿคจ๐Ÿคค๐Ÿ˜ โœจ๐Ÿ˜น๐Ÿฅ€๐Ÿ˜๐ŸŽผ๐Ÿค ๐Ÿคฉโฌ‡๐Ÿ’“๐ŸŒท๐Ÿค™"

Visual Collisions

Given a desire to use visually dissimilar emojis in identities, it would be useful to think about the chosen emoji set and how that might be interpreted in a visual context. Some criticism of this particular group of emojis, which are sometimes magnified by smaller font sizes, might focus on the visual similarity of:

  1. Similar “globe” emojis (๐ŸŒ ๐ŸŒ ๐ŸŒŽ)
  2. Similar “face” emojis (๐Ÿ™‚ ๐Ÿ˜ ๐Ÿ˜‘ ๐Ÿ™)
  3. Similar “kiss” emojis (๐Ÿ˜™ ๐Ÿ˜š ๐Ÿ˜— ๐Ÿ˜˜)
  4. Similar “star” emojis (โญ ๐ŸŒŸ)
  5. Similar “grin” emojis (๐Ÿ˜€ ๐Ÿ˜ƒ ๐Ÿ˜„ ๐Ÿ˜ ๐Ÿ˜† ๐Ÿ˜…)
  6. Similar “heart” emojis (๐Ÿ’” ๐Ÿ’— ๐Ÿ’• ๐Ÿ’– ๐Ÿ’˜ ๐Ÿ’™ ๐Ÿ’œ ๐Ÿ’š ๐Ÿ’› ๐Ÿ–ค)
  7. Similar “hand” emojis (๐Ÿคž ๐Ÿ‘‹ โœ‹ ๐Ÿ‘Š โœŠ ๐Ÿค ๐Ÿคฒ)

It might be desirable to choose emojis not based on community membership or common usage, but on their most dissimilar visual identity. to make it even harder for scammers to deliberately pick lookalike emojis and rely on small text sizes, platform font differences, or user inattention.

A couple of other emoji sets can be found, for example @Equim-chan/base256 (which uses one of my favorite emojis: ๐Ÿ‘พ), npm/base-emoji, or even the KittenMoji crate.

Fun!

This is available on my GitHub.