Creating Fake Data
Monday, May 17, 2010
A few days ago there was a post on Hacker News about a project called Faker.js. Basically, the project is a Javascript clone of libraries in Ruby and Perl for creating fake data for names, phone numbers, e-mails, street addresses, company information, etc.
I took a look at the code, and got inspired to create a Factor version.
First, some useful vocabularies:
USING: ascii combinators fry kernel make math random sequences ;
All three implementations seem to use simple random selection to generate the “fake” information. For example, each has a long list of valid first and last names. It looks sort of like:
CONSTANT: FIRST-NAME {
"Aaliyah" "Aaron" "Abagail" "Abbey" "Abbie" "Abbigail"
"Abby" "Abdiel" "Abdul" "Abdullah" "Abe" "Abel" "Abelardo"
"Abigail" "Abigale" "Abigayle" "Abner" "Abraham" "Ada"
"Adah" "Adalberto" "Adaline" "Adam" "Adan" "Addie" "Addison"
"Adela" "Adelbert" "Adele" "Adelia" "Adeline" "Adell"
...
CONSTANT: LAST-NAME {
"Abbott" "Abernathy" "Abshire" "Adams" "Altenwerth"
"Anderson" "Ankunding" "Armstrong" "Auer" "Aufderhar"
"Bahringer" "Bailey" "Balistreri" "Barrows" "Bartell"
"Bartoletti" "Barton" "Bashirian" "Batz" "Bauch" "Baumbach"
...
Given a long list of possible names, generating a fake name is no more complicated than:
IN: scratchpad FIRST-NAME random LAST-NAME random " " glue .
"Greyson Barrows"
Similarly, creating phone numbers is no more complicated than a list of typical phone number patterns, combined with a word that performs substitution of random numbers:
CONSTANT: PHONE-NUMBER {
"###-###-####"
"(###)###-####"
"1-###-###-####"
"###.###.####"
"###-###-####"
"(###)###-####"
"1-###-###-####"
...
: (numbers) ( str -- str' )
[ dup CHAR: # = [ drop "0123456789" random ] when ] map ;
Generating a fake phone number (without performing any kind of validation on area codes or local numbers) is as easy as:
IN: scratchpad PHONE-NUMBER random (numbers) .
"352-327-9815"
For added flavor, the author chose to include “business bullshit” generation:
IN: scratchpad 5 [ fake-bs . ] times
"leverage 24/7 models"
"deploy ubiquitous vortals"
"maximize holistic channels"
"exploit real-time niches"
"unleash proactive mindshare"
And “product catch phrase” generation:
IN: scratchpad 5 [ fake-catch-phrase . ] times
"Reverse-engineered value-added toolset"
"Diverse systemic concept"
"Ergonomic holistic pricing structure"
"Persevering local interface"
"Intuitive human-resource time-frame"
Useful for scale testing websites, practical jokes, and probably less innocent purposes. You can see the full version on my GitHub account.