# Re: Factor

Factor: the language, the theory, and the practice.

## Substrings

Sunday, July 10, 2011

One year ago, I wrote about some new combinatoric functions that I wrote. I had a recent need for a couple of “substring” functions. Since I couldn’t find them in Factor’s standard library, I thought I would contribute these:

### all-subseqs

Our first word takes a sequence, and then returns all (consecutive) subsequences that can be found. In Factor, the grouping vocabulary provides a clumping feature that splits a sequence into overlapping, fixed-length, subsequences. We can use this to find clumps of every possible length to solve this problem:

``````USING: grouping kernel ranges sequences ;

: all-subseqs ( seq -- seqs )
dup length [1..b] [ <clumps> ] with map concat ;
``````

You can see how this works:

``````IN: scratchpad "abcd" all-subseqs .
{ "a" "b" "c" "d" "ab" "bc" "cd" "abc" "bcd" "abcd" }
``````

Note: we specifically don’t include the “empty” string in the results.

### longest-subseq

Several algorithms and implementations can be found for the longest common substring problem. Basically, we want a word that returns the longest (consecutive) substring that is common between two strings.

Using locals, I translated the Python solution:

``````USING: arrays kernel locals math ranges sequences ;

:: longest-subseq ( seq1 seq2 -- subseq )
seq1 length :> len1
seq2 length :> len2
0 :> n!
0 :> end!
len1 1 + [ len2 1 + 0 <array> ] replicate :> table
len1 [1..b] [| x |
len2 [1..b] [| y |
x 1 - seq1 nth
y 1 - seq2 nth = [
y 1 - x 1 - table nth nth 1 + :> len
len y x table nth set-nth
len n > [ len n! x end! ] when
] [ 0 y x table nth set-nth ] if
] each
] each end n - end seq1 subseq ;
``````

Below, you can see how it works:

``````IN: scratchpad "abc" "def" longest-subseq .
""

IN: scratchpad "abcd" "abcde" longest-subseq .
"abcd"
``````

Note: don’t confuse this with the longest common subsequence problem (see substring vs. subsequence for more details), which is implemented in Factor by the lcs vocabulary.