# Doc/Articles/Play204

**At Play With J**
... Table of Contents ... Previous Chapter ... Next Chapter

# 37. Jacobâs Ladder

. *By* Eugene McDonnell. *First published in* Vector, 20, 4, *(April 2004), 84-97*.

*And he dreamed, and behold a ladder set up on the earth, and the top of it reached to heaven: and behold the angels of God ascending and descending on it.*

Genesis 28:12

Dedicated to my grandson Jacob (15 months old).

### The Name of the Game

Lewis Carroll invented a game he called *Doublets* in 1879. He used *doublet* to describe two words of the same length, which were to be connected by a *chain* of other words, called *links*. Two words are linked if they use the same letters in every position but one, like *rota* and *iota*. As an example, he gave as a doublet the words *head* and *tail*, and for the links the words *heal*, *teal*, *tell* and *tall*, so that the entire chain was *head*, *heal*, *teal*, *tell*, *tall* and *tail*. The word Doublet hasn't stuck, however, and the game is now usually called *Word Ladders*.

The Word Ladder game can be played mentally, and many people enjoy playing in their head, sometimes making it a game for two or more people, to see who can find a ladder quickest. This article, however, treats computer solutions to the problem, which now asks that the chain be as short as possible. There may be more than one shortest solution. For example,

+----+----+ |head|head| |heal|heal| |teal|heil| |taal|hail| |tail|tail| +----+----+

are two solutions shorter by one link than Carroll's. Carroll would probably point out that *taal* is usually capitalized, in phrases like *the Taal* ; it is a name for a language, like *English* or *Italian*, and is another name for *Afrikaans*, one of the official languages of South Africa; and *heil* is a German interjection used infamously by the Nazis in phrases like *Heil Hitler*. These words let me point out that *all* the words in my word tables are in lower case, even names like *Hugo* and *Clive* ; and that they include numerous words from foreign languages that have gained currency in English, like Russian *dvor* and Spanish *amigo*. Different word lists will give different results.

Some doublets give rise to eight or more solutions ' here are the solutions for the doublet *white* and *black*.

+-----+-----+-----+-----+-----+-----+-----+-----+-----+ |white|white|white|white|white|white|white|white|white| |whine|whine|whine|whine|whine|whine|whine|whine|whine| |chine|chine|chine|chine|chine|chine|chine|chine|chine| |chink|chink|chink|chink|chink|cline|cline|cline|cline| |chick|clink|clink|clink|clink|clink|clink|clink|clink| |click|blink|clank|clank|click|blink|clank|clank|click| |clack|blank|blank|clack|clack|blank|blank|clack|clack| |black|black|black|black|black|black|black|black|black| +-----+-----+-----+-----+-----+-----+-----+-----+-----+

Later on I'll bring to your attention the ladders for the doublet *dvor* and *lade*: there are eighty solutions to it, each nine words long!

My word tables have a history. Many years ago, in our IPSA office in Palo Alto, Joey Tuttle acquired a tape from Houghton Mifflin, the publishers of *The American Heritage Dictionary*, that contained an alphabetical listing of all the words in their dictionary. Joey extracted a number of files, each file giving all of the words having the same length, and mounted these on the I. P. Sharp computer in Toronto. I found the list useful in a number of ways, and one of the things I did was to write an APL program to form Word Ladders, of which more later. I have these word files now on my personal computer.

### Structure of the Game

Looked at from the point of view of the game, a table of words is seen as an undirected graph, where the nodes are the words, and the edges are the links for each word. Linkness is symmetric: if *iota* is a link of *rota*, then vice-versa. This being the case, the graph is also symmetric, and the counterdiagonal is all zero - a word is not a link of itself. A word may have many links. For example, *bare* has 26, using my table. They are:

+----+----+----+----+----+----+----+----+----+----+----+----+----+ |babe|bake|bale|bane|base|bate|barb|bard|bari|bark|barm|barn|bars| +----+----+----+----+----+----+----+----+----+----+----+----+----+ |care|dare|fare|hare|mare|pare|rare|tare|vare|ware|yare|bore|byre| +----+----+----+----+----+----+----+----+----+----+----+----+----+

A word may have no links, as well. Some four-letter hermits are *agog, ecru, idol, ugly, xmas, yeti* and *zarf*.

I've adopted some naming conventions. A word list - actually a character table - is called *Tn*, where *n* is the length of the words in the table. Thus my four-letter word table is *T4*. A link list, one that gives the links for each word, is called *En*. Thus, the link list for four-letter words is *E4*. Later on, in the discussion of J functions, I'll introduce some more conventions.

### The Link List

There's a story to do with the creation of link lists from a word table. I wrote one for myself that took a *Tn* as argument, and produced the corresponding *En*. I showed this to Roger Hui, who noted that it had quadratic time. He thought it would be possible to make one that had linear time. At the time of this message we were still calling link lists *neighbor lists*.

From: rhui000@shaw.ca Subject: Re: Word Ladders Date: February 14, 2004 8:10:30 AM PST To: eemcd@mac.com There is indeed a much faster, linear, method for generating the neighbors list. The idea is this: for each word, blank out successive letters, and use these blanked out words to match for neighbors. e.g. if the words are abba and abbe, _bba _bbe a_ba a_be ab_a ab_e abb_ abb_ So if you have a (m,n) matrix of words, in the matching you'd be dealing with a ((n*m),n) matrix and doing linear operations on it, rather than the (m,m,n) outer product. mlx =: <@I.@(<:@#@[ = +/@|:@:="1)"1 2~ NB. eemcd mlx1=: [: <@I."1 <:@{:@$ = +/ @: ="1 / ~ NB. eemcd mlx2=: 3 : 0 NB. hui 'm n'=. $y NB. # of words, # letters i=. n (* + i.@[ *"1 -.@])=i.n NB. indices for blanking successive positions t=. ,/ i{"_ 1 y,.'_' NB. blanked words p=. ,/@:(([ ,. -.~)"0 1)~ NB. verb for pairing j=. ; t <@p/. n#i.#y NB. group word indices per blanked words h=. ({."1 </. {:"1) j NB. unordered neighbors k=. ~. {."1 j NB. word indices corresp. to h (m{.h-.&.>k) /: k,(i.m)-.k NB. reorder ) mlx and mlx1 give identical results. mlx2 gives neighbor indices that are unordered, but is otherwise identical. The improved efficiency in mlx2 is more pronounced as the number of words gets large. alp=: a.{~97+i.26 x=: ~. alp {~ 4000 4 ?@$ #alp NB. table of 4000 'words' $x NB. 3980 after duplicates removed 3980 4 (mlx -: /:~&.>@mlx2) x 1 (mlx1 -: /:~&.>@mlx2) x 1 ts=: 6!:2 , 7!:2@] NB. to find time and space used ts 'mlx x' 81.9158 608640 ts 'mlx1 x' 12.1867 8.3887e7 ts 'mlx2 x' 0.490401 1.47846e6

As you can see, for a table of 3980 words, Hui's program is about 160 times faster. That's a lot. It does use twice the space, but it's worth it. I've been happily using `mlx2` since to form my link lists. It's always a joy to get instant results rather than "start the process, go out and mow the lawn, then come back and maybe it will be done."

### Two Schools of Thought

I'll discuss two different ways of building ladders, given a table `Tn`, and a links list `En`. The first way is the standard approach to finding paths in a graph: search forward from one of the words, the *starting* word, and work one's way through until the other word, the *goal* word is reached, or it is found that there is no path. This is the approach used in Edsgar Dijkstra's well-known algorithm. The function that uses this forward search I call `FL`, for 'forward ladders'.

The second way, one I used a quarter-century ago, started searching from both ends of the problem. This can be done because of the problem's symmetry: if I find the shortest path from white to black, I've also found the shortest path from black to white. My intuition told me that a forward search algorithm would take significantly more space, and possibly more time, than a two-way search. Somewhat fortuitously, the two-way search can be identified with the biblical ladder in Jacob's dream, with angels going up and down. Because of this, and in honor of my grandson Jacob, I'll call the two-way search the *Jacob's Ladder* way, and the associated function `JL`, for "Jacob's Ladder".

My reasoning was this: Suppose we use `FL`, starting with a one-by-one matrix, and that there are three links in every item of `En`. After one step, we have at most a three-by-two matrix; after two a nine-by-three ... and after eight we have at most a 6561-by-9 matrix. Suppose that we have now found the goal. We've looked at 6561 nodes.

If, on the other hand, we use `JL`, after four iterations forward and backward, we have at most two 81-by-5 matrices, meaning that we have, at most, 162 nodes. In each case we've taken eight steps. The backward steps begin with a node that is necessarily one of the 6561. The next backward step continue with 3 nodes, one of which is in that of the 2187 of the forward search. Likewise, the next backward step finds 9 nodes, one of which is in the 729 of the forward search. Next 27, one included in the 243. By the next step, finding 81 nodes, the forward search has also found 81, and one of the backward 81 nodes must be among the 81 in the forward search. Thus the forward search has worked with 40 times more nodes than the forward and backward search. I had to conclude that it would take less time and less space for the two-way search.

### The Case of the Mysterious Test

I wrote `FL` and `JL`, tested them, and found that my reasoning was correct, and communicated the results to Roger Hui. His response was dumbfounding. Earlier I had sent Roger a copy of the first fifty words in T4. He generated a two-column table called pairs giving all 1225 combinations of two out of fifty, and used this as the right argument in a timing test. He made the left argument `E` using `mlx2` on the first fifty words. This is what he saw:

$pairs=: 2 comb 50 1225 2 ts 'E FL"1 pairs' 0.94485 1.1991e6 ts 'E JL"1 pairs' 0.90665 1.1991e6

I tried this test on my machine and got the same kind of result: the forward search had essentially the same speed and the same use of space as Jacob's Ladder. This was completely contrary to all the earlier measurements I had made, but I couldn't deny what my own senses told me. For several days I was at a standstill - I had no idea what was wrong with my earlier measurements - or, less likely, whether there was anything wrong with Roger's measurements that I didn't understand.

### The Case Solved

There was to be a happy ending however. Roger had earlier told me that he knew a way to do pathfinding that was yards better than mine, and not only that, but would find the shortest paths for all possible pairs. I asked him if he had experimented with this yet. Then this message came:

A few minutes of experimentation with the 4-letter words reviews the fallacy of my approach. The transitive closure of that neighbors list took so long that I interrupted it after about 10 minutes. Further investigation reveals that most words are reachable from most other words. Starting from every way possible generates too much information (and too much redundant information).

If I had to choose, I'd choose the your original FL approach. JL is not sufficiently faster enough to warrant the extra complications.

This was interesting, but still left me baffled by the anomalous timing we both had seen. But then, the same day, came a new message:

Further ... The up-and-down approach is enough faster after all...

ts 'i=: E FL 704 1407' 11.9041 3.04531e6 ts 'i=: E FL2 704 1407' 2.27514 1.66336e6 ts 'i=: E JL 704 1407' 0.881336 515264

The test he now used, timing for the pair 704 1407, was one that had eighty(!) nine-node solutions. The first test above used my original `FL`; the second used `FL2`, his rewriting of `FL`; the third used my original `JL`. This was very welcome news, but what about those anomalous timings? How explain them? I looked at all aspects of the data and I believe I can now explain it. The anomalous results occurred because the 1st 50 words had no link list longer than 5, and most were 3 or less; the average number of links per node was 1.8; 18 out of the 50 link lists were empty, a very large percentage; thus the timings reflected an atypical set, one in which the up-and-down program couldn't show itself better than the forward search. The best demonstration is to compare the results of a test using `E` which has only 50 items, with tests using `E4`, which has 2962 items. Here are the tests using `E`, showing `FL` and `JL` essentially equal in time and space:

ts 'E FL"1 pairs' 0.94485 1.1991e6 ts 'E JL"1 pairs' 0.90665 1.1991e6

The only change in the next tests is using the `E4` of 2962 nodes. This averages 8.1 links per node, as many as 31; 91 are empty, only 3.2%. The distribution of links in the first 50 items of `E4` is also quite different from that of `E`. There are 155 links, an average of 3.1 per item, and one has as many as thirteen. Only seven are empty. Here are the tests using `E4`:

ts 'E4 FL"1 pairs' 674.971 6.14182e6 ts 'E4 JL"1 pairs' 24.1225 2.28742e6

To me, this is conclusive. With this one change, `JL` is 28 times faster than `FL`, instead of being equal. It uses less space, less than half as much as `FL`.

675%24 NB. time ratio 28.125 6.142%2.287 NB. space ratio 2.69

Just for the purpose of this paper I've made some more tests, using both `E4` and `E5`, with right argument 100 random pairs of nodes, with no node repeated. The results are what I have come to expect to get.

Tests of 100 random pairs from `T4` and `T5`, no node repeated,

2004 03 05

$E4 2962 pr4=: 100 2$200?#E4 NB. 200 distinct numbers from i. 2962 f4 =: ts 'E4 FL"1 pr4' j4 =: ts 'E4 JL"1 pr4' f4,:j4 58.3915 4.19264e6 NB. FL numbers 4.08996 2.68454e6 NB. JL numbers f4%j4 NB. ratio of test numbers 14.2768 1.56177 NB. JL 14 times faster, 1.5 times less space $E5 5604 pr5=: 100 2$200?#E5 NB. 200 distinct numbers from i. 5604 f5 =: ts 'E5 FL"1 pr5' j5 =: ts 'E5 JL"1 pr5' f5,:j5 69.0011 2.84186e6 NB. FL numbers 4.43224 843456 NB. JL numbers f5%j5 NB. ratio of test numbers 15.568 3.3693 NB. JL 16 times faster. 3.4 times less space

### Function Syntax and Use

The versions of `FL` and `JL` I wrote were gone over and tightened up by Roger Hui. Their syntax is:

R =: En FL a,b R =: En JL a,b

Where `a` and `b` are indices of words in some `Tn`. The result is an integer table, where each row is distinct, and the successive values in a row are pairwise links, with first item `a` and last item `b`.

] R =: E4 JL 2182 861 2182 628 617 616 589 810 859 861 2182 1505 1483 1455 812 810 859 861 2182 2619 2608 2573 812 810 859 861

To obtain the desired word ladder, use this result as an index to `T4`:

` <"2 T4 {~ R`

For example,

<"2 T4 {~ E4 JL 2182 861 +----+----+----+ |rips|rips|rips| |dips|lips|tips| |dies|lies|ties| |died|lees|tees| |deed|fees|fees| |feed|feed|feed| |fled|fled|fled| |flew|flew|flew| +----+----+----+

### Forward March

The program `FL` is easier to explain than the more intricate `JL`.

FL =: 4 : 0 's e'=. y u=.d=. ,s c=. ,.s while. -. e e. d do. d=. ; v=. (d{x.) -. &.> < u if. '' -: d do. _1 return. end. u=. u , ~. d c=. ((#&>v)#c) ,. d end. c #~ e=d )

The two word indices are `s` and `e`. In `FL`, these signify start and end, but in `JL` they don't have that mnemonic significance, since they start separate chains. The variable `x` is the links list, some `En`.

Variable `d` is an integer list, initially `,s`. It is used to select boxed lists of potential new links. The first time through it gets all the links of `s`. Next time it gets all the links of those links, and so forth. Variable `u` contains all of the links already seen. No link appears in `u` more than once. Initially it is `,s`. It is used to ensure that no later use is made of a link that has already been used. This is because once a link is used in any step, there is no point to using it again in a new step - any chain with a later appearance of an earlier link must be longer than one with an earlier appearance.

Variable `c` is the table of chains, initially with `s` as its only value. Within the `while.` loop it will be extended. Each of its rows represents a potential shortest path.

The `while.` loop continues until `d` contains `e` as an item; when it does, it means that one or more shortest paths have been found.

Variable `d` is used to select boxed lists of links from `x` . Before further use, each box has removed from it all links that have already been used. These cleaned-up boxes are assigned to `v`, the raze of which becomes the new link selector list `d`.

If `d` is empty, there are no new links, and since `e` hasn't been found, we'll have to admit that there are no paths between `s` and `e`. When this happens, the scalar `_1` is returned.

Variable `u` is updated with all the new links, by appending `d` to it. Because `u` should never contain two appearance of the same link, duplicates are removed from `d` before appending it.

Table `c` is updated by adding a new column, with the items of `d`.

When the `while.` loop ends, a selecting mask is formed by the equality of scalar `e` and list `d`, and this mask is used to remove from `c` all those rows not ending in `e`. This gives the desired result.

### The Angels of God Ascending and Descending

Since `JL` goes forwards and backwards, there are separate variables for the forward and backward sequences. Instead of `c` we have `sc` and `ec` - two chains; instead of `u` and `d` we have `su` and `eu`, `sd` and `ed`.

The `while.` is different - it says, effectively, "while forever", since the loop continues as long as the value of the `while.` phrase is 1. The exiting from the loop will take place by way of `if.` statements.

JL =: 4 : 0 sc=. ,. su=.sd=. ,{.y ec=. ,. eu=.ed=. ,{:y while. 1 do. if. +./ sd e. ed do. break. end. if. '' -: sd=. ; v=. (sd{x.) -.&.> <su do. _1 return. end. su=. ~.su,sd [ sc=. sd ,.~ (#&>v)#sc if. +./ sd e. ed do. break. end. if. '' -: ed=. ; v=. (ed{x.) -.&.> <eu do. _1 return. end. eu=. ~.eu,ed [ ec=. ed ,. (#&>v)#ec end. sc join ec )

The variables ending in `u`, `d` and `c` have the same functions as the `u`, `d` and `c` variables in FL. The first and the last three statements in the `while.` loop have almost identical structure. With two path tables being built in the same loop, the test for termination is by finding that the same item or items appear in `sd` and `ed`. The test for "no path found" is essentially the same as in `FL`, but there are separate ones for `sd` and `ed`. An important difference is that `sc` is built from left to right, but `ec` is built from right to left. This makes joining the two easier.

### Linking the Chains

When the `while.` of `JL` terminates, the fitting together of the two path tables requires some agility. It is complicated enough so that a special join function has been made. It was written by Roger Hui as a rewrite of a `joinends` function provided to me by R. E. Boss when I sent a message to the J forum explaining the problem and asking for a solution.

join=: 4 : 0 x=. x#~ ({:"1 x) e. {."1 y y=. y#~ ({."1 y) e. {:"1 x (({."1 i){}:"1 x) ,. ({:"1 i){y [ i=. (0,#y)#:I.,({:"1 x)=/{."1 y )

The variables `x` and `y` are the forward and backward chains, respectively. We have reached this point because one or more items of the last column of `x` and the first column of `y` match. We want to keep only those rows containing these matching items. The first two lines remove from `x` and `y` all the rows that don't have matching values in them. I'll invent an `x` and a `y` and go slowly through the steps that lead to the desired result.

Here are the two:

x 200 300 400 0 1 130 2 3 120 4 5 130 6 7 120 8 9 130 500 600 700 y 500 600 700 800 130 2 1 0 120 5 4 3 120 8 7 6 120 11 10 9 120 14 13 12 130 17 16 15 900 1000 1100 1200

Only five rows of `x` and six of `y` match. First we remove the rows of `x` that don't end in one of the matching values:

] x=. x#~ ({:"1 x) e. {."1 y 0 1 130 2 3 120 4 5 130 6 7 120 8 9 130

And similarly, remove the rows of `y` that don't begin with one of the matching values.

] y=. y#~ ({."1 y) e. {:"1 x 130 2 1 0 120 5 4 3 120 8 7 6 120 11 10 9 120 14 13 12 130 17 16 15

Now we have to maneuver to get the rows of `x` ending in 120 in line with those of `y` beginning with 120, and similarly for 130. First we compare for equality the tail of each row of `x` with the head of each row of `y`:

({:"1 x)=/{."1 y 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1

This is raveled and the indices of 1s found:

I.,({:"1 x)=/{."1 y 0 5 7 8 9 10 12 17 19 20 21 22 24 29

We convert this into their base `#y` representation.

] i=. (0,#y)#:I.,({:"1 x)=/{."1 y 0 0 0 5 1 1 1 2 1 3 1 4 2 0 2 5 3 1 3 2 3 3 3 4 4 0 4 5

It's another Magical Matrix^[1]^. The first column of `i` is used to select rows from `x` and the second to select rows from `y`.

Use the second column to select rows from `y` in the right quantity and in the right order:

({:"1 i){y 130 2 1 0 130 17 16 15 120 5 4 3 120 8 7 6 120 11 10 9 120 14 13 12 130 2 1 0 130 17 16 15 120 5 4 3 120 8 7 6 120 11 10 9 120 14 13 12 130 2 1 0 130 17 16 15

and use the first column to select rows of `x`, at the same time removing the last column; it merely repeats the first column of `y`.

(({."1 i){}:"1 x) 0 1 0 1 2 3 2 3 2 3 2 3 4 5 4 5 6 7 6 7 6 7 6 7 8 9 8 9

Lastly, stitch these together, and we have the desired result:

(({."1 i){}:"1 x) ,. ({:"1 i){y [ i=. (0,#y)#:I.,({:"1 x)=/{."1 y 0 1 130 2 1 0 0 1 130 17 16 15 2 3 120 5 4 3 2 3 120 8 7 6 2 3 120 11 10 9 2 3 120 14 13 12 4 5 130 2 1 0 4 5 130 17 16 15 6 7 120 5 4 3 6 7 120 8 7 6 6 7 120 11 10 9 6 7 120 14 13 12 8 9 130 2 1 0 8 9 130 17 16 15

And we've matched the three 130s in the last column of `x` with the two 130s in the first column of `y`, giving six rows; and matched the two 120s in the last column of `x` with the four 120s in the first column of `y`, giving eight rows, which makes fourteen rows altogether. This contrived example shows a solution where there are fourteen different ladders with six links each, giving the links we can use to select the words that make Word Ladders.

### Acknowledgements

R.E. Boss provided a workable joining function on the same day that I sent a message to the J forum asking for one. Norman Thompson gave me many ideas about finding paths in graphs. Roger Hui took my ugly ducklings and turned them into beautiful swans, and, by jumping to a conclusion, made it possible for me to understand better the structure of the problem.

### Reference

^[1]^ McDonnell, E. E., The Magical Matrix. *Vector* 20, 2, (2003-08), 122-126.

### Errata

The verb `comb` is used in the article but not defined. Below is a suitable definition, for others see Essays/Combinations

comb=: 4 : 0 c=. 1 {.~ - d=. 1+y-x z=. i.1 0 for_j. (d-1+y)+/&i.d do. z=. (c#j) ,. z{~;(-c){.&.><i.{.c=. +/\.c end. )

The list of 50 words from Eugene's word table `T4` that he sent to Roger Hui was:

T4=: _4]\ (' ',LF)-.~ ]0 : 0 'tis abba abbe abed abel abet abib able ably abut ache acid acme acne acre acts acyl adak adam adar adds aden adit aery afar afro agar aged ages agio agni agog agra ague ahab ahem ahoy aide aids aims ainu aire airs airt airy ajar ajax akee akin alai )

Download all Eugene's zipped Tn files - courtesy of Joey Tuttle.