?sdrawkcab ti gnitteg ew erA

So, Voynicheros know well enough that the composition of Voynichese words is governed by a set of rules which determines which characters are allowed or required to follow which. These rules form either a kind of “grammar”, or a set of building blocks which make up the VM words, depending on which way you look at it. Of course everybody has their own ideas of what these rules actually look like, from the Core-Mantel-Crust paradigm of Jorge Stolfi to the feeble attempts of yours truly. As usual, there is no universally accepted way what the grammar actually looks like, and for every set of rules there appears to be any number of counter examples.

Either way, it appear that there are “mandatory” letter groups in each word, and “optional” letter groups, and, at least as far as text in Currier B language is concerned, it appears to me that the optional groups precede the mandatory groups in each word — in other words, an optional start group is followed by a mandatory end group.1

Now, combine this with the observation that Voynichese entropy is below that of natural languages, ie the “predicitability” of the next letter in a word is higher than would be anticipated2 This in turn indicates to me that a Voynichese “word” is not equivalent to a plaintext word (as I have pointed out in my Face value fallfacy page), but rather “less” — a number of letters only? (This would fit in with my Strokes theory, in case you haven’t noticed.)

Let’s assume for the moment that each VM “word” represents two plaintext letters — one is enciphered in the start group, and one in the end group — but the letters are sorted backward. This backward sorting could happen for the whole text (the whole paragraph, the whole line, …), or it could happen only within one VM word. So, for example, let’s encipher “voynich” this way: We reverse letter order and group the letters into “chunks” of two. The cipher sequence could in this case be

hc in yo v (reverse word-wise)

or

ov ny ci h (reverse chunk-wise)

From here we proceed with individually enciphering each letter into one Voynichese group — start group or end group–, however that is actually done.

But we notice that, since “voynich” consists of an odd number of letters, beside three groups of two-letter chunks, there results also one single-letter chunk. This is in accordance with our observation (hoorah!), but unfortunately it doesn’t explain why the end groups should be any different from the start groups (boo!), as both should be governed by the same set of rules. It only explains why some words consist of two groups, and some of only one.3

But what, if odd and even letters in the plaintext were enciphered differently? For example, the plaintext letters could be converted to CaMeLcAsE before enciphering. In the case of the chunk-wise reversal (second example above) that would render

oV nY cI H

And with one final leap of faith, namely, that capital letters are enciphered somehow different from lower case letters, ta-dah, we’re there: We have words of one start group and one end group, and words of only an end group, and the start and end groups are disctinctly different from each other. Note that the difference doesn’t mean that start and end groups are made up of completely different character sets. Rather, there is considerable overlap between the VM characters used in start and end groups. The difference lies in the way of possible (“legal”) arrangments/sequences of the characters.

This could be explained in view of the strokes — some strokes occur only in uppercase letters, some occur in both upper and lower case latters. Slashes “/” and backslashes “\” for examle show up mostly in capital letters (“M”, “N”, “W”, “V”) and rarely or not at all in lower case letters. A vertical line “I” will show up in both sets of letters (“B”, “D”, “E”, … as well as “b”, “d”, “l”, …) , but more frequently in upper case. A tiny circle “o” is mostly relegated to lower case letters, and so on.

And, once more here I’m at the end of my wisdom. This all looks fine and reasonable to me, but how to put it to the test? The “forward” way of synthesizing the way the plaintext letters were broken up into their constituent strokes, and how these strokes were in turn substituted with VM letters, offers too many alternatives. The “backward” way of deducing what exactly constitutes a start and an end group, is difficult — lots of people have tried, but none seems to have come to a convincing conclusion.

Hm.


P.S. — And just when I hit the send button, it occured to me — letter reversal isn’t even necessary. As long as you make sure that the last letter in a word is enciphered as a capitel letter, you’re fine. In other words, encipher your plaintext such that it is “lower case — upper case”, and only at the very last chunk make it “lower — upper” if there are two letters left to encipher, or only “upper”, if it’s only one letter.

  1. This determination was mostly made by “gut feeling”. I haven’t given it a thorough statistical check. ↩︎
  2. This can be attributed to any number of reasons, starting with a faulty recognition of the VM character set. ↩︎
  3. One could ask: Why even bother with the single letter chunks? They could be introduced to denote plaintext word boundaries, but obviously they would fulfill that purpose only in 50% of all cases. ↩︎

Margin of Error

A little while ago I presented my “master plan” on how to crack the VM, and top priority on the list was a clear definition of the character set that was used when writing the VM.

Obviously, all other methods of analysis must follow this definition, any mistake in defining this set will have grave consequences, and the fiendishness of the VM lies — among other things — in the very fact that we are in the dark of whether two similar letters are identical or distinct, and whether they even are two letters, or a single one.1 Obviously, any statistical attack on the VM will depend on the accuracy of those — let’s face it — educated guesses. We can only answer how frequent individual letters are if we know which individual letters exist, and the adage that the VM words show less entropy and are more “predictable” in their letter sequence than natural language words will depend on the assessment of how many letters there really are.

As an example, take this screen shot from f103r. (You can peruse it for yourself in context with Jason Daviers’ Voyager at the coordinates https://www.jasondavies.com/voynich/#f103r/0.275/0.391/5.00)

In EVA transcription, the top two words will read <okain ShcKhody>, the two words in the row below will be <ShecKhy lchedy>. The word in the third row is <okeal>.

We find that the second word begins with two (barely) connected “c”-shaped letters with a teardrop above them. The same combination is found at the beginning of the first word in the second row. The connected “cc”s are repeated in the second word in the second row, but without the teardrop. Once a “lone” “c” follows the “cc” with the teardrop (first word, second row), and once the “cc” without teardrop.2 In the center of three words there is a gallows character extending far above the baseline (<k> or <K> in EVA3). In the first word, first row, it stands alone, but in the second word in that row and in the first word on the second row, the gallows character is “flanked” on both sides by (and connected to) “c” shapes.

So, what does this all mean? Are all these “c” shapes really just the same character? Are the two connected “c”s a ligature of two indivdual “c”s, or something different? The situation is similar to handwriting of the Latin alphabet, where two connected vertical archs could mean “nn” or “m”, or “uu” or “w”. Is the teardrop above the “cc” just a modifier of the “real” “cc”, or does it make the assembly a completely new character? The first case would mean the two relate to each other like vowels to their umlaut (“A”/”Ä”, where the differences are negligible), in the second case they would be two completely unrelated entities (“O”/”Q”). Likewise, is the gallows flanked by the “c”s really just a sequence of “c”, “gallows”, “c” (and thus three letters) or is it one letter only, something completely new and unrelated to the standalone gallows? We don’t know, and the number of possibilites is what makes the VM so maddeningly frustrating.

But. There might be a crib into the system. Namely, we have a number of “key” sequences in the VM where there are strings of (what appear to be) isolated letters, mostly in the page margins. Could we analyse those in the hope that they will give us a hint to the actual character set?4 Let’s have a look at what we have:5

  • f49v: A column of letters to the left to the body of the text, one letter per row of text body, 26 characters. Arabic numbers 1 to 5 to the left of the “keys.
  • f57v: Four concentric rings with inscriptions. Counting from the center, rings 1 and 3 contain (mostly) only what appear to be individual letters, rings 2 and 4 contain combinations of letters and words. The sequence in ring 3 is repeated several times.
  • f66r: Similar to f49v, but 35 letters. To the left of that there is a column of complete VM words.
  • f69r: At the center of a star-shape (diatom?), five slots with one letter, one slot with two characters.
  • f75v: In the top right of the page, a column of six letters (five different) next to the body of the text.
  • f76r: Nine letters in a column to the left of the body

All but the last two occurences of the “keys” were written in Currier language A.

Here is my overview over which characters appear in which key sequence:

There are also a few special characters and sequences to be found within the keys:

The special characters are most prominent in the f57v-ring sequences:

This is a magnified section of ring 3 on f57v. From left to right it shows: the “inverted v” (<v>), the “picnic table”6 (<x>), three gallows characters, the “reclining figure”7, two more characters and the “switch” character.8

A few observations here: (As usual with the VM, these are much more questions than answers.)

  • If you just spell out the letters of a key, they don’t render typical Voynichese words. In other words, the sequences are not simply regular VM text written top to bottom.
  • One thing which immediately struck me is the odd distribution of letters. Many very frequent characters in the body of the VM are poorly represented in the keys, and vice versa. Unfortunately I don’t have the slightest idea what this might mean. Maybe the keys are really “titles”, and the special characters are ornamental “initials” of sorts? But the keys aren’t displayed very prominently and are sometimes hidden with in the body of the page (like f69r and f75v), so they don’t stand out well.
  • The gallows characters in general exhibit a behaviour distinct from “regular” characters. Here, at first glance they appear like any other letter. On f57v they are drawn in very peculiar shapes though.
  • What are <air> and <aiin> doing here? Are they really individual characters, rather than composite words?
  • Letters <c> (“c” with a long top stroke towards the right) and <S> (“c” with a teardrop) appear isolated here. This is paritcularly odd, because outside the key sequences they’re mostly confined to combinations with <h> (a “c” with a connection to the preceding letter). <S> resembles <s> (which looks like Latin “S”.) If they are really the same, it would explain why <S> should show up standalone in the key sequence, but it would not explain why it’s otherwise paired with <h>.

What to make of it? I don’t have the slightest idea. After a thorough perusal of the key sequences, I’m still further from arriving at a definitive glyph set than ever, and the questions have multipilied rather than gone away.

I’m still intrigued by the Stroke idea, which basically says that each Voynichese letter does not represent one letter from the plaintext, but one penstroke of a plaintext letter — namely, a vertical line, a small circle (“o”), a dash, a slash or such which, when combined, render complete letters again. In that case, the rare VM letters (like the reclining figure or the switch) might represent graphical elements which are very rare in regular text, like a degree symbol “°”, or a caret “^”, and would be used only alongside rare plaintext letters. But why should these show up dominating the key sequences?

Your thoughts?


  1. Compare this to the ciphers of the Zodiac killer or even the Rohonc codex, which at least don’t present that problem. ↩︎
  2. In EVA, the curious case of the connected “c”s is treated thus:
    A single “c” is transcribed as <e>
    Two connected “c”s are transcribed as <ch>
    A single “c” with a teardrop is <S>, and
    Consequently the “c”-teardrop-“c” combination is rendered <Sh>. ↩︎
  3. I’ll use angled brackets and bold typeface for characters in EVA representation throughout. ↩︎
  4. Under the explicit assumption that these “keys” were not later additions from different scribes like the rest of the marginalia, but “original” emendations by the VM author. Only if these keys are authentic then may we assume that they’ll hold relevant information. ↩︎
  5. As always, Julian Bunn also has given this topic a brief treatment in his book “Puzzles of the VM“, p. 24ff ↩︎
  6. Despite being comparatively rare, there are possibly different “sub-shapes” of the picnic table, namely “walking left” and “walking right”, depending on the direction of tiny serifs at the bottom of its legs. You can find both variations towards the bottom of the key sequence in As usual, this could be different characters, or it could be simply variations in writing. In the latter case this would be odd, because it could mean that there was one left-handed and one right-handed scribe at work even within the same key sequence. ↩︎
  7. That’s the name given by Julian to this character. I’m pretty certain he was inspired by the Mayan Chacmool statues.” ↩︎
  8. AFAIK the reclining figure and the switch are so rare that they don’t even have their own EVA representation. ↩︎

A simple verbose cipher

Lately I was invited to hold a presentation for lay people about the Voynich manuscript, and I felt I needed to explain the concept of “verbose ciphers”. Impromptu I came up with a little cipher idea:

  1. Write down your plaintext, like
    ATTACK AT DAWN
  2. Introduce a space after each of the plaintext letters:
    A T T A C K A T D A W N
  3. Into the spaces, write the letter which is next in the alphabet to the preceeding letter:
    ABTUTUABCDKL ABTU DEABWXNO
  4. Run this through a Caesar cipher1 (in this case with a “+7” key), and voila:
    HIABABHIJKRS HIAB KLHIDEUV

Deciphering is super easy, because you simply discard every other letter from the ciphertext, and apply a reversed Caesar cipher to what is left.2

This scheme has a number of interesting properties:

  • It reduces entropy (“information content”)
  • It increases word length
  • It smears frequency distribution, since the relative frequency of any letter is now its “original” frequency plus that of the letter preceding it in the alphabet, divided by 2
  • It would have been well within the scope of a medieval cryptographer

I’m not sure if I’m onto something here. The above features certainly sound familiar to Voynicheros, and what I particularly like about it is that a 15th century author could have come up with an algorithm like this, and it requires no stretches of the imagination on that part. OTOH, a lot of features of the VM remain unexplained. (Word-initial/-terminal letter sequences, gallows, strings of similar words, …)

Your thoughts? Is this algorithm already known, and was I just unaware of it?

  1. In my class last year my topic was “Cryptgraphy in the middle ages”, so they better had paid attention. ↩︎
  2. In the case of the VM, you wouldn’t even have to do that, because the use of the cipher alphabet would already have done that for you. ↩︎

Stories of a middle-aged man

Last weekend I attended an event of my local medieval reenactment group, Knights Crossing, which is a chapter of the international Society for Creative Anachronism. Aside of having the honour of heralding court for our baronesses Swanhilde and Anke, I also held a little class on the Voynich manuscript, which I used to gather the current state of research into the manuscript, and also tout my personal pet, the Stroke theory, again.

Here is the slide set, I hope it’s comprehensible even without my running commentary. (Within the SCA, I’m known as “Agilmar”, so this is where mention of the name comes from.) Enjoy!

Thanks to my wife Sina for cleaning up the mess I had my of my PPT!

Voynich’s Third Column

And again, despite my constant pledges to be more active, it’s been some time since my last post, mostly because I was kept busy by mundane affairs.

Nevertheless, despite my respite, the world hasn’t stopped turning, and I lately came across the blogpost by Lisa Fagin Davis who announced that finally, after about a decade of delays,1 the multispectral images taken from the VM have been released into the public!

Read Lisa’s article by all means, since it contains a wealth of information. One feature which struck me is the examination of f1r, the first page of the VM, which contains the notorious “de Tepenecz” signature. Less attention has been given to some other marginalia on this folio, namely two columns of extraneous writing on the outer edge of the folio: One is a sorted list of letters of the latin alphabet, the other is a list of different letters from the VM’s character set. These lists have been rendered partly illegible, and the general assumption is that they are remnants of some (hapless, as we all are) sod’s attempts at deciphering the VM.2

Top right of f1r with beginning of the column of latin letters, and the (barely legible) first VM letters in visible light

All of this is nothing new and short of spectacular. But what the multi-spectral imaging reveals is that there is actually a third column of letters, invisible to the naked eye.

f1r after multispectral enhancement. The third column of letters is clearly visible. (Sorry for the inconvenient format.)

This is another sorted list of latin letters, shifted by one place compared to the first one. (A Caesar cipher comes to mind.) This in itself is not spectacular and only shows the tenacity of our decipherer. But in a greater context if has implications for Rich Santa Coloma‘s theory that Voynich himself forged the manuscript.

Rich has over the years gathered circumstantial evidence to show that Voynich could possibly be the real author behind the MS, and has shown how it could have been done. In his latest blog post, he has tackled the question whether Voynich could have been behind the letters which make up the provenance of the VM.3 At the same time, there is no “smoking gun” which would clearly point to a forgery — the C14 implies that the vellum was spot on for the early 15th century provenance, and Voynich made no mistakes in mixing his ink, although he couldn’t have known at the time that there would be later examining technologies for that. So, at the very least Voynich made some lucky shots in composing the VM.

But the “third column” in my understanding poses a real problem for the forgery theory — the question is not so much why Voynich would have included it, but why did he make efforts to efface it later on?

A reason for the inclusion is readily conceivable, namely the same as the first two columns, to make it look like someone had tried to crack the VM, and make the MS appear older than it actually is. But why eradicate part of the marginalia?

They contain no damning information that would have exposed the fraud, no internal inconsistencies, nothing incriminating — it’s just a string of letters. If the VM is genuine, then it is plausible that different parts of the page were exposed to sunlight/chemicals/… whatever differently over the course of their six century run, and that parts have faded, and some have been completely obliterated. But this is much more difficult to explain for a forgery. Voynich may have experimented with different procedures to make his forgery look weathered, but it’s a stretch to assume he did so with the final VM pages, rather than with a test scrap.

It boils down to the one problem I have in general with the idea of a Voynich forgery, and that is the incomprehensible text.

If I wanted to forge and sell an MS, I would make it spectacular. If it’s supposed to be a Roger Bacon manuscript, make it abound with steam engines, telescopes, and prophecies, everyting to attract the curious eye.4 Make the illusion perfect by putting this information in the text as well. But what is the point of devising an uncanny scheme of enciphering instead which over a century later still hasn’t been cracked?5 Or, on the other hand, let the pictures do the talking, and don’t bother with the text, but simply write down gibberish — but for that, the VM text is much too structured, with all the intricate mechanisms we have observed in the structure and distribution of words.

And I have the same feeling about Voynich’s “third column.” It can be explained how it came to be there (even in the case of a forgery), but it’s difficult to establish a motive for Voynich to put it there. Which is why I think that the third column is a serious problem for the forgery theory.

  1. I sense a conspiracy theory in the making. ↩︎
  2. One observation by Lisa is noteworthy in this context, namely that the VM letters appear in no particular order in their column — which is peculiar, because there are a few “lists” of different VM letters throughout the manuscript. If this truly is a deciphering attempt, then why did our wanna-be codebreaker not start from one of those given lists, assuming they were “alphabetically” ordered? How did he arrive at the presumed correspondence of VM and latin letters? ↩︎
  3. Which is where I know very little about the VM, and here I still owe you a reply, Rich! ↩︎
  4. While we do have a good number of odd images in the VM, none of them unequivocally point to later era technologies. ↩︎
  5. It’s just conceivable that Voynich bungled here — he could have accidentally invented a “one-way enciphering” mechanism where the plaintext simply isn’t retrievable in an unambiguous manner — anagramming mechanisms come to mind. So while the information would still be there, it would be as unobtainable as if hidden in a Black Hole. But that blunder would mean that Voynich went on such an ambiguous enterprise without once testing whether his enciphering scheme actually worked both ways. It also leaves open the question why nobody has been able to even understand Voynich’s enciphering mechanism. ↩︎

Julian Bunn’s Introduction Booklet

In the course of my renewed interest in the Voynich, I browsed also through my old notes and literature. On my bookshelves, I was reminded of Julian Bunn’s small booklet, Puzzles of the Voynich Manuscript.

It’s a most readable introduction to the Voynich from the point of few of a new acolyte who wants to tackle the VM. Julian covers most of the relevant features of the VM in a concise and easy-to-grasp format, so you will know what you’re up against if you attempt any decipherment. Checking off this list of features will also help you in a first “sanity check” if you have come up with an idea — if your theory doesn’t acommodate for the observed effects, it probably won’t hold water.

At the same time Julian isn’t championing any particular idea, but in a most unbiased manner presents ideas and arguments from other people and weighs their pros and cons. If one had to find any flaws with it, I’d say the layout could have done with a little more TLC. The font is a little too big, the pictures are a little too small, the pagination sometimes is awkward. At A4 the book is a bit unwieldy, a more compact pocket format “vademecum” might have served better, but YMMV. Quite obviously Julian prioritized content over presentation, which need not be a bad thing.

Overall, get the book. You won’t regret it, it’s a short read of highly concentrated insight to kickstart your VM research career.

You may obtain it from Amazon through the link above, for free as a digital version, or for a small obol in printed form.

Putting the Cart Before the Horse

Lately I complained about the fact that all the “old guard” of the VM apparently had quit their occupation, but I found out that I have thoroughly misunderstood the situation. The venerable Julian Bunn for example is still as number crunching as ever, and the last time I checked on his blog I simply had the bad luck of visiting him when he was in a period of inactivity. Likewise the highly esteemed Rene Zandbergen has published a new paper about the VM, and both seem to be working on a method of recreating the VM content semi-automatically by use of a number of wheels containng “syllables” which are constantly re-arranged to form VM words — a “Voynich slot machine”, if you pardon the expression. So, their work goes in the same direction as mine, though Rene’s approach appears to be more focussed on a three-component approach to word composition, rather than the two-component of Robert Firth and me.

Regardless, both approaches try to compose the VM vocabulary from a limited number of building blocks. When perusing Rene’s essay, it occured to me that it might be helpful “to put the cart before the horse”.

What if we tried these composition methods to see which of the VM words it could compose do not actually occur in the VM? Would we gain insight from that? Robert’s method, taken naively, will only compose some 500 different words (compare to some 3000 different words in total in the VM, but with many being very rare). Are all 500 of these frequent in the VM? Is it only 250, and the other 250 are never used in the VM?

If there were two sets of 25 building blocks, they could compose between them 625 different words. Are 500 of them frequent in the VM, and the other 125 non-existant? If we also measure the frequency of two-letter combinations in a natural language (say, Italian), and find that 125 are “forbidden” and don’t occur,1 do we have a strong indication that each VM word represents two letters from an Italian plaintext…?

It hadn’t occured to me before that checking for exclusion rather than inclusion might be useful.


  1. Eg in German, “c” is almost always followed be “h” or “k”, but not by any other characters. ↩︎

Would Elias Schwerdtfeger come to the courtesy phone, please?

Elias, if you read this (or if anybody else knows his whereabouts): I lately tried to access your wonderful and convenient Voynich transcription extractor (http://vib.tamagothi.de/index.php), but while it seems to still be up, it seems to have issues with the separation of Currier languages. (My guess is, the corresponding php script is no longer supported.) Can you fix this?

Or does anybody know of a comparable tool which lets you filter for certain VM transcriptions? Any help welcome.

Time for an Inventory

It’s that time of the year again — Christmas season. I’ve been very quiet on this blog for some time, but I’ve lately picked up some interest in the Voynich again, and I thought I’d use the opportunity to make an inventory of sorts — where do we stand, where do we want to go, and how will we get there? Other things aside, I opened this blog in January 2009, so it’s been around close to 15 years, and perhaps it’s time to reflect.

Oddly enough, since the Hi-res scans from Beinecke were available and in the wake of the McCrone analysis of the VMs phyiscal properties, very little progress appear to have been made. IIRC I took an interest in the VM, and the mailing list back then was brimming with ideas and competent analysis, but AFAICT this has ebbed over the last years. The voices of Rene Zandbergen, Julian Bunn, Jorge Stolfi, Marke Fincher and many of the others of the “early days” of computer power being available for number crunching seem mostly quiet these days. Rich SantaColoma is still on his theory that the VM is a fake by Voynich himself, but AIUI he’s also kind of stalled in finding definite proof for this assumption. (Or did I miss breaking news, Rich?) Other than that, the same old ideas are being re-hashed endlessly, namely, that the VM is in one way or another the phonetic transcription of some kind of spoken language, usually Proto-Etruscan with a hint of Rongorongo, and the VM contains a selection of cocktail recipes for the Antipope’s personal barkeeper. Or some such thing.

This is even more frustrating since the very first thing you notice about the VM writing is that it is obviously different from all know natural languages. (At the same time, it can be transcribed so that it is almost pronouncable — which is what Rene and Gabriel Landini did with the EVA transcription–, and this is a most remarkable thing, but nobody seems to have taken up the hint.) So, the people running down that alley simly didn’t even invest the time and effort to get their basics right, and I’m under the impression that all the serious researchers have given up by now (or at least continue their work in seclusion), and the field is left to the clueless.

Personally, I think it was a early as 2010 that I had the idea that the VM’s encipherment may be based on the graphic dissection of the source letters. Namely, the letters of the plaintext would be decomposed into their original graphic elements (lines, circles, arcs), or correspondingly into their penstrokes. Thus, the letter “A” would be disassembled into three elements, namely a slash “/”, a horizontal hyphen “-“, and a backslash “\”. These three elements could be reassembled into the original plaintext letter: “/-\”. Now all the VM author had to do was assign to each element one ciphertext letter and write that down: “/” might be represented by <q>, “-” by <o>, and “\” by <c>, so the plaintext letter “A” would turn out “<qoc> in the VM ciphertext. And of course the graphical elements would turn up in different letters of the plaintext alphabet. “V” might be decomposed into “\/” and “W” into “\/\/”, so the ciphertext for “V” would be <qc>, and that for “W” <qcqc>. (Read the whole lengthy story here.) This scheme would have accounted for a large number of features of Voynichese, though not all of them.

In any way, having built upon Robert Firth’s observation, I ran into a dead end. The scheme as suggested leaves open a huge number of degrees of freedom and ambiguities.1 Most naturally, we don’t know the source language of the VM, hence we can’t directly apply anything like frequency counts to the statistics. There are many different ways to decompose letters, and there are many different letter fonts you can chose from to start with (compare printed letters and cursive). So the apparently easy tasks of finding the 50 or so building blocks which make up the majority of the ciphertext leads to a virtually unlimited number of possible combinations one would have to try. So, I wasn’t even able to verify where the Stroke theory was the right idea or whether I was barking up the wrong tree. And even if a Voynich fairy had told me, Elmar, you’re on the right track, I wouldn’t have known how to go on from there. This is where I stalled.

Anyway, maybe with a fresh look at things, I’ll finally have the grand idea which allows me to write a few lines of code, let the computer crunch away at the numbers for a few hours while I recline in my chair, and then let me reap success, fame and fortune. So, what needs to be done to reach that stage? Here are a few steps which I might tackle in the coming year, and see where this leads me:

  1. Check the VM character set. Of course, this lies at the heart of everything. Though the Stroke theory should not respond too sensitively to errors in the character set (thanks to Rene’s and Gabriele’s foresight, and such errors could be worked around), of course it would be best to set out with the correct set of characters from the start.
  2. Define a vocabulary for working with the Stroke theory. I feel in the past I have confused people tremenduously with the assumption that everybody uses the same definitions for terms like “word”, “token” or “syllable.” I need to clarify the vocabulary to be able to lead meaningful discussions with people.
  3. Check Robert Firth’s assumptions about the building blocks. Hitherto I accepted the “set” of building blocks Robert claimed would constitute the majority of the VM text. But it certainly wouldn’t hurt to see if this is really the optimum building block set. Likewise, it would be reasonable to extend Robert’s work, which has been done on Currier language “A” to my knowledge, to Currier “B” as well.
  4. Check two-letter statistics. Under the original Stroke theory2, VM words can represent any number of plaintext letters. Robert claimed that most VM words represent a single or two plaintext letters.3 So it would be reasonable to compare plaintext language statistics of two-letter groups with the statistics for VM word. If, for example, “st” and “rt” turned out to be the most frequent letter pairs in a certain language, and if the most frequent VM words were <qocheedy> and <qokcheedy>, it would be reasonable to assume that “s”, “r” and “t” were represented by the Firth blocks <qoch>, <qok>, and <eedy>, respectively.4 (Of course, not knowing which is the VM plaintext language complicates matters here once more.)
  5. Do more menial labor. Yes, this is what I like to do least. But perhaps I should finally get around to delve into the chores of finding out what the Stroke theory means for the VM language. If there is something to it, what can we deduce from that fact? For example, if we assume that a vertical line “|”, a horizontal hyphen “-“, and a circle “o” were used in the “(de-)construction set” it would immediately follow that sequences around “|” should abound. “I”, “P”, “B”, and “d” could be disassembled to “|”, “|o”, “|oo”, and “o|”, while “L”, “F”, and “E” would turn to “|-“, “|–“, and “|—“. The latter three groups could be equivalent to <in>, <iin>, and <iiin> (just in reverse order), but what about the others? Can they be found as well?

So, this might actually be my homework for the next few weeks and months. Let’s see where we’ll arrive at.


  1. E.g., as you see above, both “VV” and “W” in plaintext would lead to <qcqc> in the ciphertext, and it would be impossible to resolve this upon deciperhing, though there would be ways around it. ↩︎
  2. It would probably better to call it the “Stroke hypothesis”. ↩︎
  3. This would also fit in with the observation that most Currier B words can be split up in one mandatory and one optional part. See Grammar. ↩︎
  4. Please note that this and all other examples here are completely arbitrary, and were just the first things that came to my mind. I don’t expect any of this to hit on the actual truth. ↩︎