Instant sharing with the world
About Me
Jots by Kunal Sen

Kunal Sen

I was born in Calcutta and spent most of my youth out there, before moving to Chicago for my Ph.D. I still live in Chicago and work ...(more)

Read My Jots
About me
X
I was born in Calcutta and spent most of my youth out there, before moving to Chicago for my Ph.D. I still live in Chicago and work for Encyclopaedia Britannica. I met my wife, Nisha Ruparel, in high school. Nisha teaches Kindergarten at the University of Chicago Laboratory Schools.
Follow
Tags

A Dictionary for ET

Jun 28, 2010 03:04 AM UTC   submitted by Kunal Sen    Follow Kunal Sen
Comments (4)           
I was in Springfield, Massachusetts, visiting the headquarters of Merriam-Webster, the oldest dictionary publisher in America and one of Britannica’s sister companies. While waiting for a meeting, I was paging through their most elaborate version – The Third International Edition, with almost half a million entries covering three thousand pages. Leafing through the densely printed pages, an old thought came back to me –

What if we make radio contact with an extraterrestrial civilization, and the only thing we can transmit is text, and we transmit the entire text of this dictionary, what can they learn from it?

A dictionary is a strange thing – it defines each word in terms of other words, all of which can also be found in the same dictionary. It is a perfect example of a totally closed system. Without the illustrations, it is as air tight as a closed system can be. With such a system, is there any intrinsic information content? In other words, what can our extraterrestrial friends learn from this huge book? Anything? Something?

What they can definitely learn by analyzing all the sentential structures in the syntax of the English language. There are known techniques to derive the syntax of a language from a large collection of sample sentences. The dictionary is full of sample sentences. Moreover, it has definitions of each and every word used in the dictionary. Therefore it should not be too difficult for an intelligent race to figure it out. With this knowledge the aliens can write an endless variety of perfectly correct English sentences. The question is, will they know anything about what they mean? Most probably not, since there is no clue in the dictionary to figure that out. The illustrations could have been a clue, even just a few of them, but that was not part of our transmission. The closed system has no leaks through which the real universe can enter the closed world of tangled words. If we include the page numbers then it is almost certain that they can figure out our number system.

Taking it a step further, let’s say we transmit all the English language books in all the libraries of the world and just to make sure we got it all, let’s also add the entire web – once again, just the text and nothing else. Will that give them any more to work with? Of course now they have everything we have ever written in the English language – all of our literature, science, religion, philosophy, history, plus the mountain-load of web content we are creating everyday, including this blog post. But still, with no external clues, our alien friends may be able to write flawless English now, and this time the text they produce will not only be grammatically correct, but through clever statistical analysis of the vast collection, they may even be able to write more “meaningful” and better quality English. But still they will probably have no idea what they are talking about.

Let’s imagine we extend it even further by including all text written in all human languages, including all the side-by-side bilingual books and bilingual dictionaries. Now they may be able to form the grammar of all known languages, and even be able to translate a piece of text from one language to another. But still they probably won’t understand a thing. It will not be too different from the automated translators that we use on the web – it does translate, purely on the basis of logic and statistics, without any understanding of the content.

However, if our text included mathematical texts, then it should be possible for them to get some very significant clues. A school arithmetic text that includes a few equations like “2 + 3 = 5” would let them figure out our number system and the meanings of the mathematical operators. This is so not just because the mathematical language is very precise, but because mathematics deals with universal and self-consistent truths. With that starting point, it is not only possible to figure out the rest of our mathematical literature, but it may provide clues into some of our English language words that are often used in mathematical texts, such as “if”, “then” etc. Like rock climbing, once you have a toe hold, it is possible to conquer a lot more.

If my conjecture is correct, then this is a bit counter-intuitive.  The sum total of all the text we have collectively produced over the ages does not add up to anything more than a gigantic closed system with no real information value outside of this closed system. It is also interesting to contemplate the opposite scenario. If we receive a massive amount of text from somewhere else – a very long series of symbols, we may not be able to extract any real semantic meaning out of it other than the syntactic structure of the language. It is difficult to imagine that with all our intelligence and ingenuity, and all of our code breaking skills, we would still fail to make any sense of anything. What makes code breaking possible is come common experience between the writer and the reader. In our scenario the only common experience are universal truisms such as mathematics.
 
Comments (4)
Rahul Bose   on Jul 30, 2010 12:01 PM UTC
Kunal,you've opened a Pandora's box. As you say it is counter intuitive but as you put it, very knowledge specific like maths. What might be missing is the cultural link.... I'll have to ponder over it though, but my experience with tribal kids sometimes make me wonder over the 'notion of knowledge' , whose knowledge and what knowledge and how do you actually disseminate knowledge.
Anonymous User   on May 31, 2011 05:18 PM UTC
I encountered the same problem when I gave a try to learn the German language. Given that just like any language it is a close system, the only way someone like me can access its meaning is if the meanings of the words are translated to me in either Bangla or English. But the philosophy of language-teaching at the Maxmueller Bhavan is that they want the students to learn German as a child learns it, without reference to any other language. I suppose they believe that if one uses any other language as a medium that will spoil the essence of the German language. But of course it is not possible to teach a language without using SOME medium, so the teachers used the means of body-language and pointing out objects to us to explain the meanings of German words. It was a very strange experience. I lost interest after the first semester.
Kunal Sen   on May 31, 2011 09:36 PM UTC
In this case the system is not closed because of the hand gestures and pointing at known objects. If the teacher was placed inside a black box and was not allowed to make any reference to any real objects (e.g. say "nicht" only after sunset) then it would be impossible to teach anything at all.
Dulali Nag   on Jun 1, 2011 07:04 AM UTC
No, it is NOT. That was my point. Communication and language teaching would have been impossible if they had adopted a completely closed system. Their ambition, that the students would learn the German language as a baby who is growing up in Germany just doesn't work. They HAD to take recourse to a shared system of meaning by pointing at objects and by using body language. Without that there would have been to dissemination of any meaning at all. I narrated by experience because it shows how communication DEMANDS some a priori shared meaning, a platform on which it can then unfold and carried forward.
Add your comment: