Have you ever stared blankly at a block of text wondering what language it was? With the Internet as powerful as it is nowadays you can translate almost any language to any other language lickty split. But there is one caveat – you need to know what language it is to start with. SO how can you identify language of text?
I use Yahoo’s Babel Fish almost daily translating languages or web pages. But if I don’t know what language it is to begin with I am out of luck. Plain and simple. I have tried may things to identify language of text over the years. I have Googled individual words or tried looking them up in multi-language dictionaries but this is hands down as simple as it can get.
If I was Yahoo I would look into buying this technology ASAP!
Alrighty then, let’s check out how it works. The software is called PolyGlot 3000. A very nifty name I might add – meaning multiple languages.
This language identifier application is only a 2.2MB in size and runs on Windows 95, 98, ME, NT, 2000, XP or 2003.
Simply fire it up when it finishes downloading and you will see this:
It looks pretty straight forward… You type in or paste your text that you want to recognize and hit that magic “Recognize Language” button or the F9 hot key and bingo bango your language is recognized:
It came back with an answer super quick and the answer was correct. 62% accuracy. Not bad with a few little sentences. Let’s try another language. Do you know what it is?
I hit F9 and Polyglot not only knows that it is Russian, it is 100% sure of it and even specifies a more specific dialect as being Pre-Reform.
This is pretty damn impressive. There is only one real preference or option you can modify. That is the amount of languages it is using to compare your text or document. Let’s take a look at it:
By selecting less languages you can speed things up a bit. Even though in all my tests I did not have to wait more than 30 seconds.
But I guess if you have wild foreigners yelling at you, time is probably of the essence :)
According to their website over 400 languages are supported:
The current version of Polyglot 3000 distinguishes 474 languages and dialects. This is biggest number of recognized languages for a language identification software to date.
Among the more than 400 supported languages only about 110 languages can be called popular. The others are very rare or even already extinct.
One of the most rare and, unfortunately, dying out languages is Pipil. In 1970 there were about 40 persons who spoke on it. Now only about 20 persons remain.
Another rare language is Yukaghir which about 170 persons speak. The Yukaghir live in the northeast of Russia, in the Republic of Yakutia, above the arctic circle. One of developers of the Polyglot 3000 lived near that region for a while.
In this list you can see all supported languages. Some languages have several possible names which differ in spelling, but coincide in pronunciation. In the given list all variants of the name are listed wherever possible.
Do you use something similar? How do you get your translating or language distinguishing on? Let us know in the comments kiddies!