What Language Is This? 5 Tools to Identify Unknown Languages

Ads by Google

이 웹사이트에 환영. 이것은 보기 원본이다

What language is this? Chinese? Japanese?

It’s Korean actually. Detecting this manually would have taken me a lot of time. Fortunately, I found some very accurate tools that can do this automatically. They are all listed below.

The experiment: I tested the websites using sample text (1-2 sentences with 8 words) from the following languages: Portuguese, Russian, Korean, Vietnamese, Italian, Turkish, Polish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Dutch, Filipino (Tagalog), Greek, Galician, Czech, Belorussian, Finnish, Tatar and Norwegian.

Overall, I tested 20 different languages.

3 Tools to Detect Unknown Language Text

1. LangId (passed 18 out of 20 tests, didn’t pass Tatar and Belorussian)

lang-id

Pros: Overall, great online tool. It offers basic text detection functionality and they also have Twitter and email-detection bots for even quicker results.

Cons: Their engine is based on Google API but they seem to have better results than the Google detector described below. It seems they know how to utilize things very well. I didn’t like that they don’t have their own unique algorithm to detect languages.

2. Google Language Detector (passed 17 out of 20 tests, didn’t pass Portuguese, Taglog and Belorussian)

What Language Is This

Pros: Google has one of the world’s best API for language detection. They good thing is you’re able to see the probability of the result they display being true. They were able to pass most of the sample tests.

Ads by Google

Cons: I was quite surprised they didn’t pass the Portuguese test. It seems they have a (I hope temporary) bug with this language. Also they can surely do a better job in making the page design better.

3. What Language Is This (passed 11 out of 20 tests, didn’t pass Russian, Korean, Ukrainian, Azerbaijani, Macedonian, Tagalog, Greek, Galician and Tatar)

what-language-is-this

Pros: Some languages like the South Slavic ones (Serbian, Croatian, Slovenian) are quite similar. In case you enter some Croatian text, let’s say, this website will tell you that the text could also be Serbian or Slovenian.

Cons: They need to work on making their detection system more sophisticated. I was thinking of putting Translated.net (another website for language detection) instead of this one, but Translated promised detection of more languages and actually did worse than WhatLanguageIsThis.com.

2 Tools To Detect Websites In Unknown Languages

4. Google Translate with Detect Language as the first option

Passed: 18 out of 20, didn’t pass Belorussian and Tatar.

Pros: This tool does its job very well. The thing I like about Google Translate is that if it doesn’t support a specific language it gives you the following screen:

Identify Unknown Languages

That’s a great language detector if you ask me!

5. Microsoft Bing Translator with Auto-Detect as the first option.

bing-translator

Passed: 8 out of 20, didn’t pass Dutch, Vietnamese, Turkish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Tagalog, Greek, Galician, Czech and Belorussian

Pros: It supports a limited number of languages. For those languages, it does its job well.

Cons: I am very disappointed with Microsoft. They have a very limited number of languages for detection& translation and their Auto-Detect feature is terrible. In case you enter a language they don’t support, you’ll get a wrong result instead of telling you they don’t support that language.

Thoughts

Overall, my opinion is the above tools are heading in a good direction. They are currently the best ones for detecting languages online and do their job pretty well when it comes to popular languages. However, they must work on adding more obscure languages (none of the tools were able to recognize Tatar) and I’m sure that all of them, especially Google will go in that direction in near future.

Image credit: Kanko*

Ads by Google

26 Comments - Write a Comment

Reply

Niefer

You can try standalone Polyglot3000 – an excellent program: http://www.polyglot3000.com/

Also Opera has a widget – Wørd – very good one too.

Reply

qwertyweb

Polyglot3000 is a desktop based application for Windows that doesn’t require web access for language detection!!
PolyglotRecognizes more than 400 languages

Reply

Darko

Polyglot is recognized as a potential dangerous application by my Anti Virus. Don’t try these old ways to promote your apps, guys…

Reply

Mr On Line

you haven’t tested for Arabic ..

Well you should have .. it’s very common ..

Good Luck !

Reply

Sprachreisen

Thanks for the links to these excellent tools.
Have to tell the readers of my newsletter about them.

Reply

citizenearth

Hey, what about Polyglot 3000? I think it is the best tool to identify languages out there.

Reply

laptoptamiri

I have serious problem. Google thinks that my Turkish site language is in english. Someone help me

Your comment