What Language Is This? 5 Tools to Identify Unknown Languages

13070763 39d779562f   What Language Is This? 5 Tools to Identify Unknown Languages이 웹사이트에 환영. 이것은 보기 원본이다

What language is this? Chinese? Japanese?

It’s Korean actually. Detecting this manually would have taken me a lot of time. Fortunately, I found some very accurate tools that can do this automatically. They are all listed below.

The experiment: I tested the websites using sample text (1-2 sentences with 8 words) from the following languages: Portuguese, Russian, Korean, Vietnamese, Italian, Turkish, Polish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Dutch, Filipino (Tagalog), Greek, Galician, Czech, Belorussian, Finnish, Tatar and Norwegian.

Overall, I tested 20 different languages.

3 Tools to Detect Unknown Language Text

1. LangId (passed 18 out of 20 tests, didn’t pass Tatar and Belorussian)

lang id   What Language Is This? 5 Tools to Identify Unknown Languages

Pros: Overall, great online tool. It offers basic text detection functionality and they also have Twitter and email-detection bots for even quicker results.

Cons: Their engine is based on Google API but they seem to have better results than the Google detector described below. It seems they know how to utilize things very well. I didn’t like that they don’t have their own unique algorithm to detect languages.

2. Google Language Detector (passed 17 out of 20 tests, didn’t pass Portuguese, Taglog and Belorussian)

Google AJAX Language API Language Detection   What Language Is This? 5 Tools to Identify Unknown Languages

Pros: Google has one of the world’s best API for language detection. They good thing is you’re able to see the probability of the result they display being true. They were able to pass most of the sample tests.

Cons: I was quite surprised they didn’t pass the Portuguese test. It seems they have a (I hope temporary) bug with this language. Also they can surely do a better job in making the page design better.

3. What Language Is This (passed 11 out of 20 tests, didn’t pass Russian, Korean, Ukrainian, Azerbaijani, Macedonian, Tagalog, Greek, Galician and Tatar)

what language is this1   What Language Is This? 5 Tools to Identify Unknown Languages

Pros: Some languages like the South Slavic ones (Serbian, Croatian, Slovenian) are quite similar. In case you enter some Croatian text, let’s say, this website will tell you that the text could also be Serbian or Slovenian.

Cons: They need to work on making their detection system more sophisticated. I was thinking of putting Translated.net (another website for language detection) instead of this one, but Translated promised detection of more languages and actually did worse than WhatLanguageIsThis.com.

2 Tools To Detect Websites In Unknown Languages

4. Google Translate with Detect Language as the first option

Passed: 18 out of 20, didn’t pass Belorussian and Tatar.

Pros: This tool does its job very well. The thing I like about Google Translate is that if it doesn’t support a specific language it gives you the following screen:

google error   What Language Is This? 5 Tools to Identify Unknown Languages

That’s a great language detector if you ask me!

5. Microsoft Bing Translator with Auto-Detect as the first option.

bing translator1   What Language Is This? 5 Tools to Identify Unknown Languages

Passed: 8 out of 20, didn’t pass Dutch, Vietnamese, Turkish, Ukrainian, Azerbaijani, Slovenian, Macedonian, Tagalog, Greek, Galician, Czech and Belorussian

Pros: It supports a limited number of languages. For those languages, it does its job well.

Cons: I am very disappointed with Microsoft. They have a very limited number of languages for detection& translation and their Auto-Detect feature is terrible. In case you enter a language they don’t support, you’ll get a wrong result instead of telling you they don’t support that language.

Thoughts

Overall, my opinion is the above tools are heading in a good direction. They are currently the best ones for detecting languages online and do their job pretty well when it comes to popular languages. However, they must work on adding more obscure languages (none of the tools were able to recognize Tatar) and I’m sure that all of them, especially Google will go in that direction in near future.

Image credit: Kanko*

The comments were closed because the article is more than 180 days old.

If you have any questions related to what's mentioned in the article or need help with any computer issue, ask it on MakeUseOf Answers—We and our community will be more than happy to help.

26 Comments -

Niefer

You can try standalone Polyglot3000 – an excellent program: http://www.polyglot3000.com/

Also Opera has a widget – Wørd – very good one too.

qwertyweb

Polyglot3000 is a desktop based application for Windows that doesn’t require web access for language detection!!
PolyglotRecognizes more than 400 languages

Darko

Polyglot is recognized as a potential dangerous application by my Anti Virus. Don’t try these old ways to promote your apps, guys…

Mr On Line

you haven’t tested for Arabic ..

Well you should have .. it’s very common ..

Good Luck !

Sprachreisen

Thanks for the links to these excellent tools.
Have to tell the readers of my newsletter about them.

citizenearth

Hey, what about Polyglot 3000? I think it is the best tool to identify languages out there.

laptoptamiri

I have serious problem. Google thinks that my Turkish site language is in english. Someone help me