Monday, April 30, 2012

Duolingo




Duolingo: Learn A Language While Translating The Web

            Currently, there are over 2 billion internet users. Only 26.8% of these users are English speakers, yet 55.9% of the web content is English. This alienates the 1,657,347,866 people that do not speak English and therefore cannot use these sites. Machine translation is not yet good enough to translate the sites automatically. If you look at translations by a computer, you see that most of the time it is incorrect. Also, how would you be able to identify the few times that the machine translation is correct? Consequently, the best solution right now is to have people complete these translations.   

Duolingo is a recent project started at CMU by Professor Luis Von Ahn and his team that uses crowd-sourcing for text-translation. Their goal is to use people to translate the web. Von Ahn saw two problems with translating the web: there is a lack of bilinguals to do all this work, and there also needs to be motivation for people to do this for free. The team thought of a solution that solves both of these problems. The project would translate the web through education. Duolingo is a way for people to learn a new language. Users are provided with sentences based on their level of proficiency in a language, which they would then have to translate. I will discuss more details of how Duolingo works later.

First, I want to talk about why Von Ahn and his team think that this project has incredible potential to be successful. To translate all of Wikipedia into Spanish would cost about $50 million using professional translators. This feat of translating the web is something that needs to be done by people volunteering their time and have no other costs for it to work. Von Ahn’s previous project was ReCaptcha. This project used the same idea of crowd-sourcing and no other costs in order to have people digitize books as they have to enter Captchas for certain websites.
Captcha
Figure 1

 (If you do not know what a Captcha is, it is displayed on the right as Figure 1. It is the distorted text images that users have to type to prove that they are not a computer. Captcha is often used when you are trying to register for a specific website. Or another example is when you are buying tickets from Ticketmaster to prevent scalpers from taking advantage of the system.) The ReCaptcha technology uses work that people already have to do and adds value to it. There are many books that are not in good enough conditions for computers to be able to identify the text; whereas humans can still read the words and type what the text is. By combining many responses, ReCaptcha can create fairly accurate digital versions of the text.

This ReCaptcha project has had billions of users and has been extremely successful. Von Ahn hopes that Duolingo will follow this same success and once again have billions of users. Number wise, there are approximately 1.2 billion people that want to learn a foreign language each year. The language software can cost up to $500, making it biased against people that cannot afford it. Duolingo is a free alternative to the software that can attract all of these people. It is similar to ReCaptcha in the sense that it is taking something that people already do, in this case, attempt to learn a new language, and adding more value to it by applying the translations to real content and translating the web.

Now, if we think back to the task of translating Wikipedia into Spanish, with 100,000 users Duolingo would be able to complete this task in just 5 weeks. With 1 million users, this can be done in just 80 hours. The question then becomes how accurate would these translations be? If beginners are translating websites to a language that they have never seen before, their sentences can be entirely incorrect.  In testing, the translations that Duolingo creates are as good as that of professionals without sacrificing speed. This is because it combines many of the users’ translations and has many learning tools.

Next, I want to look more specifically into how Duolingo works. Pretend that you are a user of Duolingo, and I will walk you through the website.  First you are provided with a sentence that fits your language level. Then, you will have the option to see the context that this sentence was taken from. Since all users are translating real content, it makes it more interesting and encourages you to keep learning. It is a practical application of language skills. 
Duolingo question
Figure 2

When looking at the specific sentence, you are able to hover over a word if you would like to see other users’ translations for help, illustrated in Figure 2 on the left. Then once you submit a translation, you are given feedback. Duolingo will either congratulate you for being correct, or indicate what is wrong with your translation. For example, if there was a simple typo that it could detect (ex. typing “epro” instead of “pero”), Duolingo will say that your translation was correct, but inform you of the typo. It will also indicate whether there were more significant errors (ex. using a masculine adjective instead of a feminine). Duolingo will also help you understand and memorize the words you do not know and have hovered over with educational examples. You can then vote on the quality of other translations displayed, providing the site with feedback over other users’ efforts.

This feedback feature I just mentioned is also one of the motivating factors for users. Most language learning methods outside of a class cannot give you feedback on your language skills. A workbook cannot tell you that your answer is incorrect when there are so many translations of the same sentences. Other software usually provides feedback for multiple choice questions or single words. According to Von Ahn, feedback is crucial to learning, and Duolingo attempts to provide feedback on entire sentences. Of course, as mentioned there are multiple translations for the same sentences and therefore makes this feedback a very difficult task.

I have discussed some of the background and intentions of Duolingo, but not as much about its results in reality. This is because Duolingo was only privately launched in November of 2011. By January, 45,000 sentences were translated from this private launch, giving a lot of hope for the success of Duolingo in the future. The Spanish and German language options were released in a beta version in March. There is an extremely large waiting list of people that want to receive accounts for the beta version.

There are some blogs from the people that have already been able to obtain these beta version accounts. From what I have researched, people have found the user interface and layout of the lessons to be “flawless and visually appealing” (Classical Bookworm). Regarding the educational functions, the lessons and translations earn you points that allow you to progress to further levels.
Duolingo lesson
Figure 3

 A layout of the lessons is illustrated in Figure 3 on the right. If you make too many mistakes, you are forced to redo the lesson. Duolingo has audio features, images for people that learn better visually, and even social features where you can ask questions for others to answer. Overall, the feedback for Duolingo has been very positive. Duolingo has great potential to impact the web and already has a lot of user interest; I cannot wait to see what the future has in store.

To learn more, watch the Duolingo intro video below, or visit the official site:  http://duolingo.com/


Works Cited
"Classical Bookworm: Duolingo: First Impressions." Classical Bookworm. Web. <http://classical-bookworm.blogspot.com/2012/01/duolingo-first-impressions.html>.
"Duolingo | Learn English, Spanish and German for Free." Duolingo. Web. <http://duolingo.com/>.
Luis Von Ahn -- Duolingo: The Next Chapter in Human Computation. TEDXCMU, 25 Apr. 2011. Web. <http://www.youtube.com/watch?v=cQl6jUjFjp4>.
"Top Ten Internet Languages - World Internet Statistics." Internet World Stats. Web. <http://www.internetworldstats.com/stats7.htm>.
"Usage of Content Languages for Websites." Usage Statistics of Content Languages for Websites, April 2012. Web. <http://w3techs.com/technologies/overview/content_language/all>.
"World Internet Usage Statistics News and World PopulationStats." Internet World Stats. Web. <http://www.internetworldstats.com/stats.htm>.


No comments:

Post a Comment