← Older Newer →

Multilingual Interactions through Machine Translation—Numbers from Socl

By andresmh12 months ago - permalink

Tags: socl translation multilanguage

For the past two years, social media platforms have been rolling out machine translation, enabling multilingual interactions. However, the people interacting in these platforms often know each other already, and have a language in common (i.e., friends). But what happens when machine translation is used to facilitate interactions among strangers, who perhaps have common interests but not a common language?

The earliest social media platform to enable machine translation was probably Facebook, which began autotranslating conversations in Facebook pages (a good place to start given that Pages are more likely to bring people who speak different languages together). Likewise, Google+ and Twitter later released similar features, enabling, for example, Spanish-speaking Twitter users to read the tweets from the now toppled Egyptian president Muhammad Morsi, translated from Arabic to Spanish:

image 

How often do these types of multilingual interactions occur? Ethan Zuckerman posed a similar question when wondering what the numbers were for machine translations, in the context of a discussion about the challenges of having people pay attention to content outside their immediate reach.

With that in mind, we decided to look into some numbers using data from our own social media platform: Soclwhich started offering machine translation since last year. Socl, like Twitter, often brings strangers together who might not speak the same language, example:

image

Multilingualism in Socl
In the 3 months of Socl data we looked at, we found more than 6,000 multilingual posts: threads like the one above, where the language of one or more of the comments, or the thread-starter, were different, presumably representing people being able to communicate with people in other languages through machine translation.
We found that most multilingual threads on Socl, 85%, contain two languages, and the remaining 15% have 3 or more languages, up to a handful of threads with 5 languages in one single thread.
image
Furthermore, the majority of multilingual posts involved English and some other language, with English-Portuguese and English-Spanish being the most common pairings among bilingual threads:
 
image

These  numbers reflect the demographics of Socl itself, as almost half of the visitors come form outside the US, mainly from Brazil, India, and Germany.
It is important to note that these numbers are produced using automatic language detection, which, while it has improved a lot in the past few years, still fails to when dealing with emoticons and other unusual Internet lingo.
 
More work is needed to understand the degree to which machine learning can support deep cross-language communication, and, perhaps more difficult, cross-cultural communication.


Many thanks to Elena Agapie, James van Eaton, and Bruce Haly, for helping with this post.

← Older Newer →