Pages in topic:   < [1 2]
Translate Wikipedia in 80 hours for free (Duolingo)
Thread poster: Jeff Whittaker
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 03:59
Member (2006)
English to Afrikaans
+ ...
@Attila May 22, 2012

Attila Piróth wrote:
A lot of web content is completely irrelevant in most languages.


I agree. What concerns me about this (and about reCaptcha) is that only the patent holder can decide which materials get digitised or translated. Now the New York Times archives have been digitised, but that just serves to give the New York Times view of things a greater say in how we interpret the world. Why can't owners of sites that use reCaptcha choose which books they wish to be digitised? Why are only English sources digitised? Surely there is a much greater need for digitisation in marginalised languages where less money is available for paid digitisation services.

Duolingo's approach denies that source-language competence is necessary for translation. (Or target-language competence, if users translate out of their native language.) As a professional translator, I have to disprove this statement day after day. (Would anyone allow their name to be published on the site: "sentence translated by NN"?)


Well, it would be an interesting experiment to prove that you're right (and they're wrong). If their +computer can statistically calculate a translation that is essentially the same as that of a human translation (or that is approved by a human translator), then it would prove their point.

I got the impression (from the TED Talk) that their non-public trials involved non-translator users who did not know the target language, and that this produced translations that practically match those of human translators.

Remember, a statistical machine translation system does not translate texts -- it calculates it. Your name would not be published next to a sentence that you translated because even if the sentence produced by the MT system is exactly the same as yours, the translation that is published is not published because you translated it but because the machine had calculated its likeliness.

"80 hours for free" - catchy tagline. What is reality? The 80 hours are well over, where is the result? Even a partial one?


The "80 hours" is an estimate used with the figure "1 million users".


 
Attila Piróth
Attila Piróth  Identity Verified
France
Local time: 03:59
Member
English to Hungarian
+ ...
Testing the concept May 22, 2012

Samuel Murray wrote:

Well, it would be an interesting experiment to prove that you're right (and they're wrong). If their +computer can statistically calculate a translation that is essentially the same as that of a human translation (or that is approved by a human translator), then it would prove their point.

I got the impression (from the TED Talk) that their non-public trials involved non-translator users who did not know the target language, and that this produced translations that practically match those of human translators.


"Here is a typical set of experimental data"... In experimental science, all such assertions are taken with a grain of salt: there have been lots of cases where the data that fits the researcher's point best was used as an illustrative example. That is why reproducibility is such a cherished concept.

The examples mentioned in the TED talk are very much hand-picked to illustrate their point; one was an idiom that requires little language-transfer skills: a native speaker who hears the two key words of an idiom is very likely to reproduce the whole correctly. This is not a representative example at all.

So, how could it be decided? They alone have access to their data - and as they could in principle modify it at their will, there quoting odd examples from it does not stand up to any scientific scrutiny. The only way to draw some useful conclusions would be this, IMO: they publish a big corpus (produced under controlled conditions), and a randomly chosen part is compared with human translation or reviewed by professional reviewers.

Such quality audit procedures are quite common in the translation industry. As long as Duolingo showcases the only tiny fragments that suite their purpose, no serious conclusion can be drawn. Of course, they will find sentences where source-language competence is of little relevance - or where the crowd produced a correct solution. I dunno, perhaps on a large sample they would even find that 10-20% of the translated sentences are on a par with professional translation. This would be an interesting piece of information. But the bar must be set incomparably higher for that one can conclude that this approach can, in general, produce something that is on a par with professional work. (The same applies to machine translation.)

If conditions were appropriately controlled, I would safely bet a pretty sum on that the translations of moderately challenging texts produced this way are not fit for any professional purposes. (Of course, I realize that "practically match those of human translators" can mean something quite different.)

"80 hours for free" - catchy tagline. What is reality? The 80 hours are well over, where is the result? Even a partial one?


The "80 hours" is an estimate used with the figure "1 million users".


OK, so what? Their estimate of "1 million" turned out to be unrealistic? That would not be the only one, IMO. Why don't they reiterate it then? "10 000 users, 8000 hours" does not sound the same...

Best,
Attila


 
Stefano Papaleo
Stefano Papaleo  Identity Verified
Italy
Local time: 03:59
Member (2005)
English to Italian
+ ...
Smoke screens May 22, 2012

Samuel Murray wrote:



This person's objections to reCaptcha are all faulty. The "About" button on the reCaptcha is clear as day, and if you click on it, it explains very clearly what is going on. And no-one is forced to use the reCaptcha -- if you object to using it, simply surf to some other web site that doesn't use it.


Sorry, but I have to disagree. If you use somebody's input & time for something else you have to state it right away IN PLAIN SIGHT with something like: "WARNING/NOTE/IMPORTANT: the input you provide will be used to.... If you do not agree you are kindly asked to leave the website." And not hide it behind an anonymous "About" link - where usually one just finds credits to whom developed the thing. It is called honesty, transparency, fairness call it as you wish, you get the drift. Just like you do with a privacy note. This is particularly important when your work will be used FOR PROFIT without having asked you if you are willing to do so and without anything in return.

There are plenty of ways to learn languages for free (or a few bucks) on the Internet and they are so much better and you really learn something.


Well, any novel way of learning a language is worth having a look at. The theory of DuoLingo seems solid -- you basically learn the way people used to learn before there were grammar books available.


You can find plenty of comments against this in other threads on Duolingo here & elsewhere. This is not grammar vs. vocabulary (why the 2 should be put against one another is beyond me...)... they guy could not care less about people learning languages. It is 16 minutes and 40 seconds of bad stand-up comedy masking exploitation dressed as "we're here to help humanity". The trick is "combining" the translations etc. Who does that? Computers? Editors? Who judges which is an easy sentence and which a complicated one? Who has a say on which is a good translation and which is not? THAT is where the real business lies. They monetize (his own words, not mine) but you work for free... where have I already seen that.. ah yes.. it is called slavery


Some learning methods focus on grammar before vocabulary, and some focus on vocabulary (and simple sentence construction) before moving on to grammar. DuoLingo seems to be the latter type -- you learn by looking up words in a dictionary, and eventually you learn to recognise patterns in the language.


And.... do we need Duolingo for that? Any person who really wants to learn a foreign language and/or contribute to translating content has already thousands ways to do so.


So with DuoLingo you don't learn grammar from a book, but rather by example. Ordinarily, if you want to learn this way, you need to pay hourly for language tutors (and get tutored many hours).


I have nothing against self-teaching per se but language tutors aren't there just for decoration, are they?

The fact that DuoLingo also helps translate stuff is of no concern to me (though I don't object to being part of a crowd whose efforts are used to statistically translate stuff).


Good to know that


 
Stefano Papaleo
Stefano Papaleo  Identity Verified
Italy
Local time: 03:59
Member (2005)
English to Italian
+ ...
@Attila: Scientific approach May 22, 2012

Exactly. I could not agree more. Everything was carefully hand-picked and conceived to convey and idea of success, cultural revolution, charity and game changer: First he charms the audience with some useless funny examples from captcha then he throws in some numbers to impress them, pleads guilty for taking so much time due to his invention and then comes the conversion on the road to Damascus i.e. recaptcha used to digitize books (another thing which usually takes lots of time & money). Then h... See more
Exactly. I could not agree more. Everything was carefully hand-picked and conceived to convey and idea of success, cultural revolution, charity and game changer: First he charms the audience with some useless funny examples from captcha then he throws in some numbers to impress them, pleads guilty for taking so much time due to his invention and then comes the conversion on the road to Damascus i.e. recaptcha used to digitize books (another thing which usually takes lots of time & money). Then he SELLS the company to no other than Google - it makes you wonder if he is really so concerned about poverty and education why he didn't just give it for free?

Then, like you correctly pointed out, come the so-called example which are either the only few cases of good results or just plain fakes. He does not mention that the 100.000 people (says who?) involved in the pyramids, the mission to the moon, the Suez canal etc. WERE PAID. He does not mention that no one prevents you from translating Wikipedia by simply taking part in it without he meddling into that. Just like he carefully picks up Rosetta Stone as "the enemy" ("not fair" and all that mumbo jumbo)... $500 just too much and so on and so forth it makes you cry because the poor can't afford it blah blah blah...

Well, a) behind that price are hours of work of a whole series of experts so guess what... they decide it to sell it (I'm sure he works for free, right?) and b) you don't mention the infinite other options of language education which are available either for free or by paying a fee. They are available both on and offline and many of them have a reputation, a method, results and visible real people behind them. But of course the real objective (actually not) is translating the web...

Who says you have to translate the web? Is there a new universal law we don't know about? Did your doctor tell you to? Ever heard of voluntary projects where people do get together and do it for free because they believe in the cause and because it is an advantage to them and others? What does it mean anyway? The web is big... veeeeery big. He masks this under the cloak of "let's spread education and make it available for free" but then he talks about.. monetizing.. on his side of course;)

He is giving candies to kids to lure them into his own agenda. Ma knows better and used to say: "Don't take candy from strangers"
Collapse


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translate Wikipedia in 80 hours for free (Duolingo)







Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »