In this text we will discuss the super power that automatic translations have to mess up the text and how this is the secret so that “lazy” people (people who copy the work of their friends) can get away with being caught by similarity checkers.
A similarity checker is software used to identify how similar a text is to others already published. This is a resource used to evaluate academic works and to judge if the similarities are natural or if it is the result of plagiarism. For almost a year I was responsible for verifying similarities and helping to correct course completion works, helping to prevent similarities from appearing or even being interpreted as plagiarism… this taught me something very interesting, that automatic translations can be used for clean up texts and thus avoid them being so similar.
The reason that automatic translators mess up the text is not a software problem, but a language problem… to understand this, let’s analyze the set of words that make up a language as the Domain of a function and their respective meanings as the Image of that function . Thus, for each word “blablublau”, a meaning in that language is associated, for example “the effect of osteoporosis on Jamaican raccoons”, which can also have this exact meaning for another language, associated with the word “parapapapapa”.
Thus, if in language X there are M words-meanings and in language Y there are N words-meanings. Of the words of language X, if there are L words with meanings equal to language Y, that is, if to translate from language X to language Y or the other way around, there are L words whose translations will be exact.
The translation in this case will be accurate, since for each word in the Domain of one language, we generate an Image (meaning) which is also the Image of a word in the Domain of another language. Thus, if the word is the “code” that the language uses to define the meaning, we can say that for both languages, these L words are equal to less than the “code” used as a label, for example:
português → quadrado → □ ← square ← inglês
However, the mess of machine translations between X and Y is the result of M-L or N-L words. For if an Image of the Domain of the words of language X does not have an Image associated with language Y, then it is necessary to combine two or more words-meanings of language Y to be equivalent to the word-meaning of language X, for example:
português → anteontem || day before yesterday ← inglês
Back in the sense of Domain and Image, we will call D (xi) the Domain of a word of the language X, I(xi) the Image of a word of the language X, I(yj) a Image of a word of the language Y, and D(yj) the Domain of a word of the language Y. Thus, for a first translation we will have:
D(x0) = I(x0) = I(y1) + I(y2) + … + I(yN) = D(y1) + D(y2) + … + D(yN).
(Pt) anteontem → (En) day before yesterday → (Pt) dia antes de ontem
Thus, if any of the N meanings of the Image of language Y, is not among the L common words of the two languages, for example the Image of the word yk, we will have in the reverse process of translation from language Y to language X the following structure:
D(y1) + D(y2) + … + D(yk) + … + D(yN) =
= I(y1) + I(y2) + … I(yk) + … + I(yN) =
= I(x1) + I(x2) + … I(xk1) + I(xk2) +… + I(xkN) + … + I(xN) = D(x1) + D(x2) + … D(xk1) + D(xk2) +… + D(xkN) + … + D(yN).
In this way, for each word-meaning that does not present the same Image in the Domain of that language, we generate in translation at least two other words with the intention of explaining its meaning. But if one of these words that explains the meaning, does not present the same Image in the Domain of the other language, to reverse the translation, we will generate at least two other words that explain its meanings. The process goes on in that machine translation mess in which we blame the innocent software.
Now let’s do an experiment that demonstrates how these messy translations can be used to avoid getting caught in similarity detectors even when copying the little friend’s text. We will do this with the text of our little friend William Shakespeare, take six verses from a well-known text, “to be or not to be”.
To be, or not to be, that is the question:
Whether ’tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles
And by opposing end them. To die—to sleep,
No more; and by a sleep to say we end (William Shakespeare).
Now let’s move it to Portuguese and then back to English.
To be or not to be, that’s the question:
If it is nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take up arms against a sea of troubles
And as opposed to finish them. To die – to sleep,
No longer; and for a sleep to say that we broke up (probably from William Shakespeare)
A relatively simple process, but the bold and underlined words have been changed from the initial text. Now a slightly bolder experiment. We will pass the original text from English → French → Greek → Russian → Japanese → Swahili → Hungarian → English, and with this we arrive at the following result that we marked in bold and underlined the parts that were changed:
The question is, are there:
Be gentle with bitter ghosts
Fantastic site and arrows
Or his hand against the sea in question
On the contrary, it is complete. Die-bed
She no longer sleeps. (unknown author)
Reading the text, we realize that it is very different from the original, in fact it is another text. But in a sense it is related to that initial and the most incredible thing is that it was generated in a fully automatic way. In fact, this process distorts the original text enough to partially preserve its meaning while making the new text original.