The paper proposes a new method for training neural machine translation models. The method is based on the idea of self-supervised learning, where the model is trained on a large corpus of unlabeled data. The model is then fine-tuned on a small corpus of labeled data. The authors show that their method achieves state-of-the-art results on several machine translation tasks.
What was the result of the First World War?
Which of these countries was not part of the Allied Powers: France, Russia, Australia or Italy?
Which of these countries did not join the Allied Powers at the start of the First World War: France, Britain, Russia or Japan?