The paper proposes a new method for training language models on large unlabeled datasets. The method, called MASS, uses a masked self-attention mechanism to learn representations of words that are robust to the context in which they appear. MASS is shown to outperform previous methods on a variety of natural language processing tasks, including text classification, question answering, and summarization.
What is the largest continent that Aftermath covers?
How many countries does Aftermath cover?
What is the largest city that Aftermath covers?
Previous