From the thesis:
We further define a pre-training objective, “naturalization”, to pre-train the models with natural coding conventions followed by the developers. In this work, we take developers’ written code and apply semantic-preserving code transformation to make the code “unnatural” and “weird”. We ask a model to transform those unnatural codes into their orig- inal forms. With such a training mechanism, we aim to make the model implicitly biased towards the natural coding...