From the thesis:
We further define a pre-training objective, “naturalization”, to pre-train the models with natural coding conventions followed by the developers. In this work, we take developers’ written code and apply semantic-preserving code transformation to make the code “unnatural” and “weird”. We ask a model to transform those unnatural codes into their orig- inal forms. With such a training mechanism, we aim to make the model implicitly biased towards the natural coding convention followed by developers. With these insights, we developed Nat- Gen 7 – a pre-trained model for source code generation (details in Chapter 7).