Model card for detoxified gpt-j-6b Model run can be found here The main difference is that I used mini_batch_size=1