likely unstable due to small batch size

checkpoint from halfway through the run