LeanDojo: Theorem Proving with Retrieval-Augmented Language Models
Under review, NeurIPS (Datasets and Benchmarks Track), 2023
Kaiyu Yang, Aidan Swope, Alex Gu, Rahul Chalamala,
Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

@article{yang2023leandojo,
  title={{LeanDojo}: Theorem Proving with Retrieval-Augmented Language Models},
  author={Yang, Kaiyu and Swope, Aidan and Gu, Alex and Chalamala, Rahul and Song, Peiyang and Yu, Shixing and Godil, Saad and Prenger, Ryan and Anandkumar, Anima},
  journal={arXiv preprint arXiv:2306.15626},
  year={2023}
}

Please visit LeanDojo Website for details.

Input Format

The model's input consists of retrieved premises concatenated with the current proof state and truncated to 2300 UTF-8 bytes. The proof state is formatted by Lean's pretty printer, the same as the input format of our model w/o retrieval. Retrieved premises are in the form of Lean code, except that proofs are removed and fully qualified names (marked by <a>...</a>) are used. Please see the example on the right.