SantaCoder 🎅 fine-tuned on Swift 🍏

This model is a fine-tuned version of bigcode/santacoder on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8353

Model description

The SantaCoder models are a series of 1.1B parameter models trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (which excluded opt-out requests). The main model uses Multi Query Attention, was trained using near-deduplication and comment-to-code ratio as filtering criteria and using the Fill-in-the-Middle objective. In addition, there are several models that were trained on datasets with different filter parameters and with architecture and objective variations.

Intended uses & limitations

More information needed

Training and evaluation data

The Stack contains over 6TB of permissively-licensed source code files covering 358 programming languages. The dataset was created as part of the BigCode Project, an open scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs). The Stack serves as a pre-training dataset for Code LLMs, i.e., code-generating AI systems which enable the synthesis of programs from natural language descriptions as well as other from code snippets. This is the near-deduplicated version with 3TB data.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 10000

Training results

Training Loss	Epoch	Step	Validation Loss
1.1132	0.05	500	1.0496
1.0077	0.1	1000	1.0245
1.0109	0.15	1500	1.0111
1.1106	0.2	2000	1.0025
0.5083	0.25	2500	1.0163
0.2996	0.3	3000	1.0339
1.0745	0.35	3500	0.9682
1.0355	0.4	4000	0.9467
0.9156	0.45	4500	0.9229
0.8834	0.5	5000	0.9199
0.6363	0.55	5500	0.9048
0.8771	0.6	6000	0.8899
1.9208	0.65	6500	0.8727
0.8816	0.7	7000	0.8633
0.8918	0.75	7500	0.8543
0.8714	0.8	8000	0.8454
0.9486	0.85	8500	0.8402
1.0609	0.9	9000	0.8364
0.9124	0.95	9500	0.8356
0.9743	1.0	10000	0.8353

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.1+cu116
Datasets 2.7.1
Tokenizers 0.13.2

Citation

@misc {manuel_romero_2023,
	author       = { {Manuel Romero} },
	title        = { santacoder-finetuned-the-stack-swift (Revision 99b9470) },
	year         = 2023,
	url          = { https://huggingface.co/mrm8488/santacoder-finetuned-the-stack-swift },
	doi          = { 10.57967/hf/0348 },
	publisher    = { Hugging Face }
}