gpt_bigcode-santacoder-GGML
Quantized version of https://huggingface.co/bigcode/gpt_bigcode-santacoder
Which one should I use?
fp16 (best quality) or q8_0 (~80% faster) are preferred, consider q5_0 the low-RAM (should work on a 2GB RPi) version.
How to Use
Use https://github.com/the-crypt-keeper/ggml/tree/starcoder_repeat_penalty until https://github.com/ggerganov/ggml/pull/311 is merged
Run inference with --top_k 1 --repeat-penalty 1.176
for best results. For example:
$ ./build/bin/starcoder -m ~/ai/models/santacoder-q8_0.bin -p 'def fib(' --top_k 1 --repeat-penalty 1.176
main: seed = 1687821778
starcoder_model_load: loading model from '/home/miner/ai/models/santacoder-q8_0.bin'
starcoder_model_load: n_vocab = 49280
starcoder_model_load: n_ctx = 2048
starcoder_model_load: n_embd = 2048
starcoder_model_load: n_head = 16
starcoder_model_load: n_layer = 24
starcoder_model_load: ftype = 2007
starcoder_model_load: qntvr = 2
starcoder_model_load: ggml ctx size = 2215.13 MB
starcoder_model_load: memory size = 768.00 MB, n_mem = 49152
starcoder_model_load: model size = 1446.98 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: temp = 0.900
main: top_k = 1
main: top_p = 0.900
main: repeat_last_n = 64
main: repeat_penalty = 1.176
main: prompt: 'def fib('
main: number of tokens in prompt = 3
main: token[0] = 563, def
main: token[1] = 24240, fib
main: token[2] = 7, (
def fib(n):
if n == 0:
return 1
elif n == 1 or n == 2:
return 1
else:
return fib(n - 1) + fib(n - 2)
print("fibonacci series:")
for i in range(5, 36):
print("{0} = {1}".format(i, fib(i)))<|endoftext|>
main: mem per token = 324248 bytes
main: load time = 2088.62 ms
main: sample time = 81.27 ms
main: predict time = 5338.73 ms / 59.99 ms per token
main: total time = 7617.71 ms
Memory Usage
fp16
starcoder_model_load: ggml ctx size = 3475.60 MB
starcoder_model_load: memory size = 768.00 MB, n_mem = 49152
starcoder_model_load: model size = 2707.45 MB
q8_0
starcoder_model_load: ggml ctx size = 2215.13 MB
starcoder_model_load: memory size = 768.00 MB, n_mem = 49152
starcoder_model_load: model size = 1446.98 MB
q5_0
starcoder_model_load: ggml ctx size = 1710.94 MB
starcoder_model_load: memory size = 768.00 MB, n_mem = 49152
starcoder_model_load: model size = 942.80 MB