gpt_bigcode-santacoder-GGML

Quantized version of https://huggingface.co/bigcode/gpt_bigcode-santacoder

Which one should I use?

fp16 (best quality) or q8_0 (~80% faster) are preferred, consider q5_0 the low-RAM (should work on a 2GB RPi) version.

How to Use

Use https://github.com/the-crypt-keeper/ggml/tree/starcoder_repeat_penalty until https://github.com/ggerganov/ggml/pull/311 is merged

Run inference with --top_k 1 --repeat-penalty 1.176 for best results. For example:

$ ./build/bin/starcoder -m ~/ai/models/santacoder-q8_0.bin -p 'def fib(' --top_k 1 --repeat-penalty 1.176
main: seed = 1687821778
starcoder_model_load: loading model from '/home/miner/ai/models/santacoder-q8_0.bin'
starcoder_model_load: n_vocab = 49280
starcoder_model_load: n_ctx   = 2048
starcoder_model_load: n_embd  = 2048
starcoder_model_load: n_head  = 16
starcoder_model_load: n_layer = 24
starcoder_model_load: ftype   = 2007
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 2215.13 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =  1446.98 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.900
main: top_k          = 1
main: top_p          = 0.900
main: repeat_last_n  = 64
main: repeat_penalty = 1.176
main: prompt: 'def fib('
main: number of tokens in prompt = 3
main: token[0] =    563, def
main: token[1] =  24240,  fib
main: token[2] =      7, (


def fib(n):
    if n == 0:
        return 1

    elif n == 1 or n == 2:
        return 1

    else:
        return fib(n - 1) + fib(n - 2)


print("fibonacci series:")
for i in range(5, 36):
    print("{0} = {1}".format(i, fib(i)))<|endoftext|>

main: mem per token =   324248 bytes
main:     load time =  2088.62 ms
main:   sample time =    81.27 ms
main:  predict time =  5338.73 ms / 59.99 ms per token
main:    total time =  7617.71 ms

Memory Usage

fp16

starcoder_model_load: ggml ctx size = 3475.60 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =  2707.45 MB

q8_0

starcoder_model_load: ggml ctx size = 2215.13 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =  1446.98 MB

q5_0

starcoder_model_load: ggml ctx size = 1710.94 MB
starcoder_model_load: memory size =   768.00 MB, n_mem = 49152
starcoder_model_load: model size  =   942.80 MB