tiny_starcoder_py-GGML

Quantized version of https://huggingface.co/bigcode/tiny_starcoder_py

Which one should I use?

fp16 (best quality) or q8_0 (~80% faster)

How to Use

Use https://github.com/the-crypt-keeper/ggml/tree/starcoder_repeat_penalty until https://github.com/ggerganov/ggml/pull/311 is merged

Run inference with --top_k 1 --repeat-penalty 1.176 for best results. For example:

$ ./bin/starcoder -m ~/ai/models/tiny_starcoder_py-fp16.bin -p 'def fibonnaci' --repeat-penalty 1.176 --top_k 1
main: seed = 1687866970
starcoder_model_load: loading model from '/home/miner/ai/models/tiny_starcoder_py-fp16.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 768
starcoder_model_load: n_head  = 12
starcoder_model_load: n_layer = 20
starcoder_model_load: ftype   = 1
starcoder_model_load: qntvr   = 0
starcoder_model_load: ggml ctx size = 1398.89 MB
starcoder_model_load: memory size =   960.00 MB, n_mem = 163840
starcoder_model_load: model size  =   438.77 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.900
main: top_k          = 1
main: top_p          = 0.900
main: repeat_last_n  = 64
main: repeat_penalty = 1.176
main: prompt: 'def fibonnaci'
main: number of tokens in prompt = 5
main: token[0] =    589, def
main: token[1] =  28176,  fib
main: token[2] =    267, on
main: token[3] =  46278, nac
main: token[4] =     91, i


def fibonnaci_2(n):
    """Fibonacci series of n."""

    if n == 0:
        return 1
    else:
        return fibonnaci_2(n-1) + fibonnaci_2(n-2)


if __name__ == '__main__':
    print(fibonnaci_2(5))<|endoftext|>

main: mem per token =   290312 bytes
main:     load time =   203.32 ms
main:   sample time =    69.94 ms
main:  predict time =  1262.22 ms / 15.98 ms per token
main:    total time =  1560.19 ms

Memory Usage

fp16

starcoder_model_load: ggml ctx size = 1398.89 MB
starcoder_model_load: memory size =   960.00 MB, n_mem = 163840
starcoder_model_load: model size  =   438.77 MB

q8_0

starcoder_model_load: ggml ctx size = 1204.83 MB
starcoder_model_load: memory size =   960.00 MB, n_mem = 163840
starcoder_model_load: model size  =   244.71 MB