🔥 Good news

You can download the model from PY007 without any change to llama.cpp.

Here is a demo.

Pay attention

To use this model, you need to change the rope part of llama.cpp/llama.cpp. (From mode 0 to mode 2 rope)

Change 2568 and 2572 line from

struct ggml_tensor * Kcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd_head, n_head_kv, N), n_past, n_embd_head, 0, 0, freq_base, freq_scale);
struct ggml_tensor * Qcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd_head, n_head, N),    n_past, n_embd_head, 0, 0, freq_base, freq_scale);

to

struct ggml_tensor * Kcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpk, n_embd_head, n_head_kv, N), n_past, n_embd_head, 2, 0, freq_base, freq_scale);
struct ggml_tensor * Qcur = ggml_rope_custom_inplace(ctx0, ggml_reshape_3d(ctx0, tmpq, n_embd_head, n_head, N),    n_past, n_embd_head, 2, 0, freq_base, freq_scale);

TinyLlama-1.1B Chat v0.2 GGUF

Description

This repo contains GGUF format model files for PY007's TinyLlama 1.1B Chat v0.2

<!-- README_GGUF.md-about-gguf start -->

About GGUF

GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.

The key benefit of GGUF is that it is a extensible, future-proof format which stores more information about the model as metadata. It also includes significantly improved tokenization code, including for the first time full support for special tokens. This should improve performance, especially with models that use new special tokens and implement custom prompt templates.

Here are a list of clients and libraries that are known to support GGUF:

<!-- README_GGUF.md-about-gguf end -->

<!-- prompt-template start -->

Prompt template: TinyLlama chat

<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n

Example:

<|im_start|>user
Explain huggingface.<|im_end|>
<|im_start|>assistant
Hugging Face is a platform for building and hosting open-source applications. It provides a simple interface for developers to build, deploy, and host any application on the web. Hugging Face offers a wide range of services, including:

1. API Gateway: This service allows developers to create REST APIs that can be accessed by other Hugging Face services.

2. Functions: This service provides functions that can be used for processing data and making predictions.

3. Transformers: These are a set of algorithms that allow developers to process large amounts of text data and generate new content.

4. Datasets: Hugging Face provides datasets that can be used to train models, evaluate them, and make predictions.

5. CLI: This service provides a command-line interface for developers to build, deploy, and manage their applications.

6. Documentation: This service provides documentation for the different services and features available on Hugging Face's platform.

7. Community: The Hugging Face community is made up of developers, data scientists, and other experts who can provide support and resources for using and building on Hugging Face's platforms.<|im_end|>

<!-- prompt-template end -->

<!-- compatibility_gguf start -->

Compatibility

These quantised GGUF files are compatible with llama.cpp from August 21st 2023 onwards, as of commit 6381d4e110bd0ec02843a60bbeb8b6fc37a9ace9

They are now also compatible with many third party UIs and libraries - please see the list at the top of the README.

Explanation of quantisation methods

<details> <summary>Click to see details</summary>

The new methods available are:

Refer to the Provided Files table below to see what files use which methods, and how. </details> <!-- compatibility_gguf end -->

<!-- README_GGUF.md-how-to-run start -->

Example llama.cpp command

For compatibility with older versions of llama.cpp, or for any third-party libraries or clients that haven't yet updated for GGUF, please use GGML files instead.

./main -m ./models/ggml-model-q4_k_m.gguf \
        -n 512 --color --temp 0 -e \
        -p "<|im_start|>user\nExplain huggingface.<|im_end|>\n<|im_start|>assistant\n"