text-generation-inference

This is Llama2-22b by chargoddard in a couple of GGML formats. I have no idea what I'm doing so if something doesn't work as it should or not at all that's likely on me, not the models themselves.<br> A second model merge has been released and the GGML conversions for that can be found here.

While I haven't had any issues so far do note that the original repo states <i>"Not intended for use as-is - this model is meant to serve as a base for further tuning"</b>.

Approximate VRAM requirements at 4K context: <table style='border: 2px #000000 solid; width: 50%' align='left' border='2'> <tbody> <tr> <td style='text-align: center'>MODEL</td> <td style='text-align: center'>SIZE</td> <td style='text-align: center'>VRAM</td> </tr> <tr> <td style='text-align: center'>q5_1</td> <td style='text-align: center'>16.4GB</td> <td style='text-align: center'>21.5GB</td> </tr> <tr> <td style='text-align: center'>q4_K_M</td> <td style='text-align: center'>13.2GB</td> <td style='text-align: center'>18.3GB</td> </tr> <tr> <td style='text-align: center'>q3_K_M</td> <td style='text-align: center'>10.6GB</td> <td style='text-align: center'>16.1GB</td> </tr> <tr> <td style='text-align: center'>q2_K</td> <td style='text-align: center'>9.2GB</td> <td style='text-align: center'>14.5GB</td> </tr> </tbody> </table>