DictaLM: A Large Generative Language Model for Modern Hebrew

A large generative pretrained transformer (GPT) language model for Hebrew, released here.

This is the base-model pretrained on general text completion. On it's own, it isn't very useful, but it can be fine-tuned for specific tasks (instruct, chat, QA, and more).

You can access the instruct-tuned model here.

Sample usage (for text completion):

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictalm-7b')
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True).cuda()

model.eval()

with torch.inference_mode():
    # this prompt was taken from the headline of a [YNet](https://www.ynet.co.il/architecture/article/b1j3bzcrn) article.
    prompt = 'מנורה מכובע ים וכוסות מבקבוקי פלסטיק: הצצה'
    kwargs = dict(
        inputs=tokenizer(prompt, return_tensors='pt').input_ids.to(model.device),
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.75,
        max_length=100,
        min_new_tokens=5
    )
    
    print(tokenizer.batch_decode(model.generate(**kwargs), skip_special_tokens=True))

There are many different parameters you can input into kwargs for different results (greedy, beamsearch, different samplign configurations, longer/shorter respones, etc.).

You can view the full list of parameters you can pass to the generate function here.

Alternative ways to initialize the model:

If you have multiple smaller GPUs, and the package accelerate is installed, you can initialize the model split across the devices:

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, device_map='auto')

If you are running on linux and have the bitsandbytes package installed, you can initialize the model in 4/8 bit inference mode:

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, load_in_8bit=True)

If you have FlashAttention installed in your environment, you can instruct the model to use the flash attention implementation (either V1 or V2, whichever is installed):

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b', trust_remote_code=True, use_flash_attention=True)

Citation

If you use DictaLM in your research, please cite DictaLM -- A Large Generative Language Model for Modern Hebrew

BibTeX:

@misc{shmidman2023introducing,
      title={Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew}, 
      author={Shaltiel Shmidman and Avi Shmidman and Amir David Nissan Cohen and Moshe Koppel},
      year={2023},
      eprint={2309.14568},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0