Vietnamese Legal Text BERT
Table of contents
<a name="introduction"></a> Using Vietnamese Legal Text BERT hmthanh/VietnamLegalText-SBERT
<a name="transformers"></a> Using Vietnamese Legal Text BERT transformers
Installation <a name="install2"></a>
- Install
transformers
with pip:
pip install transformers
<br />
- Install
tokenizers
with pip:
pip install tokenizers
Pre-trained models <a name="models2"></a>
Model | #params | Arch. | Max length | Pre-training data |
---|---|---|---|---|
hmthanh/VietnamLegalText-SBERT |
135M | base | 256 | 20GB of texts |
Example usage <a name="usage2"></a>
import torch
from transformers import AutoModel, AutoTokenizer
phobert = AutoModel.from_pretrained("hmthanh/VietnamLegalText-SBERT")
tokenizer = AutoTokenizer.from_pretrained("hmthanh/VietnamLegalText-SBERT")
sentence = 'Vượt đèn đỏ bị phạt bao nhiêu tiền?'
input_ids = torch.tensor([tokenizer.encode(sentence)])
with torch.no_grad():
features = phobert(input_ids) # Models outputs are now tuples