<br /> <p align="center"> <h1 align="center">Swe-CLIP 500k</h1>
<p align="center">
<a href="https://github.com/FreddeFrallan/Multilingual-CLIP/tree/main/Model%20Cards/Swe-CLIP%20500k">Github Model Card</a>
</p>
</p>
Usage
To use this model along with the original CLIP vision encoder you need to download the code and additional linear weights from the Multilingual-CLIP Github. Once this is done, you can load and use the model with the following code
from src import multilingual_clip
model = multilingual_clip.load_model('pain/bert-base-arabertv2-Vit-B-32')
embeddings = model(['Älgen är skogens konung!', 'Alla isbjörnar är vänsterhänta'])
print(embeddings.shape)
# Yields: torch.Size([2, 640])
<!-- ABOUT THE PROJECT -->
About
A KB/Bert-Swedish-Cased tuned to match the embedding space of the CLIP text encoder which accompanies the Res50x4 vision encoder. <br>
Training data pairs was generated by sampling 500k sentences from the combined descriptions of GCC + MSCOCO + VizWiz, and translating them into Swedish. All translation was done using the Huggingface Opus Model, which seemingly procudes higher quality translations than relying on the AWS translate service.