Model Card for Model ID

Attempts to extract metadata; keywords, description and header count

Model Details

Model Description

Developed by: Israel N.
Model type: Llama-2-7B
Language(s) (NLP): English
License: Apache-2.0
Finetuned from model [optional]: TinyPixel/Llama-2-7B-bf16-sharded

Uses

Direct Use

Expediting offline SEO analysis

Bias, Risks, and Limitations

Currently does not respond to site or metadata, might need a more refined dataset to work.

How to Get Started with the Model

!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops

Import and use the AutoModelForCausalLM.pretrained to load the model from "israelNwokedi/Llama2_Finetuned_SEO_Instruction_Set".

Training Details

Training Data

Prompts: Entire sites and backlinks scrapped from the web Outputs: Keywords, description, header counts (h1-h6).

These are the main components of the dataset. Additional samples are ChatGPT-generated metadata as prompts and the relevant outputs.

Training Procedure

Finetuning of pre-trained "TinyPixel/Llama-2-7B-bf16-sharded" huggingface model using LoRA and QLoRA.

Preprocessing [optional]

Used Transformers' BitsAndBytesConfig for lightweight model training and "TinyPixel/Llama-2-7B-bf16-sharded" tokenizer for encoding/decoding.

Training Hyperparameters

Training regime: 4-bit precision

Testing Data, Factors & Metrics

Testing Data

Sampled from training data.

Metrics

Not yet computed.

[More Information Needed]

Results

Intial test attempted reconstructing another artiicial metadata as part of its text generation function however this was not the intended usecase.

Environmental Impact

Hardware Type: Tesla T4
Hours used: 0.5
Cloud Provider: Google Colaboratory
Compute Region: Eurpoe
Carbon Emitted: 0.08