Model Card for Model ID
Attempts to extract metadata; keywords, description and header count
Model Details
Model Description
<!-- Provide a longer summary of what this model is. -->
- Developed by: Israel N.
- Model type: Llama-2-7B
- Language(s) (NLP): English
- License: Apache-2.0
- Finetuned from model [optional]: TinyPixel/Llama-2-7B-bf16-sharded
Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Expediting offline SEO analysis
Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. --> Currently does not respond to site or metadata, might need a more refined dataset to work.
How to Get Started with the Model
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops
Import and use the AutoModelForCausalLM.pretrained to load the model from "israelNwokedi/Llama2_Finetuned_SEO_Instruction_Set".
Training Details
Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Prompts: Entire sites and backlinks scrapped from the web Outputs: Keywords, description, header counts (h1-h6).
These are the main components of the dataset. Additional samples are ChatGPT-generated metadata as prompts and the relevant outputs.
Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> Finetuning of pre-trained "TinyPixel/Llama-2-7B-bf16-sharded" huggingface model using LoRA and QLoRA.
Preprocessing [optional]
Used Transformers' BitsAndBytesConfig for lightweight model training and "TinyPixel/Llama-2-7B-bf16-sharded" tokenizer for encoding/decoding.
Training Hyperparameters
- Training regime: 4-bit precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
Testing Data, Factors & Metrics
Testing Data
<!-- This should link to a Data Card if possible. -->
Sampled from training data.
Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Not yet computed.
[More Information Needed]
Results
Intial test attempted reconstructing another artiicial metadata as part of its text generation function however this was not the intended usecase.
Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
<!-- Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). -->
- Hardware Type: Tesla T4
- Hours used: 0.5
- Cloud Provider: Google Colaboratory
- Compute Region: Eurpoe
- Carbon Emitted: 0.08