casual_lm

Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->

GPT-6B_Tuned_small_pile is a GPT-j-6B model trained on 0.1 million example of pile dataset.

n_embd: 4096, n_layer: 28, n_positions: 2048

Tuning Parameters:

val_split_percent: 20,

momentum: 0.9 

train_batch_size (eff) : 32

train_micro_batch: 16

gradient_accumulation_steps: 2

gradient_clipping: 0.5

learning_rate: 0.00001

weight_decay: 0.01

lr_schedular: cosine

lr_warmup_steps: 1000

lr_decay: 0.1

lr_decay_step: 2000

mixed_precision: bf16

image.png## Model Details

Model Description

<!-- Provide a longer summary of what this model is. -->