This model is obtained first
- Feature alignment based on SciCap. The intermediate output is at this link: https://huggingface.co/alexshengzhili/llava-7bv0-mm-projector-ft-with-ocr-caption-prompted-paragraph
- Instruction Tuning based on OG llava-provided paper