System and Engineering Group

MSR Asia-Shanghai

下载

LLM2CLIP

2024年8月

LLM2CLIP is a novel approach that embraces the power of LLMs to unlock CLIP’s potential. By fine-tuning the LLM in the caption space with contrastive learning, we extract its textual capabilities into the output embeddings, significantly improving the output layer’s…

Github

MInference: Accelerating Pre-filling for Long-context LLMs via Dynamic Sparse Attention

2024年5月

MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then approximates the sparse index online…

Github