You script to quantize the instructor models simply doesn't work. #85

BBC-Esq · 2023-09-17T21:00:40Z

Tried every which way to get it to work, just can't get it to work. No other examples on the Internet of it working either.

AIApprentice101 · 2023-09-18T02:15:28Z

It works just fine for me. For GPU, you need to seek other solutions (https://discuss.pytorch.org/t/does-dynamic-quantization-support-gpu/119231).

BBC-Esq · 2023-09-18T09:54:41Z

It works just fine for me. For GPU, you need to seek other solutions (https://discuss.pytorch.org/t/does-dynamic-quantization-support-gpu/119231).

Do you have a sample script or series of commands that you used? I've tried verbatim.

AIApprentice101 · 2023-09-18T12:27:14Z

I just used the code from the quantization section in the readme of this repo.

BBC-Esq · 2023-09-18T12:28:52Z

I just used the code from the quantization section in the readme of this repo.

Thanks. I can't get it to work. If I try to get the error messages or log messages, would you be willing to help me just a little bit?

BBC-Esq · 2023-09-29T23:07:36Z

I figured out how to dynamically quantize the instructor-xl model, but at the point that it's supposed to create the embeddings, i want it to use gpu acceleration (cuda) just like it does when I use the float32 version of the model. Is that possible? If I understand the comments above, it's not? What about quantizing the model beforehand NOT using the "dynamic" method? I've been struggling with this for months so any help would be much appreciated. The link above is to a discussion back in 2021 and "seek other solutions" doesn't point me in the right direction so...I'm looking at bitsandbytes but couldn't find a solution either... Here is the portion of the script I'm trying to use:

    if "instructor" in EMBEDDING_MODEL_NAME:
        # Create the instructor embeddings object
        embeddings = HuggingFaceInstructEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
            query_instruction="Represent the document for retrieval."
        )
        
        # Quantize the instructor model on the CPU
        embeddings.client = quantize_dynamic(embeddings.client, dtype=torch.qint8)
        
        # Move the quantized model to the GPU
        embeddings.client = embeddings.client.to('cuda')
    elif "bge" in EMBEDDING_MODEL_NAME and "large-en-v1.5" not in EMBEDDING_MODEL_NAME:
        embeddings = HuggingFaceBgeEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
            encode_kwargs={'normalize_embeddings': True}
        )
    else:
        embeddings = HuggingFaceEmbeddings(
            model_name=EMBEDDING_MODEL_NAME,
            model_kwargs={"device": COMPUTE_DEVICE},
        )

hongjin-su · 2023-12-19T09:52:12Z

Hi, Thanks a lot for your interest in the INSTRUCTOR!

The following seems to work for me:

import torch
from InstructorEmbedding import INSTRUCTOR
from torch.nn import Embedding, Linear
from torch.quantization import quantize_dynamic
from sklearn.metrics.pairwise import cosine_similarity

model = INSTRUCTOR('hkunlp/instructor-large',device='cpu')
qconfig_dict = {Embedding : torch.ao.quantization.qconfig.float_qparams_weight_only_qconfig, Linear: torch.ao.quantization.qconfig.default_dynamic_qconfig}
qmodel = quantize_dynamic(model, qconfig_dict)

sentences_a = [['Represent the Science sentence: ','Parton energy loss in QCD matter'], 
               ['Represent the Financial statement: ','The Federal Reserve on Wednesday raised its benchmark interest rate.']]
sentences_b = [['Represent the Science sentence: ','The Chiral Phase Transition in Dissipative Dynamics'],
               ['Represent the Financial statement: ','The funds rose less than 0.5 per cent on Friday']]
embeddings_a = qmodel.encode(sentences_a)
embeddings_b = qmodel.encode(sentences_b)
similarities = cosine_similarity(embeddings_a,embeddings_b)

torch.save(qmodel.state_dict(),'state.pt')

Hope this helps!

yonalasso · 2024-03-14T11:23:48Z

I get with this script:

[1]    12026 illegal hardware instruction  python3
/usr/local/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

I'm on M1 Mac, sentence_transformers==2.2.2 (also had the problem with the token) #106

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You script to quantize the instructor models simply doesn't work. #85

You script to quantize the instructor models simply doesn't work. #85

BBC-Esq commented Sep 17, 2023

AIApprentice101 commented Sep 18, 2023 •

edited

Loading

BBC-Esq commented Sep 18, 2023

AIApprentice101 commented Sep 18, 2023

BBC-Esq commented Sep 18, 2023

BBC-Esq commented Sep 29, 2023

hongjin-su commented Dec 19, 2023

yonalasso commented Mar 14, 2024

You script to quantize the instructor models simply doesn't work. #85

You script to quantize the instructor models simply doesn't work. #85

Comments

BBC-Esq commented Sep 17, 2023

AIApprentice101 commented Sep 18, 2023 • edited Loading

BBC-Esq commented Sep 18, 2023

AIApprentice101 commented Sep 18, 2023

BBC-Esq commented Sep 18, 2023

BBC-Esq commented Sep 29, 2023

hongjin-su commented Dec 19, 2023

yonalasso commented Mar 14, 2024

AIApprentice101 commented Sep 18, 2023 •

edited

Loading