You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m using the pretrained MedCLIP model to compute the CLIP score between chest X-rays from the MIMIC-CXR dataset and their associated descriptions, but I’m observing very low scores.
Example:
File path: files/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg
Description:
"Single frontal view of the chest provided. There is no focal consolidation, effusion, or pneumothorax. The cardiomediastinal silhouette is normal. Again seen are multiple clips projecting over the left breast and remote left-sided rib fractures. No free air below the right hemidiaphragm is seen. No acute intrathoracic process."
This results in a low CLIP score with MedCLIP:
However, when using the OpenAI CLIP base model, the score is significantly higher
This behavior has been consistently observed across multiple cases. Is this expected? If I plan to use CLIP models for other downstream tasks, would it be accurate to conclude that the OpenAI model might be even more suitable for medical CXR-related applications? Howcome?
Hi authors,
I’m using the pretrained MedCLIP model to compute the CLIP score between chest X-rays from the MIMIC-CXR dataset and their associated descriptions, but I’m observing very low scores.
Example:
File path: files/p10/p10000032/s53911762/68b5c4b1-227d0485-9cc38c3f-7b84ab51-4b472714.jpg
Description:
"Single frontal view of the chest provided. There is no focal consolidation, effusion, or pneumothorax. The cardiomediastinal silhouette is normal. Again seen are multiple clips projecting over the left breast and remote left-sided rib fractures. No free air below the right hemidiaphragm is seen. No acute intrathoracic process."
This results in a low CLIP score with MedCLIP:
However, when using the OpenAI CLIP base model, the score is significantly higher
This behavior has been consistently observed across multiple cases. Is this expected? If I plan to use CLIP models for other downstream tasks, would it be accurate to conclude that the OpenAI model might be even more suitable for medical CXR-related applications? Howcome?
BTW, I’m using the implementation from the TorchMetrics CLIP score and TorchMetrics CLIP score doc .
The text was updated successfully, but these errors were encountered: