RASO (Recognize Any Surgical Object) is a vision-language model for recognizing and detecting surgical instruments and objects in surgical images and videos.
# Clone the repository
git clone https://github.com/ntlm1686/raso.git
cd recognize-any-surgical-object
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .
The pre-trained model weights need to be downloaded from Hugging Face: https://huggingface.co/Mumon/raso
Download the model weights and place them in the MODEL
directory:
MODEL/raso_zeroshot.pth
: Zero-shot recognition modelMODEL/raos_cholect50_ft.pth
: Model fine-tuned on the Cholec50 datasetimport torch
from PIL import Image
from raso.models import raso
from raso import inference, get_transform
# Load model
model = raso(pretrained='./MODEL/raso_zeroshot.pth',
image_size=384,
vit='swin_l')
model.eval()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
transform = get_transform(image_size=384)
# Load and preprocess image
image_path = "./examples/img_01.png"
image_pil = Image.open(image_path)
image = transform(image_pil).unsqueeze(0).to(device)
tags, logits = inference(image, model)
print("Results with default threshold (0.65):", tags)
If you use RASO in your research, please cite the following papers:
@misc{li2025recognizesurgicalobjectunleashing,
title={Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data},
author={Jiajie Li and Brian R Quaranto and Chenhui Xu and Ishan Mishra and Ruiyang Qin and Dancheng Liu and Peter C W Kim and Jinjun Xiong},
year={2025},
eprint={2501.15326},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.15326},
}
This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - see the LICENSE file for details.
This project builds upon the Recognize Anything repository. We acknowledge and thank the authors for their foundational work on the Recognize Anything Model (RAM) architecture that made RASO possible.