Introduction to surgical AI
Artificial Intelligence for surgical applications is a growing field, bolstered both by the R&D efforts of private companies and by the research of academic institutions and hospitals around the world. In recent years, there have been major advances in computer vision in the field linked to improving perception for robotic surgery. In addition, there is important work being done related to facilitating the work of surgeons through smart predictions, better visualization, and optimization of surgical operations.
In order to ensure that AI applications in this high-stakes field are trustworthy, it’s important to pay attention to the way data is annotated and iteratively improved. Most of the available data is in the form of videos, which bring challenges of their own, such as the importance of consistent annotations across frames and across videos, as well as the need for automation solutions which could make the annotation work more cost-effective. Thanks to our extensive experience with video annotation for medical AI purposes, we can support you in your project, no matter how complex!
Challenges and best practices
Based on what we have learned in our experience annotating video data for medical AI applications, we’d like to share some of the most frequent challenges that one encounters in this type of work, and our best practices on how to deal with them:
Challenge | Best Practice |
---|---|
The data for surgical AI applications frequently comes in the form of videos, some of which can be very long. For example, a video of a colonoscopy may have a length of more than 2 hours. This makes it time-consuming to perform detailed frame-by-frame annotation, and splitting the videos into shorter clips may backfire because if they are annotated by different people who do not have the whole context, their interpretations of the content may differ. | In order to perform video annotation in an efficient way which does not sacrifice quality and consistency, our annotators use a variety of automation techniques which allow them to interpolate annotations from one frame to the next one. In this way, one annotator can annotate an entire video in a short period of time, ensuring that they have access to its entire duration in order to make better judgments. Many platforms on the market support interpolation with bounding boxes, while the best ones provide this feature even with polygons and semantic segmentation. In addition, once you have a trained model, you can use it in order to generate pre-annotations for new datasets, which can optimize the manual work of the annotators who will only check and validate the outputs rather than annotating from scratch! |
The anatomy of each person differs, and their internal organs may appear very differently in terms of position, size, and orientation. Moreover, many abnormalities such as polyps or cancerous formations may be difficult to detect even for human experts, who may disagree on the interpretation of their presence, size, and type. | We are conscious that there is a lot of subjective judgment involved in medical AI annotation and this is why we work with certified medical professionals. We have a roster with more than 20 specialties, including radiologists, surgeons, dentists, ophthalmologists, and others, who work closely with you in order to agree on the standards for interpretation of the data and to make sure the entire annotation team interprets the data in the same way, according to industry standards and widely accepted taxonomies. |
Medical and dental data are not readily available and accessible as other types of computer vision data. Data from different hospitals is frequently kept away in siloed systems which are frequently not interoperable, and there is an acute lack of diversity in the demography of patients and the geographic locations where data is usually coming from. Therefore, datasets for surgical AI applications frequently carry systematic biases. | As a company committed to mitigating harmful biases in AI systems, we dedicate a lot of time to find out workable solutions to make datasets more representative and diverse. The most effective solution is an iterative approach where an AI model that has been trained is tested on new data during deployment and its performance is evaluated on a continuous basis. Whenever data drift is detected, the outliers are sent to our humans-in-the-loop which label the ground truth interpretation so that the model’s predictions can be assessed. This guarantees the gradual improvement of models and the mitigation of harmful prejudice. |
This is why it’s really important to start with the right training data and to adopt a human-in-the-loop approach in order to be able to sift through large quantities of video data and continuously improve the medical AI models that you are using by annotating the most valuable clips.
Types of annotation for Surgical AI
Below we will feature several different use cases of video annotation for surgical purposes and some of the best practices for each one.
Surgical steps tracking
Frames are labeled at a frame level or object level based on the surgical action that is being performed: scision, suction, etc. This afterwards helps to provide a detailed analysis and breakdown of hours-long surgical operations, to assess the performance of surgeons, and to come up with insights for optimization.
Surgical tool tracking
For this use case, the surgical tools are labeled based on their type and segmented with a brush, polygon, or keypoints across frames. The purpose of this annotation is to track their position and actions, frequently for the training of robotic surgery AI systems.
Anatomy labeling
In order to be able to give trustworthy diagnoses at scale, medical professionals may use the help of AI tools which highlight the anatomy of a particular organ. This can be done on both 2D, 3D, and video data, using bounding boxes, polygons, keypoints, or full semantic segmentation.
Polyp and abnormalities labeling
For the enhanced detection of abnormalities, AI can help practitioners to avoid false negatives and to screen large amounts of data in very little time. Such systems require the labeling of internal organs, polyps, cancerous cells, or other formations using a precise segmentation mask or a polygon.
Dental surgery
Oral surgery using AI has been gaining a lot of traction in procedures such as dental implants, endodontic surgery, TMJ surgery, and others. As in other types of robotic surgery, here the surgical steps, instruments and anatomy are tracked, so as to enable automated or semi-automated surgical procedures.
Depth labeling
For the purposes of robotic surgery, it is very important to be able to gain a 3D understanding of the environment and options for navigation using just a 2D image. This can be achieved by labeling the foreground vs the background with a full semantic segmentation mask.
Tools we love
Below are some of our absolute favorite tools for surgical video annotation: one open-source free tool for simpler projects and a paid one which offers more functionalities for more sophisticated projects.
CVAT is the go-to tool for video annotation using bounding boxes or frame-level labels. It offers simple but useful functionalities for interpolation, dataset management, user management, and quality control.
For more advanced projects which require video labeling with polygons, keypoints, or semantic segmentation, V7 is the way to go. It offers more robust features for labeling pipeline management and even custom model training.
How to use a human-in-the-loop for surgical video AI
Medical AI applications are extremely high-risk and it’s best practice to use a human-in-the-loop in order to ensure their trustworthiness and to win the favor of practitioners who will be using them. Here are some of the ways in which humans can be plugged into the entire MLOps cycle in order to provide human input and verification on a continuous basis:
- Ground truth annotation: in order to train your initial models, we offer full dataset annotation from scratch by qualified medical professionals. They use a variety of different video labeling tools with interpolation features in order to reduce annotation time and make the process more cost-effective.
- Output validation with active learning: once you’ve trained an initial model, we can use it in order to pre-annotate a large part of the dataset, which will both increase the speed of the annotators and the impact of their work, by setting up an active learning workflow and prioritizing frames or videos where your model is least certain.
- Real-time edge case handling: once you have a model in deployment, our humans-in-the-loop are available 24/7 to monitor live video streams coming from your systems or to provide alert handling whenever a specific alert is triggered, in order to avoid alert fatigue for the end user. In this way, we provide a critical second human layer of verification for your model’s most critical responses.
The right annotation team for your medical AI project
Get in touch with us and our team will assist you with finding the best solution