Implementation Details
This document provides an in-depth explanation of how the point-based and box-based annotation features are implemented in the SAM Annotator tool.
Table of Contents
- Overview
- Components
- Box-Based Annotation
- Point-Based Annotation
- Annotation Data Structure
- Mask Processing
- Saving and Loading Annotations
- Visualization
Overview
The SAM Annotator is built around the Segment Anything Model (SAM), developed by Meta AI. SAM is designed to generate segmentation masks from various prompts including points and bounding boxes. Our application provides an interface to interact with SAM for efficient image annotation.
Two primary annotation methods are implemented: 1. Box-based annotation: Drawing a bounding box around an object to generate a segmentation mask 2. Point-based annotation: Placing foreground and background points to guide the segmentation
Components
The annotation system is composed of several interacting components:
- SAMAnnotator: Main orchestrator class that coordinates the entire annotation workflow
- EventHandler: Manages user interactions with the interface
- WindowManager: Handles window operations and visualization
- Predictor: Interfaces with the SAM model to generate masks
- FileManager: Manages loading and saving of annotations
- CommandManager: Implements undo/redo functionality for annotation operations
Box-Based Annotation
Implementation Flow
- User Interaction:
- The user draws a box by clicking and dragging
EventHandler.handle_mouse_event()
captures the mouse events-
Box coordinates are stored in
box_start
andbox_end
variables -
Mask Prediction:
- Upon mouse release,
_handle_mask_prediction()
is called - The method scales the box coordinates from display size to original image size
- A center point is calculated from the box for additional reference
-
The predictor is called with both the box and center point:
masks, scores, _ = self.predictor.predict( point_coords=input_points, point_labels=input_labels, box=input_box, multimask_output=True )
-
Processing Results:
- The best mask is selected based on confidence scores
- The mask is resized to match display dimensions
- The mask is set in WindowManager:
self.window_manager.set_mask(display_mask)
- The interface is updated to show the predicted mask
Key Functions:
_handle_mask_prediction()
: Processes the box input and generates a maskEventHandler.handle_mouse_event()
: Captures mouse interactions for drawing the boxEventHandler.reset_state()
: Clears the current selection state
Point-Based Annotation
Implementation Flow
- User Interaction:
- The mode is switched to 'point' (using 'w' key)
- The user clicks to place foreground points (left click) or background points (right click)
EventHandler.handle_mouse_event()
captures these points and their labels-
Points are stored in the
points
list and labels in thepoint_labels
list -
Mask Prediction:
- After points are placed, pressing 'space' triggers
_handle_point_prediction()
- The method scales the point coordinates from display size to original image size
-
The predictor is called with the points and their labels:
masks, scores, _ = self.predictor.predict( point_coords=input_points, point_labels=input_labels, multimask_output=True )
-
Processing Results:
- The best mask is selected based on confidence scores
- The mask is resized to match display dimensions
- The mask is set in WindowManager:
self.window_manager.set_mask(display_mask)
- The interface is updated to show the predicted mask with the input points
Key Functions:
_handle_point_prediction()
: Processes the point inputs and generates a maskEventHandler.handle_mouse_event()
: Captures mouse interactions for placing pointsVisualizationManager.draw_input_points()
: Draws the points with appropriate colors (green for foreground, red for background)
Annotation Data Structure
When an annotation is added using 'a' key, it is converted to a structured format:
annotation = {
'id': len(self.annotations),
'class_id': self.current_class_id,
'class_name': self.class_names[self.current_class_id],
'box': original_box, # Box in original image coordinates
'display_box': display_box, # Box in display coordinates
'contour_points': contour_points, # OpenCV contour format
'contour': contour_list, # Flattened points for visualization
'mask': clean_mask, # Boolean mask
'area': cv2.contourArea(display_contour),
'metadata': {
'annotation_mode': self.event_handler.mode,
'timestamp': time.time()
}
}
Mask Processing
After a mask is predicted, _add_annotation()
handles the following steps:
- Contour Extraction:
- The boolean mask is converted to uint8
- Contours are extracted using
cv2.findContours()
-
The largest contour is selected
-
Bounding Box Calculation:
- A bounding box is calculated from the contour using
cv2.boundingRect()
-
The box is scaled for both display and original image dimensions
-
Mask Cleaning:
- A clean boolean mask is created
- The contour is processed into two formats:
contour_points
: Original cv2 contour formatcontour
: Flattened list for visualization
Saving and Loading Annotations
For a user-friendly guide on how annotations are stored and loaded, see the Loading and Saving Annotations documentation.
Saving Process
The _save_annotations()
method handles saving annotations to disk:
- Annotations are validated to ensure they have required fields
- Original image dimensions are obtained
- The FileManager's
save_annotations()
method is called with: - The annotations list
- Image name
- Original and display dimensions
- Class names
The FileManager then: 1. Scales contour points back to original image space 2. Writes normalized coordinates to a text file 3. Creates visualization images of the masks 4. Saves metadata about the annotations
Loading Process
When loading an image with existing annotations via _load_image()
:
- The image is loaded and processed to display dimensions
- The FileManager's
load_annotations()
method is called to fetch existing annotations - Annotations are scaled to match the display dimensions
- The interface is updated to show the annotations
Visualization
The VisualizationManager
handles all rendering of annotations:
- create_composite_view(): Main method that creates a visualization with:
- Original image as background
- Colored mask overlays with adjustable opacity
- Bounding boxes
- Class labels
-
Interactive points (when in point mode)
-
Drawing Functions:
_draw_mask()
: Renders a mask with the class color and proper opacity_draw_box()
: Draws a bounding box with the class color_draw_label()
: Adds a class label with a semi-transparent backgrounddraw_input_points()
: Visualizes input points with numbers and colors indicating foreground/background
Command Pattern Implementation
Annotation operations use a command pattern for undo/redo functionality:
- Add Annotation:
AddAnnotationCommand
adds a new annotation to the list - Delete Annotation:
DeleteAnnotationCommand
removes an annotation - Modify Annotation:
ModifyAnnotationCommand
changes properties of an annotation
Each command handles both the execution and its reverse operation, allowing for robust undo/redo capabilities.