Shipwake-YOLO: Ship Wake Detection and Instance Segmentation for Visual Navigational-State Cue Extraction

doi:10.3390/jmse14131216

DOI: 10.3390/jmse14131216 ISSN: 2077-1312

Shipwake-YOLO: Ship Wake Detection and Instance Segmentation for Visual Navigational-State Cue Extraction

Shaoxi Li, Xingchen Ji, Chuankao Yang, Ruolan Zhang

Visual perception is an important component of close-range maritime situational awareness, particularly when conventional sources such as AIS and radar are delayed, incomplete, or unavailable. Ship wakes provide motion-related visual cues, but their segmentation remains difficult because wake regions are elongated, weakly textured, and frequently mixed with water-surface clutter. This study develops Shipwake-YOLO, a wake-oriented adaptation of YOLOv9-Seg for ship and wake instance segmentation in inland-waterway images. The task is formulated as visual navigational-state cue extraction rather than validated future manoeuvre prediction. The model segments hull and wake instances and provides mask-derived spatial cues for possible downstream state interpretation. The architecture introduces iAFF into cross-scale feature fusion, adapts the high-level SPPELAN aggregation block with SAConv-enhanced convolution, replaces selected downsampling paths with iSACADown, and adopts MPD-IoU as the bounding-box regression loss. On a 2100-image dataset collected from the Wuhu Channel of the Yangtze River, Shipwake-YOLO improves Box-mAP@50 from 77.7% to 84.6% and Mask-mAP@50 from 67.6% to 79.8% relative to the YOLOv9-Seg baseline. Under stricter IoU thresholds, the model reaches 48.9 in Box-mAP@[0.50:0.95] and 46.2 in Mask-mAP@[0.50:0.95]. The parameter count is reduced by 7.5%, and GFLOPs decrease from 144.2 to 137.1. These results indicate that the proposed adaptation improves ship-wake perception within the collected inland-waterway setting and provides a visual basis for downstream navigational-state estimation.

Outline

Shipwake-YOLO: Ship Wake Detection and Instance Segmentation for Visual Navigational-State Cue Extraction

More from our Archive