Tomato is an economically important horticultural crop widely cultivated globally, rich in nutrients such as lycopene and vitamin C. It can grow both in open-field systems and controlled environments like greenhouses. However, tomatoes exhibit asynchronous maturation, with fruits at various ripeness stages often coexisting on a single plant. Additionally, tomatoes have a relatively short postharvest shelf life. Precise and timely ripeness assessment is crucial for minimizing postharvest losses and maintaining product quality throughout the supply chain.
According to statistics, the average postharvest loss of fruits in China has reached 20%, resulting in an annual economic loss exceeding 100 billion CNY. In contrast, developed countries typically record losses below 5%, with some achieving as low as 1%–2%. A primary cause of this disparity is the asynchronous maturation of fruits, which frequently leads growers to misjudge optimal harvest windows, resulting in premature or delayed picking. In greenhouse tomato production, labor costs alone account for more than 44.5% of net profit. Currently, tomato harvesting and sorting predominantly rely on manual labor, which is plagued by low efficiency, high labor intensity, and escalating costs.
Meanwhile, tomato ripeness detection in natural environments faces numerous challenges: fluctuating illumination significantly impairs recognition accuracy; dense foliage occlusion, along with complex backgrounds of soil and weeds, often obscures fruit features, leading to erroneous ripeness assessments. So, how to achieve accurate tomato ripeness detection under complex conditions while ensuring the model is sufficiently lightweight for practical applications?
To address this issue, Professor Zhijie Fang and Associate Researcher Zijun Sun from the School of Electronics Engineering, Guangxi University of Science and Technology, proposed a lightweight tomato ripeness detection model—YOLOv11-MHS. Based on YOLOv11n, the model incorporates three key improvements: Designing the C3k2_MSCB module, which integrates a multiscale convolutional block (MSCB) to simultaneously extract and fuse features across different scales, enhancing detection accuracy; Redesigning the neck of the model with a high-level feature screening-fusion pyramid (HS-FPN) structure, which not only fuses key features to improve robustness in cluttered environments but also reduces model size; and Introducing the spatial and channel synergistic attention (SCSA) mechanism into the C2PSA module to enhance the model's ability to handle complex scenes.
Experimental results show that compared to the baseline model YOLOv11n, YOLOv11-MHS achieves an improvement of 1.7% in mAP0.5 and 2.9% in mAP0.5-0.95, while reducing parameters by 35.2% and model size by 32.7%. In comparisons with mainstream models such as Faster-RCNN, YOLOv7, and YOLOv8n, YOLOv11-MHS outperforms others in metrics including precision, recall, and mean average precision (mAP), with distinct lightweight advantages—its GFLOPs, parameter count, and memory footprint are all lower than those of comparative models.
In complex scene tests: under backlighting conditions, only YOLOv11-MHS could locate the leftmost unripe tomato; in shadowy environments, all other models missed several fruits; and under foliage occlusion, YOLOv11-MHS achieved zero misses, along with Faster-RCNN and the improved YOLOv5s, demonstrating superior robustness.
This study provides technical support for tomato ripeness detection, and its lightweight nature facilitates deployment on resource-constrained devices. In the future, the model is expected to be integrated into autonomous harvesting robots or orchard monitoring systems, advancing precision agriculture, reducing labor costs, and improving the efficiency and intelligence of tomato production.
Source: News Wise