A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (2024)

Qiushi Guo
CSRD
guoqiushi@csrd.cn

Abstract

Detecting obstacles in railway scenarios is both crucial and challenging due to the wide range of obstacle categories and varying ambient conditions such as weather and light. Given the impossibility of encompassing all obstacle categories during the training stage, we address this out-of-distribution (OOD) issue with a semi-supervised segmentation approach guided by optical flow clues. We reformulate the task as a binary segmentation problem instead of the traditional object detection approach. To mitigate data shortages, we generate highly realistic synthetic images using Segment Anything (SAM) and YOLO, eliminating the need for manual annotation to produce abundant pixel-level annotations. Additionally, we leverage optical flow as prior knowledge to train the model effectively. Several experiments are conducted, demonstrating the feasibility and effectiveness of our approach.

1 Introduction

With the rapid advancement of high-speed trains, ensuring the security of railway systems has emerged as a critical public concern. One of the primary challenges is obstacle detection, which plays a crucial role in railway safety. Developing a reliable and scalable obstacle detection system can empower train operators and dispatchers to take preemptive actions and mitigate potential accidents.

Deep learning techniques have been widely adopted across various security domains, including mobile payments[3], disaster detection [9], and fraud detection [6]. This technology exhibits substantial promise in enhancing railway safety through sophisticated obstacle detection capabilities. Significant efforts have recently been devoted to addressing obstacle detection using deep learning methods. Although these approaches have achieved some success, they also exhibit notable disadvantages:

A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (1)
  • Fragility to complex ambient conditions

  • Requirement for extensive manual annotations

  • Difficulty in extending to different scenarios

Designing an extendable, annotation-free model with strong generalization ability remains a significant challenge in both industry and academia.

To address the aforementioned issues, we propose a semi-supervised approach guided by optical flow. To mitigate the data shortage problem, we employ SAM [7] and YOLO [8] to generate highly realistic pseudo-images for training. Instead of manually collecting and annotating images pixel by pixel, we prepare two image sets: base images (fewer than 100 background images with only railway areas annotated) and object images. The object images include categories such as pedestrians, animals, and textures. Using SAM and YOLO, we obtain masks for the intended objects in these images. These objects are then pasted onto the base images according to the masks. The entire process are illustrated as Fig. 1 This process simultaneously generates image and mask pairs without manual effort.

To address the challenges posed by varying weather conditions, we implement two complementary strategies. Firstly, we compile a dataset of base images captured under diverse weather conditions, including rainy, foggy, and clear (sunny) environments. Secondly, we utilize optical flow to provide positional information as prior knowledge. For optical flow predictions, we generate pseudo sequences of obstacles. This involves creating an initial pseudo frame at point Pi(x,y)subscript𝑃𝑖𝑥𝑦P_{i}(x,y)italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x , italic_y ) and subsequently generating a new frame at Pi+1(x+δ,y+δ)subscript𝑃𝑖1𝑥𝛿𝑦𝛿P_{i+1}(x+\delta,y+\delta)italic_P start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ( italic_x + italic_δ , italic_y + italic_δ ) with the same object superimposed. Experimental results indicate that our approach yields satisfactory performance across different weather scenarios.

Our contributions are summarized as follows:

  • We reformulate the obstacle detection task as a binary segmentation problem, distinguishing between railway areas and non-railway areas.

  • We introduce a simple yet effective data generation mechanism to synthesize realistic images using SAM and YOLO.

  • Optical flow is leveraged to generate prior knowledge that guides the segmentation network.

A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (2)

2 Related work

2.1 Obstacle Detection in Railyway

Matthias Brucker et al.[2]propose a a shallow netwrok to learn railway segmentation from normal railway images. They explore the controlled inclusion of global information by learning to hallucinate obstacle-free images. Zhang Qiang et al.[12]. conbine segmentation model with the LiDAR in their obstacle detection system; Amine Boussik et al.[1] propose an unsupervised models based on a large set of generated convolutional auto-encoder models to detect obstacles on railway’s track level.

2.2 Segmentation with Optical flow

Laura et al. [10]. demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme; Volodymyr et al[5]. present an architecture forVideo Object Segmentation that combines memory-basedmatching with motion-guided propagation resulting in stable long-term modeling and strong temporal consistency.

3 Method

The pipeline of our approach is illustrated in Fig.2.Given a set of base images B𝐵Bitalic_B and target images T𝑇Titalic_T, our objective is to identify potential obstacles within specific regions η𝜂\etaitalic_η. Unlike traditional detection methods that categorically detect each obstacle, we reformulate the problem as a binary segmentation task. Instead of attempting to detect all potential obstacles, which is impractical, our emphasis is on segmenting the railway area, a region that remains consistent over time compared to obstacles.

To simulate these scenarios effectively, we generate highly realistic pseudo-images using a copy-paste approach. Additionally, to address challenges posed by extreme weather conditions, which can obscure object segmentation, we introduce optical flow to provide prior information guiding the segmentation model. Pseudo images Itsubscript𝐼𝑡I_{t}italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and It+δsubscript𝐼𝑡𝛿I_{t+\delta}italic_I start_POSTSUBSCRIPT italic_t + italic_δ end_POSTSUBSCRIPT are generated by applying a small shift δ𝛿\deltaitalic_δ to the target object, simulating its movement. The output of the optical flow model is incorporated along with pseudo images as input to facilitate accurate predictions.This section will delve into the detailed methodology employed throughout this process.

3.1 Data Acquisition

Base Images are used in our experiments are gathered at our facility in Chengdu, which features a railway spanning over 60 meters and includes simulators for fog and rain conditions. To ensure diversity in our dataset, we capture images under different weather scenarios, specifically rainy, foggy, and sunny conditions Fig. 3. Due to the fixed position of the camera, only one mask is required for annotation purposes. Importantly, the railway areas in the base images are devoid of any potential obstacles. Any obstacles present are generated using a copy-paste method.

A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (3)

Object Image dataset comprises three categories: PennFudanPed, Obj365 (part) [11], and DTD [4]. To facilitate fully automated application of our methodology, we proceed under the assumption that no masks are initially available. We focus on selecting categories likely to occur in our scenario, such as animals (e.g., deer, horse, cow) and vehicles (e.g., truck, cart). This ensures our approach is tailored to handle relevant objects effectively.

A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (4)

The entire process can be delineated into sequential steps: Initially, object images are fed into the YOLO model, which returns a list of bounding boxes identifying detected targets. These bounding boxes serve as inputs for SAM, which generates segmentation masks to outline the object pixels. Subsequently, these segmented object pixels are integrated into the base images based on the segmentation mask guidance.Here, we elaborate on the detailed methodology.

  • Object Detection with YOLO: Object images are inputted into the YOLO model, specifically trained on Obj365, to detect objects belonging to predefined target categories fitting our scenario.

  • Segmentation with SAM: Bounding boxes from YOLO are used as prompts for SAM to generate segmentation masks. These masks delineate object pixels, facilitating their extraction from the object images.

  • Integration with Base Images: Extracted object pixels are seamlessly integrated into the corresponding regions of base images, aligning with the guidance provided by the segmentation masks.

During the SAM stage, while not every segmentation mask achieves perfection, each contributes to the overall objective of accurately segmenting the railway area rather than focusing on obstacles. To address challenges related to out-of-distribution (OOD) scenarios, we introduce random polygon generation with texture rendering from DTD. Additionally, object resizing and rescaling are applied to enrich image content and bolster model robustness.The rescale follow the equation below:

h=αy+β𝛼𝑦𝛽h=\alpha\ast y+\betaitalic_h = italic_α ∗ italic_y + italic_β(1)
w=hHW𝑤𝐻𝑊w=\frac{h}{H}\ast Witalic_w = divide start_ARG italic_h end_ARG start_ARG italic_H end_ARG ∗ italic_W(2)

where h,w and H,W are the shapes of the target obj and original obj, respectively. α𝛼\alphaitalic_α and β𝛽\betaitalic_β are hyper-parameters to adjust the scale. In our project, we set α𝛼\alphaitalic_α to 0.6 and β𝛽\betaitalic_β to 30.The value should be varied by the camera’s position and it’s parameters. The final generated samples aredemonstrated as Fig .4

3.2 Optical-Flow

Optical flow is based on the assumption that the intensity of a point in an image remains constant as it moves from one frame to the next.

I(x,y,t)=I(x+Δt,y+Δt,t+Δt)𝐼𝑥𝑦𝑡𝐼𝑥Δ𝑡𝑦Δ𝑡𝑡Δ𝑡I(x,y,t)=I(x+\Delta t,y+\Delta t,t+\Delta t)italic_I ( italic_x , italic_y , italic_t ) = italic_I ( italic_x + roman_Δ italic_t , italic_y + roman_Δ italic_t , italic_t + roman_Δ italic_t )(3)

In our scenario, we employ RAFT (Recurrent All-Pairs Field Transforms) as our chosen model, which demonstrates robust performance across a wide range of scales from tiny to large. The size of obstacles in our dataset varies, spanning from hundreds of pixels down to less than 50 pixels in size. Utilizing the RAFT model requires two consecutive frames for optical flow estimation. Accordingly, we generate two pseudo images Itsubscript𝐼𝑡I_{t}italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and It+1subscript𝐼𝑡1I_{t+1}italic_I start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT, where the same target objects are pasted with a slight positional shift η𝜂\etaitalic_η.

Motion=ϕ(It,It+1)𝑀𝑜𝑡𝑖𝑜𝑛italic-ϕsubscript𝐼𝑡subscript𝐼𝑡1Motion=\phi(I_{t},I_{t+1})italic_M italic_o italic_t italic_i italic_o italic_n = italic_ϕ ( italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_I start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT )(4)
It+1=It(objx+Δx,objy+Δy)subscript𝐼𝑡1subscript𝐼𝑡𝑜𝑏subscript𝑗𝑥Δ𝑥𝑜𝑏subscript𝑗𝑦Δ𝑦I_{t+1}=I_{t}(obj_{x}+\Delta x,obj_{y}+\Delta y)italic_I start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_o italic_b italic_j start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + roman_Δ italic_x , italic_o italic_b italic_j start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT + roman_Δ italic_y )(5)

We set ΔxΔ𝑥\Delta xroman_Δ italic_x and ΔyΔ𝑦\Delta yroman_Δ italic_y range between 5-10. The motion prediction will be leveraged as priorinformation fused with pseudo image to train the model.

4 Experiments

4.1 Dataset and Evaluation Metrics

Dataset Our training dataset is consist of three parts: obs_person𝑜𝑏𝑠_𝑝𝑒𝑟𝑠𝑜𝑛obs\_{person}italic_o italic_b italic_s _ italic_p italic_e italic_r italic_s italic_o italic_n, obs_animals𝑜𝑏𝑠_𝑎𝑛𝑖𝑚𝑎𝑙𝑠obs\_{animals}italic_o italic_b italic_s _ italic_a italic_n italic_i italic_m italic_a italic_l italic_s and obs_textures𝑜𝑏𝑠_𝑡𝑒𝑥𝑡𝑢𝑟𝑒𝑠obs\_{textures}italic_o italic_b italic_s _ italic_t italic_e italic_x italic_t italic_u italic_r italic_e italic_s, namely person obstacles, animal obstacles and obstacles generated from texture polygons. The details are described as follow:As for test dataset, we recollect images with various obstacles under different weather conditions in different distance to the camera.

Metrics mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U is used to evaluate the performance of our model. mIoU𝑚𝐼𝑜𝑈mIoUitalic_m italic_I italic_o italic_U refers to the Mean Intersection over union, which is a widely used metric in segmentation task. It can be calculated as follow:

IoUi=TPiTPi+FPi+FNi𝐼𝑜subscript𝑈𝑖𝑇subscript𝑃𝑖𝑇subscript𝑃𝑖𝐹subscript𝑃𝑖𝐹subscript𝑁𝑖IoU_{i}=\frac{TP_{i}}{TP_{i}+FP_{i}+FN_{i}}italic_I italic_o italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = divide start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_T italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_F italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG(6)
mIoU=1ni=0nIoUi𝑚𝐼𝑜𝑈1𝑛superscriptsubscript𝑖0𝑛𝐼𝑜subscript𝑈𝑖mIoU=\frac{1}{n}\sum_{i=0}^{n}IoU_{i}italic_m italic_I italic_o italic_U = divide start_ARG 1 end_ARG start_ARG italic_n end_ARG ∑ start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_I italic_o italic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT(7)

pixel accuracy is also a metric to evaluate the segmentation models.

Pixel_accuracy=N_corrN_total𝑃𝑖𝑥𝑒𝑙_𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦𝑁_𝑐𝑜𝑟𝑟𝑁_𝑡𝑜𝑡𝑎𝑙Pixel\_{accuracy}=\frac{N\_{corr}}{N\_{total}}italic_P italic_i italic_x italic_e italic_l _ italic_a italic_c italic_c italic_u italic_r italic_a italic_c italic_y = divide start_ARG italic_N _ italic_c italic_o italic_r italic_r end_ARG start_ARG italic_N _ italic_t italic_o italic_t italic_a italic_l end_ARG(8)

where N_corr𝑁_𝑐𝑜𝑟𝑟N\_{corr}italic_N _ italic_c italic_o italic_r italic_r is the number of correctly classified pixels, N_total𝑁_𝑡𝑜𝑡𝑎𝑙N\_{total}italic_N _ italic_t italic_o italic_t italic_a italic_l is the number of total pixels.

NameVolumeDis(m)Category
obs_person𝑜𝑏𝑠_𝑝𝑒𝑟𝑠𝑜𝑛obs\_{person}italic_o italic_b italic_s _ italic_p italic_e italic_r italic_s italic_o italic_n40000-70person
obs_animal𝑜𝑏𝑠_𝑎𝑛𝑖𝑚𝑎𝑙obs\_{animal}italic_o italic_b italic_s _ italic_a italic_n italic_i italic_m italic_a italic_l40000-70cow,horse,deer
obs_texture𝑜𝑏𝑠_𝑡𝑒𝑥𝑡𝑢𝑟𝑒obs\_{texture}italic_o italic_b italic_s _ italic_t italic_e italic_x italic_t italic_u italic_r italic_e20000-70see DTD
val_near𝑣𝑎𝑙_𝑛𝑒𝑎𝑟val\_{near}italic_v italic_a italic_l _ italic_n italic_e italic_a italic_r2000-20person,rock,board
val_middle𝑣𝑎𝑙_𝑚𝑖𝑑𝑑𝑙𝑒val\_{middle}italic_v italic_a italic_l _ italic_m italic_i italic_d italic_d italic_l italic_e20020-50person,rock,board
val_far𝑣𝑎𝑙_𝑓𝑎𝑟val\_{far}italic_v italic_a italic_l _ italic_f italic_a italic_r20050-70person,rock,board

4.2 Implementation Details

Our method is implemented using the PyTorch framework and the model is trained on an RTX 4070Ti. We select Jaccard loss as the loss function and AdamW as the optimizer. The batch size is set to 8 and the number of epochs to 20. Data transformations include horizontal flip, coarse dropout, and random brightness contrast adjustments.

4.3 Results

To validate the performance of our approach, we conduct experiments on our three self-collected datasets: val_near, val_mid, and val_far. The details are described in Table 1. The basic training dataset contains 10,000 images (4,000+4,000+2,000). To fully assess the impact of the number of generated images, we increase the dataset size by 10% and 50% in rows 4 and 5.

The results are illustrated in Table2, which show that both RAFT and segmentation-based approaches can effectively segment obstacles in our railway area experiments. Combining RAFT and pseudo-images enhances model performance. As more generated images are added to the training dataset, the model’s performance gradually reaches its limit.

val_nearval_midval_far
yolov50.7440.6230.457
Raft0.7170.6360.585
DeepLab0.8250.8170.747
DeepLab+Raft0.8430.8280.709
DeepLab+Raft+10%0.8370.8430.724
DeepLab+Raft+50%0.8630.8510.802

4.4 Ablation Study

We conduct ablation experiment to validate the effect of different target objects. The results are demonstrated as Table 3 Comparing the row 1,2,3 with row 4, we can find that each obs dataset contributes to improving the robustness and accuracy of the model.

obs_personobs_animalobs_texturemIoU
10.781
20.817
30.732
40.849

5 Conclusion

This paper introduces a universal segmentation model based on a semi-supervised approach. To address out-of-distribution (OOD) challenges, we generate highly realistic pseudo images instead of relying on manual pixel-level annotations. Additionally, we enhance performance by incorporating optical flow techniques. Experimental results demonstrate satisfactory performance across various potential objects.

References

  • [1]Amine Boussik, Wael Ben-Messaoud, Smail Niar, and Abdelmalik Taleb-Ahmed.Railway obstacle detection using unsupervised learning: An exploratory study.In 2021 IEEE Intelligent Vehicles Symposium (IV), pages 660–667. IEEE, 2021.
  • [2]Matthias Brucker, Andrei Cramariuc, Cornelius VonEinem, Roland Siegwart, and Cesar Cadena.Local and global information in obstacle detection on railway tracks.In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9049–9056. IEEE, 2023.
  • [3]Han Cai, JiLin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, and Song Han.Enable deep learning on mobile devices: Methods, systems, and applications.ACM Transactions on Design Automation of Electronic Systems (TODAES), 27(3):1–50, 2022.
  • [4]Mircea Cimpoi, Subhransu Maji, Iasonas Kokkinos, Sammy Mohamed, and Andrea Vedaldi.Describing textures in the wild.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3606–3613, 2014.
  • [5]Volodymyr Fedynyak, Yaroslav Romanus, Bohdan Hlovatskyi, Bohdan Sydor, Oles Dobosevych, Igor Babin, and Roman Riazantsev.Devos: Flow-guided deformable transformer for video object segmentation.In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 240–249, 2024.
  • [6]Qiushi Guo, Yifan Chen, and Shisha Liao.Enhancing mobile privacy and security: A face skin patch-based anti-spoofing approach.In 2023 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), pages 52–57. IEEE, 2023.
  • [7]Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, AlexanderC Berg, Wan-Yen Lo, etal.Segment anything.In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
  • [8]Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi.You only look once: Unified, real-time object detection.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  • [9]Cem Sazara, Mecit Cetin, and KhanM Iftekharuddin.Detecting floodwater on roadways from image data with handcrafted features and deep transfer learning.In 2019 IEEE intelligent transportation systems conference (ITSC), pages 804–809. IEEE, 2019.
  • [10]Laura Sevilla-Lara, Deqing Sun, Varun Jampani, and MichaelJ Black.Optical flow with semantic segmentation and localized layers.In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3889–3898, 2016.
  • [11]Shuai Shao, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun.Objects365: A large-scale, high-quality dataset for object detection.In Proceedings of the IEEE/CVF international conference on computer vision, pages 8430–8439, 2019.
  • [12]Qiang Zhang, Fei Yan, Weina Song, Rui Wang, and Gen Li.Automatic obstacle detection method for the train based on deep learning.Sustainability, 15(2):1184, 2023.
A Universal Railway Obstacle Detection System based on Semi-supervised Segmentation And Optical Flow (2024)

References

Top Articles
Latest Posts
Article information

Author: Wyatt Volkman LLD

Last Updated:

Views: 6317

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Wyatt Volkman LLD

Birthday: 1992-02-16

Address: Suite 851 78549 Lubowitz Well, Wardside, TX 98080-8615

Phone: +67618977178100

Job: Manufacturing Director

Hobby: Running, Mountaineering, Inline skating, Writing, Baton twirling, Computer programming, Stone skipping

Introduction: My name is Wyatt Volkman LLD, I am a handsome, rich, comfortable, lively, zealous, graceful, gifted person who loves writing and wants to share my knowledge and understanding with you.