Small Object Detection Without Attention for Aerial Surveillance

Lecture Notes in Networks and Systems (LNNS,volume 882); The 13th Conference on Information Technology and Its Applications (CITA 2024) ; pp: 372-383.

Sparad:

Bibliografiska uppgifter
Huvudupphovsmän:	Choi, Yehwan, Nguyen, Duy Linh, Vo, Xuan Thuy, Hyun Jo, Kang
Materialtyp:	Bài viết
Språk:	English
Publicerad:	Springer Nature 2024
Ämnen:	To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones
Länkar:	https://elib.vku.udn.vn/handle/123456789/4294 https://doi.org/10.1007/978-3-031-74127-2_31
Taggar:	Lägg till en tagg Inga taggar, Lägg till första taggen!

Thư viện lưu trữ:	Trường Đại học Công nghệ Thông tin và Truyền thông Việt Hàn - Đại học Đà Nẵng

id	oai:elib.vku.udn.vn:123456789-4294
record_format	dspace
spelling	oai:elib.vku.udn.vn:123456789-42942024-12-06T09:00:47Z Small Object Detection Without Attention for Aerial Surveillance Choi, Yehwan Nguyen, Duy Linh Vo, Xuan Thuy Hyun Jo, Kang To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones Lecture Notes in Networks and Systems (LNNS,volume 882); The 13th Conference on Information Technology and Its Applications (CITA 2024) ; pp: 372-383. This paper introduces the development of an essential deep-learning model for surveillance systems utilizing high-mounted CCTV or drones. Objects seen from elevated angles often look smaller and may appear at different angles compared to ground-level observations. To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture. However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones. Therefore, we apply the Attention-Free Transformer (AFT) to reduce computational requirements enabling real-time operation on low-capacity devices. We validate the performance by combining ViT and AFT with the YOLOv5 real-time object detection model. Practical applicability is confirmed by implementing it on the low-capacity device named ODROID H3+. Validation datasets include Autonomous Driving Drone, VisDrone, AerialMaritime, and PKLot, all containing numerous small-sized objects. Experimental results on the VisDrone dataset show that YOLOv5 nano + AFT reduces parameter count by 4.6% while increasing accuracy by 1%, making it an efficient network. The model size is suitable for edge device implementation at 3.7 MB. Similarly, Aerial Maritime and PKLot datasets indicate a decreased amount of parameters and increased accuracy. Hence, the proposed deep learning model is applicable for aerial surveillance systems. 2024-12-06T08:59:19Z 2024-12-06T08:59:19Z 2024-11 Working Paper 978-3-031-74126-5 https://elib.vku.udn.vn/handle/123456789/4294 https://doi.org/10.1007/978-3-031-74127-2_31 en application/pdf Springer Nature
institution	Trường Đại học Công nghệ Thông tin và Truyền thông Việt Hàn - Đại học Đà Nẵng
collection	DSpace
language	English
topic	To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones
spellingShingle	To improve the detection of small objects, we propose a network incorporating an element-wise multiplication module based on the vanilla Vision Transformer (ViT) architecture However, traditional transformer models need significant computational resources, which may not be practical for edge devices like CCTV cameras or drones Choi, Yehwan Nguyen, Duy Linh Vo, Xuan Thuy Hyun Jo, Kang Small Object Detection Without Attention for Aerial Surveillance
description	Lecture Notes in Networks and Systems (LNNS,volume 882); The 13th Conference on Information Technology and Its Applications (CITA 2024) ; pp: 372-383.
format	Working Paper
author	Choi, Yehwan Nguyen, Duy Linh Vo, Xuan Thuy Hyun Jo, Kang
author_facet	Choi, Yehwan Nguyen, Duy Linh Vo, Xuan Thuy Hyun Jo, Kang
author_sort	Choi, Yehwan
title	Small Object Detection Without Attention for Aerial Surveillance
title_short	Small Object Detection Without Attention for Aerial Surveillance
title_full	Small Object Detection Without Attention for Aerial Surveillance
title_fullStr	Small Object Detection Without Attention for Aerial Surveillance
title_full_unstemmed	Small Object Detection Without Attention for Aerial Surveillance
title_sort	small object detection without attention for aerial surveillance
publisher	Springer Nature
publishDate	2024
url	https://elib.vku.udn.vn/handle/123456789/4294 https://doi.org/10.1007/978-3-031-74127-2_31
_version_	1849200433936990208

Small Object Detection Without Attention for Aerial Surveillance

Liknande verk