Generating captions and text descriptions of images will enable visually and hearing impaired extended accessibility to the real-world, thus reducing their social isolation, and improving their well-being, employability, and education experience. This thesis presents significant advancements in algorithmic approaches for generating captions and text descriptions. These enhancements are pivotal in processing and interpreting both image and audio data. The focus on algorithmic innovation ensures that the platform is not only efficient but also adaptable to various types of visual and auditory information, making it a versatile tool for aiding those with visual impairments. The thesis has addressed this aim in three main contribution chapters, image captioning, video captioning, and audio-visual video captioning approaches. The progression of this research is methodically structured, starting with image captioning. This initial phase concentrates on developing sophisticated algorithms capable of accurately interpreting and describing still images. This foundational work sets the stage for the subsequent phase, video captioning.
Eser Adı (dc.title) | Automated Captioning of Image and Audio for Visually and Hearing Impaired |
Eser Sahibi (dc.contributor.author) | Özkan Çaylı |
Tez Danışmanı (dc.contributor.advisor) | Volkan Kılıç |
Yayıncı (dc.publisher) | İzmir Katip Çelebi Üniversitesi Fen Bilimleri Enstitüsü |
Tür (dc.type) | Yüksek Lisans |
Özet (dc.description.abstract) | Generating captions and text descriptions of images will enable visually and hearing impaired extended accessibility to the real-world, thus reducing their social isolation, and improving their well-being, employability, and education experience. This thesis presents significant advancements in algorithmic approaches for generating captions and text descriptions. These enhancements are pivotal in processing and interpreting both image and audio data. The focus on algorithmic innovation ensures that the platform is not only efficient but also adaptable to various types of visual and auditory information, making it a versatile tool for aiding those with visual impairments. The thesis has addressed this aim in three main contribution chapters, image captioning, video captioning, and audio-visual video captioning approaches. The progression of this research is methodically structured, starting with image captioning. This initial phase concentrates on developing sophisticated algorithms capable of accurately interpreting and describing still images. This foundational work sets the stage for the subsequent phase, video captioning. |
Kayıt Giriş Tarihi (dc.date.accessioned) | 2024-03-01 |
Açık Erişim Tarihi (dc.date.available) | 2024-09-01 |
Yayın Tarihi (dc.date.issued) | 2024 |
Yayın Dili (dc.language.iso) | eng |
Konu Başlıkları (dc.subject) | Image Captioning |
Konu Başlıkları (dc.subject) | Video Captioning |
Tek Biçim Adres (dc.identifier.uri) | https://hdl.handle.net/11469/3884 |