A Neural Architecture Search for Automated Multimodal Learning

作者：

Highlights：

•

摘要

The boom of artificial intelligence in the past decade is owed to the research and development of deep learning and moreover, that of accessible deep learning. But the goal of Artificial General Intelligence (AGI) cannot be achieved by having application-specific, parameter sensitive neural networks that need to be defined and tuned for every use case. General intelligence also involves understanding different types of data, rather than having dedicated models for each functionality. Thus both automating machine learning while also giving importance to generalizing over multiple modalities has great potential to help move AGI research forward.We propose a generalizable algorithm-Multimodal Neural Architecture Search (MNAS) which can work on multiple modalities and perform architecture search in order to create neural networks that enable classification on multiple types of data for multiclass outputs. The work automates the development of a fusion architecture by building upon existing literature of multimodal learning and neural architecture search. The controller network which predicts the architecture has been designed such that it works on a reward model where the reward is dependent on accuracies of individual networks corresponding to each modality involved. The work shows good results with accuracy comparable to both unimodal classification on same data and manually created multimodal architectures wherein the experiments are performed on multiclass classification problem of image and text modalities. It also uses a shared parameter search graph ensuring that the computational complexity is less compared to several other neural architecture search algorithms.

论文关键词：Multimodal learning,Neural architecture search,Deep learning,Automation

论文评审过程：Received 28 May 2020, Revised 24 May 2022, Accepted 1 July 2022, Available online 6 July 2022, Version of Record 12 July 2022.

论文官网地址：https://doi.org/10.1016/j.eswa.2022.118051