nips48

NeurIPS(NIPS) 2019 论文列表

Visually Grounded Interaction and Language (ViGIL), NeurIPS 2019 Workshop, Vancouver, Canada, December 13, 2019.

Commonsense and Semantic-Guided Navigation through Language in Embodied Environment.
Learning Language from Vision.
Cross-Modal Mapping for Generalized Zero-Shot Learning by Soft-Labeling.
A perspective on multi-agent communication for information fusion.
On Agreements in Visual Understanding.
Analyzing Compositionality in Visual Question Answering.
Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following.
Shaping Visual Representations with Language for Few-shot Classification.
Supervised Multimodal Bitransformers for Classifying Images and Text.
Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog.
Can adversarial training learn image captioning ?
Deep compositional robotic planners that follow natural language commands.
Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning.
A Simple Baseline for Visual Commonsense Reasoning.
General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping.
Modulated Self-attention Convolutional Network for VQA.
Visually Grounded Video Reasoning in Selective Attention Memory.
Recurrent Instance Segmentation using Sequences of Referring Expressions.
A Comprehensive Analysis of Semantic Compositionality in Text-to-Image Generation.
CLOSURE: Assessing Systematic Generalization of CLEVR Models.
Community size effect in artificial learning systems.
Structural and functional learning for learning language use.
Learning Question-Guided Video Representation for Multi-Turn Video Question Answering.
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence.
Visual Dialog for Radiology: Data Curation and FirstSteps.
Contextual Grounding of Natural Language Entities in Images.
Natural Language Grounded Multitask Navigation.
Induced Attention Invariance: Defending VQA Models against Adversarial Attacks.
VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering.
Situated Grounding Facilitates Multimodal Concept Learning for AI.
Hidden State Guidance: Improving Image Captioning Using an Image Conditioned Autoencoder.
Not All Actions Are Equal: Learning to Stop in Language-Grounded Urban Navigation.
Learning from Observation-Only Demonstration for Task-Oriented Language Grounding via Self-Examination.
What is needed for simple spatial language capabilities in VQA?
Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning.