iccv 2019 论文列表
2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019.
|
GLAMpoints: Greedily Learned Accurate Match Points.
Unsupervised Microvascular Image Segmentation Using an Active Contours Mimicking Neural Network.
Semantic-Transferable Weakly-Supervised Endoscopic Lesions Segmentation.
Multi-Stage Pathological Image Classification Using Semantic Segmentation.
Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples With Applications to Neuroimaging.
CAMEL: A Weakly Supervised Learning Framework for Histopathology Image Segmentation.
Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation.
HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images.
An Alarm System for Segmentation Algorithm Based on Shape Model.
Joint Acne Image Grading and Counting via Label Distribution Learning.
Align, Attend and Locate: Chest X-Ray Diagnosis via Contrast Induced Attention Network With Limited Supervision.
Dilated Convolutional Neural Networks for Sequential Manifold-Valued Data.
DUAL-GLOW: Conditional Flow-Based Generative Model for Modality Transfer.
Recursive Cascaded Networks for Unsupervised Medical Image Registration.
Learning With Unsure Data for Medical Image Diagnosis.
A Deep Cybersickness Predictor Based on Brain Signal Analysis for Virtual Reality Contents.
Scaling Recurrent Models via Orthogonal Approximations in Tensor Trains.
Very Long Natural Scenery Image Prediction by Outpainting.
Few-Shot Unsupervised Image-to-Image Translation.
Attribute Manipulation Generative Adversarial Networks for Fashion Images.
Image Synthesis From Reconfigurable Layout and Style.
Boundless: Generative Adversarial Networks for Image Extension.
VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation.
Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis.
Point-to-Point Video Generation.
LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup.
ClothFlow: A Flow-Based Model for Clothed Person Generation.
SME-Net: Sparse Motion Estimation for Parametric Video Prediction Through Reinforcement Learning.
P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo.
Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction.
Hierarchical Point-Edge Interaction Network for Point Cloud Semantic Segmentation.
U4D: Unsupervised 4D Dynamic Scene Understanding.
TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo.
VrR-VG: Refocusing Visually-Relevant Relationships.
Learning to Caption Images Through a Lifetime by Asking Questions.
Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes.
Learning Similarity Conditions Without Explicit Supervision.
Mixture-Kernel Graph Attention Network for Situation Recognition.
Compositional Video Prediction.
Occlusion-Shared and Feature-Separated Network for Occlusion Relationship Reasoning.
Modeling Inter and Intra-Class Relations in the Triplet Loss for Zero-Shot Learning.
Unpaired Image Captioning via Scene Graph Alignments.
Relation-Aware Graph Attention Network for Visual Question Answering.
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction.
Language-Conditioned Graph Networks for Relational Reasoning.
HiPPI: Higher-Order Projected Power Iterations for Scalable Multi-Matching.
A Bayesian Optimization Framework for Neural Network Compression.
Parametric Majorization for Data-Driven Energy Minimization Methods.
K-Best Transformation Synchronization.
Pareto Meets Huber: Efficiently Avoiding Poor Minima in Robust Estimation.
Convex Relaxations for Consensus and Non-Minimal Problems in 3D Vision.
Deep Tensor ADMM-Net for Snapshot Compressive Imaging.
ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal.
Physics-Based Rendering for Improving Robustness to Rain.
Deep Optics for Monocular Depth Estimation and 3D Object Detection.
Computational Hyperspectral Imaging Based on Dimension-Discriminative Low-Rank Tensor Recovery.
Flare in Interference-Based Hyperspectral Cameras.
Deep Head Pose Estimation Using Synthetic Images and Partial Adversarial Domain Adaption for Continuous Label Spaces.
Aggregation via Separation: Boosting Facial Landmark Detector With Semi-Supervised Style Translation.
Context-Aware Emotion Recognition Networks.
Camera Distance-Aware Top-Down Approach for 3D Multi-Person Pose Estimation From a Single RGB Image.
Discriminatively Learned Convex Models for Set Based Face Recognition.
Through-Wall Human Mesh Recovery Using Radio Signals.
Laplace Landmark Localization.
End-to-End Learning for Graph Decomposition.
Ego-Pose Estimation and Forecasting As Real-Time PD Control.
Detecting Photoshopped Faces by Scripting Photoshop.
Face De-Occlusion Using 3D Morphable Model and Generative Adversarial Network.
Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition.
M2FPA: A Multi-Yaw Multi-Pitch High-Quality Dataset and Benchmark for Facial Pose Analysis.
Make a Face: Towards Arbitrary High Fidelity Face Manipulation.
Learning Joint 2D-3D Representations for Depth Completion.
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark.
GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images.
Deep End-to-End Alignment and Refinement for Time-of-Flight RGB-D Module.
Escaping Plato's Cave: 3D Shape From Adversarial Rendering.
UprightNet: Geometry-Aware Camera Orientation Estimation From Single Images.
Deep Mesh Reconstruction From Single RGB Images via Topology Modification Networks.
Analyzing the Variety Loss in the Context of Probabilistic Trajectory Prediction.
Cluster Alignment With a Teacher for Unsupervised Domain Adaptation.
New Convex Relaxations for MRF Inference With Unknown Graphs.
Meta-Learning to Detect Rare Objects.
Is an Affine Constraint Needed for Affine Subspace Clustering?
Robust Variational Bayesian Point Set Registration.
LayoutVAE: Stochastic Scene Layout Generation From a Label Set.
Order-Preserving Wasserstein Discriminant Analysis.
Subspace Structure-Aware Spectral Clustering for Robust Subspace Clustering.
Invariant Information Clustering for Unsupervised Image Classification and Segmentation.
Deep Constrained Dominant Sets for Person Re-Identification.
Learning Feature-to-Feature Translator by Alternating Back-Propagation for Generative Zero-Shot Learning.
C-MIDN: Coupled Multiple Instance Detection Network With Segmentation Guidance for Weakly Supervised Object Detection.
Hierarchical Encoding of Sequential Data With Compact and Sub-Linear Storage Cost.
Re-ID Driven Localization Refinement for Person Search.
Detecting 11K Classes: Large Scale Object Detection Without Fine-Grained Bounding Boxes.
Deep Supervised Hashing With Anchor Graph.
Mesh R-CNN.
Fast Point R-CNN.
Transferable Contrastive Network for Generalized Zero-Shot Learning.
Leveraging Long-Range Temporal Relationships Between Proposals for Video Object Detection.
AutoFocus: Efficient Multi-Scale Inference.
Weakly Supervised Object Detection With Segmentation Collaboration.
Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection.
Few-Shot Learning With Global Class Representations.
Hierarchical Shot Detector.
No Fear of the Dark: Image Retrieval Under Varying Illumination Conditions.
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection.
No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques.
SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering.
RepPoints: Point Set Representation for Object Detection.
Temporal Knowledge Propagation for Image-to-Video Person Re-Identification.
Self-Critical Attention Learning for Person Re-Identification.
FCOS: Fully Convolutional One-Stage Object Detection.
Human Uncertainty Makes Classification More Robust.
POD: Practical Object Detection With Scale-Sensitive Network.
Presence-Only Geographical Priors for Fine-Grained Image Classification.
Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation.
Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning.
Contextual Attention for Hand Detection in the Wild.
Discriminative Feature Transformation for Occluded Pedestrian Detection.
Deep Meta Metric Learning.
Enriched Feature Guided Refinement Network for Object Detection.
SBSGAN: Suppression of Inter-Domain Background Shift for Person Re-Identification.
Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy.
NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection.
Cross-Domain Adaptation for Animal Pose Estimation.
Learning Trajectory Dependencies for Human Motion Prediction.
TRB: A Novel Triplet Representation for Understanding 2D Human Body.
Pose-Aware Multi-Level Feature Network for Human Object Interaction Detection.
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models.
PuppetGAN: Cross-Domain Image Manipulation by Demonstration.
S2GAN: Share Aging Factors Across Ages and Share Aging Trends Among Individuals.
Photo-Realistic Facial Details Synthesis From Single Image.
A Decoupled 3D Facial Shape Model by Adversarial Training.
3D Face Modeling From Diverse Raw Scan Data.
Semi-Supervised Monocular 3D Face Reconstruction With End-to-End Shape-Preserved Domain Transfer.
Face Video Deblurring Using 3D Facial Priors.
Live Face De-Identification in Video.
Few-Shot Adaptive Gaze Estimation.
Co-Mining: Deep Face Recognition With Noisy Labels.
Towards Interpretable Face Recognition.
Habitat: A Platform for Embodied AI Research.
Exploring the Limitations of Behavior Cloning for Autonomous Driving.
Scalable Place Recognition Under Appearance Change for Autonomous Driving.
WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving.
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences.
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection.
Deep Hough Voting for 3D Object Detection in Point Clouds.
DeepGCNs: Can GCNs Go As Deep As CNNs?
3D Instance Segmentation via Multi-Task Metric Learning.
MeteorNet: Deep Learning on Dynamic 3D Point Cloud Sequences.
Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks.
Video Object Segmentation Using Space-Time Memory Networks.
Sequence Level Semantics Aggregation for Video Object Detection.
ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors.
PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment.
Explaining Neural Networks Semantically and Quantitatively.
Multi-Class Part Parsing With Joint Boundary-Semantic Awareness.
Expectation-Maximization Attention Networks for Semantic Segmentation.
YOLACT: Real-Time Instance Segmentation.
Symmetry-Constrained Rectification Network for Scene Text Recognition.
Geometry Normalization Networks for Accurate Scene Text Detection.
Convolutional Character Networks.
Large-Scale Tag-Based Font Retrieval With Generative Feature Learning.
GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition.
Deep Floor Plan Recognition Using a Multi-Task Network With Room-Boundary-Guided Attention.
Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning.
TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting.
Free-Form Video Inpainting With 3D Gated Convolution and Temporal PatchGAN.
Tag2Pix: Line Art Colorization Using Text Tag With SECat and Changing Loss.
Personalized Fashion Design.
Photorealistic Style Transfer via Wavelet Transforms.
Towards Multi-Pose Guided Virtual Try-On Network.
Guided Image-to-Image Translation With Bi-Directional Feature Transformation.
Disentangling Propagation and Generation for Video Prediction.
On the Over-Smoothing Problem of CNN Based Disparity Estimation.
OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching.
Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras.
Learning Local RGB-to-CAD Correspondences for Object Pose Estimation.
Fully Convolutional Geometric Features.
nocaps: novel object captioning at scale.
Shapeglot: Learning Language for Shape Differentiation.
Entangled Transformer for Image Captioning.
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning.
Watch, Listen and Tell: Multi-Modal Weakly Supervised Dense Event Captioning.
Joint Optimization for Cooperative Image Captioning.
Reflective Decoding Network for Image Captioning.
DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better.
Joint Demosaicking and Denoising by Fine-Tuning of Bursts of Raw Images.
Image Inpainting With Learnable Bidirectional Attention Maps.
Optimizing the F-Measure for Threshold-Free Salient Object Detection.
Deep Learning for Light Field Saliency Detection.
Guided Super-Resolution As Pixel-to-Pixel Transformation.
Disentangled Image Matting.
Where Is My Mirror?
Two-Stream Action Recognition-Oriented Video Super-Resolution.
SID4VAM: A Benchmark Dataset With Synthetic Images for Visual Attention Modeling.
EGNet: Edge Guidance Network for Salient Object Detection.
DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals.
CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition.
Joint Embedding of 3D Scan and CAD Objects.
GarNet: A Two-Stream Network for Fast and Accurate 3D Cloth Draping.
Deep Appearance Maps.
Neural Re-Simulation for Generating Bounces in Single Images.
Learning to Paint With Model-Based Deep Reinforcement Learning.
Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition With CNNs.
Grounded Human-Object Interaction Hotspots From Video.
3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization.
HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization.
MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding.
Holistic++ Scene Understanding: Single-View 3D Holistic Scene Parsing and Human Pose Estimation With Human-Object Interaction and Physical Commonsense.
FrameNet: Learning Local Canonical Frames of 3D Surfaces From a Single RGB Image.
GraphX-Convolution for Point Cloud Deformation in 2D-to-3D Conversion.
Moving Indoor: Unsupervised Video Depth Learning in Challenging Environments.
ForkNet: Multi-Branch Volumetric Semantic Completion From a Single Depth Image.
Neural Inverse Rendering of an Indoor Scene From a Single Image.
OperatorNet: Recovering 3D Shapes From Difference Operators.
Shadow Removal via Shadow Image Decomposition.
Gravity as a Reference for Estimating a Person's Height From Video.
Hyperspectral Image Reconstruction Using Deep External and Internal Learning.
SPLINE-Net: Sparse Photometric Stereo Through Lighting Interpolation and Normal Estimation Networks.
Variational Uncalibrated Photometric Stereo Under General Lighting.
Human Attention in Image Captioning: Dataset and Analysis.
Group-Wise Deep Object Co-Segmentation With Co-Attention Recurrent Neural Network.
Miss Detection vs. False Alarm: Adversarial Learning for Small Object Segmentation in Infrared Images.
VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation.
BAE-NET: Branched Autoencoder for Shape Co-Segmentation.
CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing.
Bayesian Adaptive Superpixel Segmentation.
Collect and Select: Semantic Alignment Metric Learning for Few-Shot Learning.
Foreground-Aware Pyramid Reconstruction for Alignment-Free Occluded Person Re-Identification.
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.
Objects365: A Large-Scale, High-Quality Dataset for Object Detection.
Few-Shot Object Detection via Feature Reweighting.
AM-LFS: AutoML for Loss Function Search.
Learning to Discover Novel Visual Categories via Deep Transfer Clustering.
Ground-to-Aerial Image Geo-Localization With a Hard Exemplar Reweighting Triplet Loss.
Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting.
Towards Precise End-to-End Weakly Supervised Object Detection Network.
From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer.
ABD-Net: Attentive but Diverse Person Re-Identification.
advPattern: Physical-World Attacks on Deep Person Re-Identification via Adversarially Transformable Patterns.
Learning a Mixture of Granularity-Specific Experts for Fine-Grained Categorization.
Unsupervised Graph Association for Person Re-Identification.
Clustered Object Detection in Aerial Images.
Localization of Deep Inpainting Using High-Pass Fully Convolutional Network.
WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection.
Vehicle Re-Identification With Viewpoint-Aware Metric Learning.
Learning to Rank Proposals for Object Detection.
Conservative Wasserstein Training for Pose Estimation.
Maximum-Margin Hamming Hashing.
Cross-X Learning for Fine-Grained Visual Categorization.
SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects.
Self-Training With Progressive Augmentation for Unsupervised Cross-Domain Person Re-Identification.
Neighborhood Preserving Hashing for Scalable Video Retrieval.
GODS: Generalized One-Class Discriminative Subspaces for Anomaly Detection.
GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions.
Geometric Disentanglement for Generative Latent Shape Models.
Reciprocal Multi-Layer Subspace Learning for Multi-View Clustering.
Unsupervised Multi-Task Feature Learning on Point Clouds.
Deep Comprehensive Correlation Mining for Image Clustering.
Composite Shape Modeling via Latent Space Factorization.
AVT: Unsupervised Learning of Transformation Equivariant Representations by Autoencoding Variational Transformations.
EMPNet: Neural Localisation and Mapping Using Embedded Memory Points.
Learning Across Tasks and Domains.
Cross-View Policy Learning for Street Navigation.
Recover and Identify: A Generative Dual Model for Cross-Resolution Person Re-Identification.
A Novel Unsupervised Camera-Aware Domain Adaptation Framework for Person Re-Identification.
FDA: Feature Disruptive Attack.
Boosting Few-Shot Visual Learning With Self-Supervision.
Semi-Supervised Domain Adaptation via Minimax Entropy.
Discriminative Feature Learning With Consistent Attention Regularization for Person Re-Identification.
Bilinear Attention Networks for Person Retrieval.
AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation.
Self-Supervised Representation Learning via Neighborhood-Relational Encoding.
MIC: Mining Interclass Characteristics for Improved Metric Learning.
Attract or Distract: Exploit the Margin of Open Set.
Addressing Model Vulnerability to Distributional Shifts Over Image Transformation Sets.
On the Global Optima of Kernelized Adversarial Representation Learning.
Sampling Wisely: Deep Image Embedding by Top-K Precision Optimization.
Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking.
ELF: Embedded Localisation of Features in Pre-Trained CNN.
A Learned Representation for Scalable Vector Graphics.
Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation.
Asynchronous Single-Photon 3D Imaging.
Agile Depth Sensing Using Triangulation Light Curtains.
Convolutional Approximations to the General Non-Line-of-Sight Imaging Operator.
Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference.
Unconstrained Motion Deblurring for Dual-Lens Cameras.
Towards Photorealistic Reconstruction of Highly Multiplexed Lensless Images.
Learning Perspective Undistortion of Portraits.
Restoration of Non-Rigidly Distorted Underwater Images Using a Combination of Compressive Sensing and Local Polynomial Image Representations.
Surface Normals and Shape From Water.
GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition.
View-Consistent 4D Light Field Superpixel Segmentation.
Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion.
View Independent Generative Adversarial Network for Novel View Synthesis.
Extreme View Synthesis.
Not All Parts Are Created Equal: 3D Pose Estimation by Modeling Bi-Directional Dependencies of Body Parts.
DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare.
A Neural Network for Detailed Human Depth Estimation From a Single Image.
DeepHuman: 3D Human Reconstruction From a Single Image.
xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera.
Learnable Triangulation of Human Pose.
Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning.
Learning to Reconstruct 3D Manhattan Wireframes From a Single Image.
C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion.
CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation.
Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation.
RIO: 3D Object Instance Re-Localization in Changing Indoor Environments.
Transformable Bottleneck Networks.
Domain-Adaptive Single-View 3D Reconstruction.
Learning Single Camera Depth Estimation Using Dual-Pixels.
Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery.
Improved Conditional VRNNs for Video Prediction.
Unpaired Image-to-Speech Synthesis With Multimodal Information Bottleneck.
HoloGAN: Unsupervised Learning of 3D Representations From Natural Images.
View-LSTM: Novel-View Video Synthesis Through View Decomposition.
Dual Adversarial Inference for Text-to-Image Synthesis.
Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints.
Dynamic Points Agglomeration for Hierarchical Point Sets Learning.
Semantic Part Detection via Matching: Learning to Generalize to Novel Viewpoints From Limited Training Data.
Expert Sample Consensus Applied to Camera Re-Localization.
View N-Gram Network for 3D Object Retrieval.
Learning Relationships for Multi-View 3D Object Recognition.
Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos.
Semantic Stereo Matching With Pyramid Cost Volumes.
Language Features Matter: Effective Language Representations for Vision-Language Tasks.
VideoBERT: A Joint Model for Video and Language Representation Learning.
See-Through-Text Grouping for Referring Image Segmentation.
U-CAM: Visual Explanation Using Uncertainty Based Class Activation Maps.
Seq-SG2SL: Inferring Semantic Layout From Scene Graph Through Sequence to Sequence Learning.
ViCo: Word Embeddings From Visual Co-Occurrences.
Towards Unsupervised Image Captioning With Shared Multimodal Embeddings.
Transferable Representation Learning in Vision-and-Language Navigation.
SkyScapes - Fine-Grained Semantic Understanding of Aerial Scenes.
SceneGraphNet: Neural Message Passing for 3D Indoor Scene Augmentation.
Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation.
DADA: Depth-Aware Domain Adaptation in Semantic Segmentation.
AdaptIS: Adaptive Instance Selection Network.
What Synthesis Is Missing: Depth Adaptation Integrated With Weak Supervision for Indoor Scene Parsing.
SegSort: Segmentation by Discriminative Sorting of Segments.
Learning to See Moving Objects in the Dark.
GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing.
RainFlow: Optical Flow Under Rain Streaks and Rain Veiling Effect.
Joint Learning of Semantic Alignment and Object Landmark Detection.
Semi-Supervised Video Salient Object Detection Using Pseudo-Labels.
Motion Guided Attention for Video Salient Object Detection.
Stacked Cross Refinement Network for Edge-Aware Salient Object Detection.
Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection.
Event-Based Motion Segmentation by Motion Compensation.
Towards High-Resolution Salient Object Detection.
Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation.
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation.
PU-GAN: A Point Cloud Upsampling Adversarial Network.
Deep Single-Image Portrait Relighting.
FSGAN: Subject Agnostic Face Swapping and Reenactment.
Deep Parametric Indoor Lighting Estimation.
CompenNet++: End-to-End Full Projector Compensation.
Learning Shape Templates With Structured Implicit Functions.
Structured Prediction Helps 3D Human Motion Modelling.
Human Motion Prediction via Spatio-Temporal Inpainting.
Imitation Learning for Human Pose Prediction.
Predicting 3D Human Dynamics From Video.
Fast Object Detection in Compressed Video.
Graph Convolutional Networks for Temporal Action Localization.
TSM: Temporal Shift Module for Efficient Video Understanding.
Learning Temporal Action Proposals With Fewer Labels.
Self-Supervised Learning With Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera.
Self-Supervised Moving Vehicle Tracking With Stereo Sound.
Non-Local ConvLSTM for Video Compression Artifact Reduction.
Video Compression With Rate-Distortion Autoencoders.
Relation Distillation Networks for Video Object Detection.
Identity From Here, Pose From There: Self-Supervised Disentanglement and Generation of Objects Using Unlabeled Videos.
Spatiotemporal Feature Residual Propagation for Action Prediction.
Face Alignment With Kernel Density Deep Neural Network.
Single-Network Whole-Body Pose Estimation.
Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression.
SO-HandNet: Self-Organizing Network for 3D Hand Pose Estimation With Semi-Supervised Learning.
Single-Stage Multi-Person Pose Machines.
Dynamic Kernel Distillation for Efficient Pose Estimation in Videos.
Photo-Realistic Monocular Gaze Redirection Using Generative Adversarial Networks.
Unsupervised Person Re-Identification by Camera-Aware Similarity Consistency Learning.
Gaze360: Physically Unconstrained Gaze Estimation in the Wild.
Probabilistic Face Embeddings.
DeCaFA: Deep Convolutional Cascade for Face Alignment in the Wild.
Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition.
Unsupervised High-Resolution Depth Learning From Videos With Dual Networks.
MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation.
Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving.
Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data.
Self-Ensembling With GAN-Based Data Augmentation for Domain Adaptation in Semantic Segmentation.
Boundary-Aware Feature Propagation for Scene Segmentation.
Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation.
ACFNet: Attentional Class Feature Network for Semantic Segmentation.
Relational Attention Network for Crowd Counting.
Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation.
SparseMask: Differentiable Connectivity Learning for Dense Image Prediction.
Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach.
Adaptive Context Network for Scene Parsing.
MVP Matching: A Maximum-Value Perfect Matching for Mining Hard Samples, With Application to Person Re-Identification.
Dual Student: Breaking the Limits of the Teacher in Semi-Supervised Learning.
ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices.
CIIDefence: Defeating Adversarial Attacks by Fusing Class-Specific Image Inpainting and Image Denoising.
Attribute Attention for Semantic Disambiguation in Zero-Shot Learning.
An Empirical Study of Spatial Attention Mechanisms in Deep Networks.
Object Guided External Memory Network for Video Object Detection.
Multi-Adversarial Faster-RCNN for Unrestricted Object Detection.
PARN: Position-Aware Relation Networks for Few-Shot Learning.
Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification.
View Confusion Feature Learning for Person Re-Identification.
Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks.
Incremental Learning Using Conditional Adversarial Networks.
Dynamic Anchor Feature Selection for Single-Shot Object Detection.
Selective Sparse Sampling for Fine-Grained Image Recognition.
DANet: Divergent Activation for Weakly Supervised Object Localization.
Online Hyper-Parameter Learning for Auto-Augmentation Strategy.
CenterNet: Keypoint Triplets for Object Detection.
Simultaneous Multi-View Instance Detection With Learned Geometric Soft-Constraints.
Adversarial Learning With Margin-Based Triplet Embedding Regularization.
Metric Learning With HORDE: High-Order Regularizer for Deep Embeddings.
Deep Elastic Networks With Model Selection for Multi-Task Learning.
Symmetric Graph Convolutional Autoencoder for Unsupervised Graph Representation Learning.
Fast and Practical Neural Architecture Search.
Normalized Wasserstein for Mixture Distributions With Applications in Adversarial Learning and Domain Adaptation.
Deep Metric Learning With Tuplet Margin Loss.
AttPool: Towards Hierarchical Feature Representation in Graph Convolutional Networks via Attention Mechanism.
Gaussian Affinity for Max-Margin Class Imbalanced Learning.
A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision.
SoftTriple Loss: Deep Metric Learning Without Triplet Sampling.
Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding.
Task2Vec: Task Embedding for Meta-Learning.
Neural Inter-Frame Compression for Video Coding.
KPConv: Flexible and Deformable Convolution for Point Clouds.
Learning an Effective Equivariant 3D Descriptor Without Supervision.
Scaling and Benchmarking Self-Supervised Visual Representation Learning.
Spectral Regularization for Combating Mode Collapse in GANs.
Learning Compositional Representations for Few-Shot Recognition.
Unsupervised Learning of Landmarks by Descriptor Vector Exchange.
ViSiL: Fine-Grained Spatio-Temporal Video Similarity Learning.
Unsupervised Procedure Learning via Joint Dynamic Summarization.
Action Assessment by Joint Relation Graphs.
Temporal Attentive Alignment for Large-Scale Video Domain Adaptation.
Non-Local Recurrent Neural Memory for Supervised Sequence Modeling.
Uncertainty-Aware Audiovisual Activity Recognition Using Deep Bayesian Variational Inference.
Dual Attention Matching for Audio-Visual Event Localization.
Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection.
STGAT: Modeling Spatial-Temporal Interactions for Human Trajectory Prediction.
PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction.
What Would You Expect? Anticipating Egocentric Actions With Rolling-Unrolling LSTMs and Modality Attention.
Weakly Supervised Energy-Based Learning for Action Segmentation.
SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition.
Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition.
Generative Multi-View Human Action Recognition.
SlowFast Networks for Video Recognition.
DynamoNet: Dynamic Action and Motion Network.
Learning Discriminative Model Prediction for Tracking.
FAMNet: Joint Learning of Feature, Affinity and Multi-Dimensional Assignment for Online Multiple Object Tracking.
GradNet: Gradient-Guided Network for Visual Object Tracking.
Learning Spatial Awareness to Improve Crowd Counting.
Bayesian Loss for Crowd Count Estimation With Point Supervision.
A Dual-Path Model With Adaptive Attention for Vehicle Re-Identification.
Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification.
Self-Similarity Grouping: A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification.
Memory-Based Neighbourhood Embedding for Visual Recognition.
Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection.
Transductive Learning for Zero-Shot Object Detection.
Generative Modeling for Small-Data Object Detection.
Object-Aware Instance Labeling for Weakly Supervised Object Detection.
Scale-Aware Trident Networks for Object Detection.
Scaling Object Detection by Transferring Classification Weights.
Towards Interpretable Object Detection by Unfolding Latent Structures.
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features.
PR Product: A Substitute for Inner Product in Neural Networks.
Local Aggregation for Unsupervised Learning of Visual Embeddings.
Anchor Loss: Modulating Loss Scale Based on Prediction Difficulty.
Confidence Regularized Self-Training.
Variational Adversarial Active Learning.
Progressive Reconstruction of Visual Structure for Image Inpainting.
A Closed-Form Solution to Universal Style Transfer.
Multimodal Style Transfer via Graph Cuts.
Everybody Dance Now.
Attribute-Driven Spontaneous Motion in Unpaired Image Translation.
RelGAN: Multi-Domain Image-to-Image Translation via Relative Attributes.
Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis.
Algebraic Characterization of Essential Matrices and Their Averaging in Multiview Settings.
Efficient and Robust Registration on the 3D Special Euclidean Group.
ClusterSLAM: A SLAM Backend for Simultaneous Rigid Body Clustering and Motion Estimation.
EM-Fusion: Dynamic Object-Level SLAM With Probabilistic Data Association.
Learning Meshes for Dense Visual SLAM.
Learning Two-View Correspondences and Geometry Using Order-Aware Network.
Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters.
Multi-Modality Latent Interaction Network for Visual Question Answering.
Adversarial Representation Learning for Text-to-Image Matching.
Language-Agnostic Visual-Semantic Embeddings.
Generating Easy-to-Understand Referring Expressions for Target Identifications.
Creativity Inspired Zero-Shot Learning.
ACMM: Aligned Cross-Modal Memory for Few-Shot Image and Sentence Matching.
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval.
Saliency-Guided Attention Network for Image-Sentence Matching.
GANalyze: Toward Visual Definitions of Cognitive Image Properties.
Controllable Attention for Structured Layered Video Decomposition.
Understanding Human Gaze Communication by Spatio-Temporal Graph Reasoning.
Attentional Neural Fields for Crowd Counting.
Learning Compositional Neural Information Fusion for Human Parsing.
Deep Contextual Attention for Human-Object Interaction Detection.
Enforcing Geometric Constraints of Virtual Normal for Depth Prediction.
Floorplan-Jigsaw: Jointly Estimating Scene Layout and Aligning Partial Scans.
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera.
Perceptual Deep Depth Super-Resolution.
ERL-Net: Entangled Representation Learning for Single Image De-Raining.
End-to-End Learning of Representations for Asynchronous Event-Based Data.
Learning Filter Basis for Convolutional Neural Network Compression.
Scoot: A Perceptual Metric for Facial Sketches.
GAN-Based Projector for Faster Recovery With Convergence Guarantees in Linear Inverse Problems.
Solving Vision Problems via Filtering.
Fast Video Object Segmentation via Dynamic Targeting Network.
Human-Aware Motion Deblurring.
Predicting the Future: A Jointly Learnt Model for Action Anticipation.
Video Classification With Channel-Separated Convolutional Networks.
StartNet: Online Detection of Action Start in Untrimmed Videos.
Temporal Recurrent Networks for Online Action Detection.
Temporal Structure Mining for Weakly Supervised Action Detection.
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition.
Weakly-Supervised Action Localization With Background Modeling.
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition.
Action Recognition With Spatial-Temporal Discriminative Filter Banks.
Attentional Feature-Pair Relation Networks for Accurate Face Recognition.
FAB: A Robust Facial Landmark Detection Framework for Motion-Blurred Videos.
Person-in-WiFi: Fine-Grained Person Perception Using WiFi.
AMASS: Archive of Motion Capture As Surface Shapes.
Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds.
Multi-Garment Net: Learning to Dress 3D People From Images.
PointAE: Point Auto-Encoder for 3D Statistical Shape and Texture Modelling.
Fingerspelling Recognition in the Wild With Iterative Visual Attention.
Joint Monocular 3D Vehicle Detection and Tracking.
Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics.
Object-Driven Multi-Layer Scene Decomposition From a Single Image.
Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture From Images "In the Wild".
Human Mesh Recovery From Monocular Images via a Skeleton-Disentangled Representation.
Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild.
FACSIMILE: Fast and Accurate Scans From an Image in Less Than a Second.
Teacher Guided Architecture Search.
Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations.
Probabilistic Deep Ordinal Regression Based on Gaussian Processes.
Block Annotation: Better Image Annotation With Sub-Image Decomposition.
SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval.
Accelerate Learning of Deep Hashing With Gradient Attention.
Universal Semi-Supervised Semantic Segmentation.
AMP: Adaptive Masked Proxies for Few-Shot Segmentation.
DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing.
Gated-SCNN: Gated Shape CNNs for Semantic Segmentation.
SPGNet: Semantic Prediction Guidance for Scene Parsing.
Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation.
Attention Bridging Network for Knowledge Transfer.
Video Instance Segmentation.
IMP: Instance Mask Projection for High Accuracy Semantic Segmentation of Things.
Explicit Shape Encoding for Real-Time Instance Segmentation.
Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once.
DSConv: Efficient Convolution Operator.
Deep Self-Learning From Noisy Labels.
Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection.
Learning to Find Common Objects Across Few Image Collections.
Learning With Average Precision: Training Image Retrieval With a Listwise Loss.
Minimum Delay Object Detection From Video.
Deep Graphical Feature Learning for the Feature Matching Problem.
A Deep Step Pattern Representation for Multimodal Retinal Image Registration.
SILCO: Show a Few Images, Localize the Common Object.
Semi-Supervised Pedestrian Instance Synthesis and Detection With Mutual Reinforcement.
Fashion++: Minimal Edits for Outfit Improvement.
Targeted Mismatch Adversarial Attack: Query With a Flower to Retrieve the Tower.
Video Face Clustering With Unknown Number of Clusters.
Dynamic Curriculum Learning for Imbalanced Data Classification.
Correlation Congruence for Knowledge Distillation.
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization.
Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition.
Spectral Feature Transformation for Person Re-Identification.
Mask-Guided Attention Network for Occluded Pedestrian Detection.
Guessing Smart: Biased Sampling for Efficient Black-Box Adversarial Attacks.
XRAI: Better Attributions Through Regions.
Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks.
Defending Against Universal Perturbations With Shared Adversarial Training.
Rethinking ImageNet Pre-Training.
Bayesian Optimized 1-Bit CNNs.
Universal Perturbation Attack Against Image Retrieval.
A Geometry-Inspired Decision-Based Attack.
Improving Adversarial Robustness via Guided Complement Entropy.
Proximal Mean-Field for Neural Network Quantization.
The LogBarrier Adversarial Attack: Making Effective Use of Decision Boundary Information.
Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks.
Scalable Verified Training for Provably Robust Image Classification.
Wasserstein GAN With Quadratic Transport Cost.
Physical Adversarial Textures That Fool Visual Object Tracking.
Better and Faster: Exponential Loss for Image Patch Matching.
Sym-Parameterized Dynamic Inference for Mixed-Domain Image Translation.
On the Efficacy of Knowledge Distillation.
Hilbert-Based Generative Defense for Adversarial Examples.
Semantic Adversarial Attacks: Parametric Transformations That Fool Deep Classifiers.
Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning.
A Tour of Convolutional Networks Guided by Linear Interpreters.
Implicit Surface Representations As Layers in Neural Networks.
Enhancing Adversarial Example Transferability With an Intermediate Level Attack.
Sparse and Imperceivable Adversarial Attacks.
What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.
Towards Unconstrained End-to-End Text Spotting.
Zero-Shot Grounding of Objects From Natural Language Queries.
A Fast and Accurate One-Stage Approach to Visual Grounding.
Learning to Assemble Neural Module Tree Networks for Visual Grounding.
Phrase Localization Without Paired Training Examples.
Visual Semantic Reasoning for Image-Text Matching.
Dynamic Graph Attention for Referring Expression Comprehension.
Attention on Attention for Image Captioning.
Robust Change Captioning.
Counterfactual Critic Multi-Agent Training for Scene Graph Generation.
From Strings to Things: Knowledge-Enabled VQA Model That Can Read and Reason.
A Graph-Based Framework to Bridge Movies and Synopses.
VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.
SinGAN: Learning a Generative Model From a Single Natural Image.
Specifying Object Attributes and Relations in Interactive Scene Generation.
Meta-Sim: Learning to Generate Synthetic Datasets.
PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows.
Texture Fields: Learning Texture Representations in Function Space.
Neural Turtle Graphics for Modeling City Road Layouts.
COCO-GAN: Generation by Parts via Conditional Coordinating.
Seeing What a GAN Cannot Generate.
InGAN: Capturing and Retargeting the "DNA" of a Natural Image.
FiNet: Compatible and Diverse Fashion Image Inpainting.
Free-Form Image Inpainting With Gated Convolution.
Learning Implicit Generative Models by Matching Perceptual Features.
Understanding Generalized Whitening and Coloring Transform for Universal Style Transfer.
Controllable Artistic Text Style Transfer via Shape-Matching GAN.
Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
Content and Style Disentanglement for Artistic Style Transfer.
Copy-and-Paste Networks for Deep Video Inpainting.
Onion-Peel Networks for Deep Video Completion.
Convolutional Sequence Generation for Skeleton-Based Action Synthesis.
DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch.
Is This the Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization.
Monocular Piecewise Depth Estimation in Dynamic Scenes by Exploiting Superpixel Relations.
Shape-Aware Human Pose and Shape Reconstruction Using Multi-View Images.
Cross View Fusion for 3D Human Pose Estimation.
Efficient Learning on Point Clouds With Basis Point Sets.
Neural-Guided RANSAC: Learning Where to Sample Model Hypotheses.
MVSCRF: Learning Multi-View Stereo With Conditional Random Fields.
Unsupervised Collaborative Learning of Keyframe Detection and Visual Odometry Towards Monocular Deep SLAM.
Scene Text Visual Question Answering.
G3raphGround: Graph-Based Language Grounding.
Why Does a Visual Question Have Different Answers?
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning.
Learning to Collocate Neural Modules for Image Captioning.
Generating Diverse and Descriptive Image Captions Using Visual Paraphrases.
Towards Bridging Semantic Gap to Improve Semantic Segmentation.
Diverse Image Synthesis From Semantic Layouts via Conditional IMLE.
SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion.
Counting With Focus for Free.
Fast Image Restoration With Multi-Bin Trainable Linear Units.
Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution.
Coherent Semantic Attention for Image Inpainting.
Fully Convolutional Pixel Adaptive Image Denoiser.
Deep Blind Hyperspectral Image Fusion.
CFSNet: Toward a Controllable Feature Space for Image Restoration.
Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation.
Deep Restoration of Vintage Photographs From Scanned Halftone Prints.
Enhancing Low Light Videos by Exploring High Sensitivity Camera Noise.
Multi-View Image Fusion.
Monocular Neural Image Based Rendering With Continuous View Control.
A Dataset of Multi-Illumination Images in the Wild.
Deep Depth From Aberration Map.
lambda-Net: Reconstruct Hyperspectral Images From a Snapshot Measurement.
Micro-Baseline Structured Light.
Calibration of Axial Fisheye Cameras Through Generic Virtual Central Models.
Program-Guided Image Manipulators.
Fast-deepKCF Without Boundary Effect.
Learning the Model Update for Siamese Trackers.
Bridging the Gap Between Detection and Tracking: A Unified Approach.
Spatial-Temporal Relation Networks for Multi-Object Tracking.
RANet: Ranking Attention Network for Fast Video Object Segmentation.
AdvIT: Adversarial Frames Identifier Based on Temporal Consistency in Videos.
Global-Local Temporal Representations for Video Person Re-Identification.
AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation.
Asymmetric Cross-Guided Attention Network for Actor and Action Video Segmentation From Natural Language Query.
DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation.
Reasoning About Human-Object Interactions Through Dual Attention Networks.
Progressive Sparse Local Attention for Video Object Detection.
Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks.
BMN: Boundary-Matching Network for Temporal Action Proposal Generation.
Co-Separating Sounds of Visual Objects.
Visualization of Convolutional Neural Networks for Monocular Depth Estimation.
3D Point Cloud Generative Adversarial Network Based on Tree Structured Graph Convolutions.
Unsupervised 3D Reconstruction Networks.
Learning Object-Specific Distance From a Monocular Image.
Digging Into Self-Supervised Monocular Depth Estimation.
Few-Shot Generalization for Single-Image 3D Reconstruction via Priors.
Online Unsupervised Learning of the 3D Kinematic Structure of Arbitrary Rigid Bodies.
Selectivity or Invariance: Boundary-Aware Salient Object Detection.
Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection.
Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm.
Fast Computation of Content-Sensitive Superpixels and Supervoxels Using Q-Distances.
Second-Order Non-Local Attention Networks for Person Re-Identification.
Auto-ReID: Searching for a Part-Aware ConvNet for Person Re-Identification.
Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition.
Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval.
Diversity With Cooperation: Ensemble Methods for Few-Shot Classification.
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation.
Omni-Scale Feature Learning for Person Re-Identification.
Batch DropBlock Network for Person Re-Identification and Beyond.
One-Shot Neural Architecture Search via Self-Evaluated Template Network.
Active Learning for Deep Detection Neural Networks.
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval.
Person Search by Text Attribute Query As Zero-Shot Learning.
Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification.
EvalNorm: Estimating Batch Normalization Statistics for Evaluation.
RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment.
Deep Multiple-Attribute-Perceived Network for Real-World Texture Recognition.
Transductive Episodic-Wise Adaptive Metric for Few-Shot Learning.
Task-Driven Modular Networks for Zero-Shot Compositional Learning.
Rethinking Zero-Shot Learning: A Conditional Visual Classification Perspective.
Online Model Distillation for Efficient Video Inference.
Dynamic Multi-Scale Filters for Semantic Segmentation.
HarDNet: A Low Memory Traffic Network.
Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks.
Orientation-Aware Semantic Segmentation on Icosahedron Spheres.
Deep Closest Point: Learning Representations for Point Cloud Registration.
Data-Free Learning of Student Networks.
Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation.
Approximated Bilinear Modules for Temporal Modeling.
Deep Residual Learning in the JPEG Transform Domain.
DiscoNet: Shapes Learning on Disconnected Manifolds for 3D Editing.
Local Relation Networks for Image Recognition.
Learned Video Compression.
Domain Intersection and Domain Difference.
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution.
AttentionRNN: A Structured Spatial Attention Mechanism.
Patchwork: A Patch-Wise Attention Network for Efficient Object Detection and Segmentation in Video Streams.
Information Entropy Based Feature Pooling for Convolutional Neural Networks.
Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features.
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks.
Conditional Coupled Generative Adversarial Networks for Zero-Shot Domain Adaptation.
Global Feature Guided Local Pooling.
LIP: Local Importance-Based Pooling.
Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation.
Continual Learning by Asymmetric Loss Approximation With Single-Side Overestimation.
O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks.
HBONet: Harmonious Bottleneck on Two Orthogonal Dimensions.
Accelerate CNN via Recursive Bayesian Pruning.
MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning.
Attention Augmented Convolutional Networks.
LAP-Net: Level-Aware Progressive Network for Image Dehazing.
Indices Matter: Learning to Index for Deep Image Matting.
Controlling Neural Networks via Energy Dissipation.
Self-Supervised Representation Learning From Multi-Domain Data.
Co-Evolutionary Compression for Unpaired Image Translation.
AutoGAN: Neural Architecture Search for Generative Adversarial Networks.
Dynamic-Net: Tuning the Objective Without Re-Training for Synthesis Tasks.
Adversarial Feedback Loop.
SENSE: A Shared Encoder Network for Scene-Flow Estimation.
Seeing Motion in the Dark.
Bottleneck Potentials in Markov Random Fields.
Noise Flow: Noise Modeling With Conditional Normalizing Flows.
Real Image Denoising With Feature Attention.
Variable Rate Deep Image Compression With a Conditional Autoencoder.
DSIC: Deep Stereo Image Compression.
Dynamic PET Image Reconstruction Using Nonnegative Matrix Factorization Incorporated With Deep Image Prior.
Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications.
Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations.
RankSRGAN: Generative Adversarial Networks With Ranker for Image Super-Resolution.
Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model.
Wavelet Domain Style Transfer for an Effective Perception-Distortion Tradeoff in Single Image Super-Resolution.
Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid.
Learning Combinatorial Embedding Networks for Deep Graph Matching.
Siamese Networks: The Tale of Two Manifolds.
Unsupervised Neural Quantization for Compressed-Domain Similarity Search.
Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval.
AFD-Net: Aggregated Feature Difference Learning for Cross-Spectral Image Patch Matching.
CARAFE: Content-Aware ReAssembly of FEatures.
AdaTransform: Adaptive Data Transformation.
Linearized Multi-Sampling for Differentiable Image Transformation.
Bayes-Factor-VAE: Hierarchical Bayesian Deep Auto-Encoder Models for Factor Disentanglement.
Learning Local Descriptors With a CDF-Based Dynamic Soft Margin.
Unsupervised Pre-Training of Image Features on Non-Curated Data.
Understanding Deep Networks via Extremal Perturbations and Smooth Masks.
Universal Adversarial Perturbation via Prior Driven Uncertainty Approximation.
Sampling-Free Epistemic Uncertainty Estimation Using Approximated Variance Propagation.
3D-LaneNet: End-to-End 3D Multiple Lane Detection.
DAGMapper: Learning to Map by Discovering Lane Topology.
6-DOF GraspNet: Variational Grasp Generation for Object Manipulation.
Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking.
Situational Fusion of Visual Representation for Visual Navigation.
CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization.
TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts.
Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry.
Local Supports Global: Deep Camera Relocalization With Sequence Enhancement.
LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis.
PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings.
Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints.
Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles.
Prior Guided Dropout for Robust Visual Localization in Dynamic Environments.
Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes.
Bayesian Relational Memory for Semantic Visual Navigation.
Lifelong GAN: Continual Learning for Conditional Image Generation.
Image Generation From Small Datasets via Batch Statistics Adaptation.
Adversarial Defense via Learning to Generate Diverse Attacks.
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement.
An Internal Learning Approach to Video Inpainting.
SROBB: Targeted Perceptual Loss for Single Image Super-Resolution.
Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis.
Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images.
Closed-Form Optimal Two-View Triangulation Based on Angular Errors.
Polarimetric Relative Pose Estimation.
Floor-SP: Inverse CAD for Floorplans by Sequential Room-Wise Shortest Path.
Multi-View Stereo by Temporal Nonparametric Fusion.
Controllable Video Captioning With POS Sequence Guidance Based on Gated Fusion Network.
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.
Hierarchy Parsing for Image Captioning.
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding.
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment.
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded.
Scene Graph Prediction With Limited Labels.
Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization.
Making History Matter: History-Advantage Sequence Training for Visual Dialog.
End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans.
Rescan: Inductive Instance Segmentation for Indoor RGBD Scans.
VideoMem: Constructing, Analyzing, Predicting Short-Term and Long-Term Video Memorability.
Non-Local Intrinsic Decomposition With Near-Infrared Priors.
Self-Guided Network for Fast Image Denoising.
JPEG Artifacts Reduction via Deep Convolutional Sparse Coding.
Learning Deep Priors for Image Dehazing.
Spatio-Temporal Filter Adaptive Network for Video Deblurring.
Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data.
Deep Learning for Seeing Through Window With Raindrops.
Deep Multi-Model Fusion for Single-Image Dehazing.
Learning to Jointly Generate and Separate Reflections.
Kernel Modeling Super-Resolution on Real Low-Resolution Images.
Mop Moiré Patterns Using MopNet.
Pro-Cam SSfM: Projector-Camera System for Structure and Spectral Reflectance From Motion.
Attacking Optical Flow.
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection.
'Skimming-Perusal' Tracking: A Framework for Real-Time and Robust Long-Term Tracking.
The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs.
Robust Multi-Modality Multi-Object Tracking.
End-to-End Hand Mesh Recovery From a Monocular RGB Image.
HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation.
Aligning Latent Spaces for 3D Hand Pose Estimation.
Monocular 3D Human Pose Estimation by Generation and Ordinal Ranking.
DF2Net: A Dense-Fine-Finer Network for Detailed 3D Face Reconstruction.
PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization.
Tex2Shape: Detailed Full Human Body Geometry From a Single Image.
Resolving 3D Human Pose Ambiguities With 3D Scene Constraints.
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks.
Optimizing Network Structure for 3D Human Pose Estimation.
Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop.
3DPeople: Modeling the Geometry of Dressed Humans.
Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images.
GP2C: Geometric Projection Parameter Consensus for Joint 3D Pose and Focal Length Estimation in the Wild.
3D-RelNet: Joint Object and Relational Network for 3D Prediction.
Canonical Surface Mapping via Geometric Cycle Consistency.
On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos.
How Do Neural Networks See Depth in Single Images?
3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers.
Self-Supervised Monocular Depth Hints.
Detecting the Unexpected via Image Resynthesis.
Recurrent U-Net for Resource-Constrained Segmentation.
Efficient Segmentation: Learning Downsampling Near Semantic Boundaries.
ACE: Adapting to Changing Environments for Semantic Segmentation.
Semi-Supervised Skin Detection by Network With Mutual Guidance.
Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data.
Domain Adaptation for Semantic Segmentation With Maximum Squares Loss.
Accelerated Gravitational Point Set Alignment With Altered Physical Laws.
Integral Object Mining via Online Attention Accumulation.
TensorMask: A Foundation for Dense Object Segmentation.
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition.
Embodied Amodal Recognition: Learning to Move to Perceive Objects.
Unconstrained Foreground Object Search.
Fooling Network Interpretation in Image Classification.
Dynamic Context Correspondence Network for Semantic Alignment.
STM: SpatioTemporal and Motion Encoding for Action Recognition.
Disentangling Monocular 3D Object Detection.
Detecting Unseen Visual Relations Using Analogies.
Learning Rich Features at High-Speed for Single-Shot Object Detection.
DUP-Net: Denoiser and Upsampler Network for 3D Adversarial Point Clouds Defense.
STD: Sparse-to-Dense 3D Object Detector for Point Cloud.
DPOD: 6D Pose Object Detector and Refiner.
Transferable Semi-Supervised 3D Object Detection From RGB-D Data.
A Comprehensive Overhaul of Feature Distillation.
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks.
Resource Constrained Neural Network Architecture Search: Will a Submodularity Assumption Help?
Improved Techniques for Training Adaptive Deep Networks.
On Network Design Spaces for Visual Recognition.
Adaptative Inference Cost With Convolutional Neural Mixture Models.
Switchable Whitening for Deep Representation Learning.
SRM: A Style-Based Recalibration Module for Convolutional Neural Networks.
Batch Weight for Domain Adaptation With Mass Shift.
Differentiable Kernel Evolution.
Deep Meta Functionals for Shape Representation.
AutoDispNet: Improving Disparity Estimation With AutoML.
Universally Slimmable Networks and Improved Training Techniques.
Evolving Space-Time Neural Architectures for Videos.
Bidirectional One-Shot Unsupervised Domain Mapping.
Crowd Counting With Deep Structured Scale Integration Network.
Order-Aware Generative Modeling Using the 3D-Craft Dataset.
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style.
SC-FEGAN: Face Editing Generative Adversarial Network With User's Sketch and Color.
The Sound of Motions.
Exploiting Temporal Consistency for Real-Time Video Depth Estimation.
Topological Map Extraction From Overhead Images.
Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection.
Generative Adversarial Minority Oversampling.
Variational Few-Shot Learning.
PLMP - Point-Line Minimal Problems in Complete Multi-View Visibility.
A Quaternion-Based Certifiably Optimal Solution to the Wahba Problem With Outliers.
An Efficient Solution to the Homography-Based Relative Pose Problem With a Common Reference Direction.
Quasi-Globally Optimal and Efficient Vanishing Point Estimation in Manhattan World.
Consensus Maximization Tree Search Revisited.
Linearly Converging Quasi Branch and Bound Algorithms for Global Rigid Registration.
Unsupervised Deep Learning for Structured Shape Matching.
ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics.
PointCloud Saliency Maps.
Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data.
Interpolated Convolutional Networks for 3D Point Cloud Understanding.
Equivariant Multi-View Networks.
Deep Non-Rigid Structure From Motion.
Discrete Laplace Operator Estimation for Dynamic 3D Reconstruction.
Point-Based Multi-View Stereo Network.
Learning an Event Sequence Embedding for Dense Event-Based Deep Stereo.
X-Section: Cross-Section Prediction for Enhanced RGB-D Fusion.
Gated2Depth: Real-Time Dense Lidar From Gated Images.
Calibration Wizard: A Guidance System for Camera Calibration Based on Modelling Geometric and Corner Uncertainty.
Privacy Preserving Image Queries for Camera Localization.
S4L: Self-Supervised Semi-Supervised Learning.
Semi-Supervised Learning by Augmented Distribution Alignment.
Domain Adaptation for Structured Output via Discriminative Patch Representations.
Episodic Training for Domain Generalization.
UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation.
Larger Norm More Transferable: An Adaptive Feature Norm Approach for Unsupervised Domain Adaptation.
Unsupervised Domain Adaptation via Regularized Conditional Alignment.
Moment Matching for Multi-Source Domain Adaptation.
Transferability and Hardness of Supervised Classification Tasks.
Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels.
Many Task Learning With Task Routing.
Similarity-Preserving Knowledge Distillation.
Distillation-Based Training for Multi-Exit Architectures.
Knowledge Distillation via Route Constrained Optimization.
A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays.
Data-Free Quantization Through Weight Equalization and Bias Correction.
Searching for MobileNetV3.
Multinomial Distribution Learning for Effective Neural Architecture Search.
Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation.
Exploring Randomly Wired Neural Networks for Image Recognition.
Anomaly Detection in Video Sequence With Appearance-Motion Correspondence.
Layout-Induced Video Representation for Recognizing Agent-in-Place Actions.
Cost-Aware Fine-Grained Recognition for IoTs Based on Sequential Fixations.
Self-Supervised Deep Depth Denoising.
Employing Deep Part-Object Relationships for Salient Object Detection.
Pushing the Frontiers of Unconstrained Crowd Counting: New Dataset and Benchmark Method.
Bit-Flip Attack: Crushing Neural Network With Progressive Bit Search.
Delving Into Robust Object Detection From Unmanned Aerial Vehicles: A Deep Nuisance Disentanglement Approach.
Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization.
Attention-Based Autism Spectrum Disorder Screening With Privileged Modality.
Interactive Sketch & Fill: Multiclass Sketch-to-Image Translation.
FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On.
Zero-Shot Emotion Recognition via Affective Structural Embedding.
Attention-Aware Polarity Sensitive Embedding for Affective Image Retrieval.
Adaptive Density Map Generation for Crowd Counting.
Elaborate Monocular Point and Line SLAM With Robust Initialization.
GSLAM: A General SLAM Framework and Benchmark.
Hiding Video in Audio via Reversible Generative Models.
Homography From Two Orientation- and Scale-Covariant Features.
QUARCH: A New Quasi-Affine Reconstruction Stratum From Vague Relative Camera Orientation Knowledge.
Estimating the Fundamental Matrix Without Point Correspondences With Application to Transmission Imaging.
Revisiting Radial Distortion Absolute Pose.
A Differential Volumetric Approach to Multi-View Photometric Stereo.
Pixel2Mesh++: Multi-View 3D Mesh Generation via Deformation.
Cascaded Parallel Filtering for Memory-Efficient Image-Based Localization.
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation.
Learning Lightweight Lane Detection CNNs by Self Attention Distillation.
Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting.
SpaceNet MVOI: A Multi-View Overhead Imagery Dataset.
SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation.
Incremental Class Discovery for Semantic Segmentation With RGBD Sensing.
End-to-End Wireframe Parsing.
Perspective-Guided Convolution Networks for Crowd Counting.
Tracking Without Bells and Whistles.
Anchor Diffusion for Unsupervised Video Object Segmentation.
Looking to Relations for Future Trajectory Forecast.
Deep Meta Learning for Real-Time Target-Aware Visual Tracking.
Deformable Surface Tracking by Graph Matching.
Unsupervised Video Interpolation Using Cycle Consistency.
Recursive Visual Sound Separation Using Minus-Plus Net.
Making the Invisible Visible: Action Recognition Through Walls and Occlusions.
Zero-Shot Anticipation for Instructional Activities.
DistInit: Learning Video Representations Without a Single Labeled Video.
Relation Parsing Neural Network for Human-Object Interaction Detection.
Toyota Smarthome: Real-World Activities of Daily Living.
Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles.
FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape From Single RGB Images.
TexturePose: Supervising Human Mesh Estimation With Texture Consistency.
A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation From a Single Depth Image.
Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection.
Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network.
Talking With Hands 16.2M: A Large-Scale Dataset of Synchronized Body-Finger Motion and Audio for Conversational Motion Analysis and Synthesis.
MONET: Multiview Semi-Supervised Keypoint Detection via Epipolar Divergence.
Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning.
Context-Aware Feature and Label Fusion for Facial Action Unit Intensity Estimation With Partially Labeled Data.
Occlusion-Aware Networks for 3D Human Pose Estimation in Video.
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading.
Uncertainty Modeling of Contextual-Connections Between Tracklets for Unconstrained Video-Based Face Recognition.
Racial Faces in the Wild: Reducing Racial Bias by Information Maximization Adaptation Network.
InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting.
Robust Motion Segmentation From Pairwise Matches.
MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input.
Learning Propagation for Arbitrarily-Structured Data.
SSAP: Single-Shot Instance Segmentation With Affinity Pyramid.
Surface Networks via General Covers.
Feature Weighting and Boosting for Few-Shot Segmentation.
Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function.
CCNet: Criss-Cross Attention for Semantic Segmentation.
Asymmetric Non-Local Neural Networks for Semantic Segmentation.
IL2M: Class Incremental Learning With Dual Memory.
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell.
Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification.
Robust Person Re-Identification by Modelling Feature Uncertainty.
Pose-Guided Feature Alignment for Occluded Person Re-Identification.
DeceptionNet: Network-Driven Domain Randomization.
Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition.
Sharpen Focus: Learning With Attention Separability and Consistency.
Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving.
Graph-Based Object Classification for Neuromorphic Vision Sensing.
A Robust Learning Approach to Domain Adaptive Object Detection.
Bridging the Domain Gap for Ground-to-Aerial Image Matching.
Vehicle Re-Identification in Aerial Imagery: Dataset and Approach.
Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings.
Few-Shot Image Recognition With Knowledge Transfer.
Automatic and Robust Skull Registration Based on Discrete Uniformization.
Towards Adversarially Robust Object Detection.
GeoStyle: Discovering Fashion Trends and Events.
Towards Latent Attribute Discovery From Triplet Similarities.
Compact Trilinear Interaction for Visual Question Answering.
Budget-Aware Adapters for Multi-Domain Learning.
Mixed High-Order Attention Network for Person Re-Identification.
USIP: Unsupervised Stable Interest Point Detection From 3D Point Clouds.
Recognizing Part Attributes With Insufficient Data.
Dual Directed Capsule Network for Very Low Resolution Image Recognition.
Few-Shot Learning With Embedded Class Models and Shot-Free Meta Training.
Symmetric Cross Entropy for Robust Learning With Noisy Labels.
Overcoming Catastrophic Forgetting With Unlabeled Data in the Wild.
Evaluating Robustness of Deep Image Super-Resolution Against Adversarial Attacks.
HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision.
Vision-Infused Deep Audio Inpainting.
Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression.
Distilling Knowledge From a Deep Pose Regressor Network.
Beyond Cartesian Representations for Local Descriptors.
What Else Can Fool Deep Learning? Addressing Color Constancy Errors on Deep Neural Network Performance.
Instance-Guided Context Rendering for Cross-Domain Person Re-Identification.
Generative Adversarial Networks for Extreme Learned Image Compression.
PAMTRI: Pose-Aware Multi-Task Learning for Vehicle Re-Identification Using Highly Randomized Synthetic Data.
Generative Adversarial Training for Weakly Supervised Cloud Matting.
Learning Fixed Points in Generative Adversarial Networks: From Image-to-Image Translation to Disease Detection and Localization.
StructureFlow: Image Inpainting via Structure-Aware Appearance Flow.
Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions.
Face-to-Parameter Translation for Game Character Auto-Creation.
Remote Heart Rate Measurement From Highly Compressed Facial Videos: An End-to-End Deep Learning Solution With Video Enhancement.
Learning Robust Facial Landmark Detection via Hierarchical Structured Ensemble.
DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks.
On the Design of Black-Box Adversarial Examples by Leveraging Gradient-Free Optimization and Operator Splitting Method.
Adversarial Robustness vs. Model Compression, or Both?
NLNL: Negative Learning for Noisy Labels.
Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation.
Jointly Aligning Millions of Images With Deep Penalised Reconstruction Congealing.
Goal-Driven Sequential Data Abstraction.
Hierarchical Self-Attention Network for Action Localization in Videos.
Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning.
SANet: Scene Agnostic Network for Camera Localization.
Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization.
Shape Reconstruction Using Differentiable Projections and Deep Priors.
DeepVCP: An End-to-End Deep Neural Network for Point Cloud Registration.
FaceForensics++: Learning to Detect Manipulated Facial Images.