iccv 2017 论文列表
IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017.
|
Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos.
StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks.
Representation Learning by Learning to Count.
One Network to Solve Them All - Solving Linear Inverse Problems Using Deep Projection Models.
Deep Adaptive Image Clustering.
Scale Recovery for Monocular Visual Odometry Using Depth Estimated with Deep Convolutional Neural Fields.
Semi-Global Weighted Least Squares in Image Filtering.
GPLAC: Generalizing Vision-Based Robotic Skills Using Weakly Labeled Images.
The "Something Something" Video Database for Learning and Evaluating Visual Common Sense.
Learning Action Recognition Model from Depth and Skeleton Videos.
Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos.
TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal.
Localizing Moments in Video with Natural Language.
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection.
Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos.
Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics.
Learning Bag-of-Features Pooling for Deep Convolutional Neural Networks.
Deep Scene Image Classification with the MFAFVNet.
Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization.
Interpretable Transformations with Encoder-Decoder Networks.
Temporal Context Network for Activity Localization in Videos.
Unified Deep Supervised Domain Adaptation and Generalization.
Semantic Image Synthesis via Adversarial Learning.
Efficient Low Rank Tensor Ring Completion.
Semi Supervised Semantic Segmentation Using Generative Adversarial Network.
3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-Scale 3D Point Clouds.
Training Deep Networks to be Spatially Sensitive.
Deep Functional Maps: Structured Prediction for Dense Shape Correspondence.
Image2song: Song Retrieval via Bridging Image Content and Lyric Words.
Scene Categorization with Spectral Features.
Flip-Invariant Motion Representation.
Scaling the Scattering Transform: Deep Hybrid Networks.
HashNet: Deep Learning to Hash by Continuation.
Human Pose Estimation Using Global and Local Normalization.
Understanding and Mapping Natural Beauty.
Video Scene Parsing with Predictive Feature Learning.
Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images.
Soft-NMS - Improving Object Detection with One Line of Code.
Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval.
Deeper, Broader and Artier Domain Generalization.
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks.
Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization.
Offline Handwritten Signature Modeling and Verification Based on Archetypal Analysis.
A Discriminative View of MRF Pre-processing Algorithms.
Non-rigid Object Tracking via Deformable Patches Using Shape-Preserved KCF and Level Sets.
Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking.
Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network.
Saliency Pattern Detection by Ranking Structured Trees.
Recurrent Color Constancy.
Pixel Recursive Super Resolution.
Realistic Dynamic Facial Textures from a Single Image Using GANs.
Face Sketch Matching via Coupled Deep Transform Learning.
Range Loss for Deep Face Recognition with Long-Tailed Training Data.
Multi-scale Deep Learning Architectures for Person Re-identification.
Intrinsic 3D Dynamic Surface Tracking based on Dynamic Ricci Flow and Teichmüller Map.
RGB-Infrared Cross-Modality Person Re-identification.
Estimating Defocus Blur via Rank of Local Patches.
Reflectance Capture Using Univariate Sampling of BRDFs.
A Lightweight Single-Camera Polarization Compass with Covariance Estimation.
Deltille Grids for Geometric Camera Calibration.
Camera Calibration by Global Constraints on the Motion of Silhouettes.
Submodular Trajectory Optimization for Aerial 3D Scanning.
Refractive Structure-from-Motion Through a Flat Refractive Interface.
Editable Parametric Dense Foliage from 3D Capture.
Convolutional Dictionary Learning via Local Processing.
Active Decision Boundary Annotation with Deep Generative Models.
End-to-End Face Detection and Cast Grouping in Movies Using Erdös-Rényi Clustering.
TALL: Temporal Activity Localization via Language Query.
Learning from Video and Text via Large-Scale Discriminative Clustering.
DeepSetNet: Predicting Sets with Deep Neural Networks.
Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks.
Quantitative Evaluation of Confidence Measures in a Machine Learning World.
Learning 3D Object Categories by Looking Around Them.
Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition.
3D Graph Neural Networks for RGBD Semantic Segmentation.
BIER - Boosting Independent Embeddings Robustly.
Weakly-Supervised Learning of Visual Relations.
What is Around the Camera?
Personalized Cinemagraphs Using Semantic Understanding and Collaborative Learning.
Spatiotemporal Modeling for Crowd Counting in Videos.
Dynamic Label Graph Matching for Unsupervised Video Re-identification.
A Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras.
Moving Object Detection in Time-Lapse or Motion Trigger Image Sequences Using Low-Rank and Invariant Sparse Decomposition.
A Multimodal Deep Regression Bayesian Network for Affective Video Content Analyses.
Dense and Low-Rank Gaussian CRFs Using Deep Embeddings.
Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning.
Unsupervised Object Segmentation in Video by Efficient Selection of Highly Probable Positive Features.
Focusing Attention: Towards Accurate Text Recognition in Natural Images.
AutoDIAL: Automatic Domain Alignment Layers.
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression.
Rotation Equivariant Vector Field Networks.
Segmentation-Aware Convolutional Networks Using Local Attention Masks.
Multimodal Gaussian Process Latent Variable Models with Harmonization.
Tensor RPCA by Bayesian CP Factorization with Complex Noise.
Sparse Exact PGA on Riemannian Manifolds.
Self-Organized Text Detection with Minimal Post-processing via Border Learning.
The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes.
RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation.
Exploiting Spatial Structure for Localizing Manipulated Image Regions.
Generalized Orderless Pooling Performs Implicit Salient Matching.
Illuminating Pedestrians via Simultaneous Detection and Segmentation.
WordSup: Exploiting Word Annotations for Character Based Text Detection.
Extreme Clicking for Efficient Object Annotation.
Object-Level Proposals.
Locally-Transferred Fisher Vectors for Texture Classification.
Learning to Estimate 3D Hand Pose from Single RGB Images.
Boosting Image Captioning with Attributes.
AnnArbor: Approximate Nearest Neighbors Using Arborescence Coding.
SSH: Single Stage Headless Face Detector.
RoomNet: End-to-End Room Layout Estimation.
Referring Expression Generation and Comprehension via Attributes.
Mutual Enhancement for Detection of Multiple Logos in Sports Videos.
Online Multi-object Tracking Using CNN-Based Single Object Tracker with Spatial-Temporal Attention Mechanism.
Deep Generative Adversarial Compression Artifact Removal.
Blob Reconstruction Using Unilateral Second Order Gaussian Kernels with Application to High-ISO Long-Exposure Image Denoising.
Convergence Analysis of MAP Based Blur Kernel Estimation.
Image Super-Resolution Using Dense Skip Connections.
Understanding Low- and High-Level Contributions to Fixation Prediction.
Simultaneous Detection and Removal of High Altitude Clouds from an Image.
AOD-Net: All-in-One Dehazing Network.
Non-linear Convolution Filters for CNN-Based Learning.
Blur-Invariant Deep Learning for Blind-Deblurring.
Automatic Content-Aware Projection for 360° Videos.
Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification.
Learning Dense Facial Correspondences in Unconstrained Images.
DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs.
From RGB to Spectrum for Natural Scenes via Manifold-Based Mapping.
Efficient Algorithms for Moral Lineage Tracing.
FLaME: Fast Lightweight Mesh Estimation Using Variational Smoothing on Delaunay Graphs.
Taking the Scenic Route to 3D: Optimising Reconstruction from Moving Cameras.
Dynamics Enhanced Multi-camera Motion Segmentation from Unsynchronized Videos.
Optimal Transformation Estimation with Semantic Cues.
Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames.
Depth Estimation Using Structured Light Flow - Analysis of Projected Pattern Flow on an Object's Surface.
Ray Space Features for Plenoptic Structure-from-Motion.
2D-Driven 3D Object Detection in RGB-D Images.
Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence.
Visual Odometry for Pixel Processor Arrays.
Learning Spread-Out Local Feature Descriptors.
Learning to Push the Limits of Efficient FFT-Based Image Deconvolution.
Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations.
Practical and Efficient Multi-view Matching.
Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting.
Structure-Measure: A New Way to Evaluate Foreground Maps.
MemNet: A Persistent Memory Network for Image Restoration.
DCTM: Discrete-Continuous Transformation Matching for Semantic Flow.
Learning High Dynamic Range from Outdoor Panoramas.
Shadow Detection with Conditional Generative Adversarial Networks.
Makeup-Go: Blind Reversion of Portrait Edit.
EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis.
Learning Video Object Segmentation with Visual Memory.
Detail-Revealing Deep Video Super-Resolution.
Video Frame Synthesis Using Deep Voxel Flow.
Semantic Video CNNs Through Representation Warping.
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions.
Neural Ctrl-F: Segmentation-Free Query-by-String Word Spotting in Handwritten Manuscript Collections.
Constrained Convolutional Sparse Coding for Parametric Based Reconstruction of Line Drawings.
AMTnet: Action-Micro-Tube Regression by End-to-end Trainable Deep Architecture.
Action Tubelet Detector for Spatio-Temporal Action Localization.
Unsupervised Video Understanding by Reconciliation of Posture Similarities.
Learning-Based Cloth Material Recovery from Video.
Interleaved Group Convolutions.
Active Learning for Human Pose Estimation.
Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision.
Supplementary Meta-Learning: Towards a Dynamic Model for Deep Neural Networks.
Unsupervised Learning from Video to Detect Foreground Objects in Single Images.
Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences.
Side Information in Robust Principal Component Analysis: Algorithms and Applications.
Approximate Grassmannian Intersections: Subspace-Valued Subspace Learning.
Self-Supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos.
Domain-Adaptive Deep Network Compression.
Consensus Convolutional Sparse Coding.
Learning Discriminative αβ-Divergences for Positive Definite Matrices.
Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering.
Deep Free-Form Deformation Network for Object-Mask Registration.
Higher-Order Minimum Cost Lifted Multicuts for Motion Segmentation.
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN.
Learning Discriminative Latent Attributes for Zero-Shot Classification.
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks.
Learning the Latent "Look": Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images.
Attention-Based Multimodal Fusion for Video Description.
Learning Visual N-Grams from Web Data.
Situation Recognition with Graph Neural Networks.
BlitzNet: A Real-Time Deep Network for Scene Understanding.
Drone-Based Object Counting by Spatially Regularized Regional Proposal Network.
Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training.
CoupleNet: Coupling Global Structure with Local Parts for Object Detection.
Rotational Subgroup Voting and Pose Clustering for Robust 3D Object Recognition.
Learning a Recurrent Residual Fusion Network for Multimodal Matching.
Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval.
Spatial Memory for Context Reasoning in Object Detection.
Cross-Modal Deep Variational Hashing.
Characterizing and Improving Stability in Neural Style Transfer.
Fast Multi-image Matching via Density-Based Clustering.
Supervision by Fusion: Towards Unsupervised Learning of Deep Salient Object Detector.
Online Video Deblurring via Dynamic Temporal Blending Network.
From Square Pieces to Brick Walls: The Next Challenge in Solving Jigsaw Puzzles.
A Stagewise Refinement Model for Detecting Salient Objects in Images.
Going Unconstrained with Rolling Shutter Deblurring.
A Joint Intrinsic-Extrinsic Prior Model for Retinex.
Towards Large-Pose Face Frontalization in the Wild.
Faster than Real-Time Facial Alignment: A 3D Spatial Transformer Network Approach in Unconstrained Poses.
Recognition of Action Units in the Wild with Deep Nets and a New Global-Local Loss.
Pose-Driven Deep Convolutional Model for Person Re-identification.
Deep Facial Action Unit Recognition from Partially Labeled Data.
Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation.
Attention-Aware Deep Reinforcement Learning for Video Face Recognition.
Benchmarking Single-Image Reflection Removal Algorithms.
Space-Time Localization and Mapping.
Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras.
From Point Clouds to Mesh Using Regression.
Dense Non-rigid Structure-from-Motion and Shading with Unknown Albedos.
Low Compute and Fully Parallel Computer Vision with HashMatch.
Efficient Global Illumination for Morphable Models.
Pose Guided RGBD Feature Learning for 3D Object Pose Estimation.
Parameter-Free Lens Distortion Calibration of Central Cameras.
Modeling Urban Scenes from Pointclouds.
BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth.
Semantically Informed Multiview Surface Refinement.
Towards More Accurate Iris Recognition Using Deeply Learned Spatially Corresponding Features.
SVDNet for Pedestrian Retrieval.
Synergy between Face Alignment and Tracking via Discriminative Global Consensus Optimization.
Learning Discriminative Aggregation Network for Video-Based Face Recognition.
Recursive Spatial Transformer (ReST) for Alignment-Free Face Recognition.
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules.
Unlabeled Samples Generated by GAN Improve the Person Re-identification Baseline in Vitro.
Attribute-Enhanced Face Recognition with Neural Tensor Fusion Networks.
Temporal Non-volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition.
RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos.
MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction.
Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources.
First-Person Activity Forecasting with Online Inverse Reinforcement Learning.
Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images.
Fast Face-Swap Using Convolutional Neural Networks.
FCN-rLSTM: Deep Spatio-Temporal Neural Networks for Vehicle Counting in City Cameras.
Weakly Supervised Summarization of Web Videos.
Leveraging Weak Semantic Relevance for Complex Video Event Classification.
Online Real-Time Multiple Spatiotemporal Action Localisation and Prediction.
TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals.
Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge.
Temporal Superpixels Based on Proximity-Weighted Patch Matching.
CDTS: Collaborative Detection, Tracking, and Segmentation for Online Multiple Object Segmentation in Videos.
Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses.
PUnDA: Probabilistic Unsupervised Domain Adaptation for Knowledge Transfer Across Visual Categories.
Learning Robust Visual-Semantic Embeddings.
Guided Perturbations: Self-Corrective Behavior in Convolutional Neural Networks.
Predictor Combination at Test Time.
Curriculum Dropout.
Two-Phase Learning for Weakly Supervised Object Localization.
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization.
Aesthetic Critiques Generation for Photos.
Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors.
SGN: Sequential Grouping Networks for Instance Segmentation.
Multi-label Learning of Part Detectors for Heavily Occluded Pedestrian Detection.
Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning.
Deep Globally Constrained MRFs for Human Pose Estimation.
Large-Scale Image Retrieval with Attentive Deep Local Features.
Monocular 3D Human Pose Estimation by Predicting Depth on Joints.
DeepRoadMapper: Extracting Road Topology from Aerial Images.
Interpretable Explanations of Black Boxes by Meaningful Perturbation.
Learning to Disambiguate by Asking Discriminative Questions.
Generative Adversarial Networks Conditioned by Brain Signals.
Incremental Learning of Object Detectors without Catastrophic Forgetting.
Single Image Action Recognition Using Semantic Body Part Actions.
Weakly Supervised Object Localization Using Things and Stuff Transfer.
A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images.
Recurrent Topic-Transition GAN for Visual Paragraph Generation.
Robust Kronecker-Decomposable Component Analysis for Low-Rank Modeling.
What will Happen Next? Forecasting Player Moves in Sports Videos.
The Pose Knows: Video Forecasting by Generating Pose Futures.
Beyond Standard Benchmarks: Parameterizing Performance Evaluation in Visual Object Tracking.
DeepCD: Learning Deep Complementary Descriptors for Patch Representations.
Low-Rank Tensor Completion: A Pseudo-Bayesian Learning Approach.
Misalignment-Robust Joint Filter for Cross-Modal Image Pairs.
Non-uniform Blind Deblurring by Reblurring.
DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks.
Learning Visual Attention to Identify People with Autism Spectrum Disorder.
High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits.
Revisiting Cross-Channel Information Transfer for Chromatic Aberration Correction.
A Generic Deep Architecture for Single Image Reflection Removal and Image Smoothing.
Semantic Line Detection and Its Applications.
Deeply-Learned Part-Aligned Representations for Person Re-identification.
Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings.
Pose-Invariant Face Alignment with a Single CNN.
DeepCoder: Semi-Parametric Variational Autoencoders for Automatic Facial Action Coding.
A Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition.
Detecting Faces Using Inside Cascaded Contextual CNN.
A Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces.
Filter Selection for Hyperspectral Estimation.
Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints.
Detailed Surface Geometry and Albedo Recovery from RGB-D Video under Natural Illumination.
Robust Hand Pose Estimation during the Interaction with an Unknown Object.
Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting.
Learning Hand Articulations by Hallucinating Heat Distribution.
Multi-view Dynamic Shape Refinement Using Local Temporal Integration.
A 3D Morphable Model of Craniofacial Shape and Texture Variation.
Probabilistic Structure from Motion with Objects (PSfMO).
A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition.
SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition.
Single Shot Text Detector with Regional Attention.
Detect to Track and Track to Detect.
A Coarse-Fine Network for Keypoint Localization.
Low-Shot Visual Recognition by Shrinking and Hallucinating Features.
TorontoCity: Seeing the World with a Million Eyes.
Visual Forecasting by Imitating Dynamics in Natural Sequences.
Inferring and Executing Programs for Visual Reasoning.
Focal Loss for Dense Object Detection.
Towards Diverse and Natural Image Descriptions via a Conditional GAN.
Mask R-CNN.
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning.
Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention.
Transferring Objects: Joint Inference of Container and Human Pose.
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos.
Temporal Action Detection with Structured Segment Networks.
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection.
Unmasking the Abnormal Events in Video.
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video.
SBGAR: Semantics Based Group Activity Recognition.
MarioQA: Answering Questions by Watching Gameplay Videos.
Learning View-Invariant Features for Person Identification in Temporally Synchronized Videos Taken by Wearable Cameras.
DualGAN: Unsupervised Dual Learning for Image-to-Image Translation.
Sampling Matters in Deep Embedding Learning.
Temporal Generative Adversarial Nets with Singular Value Clipping.
Smart Mining for Deep Metric Learning.
Deep Growing Learning.
Centered Weight Normalization in Accelerating Training of Deep Neural Networks.
Least Squares Generative Adversarial Networks.
Towards a Unified Compositional Model for Visual Pattern Modeling.
Introspective Neural Networks for Generative Modeling.
Associative Domain Adaptation.
Universal Adversarial Perturbations Against Semantic Image Segmentation.
CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training.
Learning Efficient Convolutional Networks through Network Slimming.
Regional Interactive Image Segmentation Networks.
Deep Dual Learning for Semantic Image Segmentation.
AMAT: Medial Axis Transform for Natural Images.
Directionally Convolutional Networks for 3D Shape Segmentation.
A Unified Model for Near and Remote Sensing.
SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation?
Point Set Registration with Global-Local Correspondence and Transformation Estimation.
Sketching with Style: Visual Search with Sketches and Aesthetic Context.
Dual-Glance Model for Deciphering Social Relationships.
A Simple Yet Effective Baseline for 3d Human Pose Estimation.
Scene Parsing with Global Context Embedding.
Revisiting IM2GPS in the Deep Learning Era.
MUTAN: Multimodal Tucker Fusion for Visual Question Answering.
Compositional Human Pose Regression.
Deep Metric Learning with Angular Loss.
Performance Guaranteed Network Acceleration via High-Order Residual Quantization.
Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?
Volumetric Flow Estimation for Incompressible Fluids Using the Stationary Stokes Equations.
CREST: Convolutional Residual Learning for Visual Tracking.
Non-Markovian Globally Consistent Multi-object Tracking.
Low-Dimensionality Calibration through Local Anisotropic Scaling for Robust Hand Model Personalization.
Joint Bi-layer Optimization for Single-Image Rain Streak Removal.
Should We Encode Rain Streaks in Video as Deterministic or Stochastic?
Robust Video Super-Resolution with Learned Temporal Dynamics.
Fast Image Processing with Fully-Convolutional Networks.
Paying Attention to Descriptions Generated by Image Captioning Models.
Blind Image Deblurring with Outlier Handling.
Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer.
Visual Transformation Aided Contrastive Learning for Video-Based Kinship Verification.
Group Re-identification via Unsupervised Transfer of Sparse Features Encoding.
Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis.
Stepwise Metric Promotion for Unsupervised Video Person Re-identification.
Efficient Online Local Metric Adaptation via Negative Samples for Person Re-identification.
Video Reflection Removal Through Spatio-Temporal Optimization.
Depth and Image Restoration from Light Field in a Scattering Medium.
Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection.
Multi-view Non-rigid Refinement and Normal Selection for High Quality 3D Reconstruction.
Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map.
Progressive Large Scale-Invariant Image Matching in Scale Space.
PolyFit: Polygonal Surface Reconstruction from Point Clouds.
Online Video Object Detection Using Association LSTM.
RMPE: Regional Multi-person Pose Estimation.
3D Surface Detail Enhancement from a Single Normal Map.
Making Minimal Solvers for Absolute Pose Estimation Compact and Robust.
SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis.
Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks.
Polynomial Solvers for Saturated Ideals.
Linear Differential Constraints for Photo-Polarimetric Height Estimation.
Turning Corners into Cameras: Principles and Methods.
Material Editing Using a Physically Based Rendering Network.
Neural EPI-Volume Networks for Shape from Light Field.
Learning to Synthesize a 4D RGBD Light Field from a Single Image.
GANs for Biological Image Synthesis.
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks.
Playing for Benchmarks.
Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework.
Raster-to-Vector: Revisiting Floorplan Transformation.
Deep Cropping via Attention Box Prediction and Aesthetics Assessment.
Am I a Baller? Basketball Performance Assessment from First-Person Videos.
Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks.
Common Action Discovery and Localization in Unconstrained Videos.
Lattice Long Short-Term Memory for Human Action Recognition.
What Actions are Needed for Understanding Human Actions in Videos?
Joint Discovery of Object States and Manipulation Actions.
View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data.
Bringing Background into the Foreground: Making All Classes Equal in Weakly-Supervised Video Semantic Segmentation.
Truncating Wide Networks Using Binary Tree Architectures.
Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs.
Factorized Bilinear Models for Image Recognition.
Is Second-Order Information Helpful for Large-Scale Visual Recognition?
A Self-Balanced Min-Cut Algorithm for Image Clustering.
Multi-task Self-Supervised Visual Learning.
Privacy-Preserving Visual Learning Using Doubly Permuted Homomorphic Encryption.
Scale-Adaptive Convolutions for Scene Parsing.
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes.
Learned Watershed: End-to-End Learning of Seeded Segmentation.
Open Vocabulary Scene Parsing.
No More Discrimination: Cross City Adaptation of Road Scene Segmenters.
Joint Learning of Object and Action Detectors.
A Two Stream Siamese Convolutional Neural Network for Person Re-identification.
An Analysis of Visual Question Answering Algorithms.
Unsupervised Learning of Important Objects from First-Person Videos.
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition.
Chained Cascade Network for Object Detection.
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues.
DSOD: Learning Deeply Supervised Object Detectors from Scratch.
Learning from Noisy Labels with Distillation.
Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-Temporal Path Proposals.
Identity-Aware Textual-Visual Matching with Latent Co-attention.
Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding.
See the Glass Half Full: Reasoning About Liquid Containers, Their Volume and Content.
Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs.
Class Rectification Hard Mining for Imbalanced Deep Learning.
Soft Proposal Networks for Weakly Supervised Object Localization.
SCNet: Learning Semantic Correspondence.
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering.
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation.
Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection.
ScaleNet: Guiding Object Proposal Generation in Supermarkets and Beyond.
Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems.
High Order Tensor Formulation for Convolutional Sparse Coding.
Learning Dynamic Siamese Network for Visual Object Tracking.
Online Robust Image Alignment via Subspace Learning from Gradient Orientations.
Dual Motion GAN for Future-Flow Embedded Video Prediction.
PanNet: A Deep Network Architecture for Pan-Sharpening.
Weakly Supervised Manifold Learning for Dense Semantic Object Correspondence.
Transformed Low-Rank Model for Line Pattern Noise Removal.
Modelling the Scene Dependent Imaging in Cameras with a Deep Neural Network.
Joint Convolutional Analysis and Synthesis Sparse Representation for Single Image Layer Separation.
Learning Gaze Transitions from Depth to Improve Video Saliency Estimation.
Wavelet-SRNet: A Wavelet-Based CNN for Multi-scale Face Super Resolution.
Be Your Own Prada: Fashion Synthesis with Structural Coherence.
Super-Trajectory for Video Segmentation.
Self-Paced Kernel Estimation for Robust Blind Image Deblurring.
Infant Footprint Recognition.
Anchored Regression Networks Applied to Age Estimation and Super Resolution.
Recurrent 3D-2D Dual Learning for Large-Pose Facial Landmark Detection.
Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition.
Composite Focus Measure for High Quality Depth Maps.
Unsupervised Adaptation for Deep Stereo.
Click Here: Human-Localized Keypoints as Guidance for Viewpoint Estimation.
Learned Multi-patch Similarity.
Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation.
Unsupervised Learning of Stereo Matching.
Surface Normals in the Wild.
Toward Perceptually-Consistent Stereo: A Scanline Study.
Learning for Active 3D Mapping.
Unsupervised Creation of Parameterized Avatars.
SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again.
Photographic Image Synthesis with Cascaded Refinement Networks.
Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization.
WeText: Scene Text Detection under Weak Supervision.
Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective.
ChromaTag: A Colored Marker and Fast Detection Algorithm.
Automatic Spatially-Aware Fashion Concept Discovery.
Spatio-Temporal Person Retrieval via Natural Language Queries.
Adaptive RNN Tree for Large-Scale Human Action Recognition.
Following Gaze in Video.
Attentive Semantic Video Generation Using Captions.
Primary Video Object Segmentation via Complementary CNNs and Neighborhood Reversible Flow.
Video Fill In the Blank Using LR/RL LSTMs with Spatial-Temporal Attentions.
Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach.
Channel Pruning for Accelerating Very Deep Neural Networks.
Genetic CNN.
Adversarial Examples for Semantic Segmentation and Object Detection.
SORT: Second-Order Response Transform for Visual Recognition.
Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach.
Weakly Supervised Learning of Deep Metrics for Stereo Reconstruction.
Transitive Invariance for Self-Supervised Visual Representation Learning.
Encoder Based Lifelong Learning.
Cascaded Feature Network for Semantic Segmentation of RGB-D Images.
Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection.
Structured Attentions for Visual Question Answering.
Learning Feature Pyramids for Human Pose Estimation.
Recurrent Multimodal Interaction for Referring Image Segmentation.
Scene Graph Generation from Objects, Phrases and Region Captions.
Generative Modeling of Audible Shapes for Object Perception.
Areas of Attention for Image Captioning.
Attributes2Classname: A Discriminative Model for Attribute-Based Unsupervised Zero-Shot Learning.
An Empirical Study of Language CNN for Image Captioning.
Adversarial PoseNet: A Structure-Aware Convolutional Network for Human Pose Estimation.
BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography.
DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding.
Sublabel-Accurate Discretization of Nonconvex Free-Discontinuity Problems.
ProbFlow: Joint Optical Flow and Uncertainty Estimation.
Predicting Human Activities Using Stochastic Grammar.
Real-Time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor.
Robust Object Tracking Based on Temporal and Spatial Deep Networks.
Learning Background-Aware Correlation Filters for Visual Tracking.
Need for Speed: A Benchmark for Higher Frame Rate Object Tracking.
SHaPE: A Novel Graph Theoretic Algorithm for Making Consensus-Based Decisions in Person Re-identification Systems.
Coherent Online Video Style Transfer.
Multi-channel Weighted Nuclear Norm Minimization for Real Color Image Denoising.
On-demand Learning for Deep Image Restoration.
Video Deblurring via Semantic Segmentation and Pixel-Wise Non-linear Kernel.
Learning Discriminative Data Fitting Functions for Blind Image Deblurring.
Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation.
Delving into Salient Object Subitizing and Detection.
Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs.
RankIQA: Learning from Rankings for No-Reference Image Quality Assessment.
Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression.
How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230, 000 3D Facial Landmarks).
Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks.
Real Time Eye Gaze Tracking with 3D Deformable Eye-Face Model.
Cross-View Asymmetric Metric Learning for Unsupervised Person Re-Identification.
Catadioptric HyperSpectral Light Field Imaging.
Reconfiguring the Imaging Pipeline for Computer Vision.
Focal Track: Depth and Accommodation with Oscillating Lens Deformation.
Corner-Based Geometric Calibration of Multi-focus Plenoptic Cameras.
Rolling-Shutter-Aware Differential SfM and Image Rectification.
Surface Registration via Foliation.
"Maximizing Rigidity" Revisited: A Convex Programming Approach for Generic 3D Shape Reconstruction from Multiple Perspective Views.
Quasiconvex Plane Sweep for Triangulation with Outliers.
BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera.
3D-PRNN: Generating Shape Primitives with Recurrent Neural Networks.
Local-to-Global Point Cloud Registration Using a Dictionary of Viewpoint Descriptors.
Rolling Shutter Correction in Manhattan World.
Improved Image Captioning via Policy Gradient optimization of SPIDEr.
Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models.
A Generative Model of People in Clothing.
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era.
SuBiC: A Supervised, Structured Binary Code for Image Search.
Query-Guided Regression Network with Context Policy for Phrase Grounding.
Hard-Aware Deeply Cascaded Embedding.
Learning to Reason: End-to-End Module Networks for Visual Question Answering.
Beyond Planar Symmetry: Modeling Human Perception of Reflection and Rotation Symmetries in the Wild.
FoveaNet: Perspective-Aware Urban Scene Parsing.
Ensemble Diffusion for Retrieval.
Deformable Convolutional Networks.
Open Set Domain Adaptation.
Deep Direct Regression for Multi-oriented Scene Text Detection.
Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos.
Compressive Quantization for Fast Object Instance Search in Videos.
Learning Long-Term Dependencies for Action Recognition with a Biologically-Inspired Deep Network.
Dense-Captioning Events in Videos.
Unsupervised Action Discovery and Localization in Videos.
SegFlow: Joint Learning for Video Object Segmentation and Optical Flow.
A Read-Write Memory Network for Movie Story Understanding.
Unsupervised Representation Learning by Sorting Sequences.
Coordinating Filters for Faster Deep Neural Networks.
Predicting Deeper into the Future of Semantic Segmentation.
Personalized Image Aesthetics.
Image-Based Localization Using LSTMs for Structured Feature Correlation.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization.
Look, Listen and Learn.
When Unsupervised Domain Adaptation Meets Tensor Representations.
Towards Context-Aware Interaction Recognition for Visual Relationship Detection.
Embedding 3D Geometric Features for Rigid Object Part Segmentation.
Recurrent Scale Approximation for Object Detection in CNN.
Exploiting Multi-grain Ranking Constraints for Precisely Searching Visually-similar Vehicles.
Increasing CNN Robustness to Occlusions by Reducing Filter Support.
VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization.
Attribute Recognition by Joint Recurrent Learning of Context and Correlation.
Show, Adapt and Tell: Adversarial Training of Cross-Domain Image Captioner.
Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization.
DualNet: Learn Complementary Features for Image Recognition.
Neural Person Search Machines.
Visual Semantic Planning Using Deep Successor Representations.
Deep Determinantal Point Process for Large-Scale Multi-label Classification.
Multi-label Image Recognition by Recurrently Discovering Attentional Regions.
Recurrent Models for Situation Recognition.
SafetyNet: Detecting and Rejecting Adversarial Examples Robustly.
MIHash: Online Hashing with Mutual Information.
DeNet: Scalable Real-Time Object Detection with Directed Sparse Sampling.
Reasoning About Fine-Grained Attribute Phrases Using Reference Games.
Flow-Guided Feature Aggregation for Video Object Detection.
Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach.
Fashion Forward: Forecasting Visual Style in Fashion.
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification.
Benchmarking and Error Diagnosis in Multi-instance Pose Estimation.
No Fuss Distance Metric Learning Using Proxies.
HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis.
A Revisit of Sparse Coding Based Anomaly Detection in Stacked RNN Framework.
Non-convex Rank/Sparsity Regularization and Local Minima.
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning.
MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation.
Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies.
PathTrack: Fast Trajectory Annotation with Path Supervision.
Encouraging LSTMs to Anticipate Actions Very Early.
Deep Occlusion Reasoning for Multi-camera Multi-target Detection.
Video Frame Interpolation via Adaptive Separable Convolution.
Learning to Super-Resolve Blurry Face and Text Images.
Joint Adaptive Sparsity and Low-Rankness on the Fly: An Online Tensor Reconstruction Scheme for Video Denoising.
Learning Blind Motion Deblurring.
Zero-Order Reverse Filtering.
Learning Uncertain Convolutional Features for Accurate Saliency Detection.
Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection.
S^3FD: Single Shot Scale-Invariant Face Detector.
An Optimal Transportation Based Univariate Neuroimaging Index.
A Geometric Framework for Statistical Analysis of Trajectories with Distinct Temporal Spans.
Joint Layout Estimation and Global Multi-view Registration for Indoor Reconstruction.
Learning Compact Geometric Features.
Colored Point Cloud Registration Revisited.
CAD Priors for Accurate and Flexible Instance Reconstruction.
Real-Time Monocular Pose Estimation of 3D Objects Using Temporally Consistent Local Color Histograms.
Temporal Shape Super-Resolution by Intra-frame Motion Encoding Using High-fps Structured Light.
Learning Policies for Adaptive Tracking with Deep Feature Cascades.
Temporal Tessellation: A Unified Approach for Video Analysis.
High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference.
Using Sparse Elimination for Solving Minimal Problems in Computer Vision.
End-to-End Learning of Geometry and Context for Deep Stereo Regression.
Rethinking Reprojection: Closing the Loop for Pose-Aware Shape Reconstruction from a Single Image.
Anticipating Daily Intention Using On-wrist Motion Triggered Sensing.
Practical Projective Structure from Motion (P2SfM).
Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.
A Lightweight Approach for On-the-Fly Reflectance Estimation.
Robust Pseudo Random Fields for Light-Field Stereo Matching.
Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence.