Research |
||||||||||||||||||||||||||||||||||||||||||||||||
|
My research is mainly in the computer vision and machine learning domain. In the following I present illustrations of some of my research activities.
Similarity metric learningPerson reidentificationWe proposed several approaches for similarity metric learning with pedestrian images for person re-identification across different camera views. In particular, we introduced some specific Siamese Neural Network (SNN) architectures and appropriate learning strategies that incorporate semantic prior knowledge by a combination of supervised and weakly supervised learning algorithms. For example, we used semantic pedestrian attributes (clothing type and colour, hair style, carried bags etc.), the human body orientation and the surrounding context (i.e. accompanying persons) to improve the re-identification performance. We also proposed a new SNN-based similarity metric learning algorithm that directly optimises the ranking performance by a particular objective function defined on lists of examples and a novel triplet-based rank learning strategy.
Papers: IVC2013, ICIP2018, ACIVS2018a, ACIVS2018b, VISAPP2018, AVSS2017, ORASIS2017 Face verificationWe developed a similarity metric learning algorithm called Triangular Similarity Metric Learning (TSML) that is using an SNN model and pairwise training with a new objective function that improves the convergence behaviour during learning and improved the state of the art in face verification on public benchmarks such as "Labelled Faces in the Wild"" (LFW).
Papers: TCYB3, FG2018a, FG2018b, ICASSP2018a, MTA2018a Gesture and action recognitionOur work on SNNs for gesture and action recognition with inertial sensor data proposes several new objective functions the improve the similarity learning convergence as well as a subsequent classification stage. We notably introduced a class-balanced Siamese training algorithm that, at each training iteration, incorporates tuples of negative pairs and an objective function based on the polar sine function. We also developped a method to improve on the performance of correctly rejecting unknown classes in a classification setting.
Papers: NEUROCOMP2017, ICANN2016, FG2018 Object and face trackingPixeltrackA very fast adaptive algorithm for tracking deformable generic objects online.Find more information on the PixelTrack web page. Exploiting contextual informationSeveral approaches for including context information in on-line visual object tracking have been proposed. For example, we introduced a method, to learn on-line a discriminative classifier by stochastically sampling negative image examples from moving background regions that are likely to distract the tracking of the given object. Further, we presented various methods to combine several independent tracking algorithms depending on the scene context. This is achieved by a trained scene context classifier and an appropriate tracker selection and combination strategy. Papers: TCSVT2016, ECCV2015 (WS), BMVC2015, VISAPP2015 Visual Focus Of Attention estimationWe developed a method to learn online, unsupervised, and in real-time the Visual Focus Of Attention (VFOA) of people in video-conferencing or meeting room scenarios. The method uses a specific type of online clustering (similar to sequential k-means) to group face patch appearances coming from a face tracking algorithm. By learning these appearance clusters from low-level features we can directly estimate the VFOA without the intermediate step of head pose estimation. Further, the method does not need any prior knowledge about the room configuration or persons in the room. Our experimental results on 110 minutes of annotated data from different sources show that this approach can outperform a classical supervised method based on head orientation estimation and Gaussian Mixture Models.Evaluation data: TA2 database (fullHD resolution reduced to 960x540), IHPD, PETS 2003, Annotation: VFOA annotation (Please cite our paper (AVSS2013) if you use our data or annotation (bibtex).) Online multi-modal speaker detection and trackingIn the TA2 project, we developed a real-time system for multi-modal speaker detection and tracking, in a living room environment where the camera and microphone array are not colocated. The system is able to detect for a varying number of persons, the position of faces, where each person is looking at (i.e. the visual focus of attention), when someone is speaking, who is speaking, and map the IDs to the respective visually tracked faces. Moreover, a keyword spotting algorithm has been integrated. The purpose of this processing is to automatically cut to interesting portions inside the video picture (i.e. to automatically edit the video) in real-time, based on these audio-visual cues.Demo: Controlled environment (9.4MB) Papers: ICME2011, MMM2012, AM2013 Evaluation data: TA2 database Online multiple face trackingAlso in the context of the TA2 project, we proposed an efficient particle filter-based online multi-face tracker with MCMC sampling. It is able to deal reliably with false face detections and missing detections over longer periods of time. The proposed method is based on a probabilistic framework taking into account longer-term observations, and not only reasoning on a frame-by-frame basis, as it is commonly done.
Demos: TA2 video (12MB) Evaluation data: TA2 database Effective multi-cue object trackingFusing multiple cues (like colour, texture) in a tracking framework is not a trivial task. We developed an efficient sampling technique, called Dynamic Partitioned Sampling (DPS), that is able to fuse several cues in a principled way, using a dynamic estimation of the reliability of each cue at each point in time. We evaluated this method in a face tracking application on several videos.In the following example, we use colour (blue rectangle), texture (red rectangle), and shape (green ellipse) cues to track a head. During the video, the estimated reliability of the different cues is dynamically adapted, e.g. when the girl turns around or when her head gets occluded.
Papers: BMVC2009 Face image processingWe presented several novel approaches to facial image processing based on convolutional neural networks (CNN). After applying the Convolutional Face Finder (CFF) (Garcia and Delakis, PAMI 2004) on gray-scale images we obtain face images that we further process with different models.Face alignmentOur Convolutional Face Aligner (CFA). Geometric transformations (translation, rotation and zoom) are iteratively estimated and applied to precisely find the correct face bounding box.
Some results of face alignment with CFA on an Internet dataset. For each image pair, on the left: the face bounding box detected by CFF (white) with the desired face bounding box (white); on the right: the face bounding box detected with CFA (in white)
Papers: VISAPP2008 Facial feature localisationIn this work, we estimate the location of four facial features: the eyes, the nose, and the mouth, in low-quality grey-scale images. As illustrated in the following diagram, we first extract normalised face patches found by the CFF, then our convolutional neural network highlights the positions of each of the four features in so-called feature maps, and finally, we project these positions back onto the original image.
Here are some results on difficult examples showing robustness of this method to extreme illumination, poor quality and contrast, head pose variation, and partial occlusions:
Papers: ISPA2005, CORESA2005 (best student paper), MLSP2007 Other work on facial image analysis on low-quality grey-scale images, e.g. face recognition, gender recognition, can be found here: IWAPR2007, PHD-THESIS2007 Object detection in imagesWe applied convolutional neural networks (CNN) to the task of object detection in still images. The models are "light-weight" but very robust to common types of noise and object/image variations. Here are some results on cars and motorbikes:
Papers: VOCC2005 Some results on the difficult task of transparent logos detection (France 2 TV logo):
Papers: ICANN2006 Object segmentation in videosWe proposed a spatio-temporal segmentation of moving objects using active contours with non-parametric region descriptors based on dense motion vectors. Here are some examples of segmentations of single video frames.
Papers: GRETSI2005, JMIV2006, MASTER-THESIS2004 |