Skand Vishwanath Peri

I am a Research Assistant at Video Analytics Lab at the Indian Instiute of Science. I graduated with Bachelors of Technology (B.Tech) degree in Computer Science and Engineering from Indian Institute of Technology, Ropar, India.

I completed my Bachelor's Thesis in the field of Heterogenous Face Recognition and was guided by Dr. C.K. Narayanan. During my undergrad, I also had the wonderful opportunity of working with Dr. Abhinav Dhall and Dr. Deepti Bathula .
Recently, in Summer 2018, I was a Google Summer of Code Intern with DeepChem and had the pleasure of working with Bharath Ramsundar and Karl Leswing on importing Imaging toold to the DeepChem library. Previously, I also interned at Video Analytics Lab, IISc Bangalore under Dr. R.Venkatesh Babu during Summer 2017 and worked on HDR Deghosting.

CV  /  Github  /  Twitter  /  Linkedin  /  Google Scholar


I'm broadly interested in Computer Vision and Machine Learning and my specific interests are Unsupervised learning of representations, Computational Photography, Cross-Modal generative models, Heterogeneous Learning.



[NEW] MRI to FDG-PET: Cross-Modal Synthesis Using 3D U-Net For Multi-Modal Alzheimer's Classification
Apoorva Sikka*, Skand Vishwanath Peri*, Deepti.R.Bathula
In MICCAI Workshop on Simulation and Synthesis in Medical Imaging, 2018.

abstract | bibtex | arXiv

Recent studies suggest that combined analysis of Magnetic resonance imaging~(MRI) that measures brain atrophy and positron emission tomography~(PET) that quantifies hypo-metabolism provides improved accuracy in diagnosing Alzheimer's disease. However, such techniques are limited by the availability of corresponding scans of each modality. Current work focuses on a cross-modal approach to estimate FDG-PET scans for the given MR scans using a 3D U-Net architecture. The use of the complete MR image instead of a local patch based approach helps in capturing non-local and non-linear correlations between MRI and PET modalities. The quality of the estimated PET scans is measured using quantitative metrics such as MAE, PSNR and SSIM. The efficacy of the proposed method is evaluated in the context of Alzheimer's disease classification. The accuracy using only MRI is 70.18% while joint classification using synthesized PET and MRI is 74.43% with a p-value of 0.06. The significant improvement in diagnosis demonstrates the utility of the synthesized PET scans for multi-modal analysis.

                    Author = {Sikka, Apoorva and Peri, Skand Vishwanath and Bathula, Deepti.R},
                    Title = { MRI to FDG-PET: Cross-Modal Synthesis Using 3D U-Net For Multi-Modal Alzheimer's Classification},
                    Booktitle = {MICCAI Workshop on Simulation and Synthesis in Medical Imaging},
                    Year = {2018}

[NEW] Deep Cross modal learning for Caricature Verification and Identification (CaVINet)
Jatin Garg*, Skand Vishwanath Peri*, Himanshu Tolani*, Narayanan.C.Krishnan
ACM Multimedia, 2018.

abstract | bibtex | arXiv | project page

Learning from different modalities is a challenging task that involves determining a shared space that bridges the two modalities. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature is a modality with images having exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. To advance the research in this field, we have created a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities. The dataset captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that is able to handle extreme distortions present in caricatures using a deep learning network that learns similar representations across the modalities. We use two convolutional networks along with transformations that are subjected to orthogonality constraints to capture the shared and modality specific representations. In contrast to prior research, our approach neither depends on manually extracted facial landmarks for learning the representations, nor on the identities of the person for performing verification. The learned shared representation achieves 91% accuracy for verifying unseen images and 75% accuracy on unseen identities. Further, recognizing the identity in the image by knowledge transfer using a combination of shared and modality specific representations, resulted in an unprecedented performance of 85% rank-1 accuracy for caricatures and 95% rank-1 accuracy for visual images.

                    Author = {Garg, Jatin and Peri, Skand Vishwanath and Tolani, Himanshu and
                    Krishnan, Narayana.C},
                    Title = {Deep Cross modal learning for Caricature Verification and Identification (CaVINet)},
                    Booktitle = {ACM Multimedia},
                    Year = {2018}

[NEW] DisguiseNet : A Contrastive Approach for Disguised Face Verification in the Wild
Skand Vishwanath Peri, Abhinav Dhall
IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Disguised Faces in the Wild (CVPRW), 2018

abstract | bibtex | arXiv

This paper describes our approach for the Disguised Faces in the Wild (DFW) 2018 challenge. The task here is to verify the identity of a person among disguised and impostors images. Given the importance of the task of face verification it is essential to compare methods across a common platform. Our approach is based on VGG-face architecture paired with Contrastive loss based on cosine distance metric. For augmenting the data set, we source more data from the internet. The experiments show the effectiveness of the approach on the DFW data. We show that adding extra data to the DFW dataset with noisy labels also helps in increasing the generalization performance of the network. The proposed network achieves 27.13% absolute increase in accuracy over the DFW baseline.

                    Author = {Peri, Skand V. and
                    Dhall, Abhinav},
                    Title = {DisguiseNet : A Contrastive Approach for Disguised Face Verification in the Wild},
                    Booktitle = {IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
                    Year = {2018}

    * - Denotes Equal Contribution


Google Summer of Code Intern - DeepChem (Open Chemistry) (Summer 2018)
Mentor : Bharath Ramsundar - CTO, Computable Labs and Karl Leswing - Tech Lead, Schrodinger.

I had worked on importing Imaging Tools to DeepChem library. In specific I implemented U-Net architecture, ported the ResNet-50 architecture and also implemented Imaging Data Transforms API to enable image transformations that help in data augmentation of datasets.

HDR Deghosting (Summer 2017)
Mentor : Dr R.Venkatesh Babu, Dept. of Computational and Data Sciences, IISc Bangalore, India

My work at IISc was focussed mainly on generating Deghosted High Dynamic Range images. I spent the initial half of my internship on segmenting moving objects in varying illumination images and the rest half was spent on registering images under high illumination variance.

Mathematical Visual Simulators (Summer 2016)
Mentor : Dr C.K.Narayanan, Dept. of CS, IIT Ropar

Developed a GUI version of Singular Value Decomposition, Gradient Descent and Lagrange Multipliers depicting their geometrical interpretation. Chart.js, Plotly.js, Numeric.js and Algebra.js libraries were used to develop the tool.

Course Projects

Bio Medical Image Processing

Alzhemizer's Classification using MRI and PET images (November, 2017) [Course Final Project]
Mentor : Dr. Deepti.R.Bhatula , Dept. of CS, IIT Ropar, India

In this project a localised deep neural net based architecture with 3D Convolution to predict if the has Alzhemizer's Disease using PET/MRI scans of the person's brain was proposed. [ Code will be made public once the corresponding paper is published. ]

Nonlocal Means-Based Speckle Filtering for Ultrasound Images (September, 2017)
Mentor : Dr. Deepti.R.Bhatula , Dept. of CS, IIT Ropar, India

In this project I implemented non local means based noise filtering for ultrasound images. This algorithm is specific to ultrasound speckle noise. The paper proposed a new similarity metric : Pearson Distance. A detailed information of the implementation is present in the report.

CT Reconstruction Algorithms - ART, SART, Back Projection and Filtered Back Projection> (October, 2017)
Mentor : Dr. Deepti.R.Bhatula , Dept. of CS, IIT Ropar, India

In this work, I have implemented different Computed Tomography (CT) reconstruction algorithms. Majorly CT reconstructions involve 2 methods, Algebraic Reconstruction Algorithms and Back Projection Algorithms. I have implemented 2 variants of the first one [ART and SART] and 3 variants of the second one [simple BP and Filtered BP, Noise Filtered BP].

Computer Vision

Personality Assessment from Videos (November, 2017) [Course Final Project]
Mentor : Dr. Abhinav Dhall , Dept. of CS, IIT Ropar, India

The main aim of this project was to assess the Big 5 personality traits from videos. We came up with a novel approach in which we could regress the 5 traits using the background as well as the facial features. More information about the methodology can be found here . [Code will be published once the corresponding paper gets accepted.]

Creating Collage using Hybrid Images (August, 2017)
Mentor : Dr. Abhinav Dhall , Dept. of CS, IIT Ropar, India

In this I used the concept of Hybrid images as presented in Olivia, SIGGRAPH 06 to create collages. After applying the hybrid image technique the images were blended to show smooth transition from one to the other. Detailed information can be found in this report.

Visual Bag of Words & Homography Estimation (August, 2017)
Mentor : Dr. Abhinav Dhall , Dept. of CS, IIT Ropar, India

Visual bag of words on the Fashion MNIST data set was implemented using k-means clustering. Also Mosaic was created using homography estimation(projective transformation), warping and then blending of the images. The technical report of this can be found here .

Digital Image Processing

Image Morphing (Aug-Nov, 2016)
Mentor : Dr. Puneet Goyal , Dept. of CS, IIT Ropar, India

We morphed 2 images by using Delaunay Triangulation technique. We take the tie points as input from the user and then compute the affine transformation from one image to the other and then blend the two images to get a smooth transition from one image to the other. This process can be perfomed with multiple images (we have performed it with 2 and 3 images).

Software Engineering

Addiction Removal Application (Aug-Nov, 2017) [Course Final Project]
Mentor : Dr. Balwinder Sodhi , Dept. of CS, IIT Ropar, India

Addiction Removal is a platform for addicts to get rid of their addiction with help of other people who have already got rid of their addictions. Go ahead, find your mentor and read the motivating blogs. The development of the software is present in this google site.

Template : this, this, this and this