Confirmed Speakers

More details about the talks can be found down the page.

  • Ashish Gupta, Google
  • Anurag Batra, Google
  • Goutham Tholpadi, Microsoft [Talk]
  • Mohit Kumar, Flipkart
  • Jayanth Mysore, Homelane
  • Purvi Shah, Pratham Books
  • Thejesh GN, DataMeet
  • Anirban Majumder, Amazon [Talk]
  • Uma Sawant, LinkedIn [Talk]
  • Samik Datta, Flipkart [Talk]

Talk Details

  • Dissemination Biases of Social Media Channels: On The Topical Coverage of Socially Shared News (By Niloy Ganguly, IIT Kharagpur)

    In a marked departure from traditional offline media, where all subscribers of a particular news media source (e.g., New York Times) used to get the same news stories through printed newspapers, online news media presents multiple options for the readers to consume news. For example, the subscribers of a media source can get news directly from the news website, or from what their peers share over social media sites like Facebook and Twitter. It is, however, unclear whether there are any differences in the news disseminated on these different online channels. In this work, we analyze data from a popular online news media site (, and show that each of these different channels tends to highlight some types of stories more than other stories. We believe that consumers of online news as well as media organizations need to be aware of such differences in various online news dissemination channels.

    Bio: Niloy Ganguly is a professor in the department of computer science and engineering, Indian Institute of Technology Kharagpur. He has received his PhD from Bengal Engineering and Science University, Calcutta, India and his Bachelors in Computer Science and Engineering from IIT Kharagpur. He has been a post doctoral fellow in Technical University of Dresden, Germany. He focuses on dynamic and self-organizing networks especially online-social networks (OSN), mobile networks etc. In online social networks, he has worked on several problems like designing recommendation, community detection, expert identification, opinion dynamics analysis etc. on various web-social networks like Twitter, Reddit, Delicious etc. He has also simultaneously worked on various theoretical issues like percolation, evolution related to dynamical large networks often termed as complex networks. He has been collaborating with various national and international universities and research lab including Duke University, TU Dresden, Germany, MPI PKS and MPI SWS, Germany, Microsoft Lab, India etc. He currently publishes in various top ranking international journals and conferences including ICWSM, CIKM, SIGKDD, CSCW, ACL, WWW, INFOCOM, SIGIR, Euro Physics Letters, Pysical Review E, ACM and IEEE Transactions, etc. For further information visit his webpage

  • Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning (By Balaraman Ravindran, IIT Madras)

    Recently there has been a lot of interest in learning common representations for multiple views of data. These views could belong to different modalities or languages. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say, V1 and V2) but parallel data is available between each of these views and a pivot view (V3). We propose a model for learning a common representation for V1, V2 and V3 using only the parallel data available between V1V3 and V2V3. The proposed model is generic and even works when there are n views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) Transfer learning between languages L1,L2,...,Ln using a pivot language L and (ii) cross modal access between images and a language L1 using a pivot language L2. We evaluate our model using two datasets : (i) publicly available multilingual TED corpus and (ii) a new multilingual multimodal dataset created and released as a part of this work. On both these datasets, our model outperforms state of the art approaches.

    Bio: Prof. Ravindran is currently an associate professor in Computer Science at IIT Madras. He has nearly two decades of research experience in machine learning and specifically reinforcement learning. Currently his research interests are centered on learning from and through interactions and span the areas of data mining, social network analysis, and reinforcement learning.

  • Privacy and Security in Online Social Media (PSOSM) (By Ponnurangam Kumaraguru, IIIT Delhi)

    With increase in usage of the Internet, there has been an exponential increase in the use of online social media on the Internet. Websites like Facebook, Google+, YouTube, Orkut, Twitter and Flickr have changed the way Internet is being used. There is a dire need to investigate, study and characterize privacy and security on online social media from various perspectives (computational, cultural, psychological). Real world scalable systems need to be built to detect and defend security and privacy issues on online social media. I will describe briefly some cool ongoing projects that we have: Twit-Digest, MultiOSN, Finding Nemo, OCEAN, Privacy in India, and Call Me MayBe. Many of our research work is made available for public use through tools or online services. Our work derives techniques from Data Mining, Text Mining, Statistics, Network Science, Public Policy, Complex networks, Human Computer Interaction, and Psychology. In particular, in this talk, I will focus on the following: (1) Twit-Digest is a tool to extract intelligence from Twitter which can be useful to security analysts. Twit-Digest is backed by award-winning research publications in international and national venues. (2) MultiOSN is a platform to analyze multiple OSM services to gain intelligence on a given topic / event of interest (2) OCEAN: Open source Collation of eGovernment data and Networks Here, we show how publicly available information on Government services can be used to profile citizens in India. This work obtained the Best Poster Award at Security and Privacy Symposium at IIT Kanpur, 2013 and it has gained a lot of traction in Indian media. (3) In Finding Nemo, given an identity in one online social media, we are interested in finding the digital foot print of the user in other social media services, this is also called digital identity stitching problem. This work is also backed by award-winning research publication.

    Bio: Ponnurangam Kumaraguru ("PK") Associate Professor, is currently the Hemant Bharat Ram Faculty Research Fellow at the Indraprastha Institute of Information Technology (IIIT), Delhi, India. PK is the Founding Head of Cybersecurity Education and Research Centre (CERC). PK is one of ACM India Eminent Speakers. He received his Ph.D. from the School of Computer Science at Carnegie Mellon University (CMU). His research interests include Privacy, e-Crime, Online Social Media, and Usable Security, in particular, these days he has been dabbling with complex networked systems (e.g. social web systems like Twitter, Facebook, and telephone logs). He is also very passionate about issues related to human computer interaction. As Principal Investigator, PK is currently managing research projects of about 2 Crores INR. PK is a Co-Principal Investigator in a project approved at the Europe Union FP7 which is about 5.3 million Euros. PK has received research funds from Government of India, National Science Foundation (NSF), USA, industry bodies in India, and International Development Research Centre. He is serving as a PC member in prestigious conferences like WWW, AsiaCCS and he is also serving as a reviewer for International Journal of Information Security and ACM's Transactions on Internet Technology (TOIT). PK's Ph.D. thesis work on anti-phishing research at Carnegie Mellon University has contributed in creating an award winning start-up Wombat Security Technologies PK founded and manages PreCog, a research group at IIIT-Delhi. PK can be reached at pk[at]iiitd[dot]ac[dot]in.

  • Discovering Response-Eliciting Factors in Social Question-Answering (By Partha Pratim Talukdar, IISc Bangalore)

    Questions form an integral part of our everyday communication, both offline and online. Getting responses to our questions from others is fundamental to satisfying our information need and in extending our knowledge boundaries. A question may be represented using various factors such as social, syntactic, semantic, etc. We may hypothesize that these factors contribute with varying degrees towards getting responses from others for a given question. In this talk, I shall present a thorough empirical study to measure effects of these factors using a novel question and answer dataset from the website In this analysis, we employed a sparse non-negative matrix factorization technique to automatically induce interpretable semantic factors from the question dataset. Such interpretable factor-based analysis overcomes limitations faced by prior related research. I shall also present interesting patterns we discovered in this study. Joint work with Danish and Yogesh Dahiya.

    Bio: Partha Talukdar is an Assistant Professor in the Department of Computational and Data Sciences (CDS) and Department of Computer Science and Automation (CSA) at the Indian Institute of Science (IISc), Bangalore. Before that, he was a Postdoctoral Fellow in the Machine Learning Department at Carnegie Mellon University, working with Tom Mitchell on the NELL project. Partha received his PhD (2010) in CIS from the University of Pennsylvania, working under the supervision of Fernando Pereira, Zack Ives, and Mark Liberman. Partha is broadly interested in Machine Learning, Natural Language Processing, and Cognitive Neuroscience, with particular interest in large-scale learning and inference. He is a co-author of the book on Graph-based Semi-Supervised Learning published by Morgan Claypool Publishers. Web:

  • Estimating Popularity from User Generated Content (By Saketha Nath J, IIT Bombay)

    Many online service portals today provide a feedback facility where users can comment on the quality of the service. In the product domain, there are portals/forums dedicated to the reviews on the products. Given such huge resource of online content that reflects the user sentiment, an interesting and challenging exercise is to estimate the "average opinion" of the users on a product/service. We begin by noting an inherent limitation in the traditional machine learning techniques for solving such estimation problems and discuss how state-of-the-art can be improved. In particular, we formally state the problem of supervised class-ratio estimation and present novel learning bounds and algorithms for solving it. Simulations on benchmark datasets show significant improvement.

    Bio: Saketh is an associate professor in the Department of Computer Science and Engineering at IIT Bombay. He is broadly interested in the area of machine learning, with focus on kernel methods and optimization. Currently he is on a sabbatical at Microsoft, where he is working with the Bing team.

  • Noise to Intelligence: Using Social Signals and NLP in Analysing Social Media Content (By Vasudeva Varma, IIIT Hyderabad)

    Social media has become a good proxy for the real world. In many domains, the opinions and sentiments expressed in social media capture the pulse of people in the society. Thus, we are increasingly able to rely on social media to form opinions and take actions. However, there are inherent challenges in mining social media data. In this talk, First, I will focus specifically on social media noise at the surface level and at contextual level. After presenting a few interesting problems related to user generated content applications, I will discuss our current work on leveraging social signals and NLP for extracting information in healthcare domain from social media.

    Bio: Vasudeva Varma is a Professor and the Dean (Research) at IIIT Hyderabad. He is interested in Social media analysis, text understanding, information extraction, cloud computing and start-ups. He is the CEO of IIIT Hyderabad Foundation, which runs one of the largest technology incubators in India. The Foundation manages IIIT-H's IP and technology transfers. He Co-founded Veooz Labs, a startup in the space of news aggregation and content discovery. He published a book on Software Architecture (Pearson Education) and more than two hundred technical papers in journals and conferences. He was visiting professor/researcher at UPV, Valencia (Spain), UBO, Bretagne (France) and Language Technologies Institute, CMU, Pittsburgh (USA). Earlier, he worked in New York and Silicon Valley mostly with start-ups.

  • Challenges in Auto-Moderation of Google Maps UGC (By Ashish Gupta, Google)

    Google Maps are editable - people can submit changes and additions to Google Maps. While a large fraction of UGC contributions to Google maps are good, we do get some bad and incorrect data as well. This presentation outlines the challenges in auto-moderation, whose goal is to identify and publish good UGC while keeping the bad and incorrect data away from Google Maps.

    Bio: Technical Lead and Manager for Google Maps Auto Moderation team.

  • Engaging a Community - What it takes to motivate people to contribute their time and skills (By Anurag Batra, Google)

    Every community of contributors has its own unique dynamics. Therefore, there can be no one formula for generating quality data from the crowd. The success of any crowdsourcing solution depends on how well we engage with the community. This talk takes a few examples of successes and failures to illustrate what it takes to engage with the community and make it vibrant and well-behaved, so we may truly harness the wisdom of the crowds.

    Bio: Anurag is Product Manager for Google Village, the platform that powers Google's crowdsourcing initiatives such as Translate Community and YouTube Fan Captions. Though an engineer at heart, he gets much delight from engaging with end users, focusing on delightful user experience, and building products that change people's lives in a meaningful way. He holds a Bachelors in Technology from IIT-Delhi. He loves trail biking and has been known to cook restaurant-quality Dal Makhani.

  • Relating Romanized Comments to News Articles by Inferring Multi-glyphic Topical Correspondence (By Goutham Tholpadi, Microsoft)

    Commenting is an attractive facility provided by news sites that by engaging users actively is able to increase popularity of websites. Analyzing such user generated content can have practical applications and has recently attracted research interest. However, in multilingual societies such as India, analyzing such user-generated content is hard due to several reasons. (1) There are more than 20 official languages and linguistic tools are available mainly for Hindi. But it is observed that people heavily use romanized text as it is easy and quick using English keyboard, resulting in multi-glyphic comments, where the texts are in the same language but in different scripts. Such romanized texts are almost unexplored in machine learning so far. (2) In many cases comments are made on a specific part of the article rather than the entire topic. Off the shelf methods such as correspondence LDA is insufficient to model such relationship between article and comments. We extend the notion of correspondence in this paper to model multi-lingual, multi-script, and inter-lingual topics in a unified probabilistic model called the Multi-glyphic Correspondence Topic Model (MCTM). Using several metrics, we verify our approach and show that it improves over with state-of-the-art. We are releasing an annotated dataset built for this purpose, to enable further research on this problem.

    Bio: Goutham Tholpadi is a PhD candidate from Indian Institute of Science working with Prof. Chiranjib Bhattacharyya on machine learning applications to language processing, focusing on multilingual problems, especially in Indian langauges. He works at Microsoft in the Bing Ads team focusing on relevance and related problems. Previously, he spent around 6 years in the industry at Infosys and SAP working on Business Intelligence.

  • Reviews and Ratings spam: UGC's bane in e-commerce (By Mohit Kumar, Flipkart)

    Reviews and ratings is the most critical user generated content in e-commerce domain. Consumers rely very heavily on them in making their purchase decisions. Thus reviews and ratings become natural targets for spammers/fraudsters in order to manipulate perception of products/services. In this talk, we will discuss the ratings and reviews characteristics for e-commerce and then present a technique for fake rating identification using bayesian inference, recently accepted for publication at SDM'2016 (

    Bio: Mohit is a Principal Data Scientist at Flipkart. He currently focuses on Search and Personalisation at Flipkart. His research background has been in the areas of applied machine learning, active learning, speech and summarization. Prior to joining Flipkart, he has worked with Accenture Technology Labs, Chicago for five and half years in the Analytics research group and with IBM India Research Lab, Delhi for 2 years. He has obtained his M.S. and Ph.D. in Computer Science from Carnegie Mellon University and B.Tech in Computer Science and Engineering from IIT Kharagpur.

  • Betting on goodness: How Google successfully lets users edit one of the most loved data sets in the planet! (By Jayanth Mysore, Homelane)

    Over a billion users use Google Maps every month ! What most of these users (maybe even you?!) are not aware of is that Google puts the power of updating that canvas in the hands of every one of them. In this talk, I will cover the overall set of products that come together to make this happen. I'll explain how these products, along with an ML driven moderation system orchestrate the maker-checker system needed to keep users engaged while ensuring risky changes are handled effectively (oh well..)I will then share my observations about users who engage in UGC in this context - their behavioral characteristics that forms the underpinning of a product/technology strategy to engage them. Finally, I'll provide my opinions on burning questions that people typically have - won't people spam the system? what if someone messes up a National Highway? what if...?what if...?

    Bio: Jayanth works as the Chief Product and Technology Officer of a start-up called HomeLane. Prior to this, he was a Product Manager at Google for about 9 years, where he worked on Google Checkout, Google Analytics, Google AdWords and Google Maps. He globally led the development of all products that enabled users to edit core map and local listings data and moderation systems to apply these edits to Maps. Earlier in his career, Jayanth worked as a Research Engineer at Motorola's corporate research labs working on IP multicasting, QoS and adaptive media systems. Jayanth holds a BE in Electrical Engg form Guindy Engineering College, Chennai, MS from the University of Illinois at Urbana-Champaign and MBA from the Kellogg School of Management, Northwestern University.

  • Scaling the creation of multi-lingual content by empowering users to create hyper-local content (By Purvi Shah, Pratham Books)

    StoryWeaver ( is a collaborative web platform that was developed with the objective to help create access to joyful reading material in Indian languages. All the reading materials are openly licensed for free consumption and the content creation tools on the platform can be used to version and translate the content that is available. Of the 1400 stories, 550 are contributed by users on the platform - 250 of those are language versions. The talk will explore some of the nuances creating a participatory culture and how we are addressing issues of quality around user generated content.

    Bio: Purvi Shah leads all the Digital Projects at Pratham Books. In her 9 year long association with the organization she has handled various functions including branding, strategy and new initiatives. She has led Pratham Books' foray into digital products. Her current work focuses on managing two platforms - StoryWeaver (, an open source digital repository of multilingual stories for children and Donate-a-Book (, a unique crowd-funding platform that bridges the gap between those who need books and those who want to help provide books for children.

  • Crowdsourcing in Governance and Civic Society (By Thejesh GN, DataMeet)

    Crowdsourcing for data, ideas, complaints and suggestions is becoming common for both Goverments and Civic Action groups. Its one of the ways how citizen can participate in governance today. Its seen as a way to be in touch with ground reality and get the data from the real world. In this talk we will look at some of the projects that deal with corruption, water distribution, mapping etc. How they are affecting our civic discourse and engagement. Can UGC generated by Citizen change the way we are governed?

    Bio: Thejesh GN "Thej" is an Independent Technologist from Bangalore, India. He is the co-founder and chairman of DataMeet Trust. DataMeet is a community of Data Science and Open Data enthusiasts in India. He loves hacking Open Source software. He is passionate about using technology and open data in social sector. in 2010 he was awarded Infosys Community Empathy Fellowship which allowed him to spend an year working for not for profit.

  • Algorithms for Position Bias Correction in Ranked Lists (By Anirban Majumder, Amazon)

    Ranking and measuring the performance (e.g., click-through rates) of items in ranked lists plays an important role in applications like search, search advertising, etc. Estimating the click-through rate requires correcting for presentation bias. In this talk, I will present new algorithms for position bias correction. Our experimental results indicate that the proposed technique outperforms all the above baselines.

    Bio: Anirban works as a Senior ML Scientist at Amazon where he focuses on applying Machine Learning to various problems related to Ranking and Personalization. Prior to joining Amazon, Anirban worked as an MTS at Bell Labs India for 7 years. Anirban holds an MTech from IIT Kanpur in the area of data streaming algorithms and systems.

  • Spam filtering on socio-professional networks (By Uma Sawant, Megha Pandey & Sambuddha Roy, LinkedIn)

    It is commonly observed that any significant increase in the size of a socio-professional network is always accompanied by an increase in spam on the site. This is a natural corollary of the growth of a network: there are more unsuspecting members to target on such a site. Typically, spam comes with an ulterior motive - mostly monetary, at times with malicious intent separate from financial considerations. Tackling the problem of spam (and the related problem of low quality) content is of paramount importance in order to maintain a high quality user experience on the site; this directly drives engagement and other associated metrics for a website. In this talk we will discuss the many-pronged approaches that we have adopted in order to tackle the problem of spam on a popular socio-professional network. Our talk will consist largely of the methodologies utilized to this effect: we present our main theme of man plus machine underlying our work. We also discuss the content vs context angle wherein we leverage the network structure along with individual properties of entities in order to detect spam.

    Bio: Uma Sawant is a Senior Data Scientist in Linkedin, Bangalore. She is also pursuing her PhD from IIT Bombay, Mumbai. Prior to joining for PhD, she worked in Yahoo Labs, Bangalore as a research engineer. Her research interests include Information Retrieval, Machine learning and Data mining.
    Megha is a Senior Data Scientist at LinkedIn. Her current work largely focuses on detecting spam and low quality content in images shared on socio-professional platforms. She has a background in computer vision. Previously she has worked with A*Star Singapore, and Kitware Inc., NY, USA. She holds an MS in Computer Science from UNC, Chapel Hill, and B.Tech+M.Tech in Electrical Engineering from IIT Bombay.
    Sambuddha Roy is a Staff Data Scientist & Manager at LinkedIn. He is currently working on spam filtering at LinkedIn; earlier stints include Amazon and IBM Research. He holds a Ph.D. in computer science from Rutgers University, New Jersey.

  • 'TweetGrep' & 'CrystalBall': Gauge and Shape Customer Opinions in Social Media (By Samik Datta, Flipkart)

    From the perspective of an Organisation employing Social Media Analytics to invigorate customer relations, two important problems arise: gauging opinions around a gamut of aspects related to the organisation, and, shaping opinions by participating in ongoing conversations in a timely fashion. 'TweetGrep', an weakly-supervised aspect and sentiment classifier, speeds up the process of classifier deployment by alleviating the need for costly labeling exercises. 'CrystalBall' prioritises the organisation's participation in ongoing social media conversations by mining the structure as well as the content of the conversations.

    Bio: Samik is currently a Data Scientist with Flipkart, the Indian e-commerce poster boy!


Google Logo Microsoft Logo