## Computing Research Week - August 2020

### Speakers

Keynote Speakers
Panel 1 - “Security in AI/Trustworthy AI” - Guests
Panel 2 - “What is Good Research” - Guests
Newly Joined Faculty

### Program

Monday, 3/8/2020

09:10 – 09:30 Opening Remarks

09:30 – 10:20 Keynote 1: TBD (Three Backoff Dilemmas) / How to Cooperate in a Decentralized World - Seth Lewis GILBERT

10:20 – 10:45 Systems Design in the Post-Moore’s Law Era - Jialin Li Newly Joined Faculty

10:45 – 11:10 Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization - Debabrata Mahapatra

11:10 – 11:20 Break

11:20 – 11:45 Smart Contracts and Their Identity Crisis - Alvaro Gonzalez Rivas

11:45 – 12:10 A Transactional Perspective on Execute-order-validate Blockchains - Pingcheng Ruan

12:10 – 13:10 Lunch Break

13:10 – 14:00 Keynote 2: Learning with Less Data: Data-Efficient Probabilistic ML - Bryan LOW

14:00 – 14:25 Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction - Tapas Nayak

14:25 – 14:50 Sublinear Algorithms in T-interval Dynamic Networks - Irvan Jahja

14:50 – 15:00 Coffee break

15:00 – 15:25 On Partitioning Attacks against Bitcoin Peer-to-Peer Network - Muoi Tran

15:25 – 15:50 pH Watch - Leveraging Pulse Oximeters in Wearables for pH Sensing from Sweat - Ananta Balaji

15:50 – 16:15 The Effects of Digital Information Cues on Food Product Choices across Snack vs. Organic Foods - Ji Eun Shin

16:15 – 16:40 Alleviating Privacy Attacks via Causal Learning - Shruti Tople SoC Alumni in Academia

16:40 – 16:50 Break

16:50 – 18:00 Panel 1 - Security in AI/Trustworthy AI

Panel Topics:
Trends in Security Research in AI
Factoring in Privacy Considerations
Explainability in AI
Role of Research versus Regulation

Moderator: Kiran Gopinathan
Panelists:

Tuesday, 4/8/2020

09:30 – 10:20 Keynote 3: Next-generation Wearables - PEH Li Shiuan

10:20 – 10:45 Decidable Verification of Uninterpreted Programs - Umang Mathur Newly Joined Faculty

10:45 – 11:10 R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games - Dai Zhongxiang

11:10 – 11:20 Break

11:20 – 11:45 Dancing in the Land of Academic Job Search - Shweta Shinde SoC Alumni in Academia

11:45 – 12:10 Epione: Lightweight Contact Tracing with Strong Privacy - Kareem Shehata

12:10 – 13:10 Lunch Break

13:10 – 14:00 Keynote 4: AI with EQ - Analyzing Human Emotions - Stefan WINKLER

14:00 – 14:25 COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations - Ashraf Abdul

14:25 – 14:50 Strategies for Quantum Races - Maharshi Ray

14:50 – 15:00 Coffee break

15:00 – 15:25 Mining Pathway Associations from Networks of Mutual Exclusivity Interactions - Herty Liany

15:50 – 16:15 Scrutinizing Implementations of Smart Home Integrations - Kulani Mahadewa

16:15 – 16:40 HTTP-based Low-Latency Live Video Streaming: Challenges and Solutions - Abdelhak Bentaleb SoC Alumni in Academia

16:40 – 16:50 Break

16:50 – 18:00 Panel 2 - What is Good Research

Panel Topics:
Selection of Research Topics
Challenges
Theory vs. Practice
Research Contribution
Research Ethics

Moderator: Oteng Ntsweng
Panelists:

Wednesday, 5/8/2020

09:30 – 10:20 Keynote 5: Network Programmability in the Data-Plane - CHAN Mun Choon

10:20 – 10:45 Large Batch Optimization for Deep Learning: Training BERT in 76 minutes - Yang You Newly Joined Faculty

10:45 – 11:10 On Adversarial Bias and the Robustness of Fair Machine Learning - Hongyan Chang

11:10 – 11:20 Break

11:45 – 12:10 Attacks Which Do Not Kill Training Make Adversarial Learning Stronger - Zhang Jingfeng

12:10 – 13:10 Lunch Break

13:10 – 14:00 Keynote 6: The Rise of Model Counting: A Child of SAT Revolution - Kuldeep Singh MEEL

14:00 – 14:25 Estimating Total Body Surface Area Using Subject-specific Features - Kim-Ngan Nguyen

14:25 – 14:50 Hardness of Approximation of (Multi-)LCS over Small Alphabet - Rajendra Kumar

14:50 – 15:00 Coffee break

15:00 – 15:25 Detecting Transpositions from DNA Sequencing Data - Ramesh Rajaby

15:25 – 15:50 Mechanized Verification of Graph-manipulating Programs - Shengyi Wang

15:50 – 16:15 Smart Contract Repair - Xiao Liang Yu

16:15 – 16:40 Semantic Program Repair - Sergey Mechtaev SoC Alumni in Academia

Thursday, 6/8/2020

09:30 – 10:20 Keynote 7: Healthcare Analytics: Learning from Multiple Heterogeneous Data Sources - Vaibhav RAJAN

10:20 – 10:45 Finding Reachability in Faulty Graphs - Diptarka Chakraborty Newly Joined Faculty

10:45 – 11:10 Manthan: A Data-Driven Approach for Boolean Functional Synthesis - Priyanka Golia

11:10 – 11:20 Break

11:20 – 11:45 Digitalization in Two Retail Food and Beverage Companies: A Resource Orchestration View - Mariya Tsyganova

11:45 – 12:10 On the Privacy Risks of Model Explanations - Martin R. Strobel

12:10 – 13:10 Lunch Break

14:00 – 14:25 CnGAN: Generative Adversarial Networks for Cross-network User Preference Generation for Non-overlapped Users - Dilruk Perera

14:25 – 14:50 Quantitative Verification of Neural Networks And Its Security Application - Teodora Baluta

14:50 – 15:00 Coffee break

15:00 – 15:25 What Your Gait Reveals About You - Sanka Rasnayaka

15:25 – 15:50 Certifying Certainty and Uncertainty in Approximate Membership Query Structures - Kiran Gopinathan

15:50 – 16:15 Risk Prediction in Intensive Care Units - Ghanvatkar Suparna Deepak

16:15 – 16:40 A Value-Based View of Crowdfunding Rewards and Crowdfunding Performance - Lusi Yang SoC Alumni in Academia

16:40 – 17:00 Closing Remarks

### Voting

Thanks to the kind donation from Prof. Lee Wee Sun, this year we have cash award for Top three student speakers selected by an open poll among SoC:

#### Top 3: $200 Come and vote for your favorite talks: https://forms.office.com/Pages/ResponsePage.aspx?id=Xu-lWwkxd06Fvc_rDTR-gqMweBg45CBNn4bl7fLCgQxUQVYxNkQ2NjUwRTI4OU8zNzdHVjM4SEJUNS4u Here also attached the QR code for the link. You'll need a valid SoC account to vote. The weight ratio of staff-vote to student-vote is 2:1. ### Organizers Program Committee Chairs Program Committee Members Special Thanks To: ### Program Details Monday, 3/8/2020, 09:30 – 10:20 TBD (Three Backoff Dilemmas) / How to Cooperate in a Decentralized World – Seth Lewis GILBERT Distributed algorithms are all about cooperation: how do coordinate a collection of devices to accomplish some task? One of the central problems in distributed algorithms is contention resolution, where devices want to share some resource (like a wireless channel). In this talk we revisit the classical problem of randomized exponential backoff on a multiple-access channel. We examine randomized backoff with respect to throughput, robustness to jamming or failures, and the number of attempts to access the channel, discussing recent progress and open questions. Monday, 3/8/2020, 10:20 – 10:45 Systems Design in the Post-Moore’s Law Era – Jialin Li Newly Joined Faculty With the end of Dennard scaling and Moore's Law, performance improvement of general purpose processors has been stagnant for the past decade. This is in contrast to the continuous growth in network speed in data centers and telecommunication networks, and the increasing demand of modern applications. Not surprisingly, CPU processing, particularly network packet processing, has become the performance bottleneck of many large scale systems deployed in data centers. In this talk, I will first introduce a new approach to designing distributed systems in data centers that tackle the aforementioned challenge -- by co-designing distributed systems with the data center network. Specifically, my work has taken advantage of new-generation programmable switches in data centers to build novel network-level primitives with near-zero processing overhead. We then leverage these primitives to enable more efficient protocol and systems designs. I will describe two systems I built that demonstrate the benefit of this approach. The first one, Network-Ordered Paxos, virtually eliminate the coordination overhead in a state machine replication system, by relying on network sequencing primitives to consistently order user requests. The second work, Pegasus, substantially improves the load balancing of a distributed storage system -- up to a 9x throughput improvement over existing solutions -- by implementing an in-network coherence directory in the switch data plane. I will end the talk with some future work directions in this space. In particular, I will propose a portable hardware acceleration framework for Network Function Virtualization (NFV). Monday, 3/8/2020, 10:45 – 11:10 Multi-Task Learning with User Preferences: Gradient Descent with Controlled Ascent in Pareto Optimization – Debabrata Mahapatra Multi-Task Learning (MTL) is a well established paradigm for jointly learning models for multiple correlated tasks. Often the tasks conflict, requiring trade-offs between them during optimization. In such cases, multi-objective optimization based MTL methods can be used to find one or more Pareto optimal solutions. A common requirement in MTL applications, that cannot be addressed by these methods, is to find a solution satisfying user-specified preferences with respect to task-specific losses. We advance the state-of-the-art by developing the first gradient-based multi-objective MTL algorithm to solve this problem. Our unique approach combines multiple gradient descent with carefully controlled ascent to traverse the Pareto front in a principled manner, which also makes it robust to initialization. The scalability of our algorithm enables its use in large-scale deep networks for MTL. Assuming only differentiability of the task-specific loss functions, we provide theoretical guarantees for convergence. Our experiments show that our algorithm outperforms the best competing methods on benchmark datasets. Monday, 3/8/2020, 11:20 – 11:45 Smart Contracts and Their Identity Crisis – Alvaro Gonzalez Rivas Many expect Smart Contracts (SC’s) to disrupt the way contracts are done implying that SC have the potential to affect all commercial relationships. SC’s are automatization tools; therefore, proponents claim that SC’s can reduce transaction costs through disintermediation and risk reduction. This is an over-simplification of the role of relationships, contract law, and risk. We believe there is a gap in the understanding of the capabilities of SC’s. With that in mind we seek to define an amorphous term and clarify the capabilities of SC’s, intending to facilitate future SC research. We’ve examined the legal, technical, and IS views from an academic and practitioner’s perspective. We conclude that SC’s have taken many forms, becoming a suitcase word for any sort of code stored on a blockchain, including the embodiment of contractual terms; and that the immutable nature of SC’s is a barrier to their adoption in uncertain and multi-contextual environments. Monday, 3/8/2020, 11:45 – 12:10 A Transactional Perspective on Execute-order-validate Blockchains – Pingcheng Ruan Smart contracts have enabled blockchain systems to evolve from simple cryptocurrency platforms, such as Bitcoin, to general transactional systems, such as Ethereum. Catering for emerging business requirements, a new architecture called execute-order-validate has been proposed in Hyperledger Fabric to support parallel transactions and improve the blockchain's throughput. However, this new architecture might render many invalid transactions when serializing them. This problem is further exaggerated as the block formation rate is inherently limited due to other factors beside data processing, such as cryptography and consensus. In this work, we propose a novel method to enhance the execute-order-validate architecture, by reducing invalid transactions to improve the throughput of blockchains. Our method is inspired by state-of-the-art optimistic concurrency control techniques in modern database systems. In contrast to existing blockchains that adopt database's preventive approaches which might abort serializable transactions, our method is theoretically more fine-grained. Specifically, unserializable transactions are aborted before ordering and the remaining transactions are guaranteed to be serializable. For evaluation, we implement our method in two blockchains respectively, FabricSharp on top of Hyperledger Fabric, and FastFabricSharp on top of FastFabric. We compare the performance of FabricSharp with vanilla Fabric and three related systems, two of which are respectively implemented with one standard and one state-of-the-art concurrency control techniques from databases. The results demonstrate that FabricSharp achieves 25% higher throughput compared to the other systems in nearly all experimental scenarios. Moreover, the FastFabricSharp's improvement over FastFabric is up to 66%. Monday, 3/8/2020, 13:10 – 14:00 Learning with Less Data: Data-Efficient Probabilistic ML – Bryan LOW To be updated ... Monday, 3/8/2020, 14:00 – 14:25 Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction – Tapas Nayak A relation tuple consists of two entities and the relation between them, and often such tuples are found in unstructured text. There may be multiple relation tuples present in a text and they may share one or both entities among them. Extracting such relation tuples from a sentence is a difficult task and sharing of entities or overlapping entities among the tuples makes it more challenging. Most prior work adopted a pipeline approach where entities were identified first followed by finding the relations among them, thus missing the interaction among the relation tuples in a sentence. In this paper, we propose two approaches to use encoder-decoder architecture for jointly extracting entities and relations. In the first approach, we propose a representation scheme for relation tuples which enables the decoder to generate one word at a time like machine translation models and still finds all the tuples present in a sentence with full entity names of different length and with overlapping entities. Next, we propose a pointer network-based decoding approach where an entire tuple is generated at every time step. Experiments on the publicly available New York Times corpus show that our proposed approaches outperform previous work and achieve significantly higher F1 scores. Monday, 3/8/2020, 14:25 – 14:50 Sublinear Algorithms in T-interval Dynamic Networks – Irvan Jahja To be updated ... Monday, 3/8/2020, 15:00 – 15:25 On Partitioning Attacks against Bitcoin Peer-to-Peer Network – Muoi Tran We present the Erebus attack, which allows large malicious Internet Service Providers (ISPs) to isolate any targeted public Bitcoin nodes from the Bitcoin peer-to-peer network. The Erebus attack does not require routing manipulation (e.g., BGP hijacks) and hence it is virtually undetectable to any control-plane and even typical data-plane detectors. The Erebus attack also works against other cryptocurrencies with similar network codebase such as Litecoin, Bitcoin Cash, and ZCash. Monday, 3/8/2020, 15:25 – 15:50 pH Watch - Leveraging pulse oximeters in Wearables for pH Sesning from sweat – Ananta Balaji Sweat is a readily accessible bodily fluid for detecting biomarkers such as pH, glucose etc., enabling continuous and non-invasive assessment of the well-being of individuals. Our proposed work aims at leveraging pulse oximeter chips in current-day fitness trackers for real-time continuous monitoring of pH in sweat. We achieve that by fabricating a highly responsive and long-term reusable pH sweat sensor on a flexible material to achieve skin conformity, targeting the sensor to work at the reflected infrared (880nm) and red (660nm) photoplethysmograph (PPG) signal intensities recorded by pulse oximeters. The sensor can be readily mounted atop any wearable with a pulse oximeter. We have successfully demonstrated a low-cost, low-power, highly-responsive and long-term reusable wrist-worn wearable prototype, pH Watch, for real-time continuous monitoring of pH value of sweat. We conducted on-body trials with 10 participants and pH Watch achieves an accuracy of$\approx$91%. We also showed that the integration of our sweat sensor does not hinder the pulse oximeter from measuring heart rate and SpO\textsubscript2, and users can continue with their daily activities with motion artifacts removed efficiently from PPG signals using the TROIKA framework, resulting in heart rate and SpO\textsubscript2 measurements with an accuracy of$\approx$95% and$\approx$96% respectively when validated against commercial finger pulse oximeter measurements. To the best of our knowledge, pH Watch is the first demonstration of a reusable sweat sensor that can be readily integrated into today's smart watches with pulse oximeters, paving the way for ubiquitous sensing of biomarkers. Monday, 3/8/2020, 15:50 – 16:15 The Effects of Digital Information Cues on Food Product Choices across Snack vs. Organic Foods – Ji Eun Shin Improving nutrition knowledge is a key enabler for people to make an informed food product choice. Building on the foundation of information processing theory, we examine how two information sources, authority (expert grade) and social proof (user ratings) cues, affect users’ food product choices. Furthermore, based on the categorization of SEC attributes of consumer products (i.e., organic foods primarily with credence attributes vs. snacks with dominant experience attributes), we examine the moderating effect of SEC attributes on the relationship between information cues and food product choices. First, employing a regression discontinuity design (RDD) as an identification strategy, the results confirm the positive significant effect of authority cue on food choices and the interaction effect of the products with credence attributes on the relationship between authority cue and the choices. Second, using a panel regression model, we find that organic food products are more likely to be chosen when a favorable authority cue is given, while snack products are more likely to be chosen given a higher volume of a social proof cue. Monday, 3/8/2020, 16:15 – 16:40 Alleviating Privacy Attacks via Causal Learning – Shruti Tople SoC Alumni in Academia Machine learning models, especially deep neural networks have been shown to be susceptible to privacy attacks such as membership inference where an adversary can detect whether a data point was used for training a black-box model. Such privacy risks are exacerbated when a model's predictions are used on an unseen data distribution. To alleviate privacy attacks, we demonstrate the benefit of predictive models that are based on the causal relationships between input features and the outcome. We first show that models learnt using causal structure generalize better to unseen data, especially on data from different distributions than the train distribution. Based on this generalization roperty, we establish a theoretical link between causality and privacy: compared to associational models, causal models provide stronger differential privacy guarantees and are more robust to membership inference attacks. Experiments on simulated Bayesian networks and the colored-MNIST dataset show that associational models exhibit upto 80% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess. Tuesday, 4/8/2020, 09:30 – 10:20 Next-generation Wearables – PEH Li Shiuan Wearables will be as pervasive as phones. Yet, current wearables such as smartwatches and fitness trackers have fairly limited functionality. What lies ahead for wearables? In this talk, I will first take a look at the characteristics of wearable applications of the future, and discuss the implications on the architecture of next-generation wearable chips. I will then walk through recent research into wearable chip architectures and prototypes. Finally, the talk will round off with a glimpse into next-generation application scenarios for wearables. Tuesday, 4/8/2020, 10:20 – 10:45 Decidable Verification of Uninterpreted Programs – Umang Mathur Newly Joined Faculty To be updated ... Tuesday, 4/8/2020, 10:45 – 11:10 R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games – Dai Zhongxiang This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning. Tuesday, 4/8/2020, 11:20 – 11:45 Dancing in the Land of Academic Job Search – Shweta Shinde SoC Alumni in Academia With the start of a fresh academic year, many of you will find yourselves rushing for the next milestone in your PhD, be it course work, QE, research paper submissions, thesis proposal, or defense. As you plan for near-term goals, it pays to invest some time in thinking about the endgame. In this talk, I will share my experience of how I reached my destination through a winding path of being a clueless research intern, a PhD student with impostor-syndrome, a post-doc in a pandemic-ridden job market, and now an incoming tenure-track faculty at ETH Zurich. Reflecting on this journey, I will highlight the critical factors that helped me get here, most of which were non-obvious at the time. I will share these key insights along with my anecdotal experiences from my job search in the academic market. Tuesday, 4/8/2020, 11:45 – 12:10 Epione: Lightweight Contact Tracing with Strong Privacy – Kareem Shehata Contact tracing is an essential tool in containing infectious diseases such as COVID-19. Many countries and research groups have launched or announced mobile apps to facilitate contact tracing by recording contacts between users with some privacy considerations. Most of the focus has been on using random tokens, which are exchanged during encounters and stored locally on users' phones. Prior systems allow users to search over released tokens in order to learn if they have recently been in the proximity of a user that has since been diagnosed with the disease. However, prior approaches do not provide end-to-end privacy in the collection and querying of tokens. In particular, these approaches are vulnerable to either linkage attacks by users using token metadata, linkage attacks by the server, or false reporting by users. In this work, we introduce Epione, a lightweight system for contact tracing with strong privacy protections. Epione alerts users directly if any of their contacts have been diagnosed with the disease, while protecting the privacy of users' contacts from both central services and other users, and provides protection against false reporting. As a key building block, we present a new cryptographic tool for secure two-party private set intersection cardinality (PSI-CA), which allows two parties, each holding a set of items, to learn the intersection size of two private sets without revealing intersection items. We specifically tailor it to the case of large-scale contact tracing where clients have small input sets and the server's database of tokens is much larger. Tuesday, 4/8/2020, 13:10 – 14:00 AI with EQ - Analyzing Human Emotions – Stefan WINKLER While most AI researchers focus mainly on the “IQ” aspect of intelligence, emotional intelligence or “EQ” is just as important for machines to be able to interact with humans effectively and naturally. In this seminar, I will discuss work where we explore the emotional aspects of image understanding. The first is photowork: Ubiquitous and affordable digital cameras have led to an explosion of the amount of image material both amateurs and professionals have to work with. Assessing, selecting, editing, organizing, annotating, and browsing this large amount of visual data is tedious and time-consuming. Our aim in this project is to automate some of these processes. Our approaches are content-based and focus on family photo collections, where people and their emotions play a major role. The second is on profiling people, with a focus on their affective states. Facial expressions are an essential component for conveying and understanding human emotions. Contrary to most existing approaches, we avoid the classification of emotions into a few predefined categories, and instead follow a dimensional paradigm as represented by the circumplex model. Based on the tracking of facial landmark points and relevant geometrical features, we directly estimate arousal, valence, and intensity of emotion. We discuss the benefits of our method, and also present some of its applications. Tuesday, 4/8/2020, 14:00 – 14:25 COGAM: Measuring and Moderating Cognitive Load in Machine Learning Model Explanations – Ashraf Abdul Interpretable machine learning models trade off accuracy for simplicity to make explanations more readable and easier to comprehend. Drawing from cognitive psychology theories in graph comprehension, we formalize readability as visual cognitive chunks to measure and moderate the cognitive load in explanation visualizations. We present Cognitive-GAM (COGAM) to generate explanations with desired cognitive load and accuracy by combining the expressive nonlinear generalized additive models (GAM) with simpler sparse linear models. We calibrated visual cognitive chunks with reading time in a user study, characterized the trade-off between cognitive load and accuracy for four datasets in simulation studies, and evaluated COGAM against baselines with users. We found that COGAM can decrease cognitive load without decreasing accuracy and/or increase accuracy without increasing cognitive load. Our framework and empirical measurement instruments for cognitive load will enable more rigorous assessment of the human interpretability of explainable AI. Tuesday, 4/8/2020, 14:25 – 14:50 Strategies for quantum races – Maharshi Ray We initiate the study of quantum races, games where two or more quantum computers compete to solve a computational problem. While the problem of dueling algorithms has been studied for classical deterministic algorithms, the quantum case presents additional sources of uncertainty for the players. The foremost among these is that players do not know if they have solved the problem until they measure their quantum state. This question of "when to measure?"" presents a very interesting strategic problem. We develop a game-theoretic model of a multiplayer quantum race, and find an approximate Nash equilibrium where all players play the same strategy. In the two-party case, we further show that this strategy is nearly optimal in terms of payoff among all symmetric Nash equilibria. A key role in our analysis of quantum races is played by a more tractable version of the game where there is no payout on a tie; for such races we completely characterize the Nash equilibria in the two-party case. One application of our results is to the stability of the Bitcoin protocol when mining is done by quantum computers. Bitcoin mining is a race to solve a computational search problem, with the winner gaining the right to create a new block. Our results inform the strategies that eventual quantum miners should use, and also indicate that the collision probability -- the probability that two miners find a new block at the same time -- would not be too high in the case of quantum miners. Such collisions are undesirable as they lead to forking of the Bitcoin blockchain. Tuesday, 4/8/2020, 15:00 – 15:25 Mining Pathway Associations from Networks of Mutual Exclusivity Interactions – Herty Liany To be updated ... Tuesday, 4/8/2020, 15:50 – 16:15 Scrutinizing Implementations of Smart Home Integrations – Kulani Mahadewa A key feature of the booming smart home is the integration of a wide assortment of technologies, including variousstandards, proprietary communication protocols, and heterogeneous platforms. Due to customization, unsatisfied assumptions, andincompatibility in the integration, critical security vulnerabilities are likely to be introduced by the integration. Hence, this work addressesthe security problems in smart home systems from an integration perspective, as a complement to numerous studies that focus on theanalysis of individual techniques. We propose HOMESCAN, an approach that examines the security of the implementations of smarthome systems. It extracts the abstract specification of application-layer protocols and internal behaviors of entities, so that it is able toconduct an end-to-end security analysis against various attack models. Applying HOMESCAN on three extensively-used smart homesystems, we have found twelve non-trivial security issues, which may lead to unauthorized remote control and credential leakage. Tuesday, 4/8/2020, 16:15 – 16:40 HTTP-based Low-Latency Live Video Streaming: Challenges and Solutions – Abdelhak Bentaleb SoC Alumni in Academia Low-Latency Live (LLL) streaming remains a challenge in the adaptive streaming space due to the stringent requirements for not just quality and rebuffering, but also latency. The goal of LLL streaming is to achieve near one-second latency when delivering live content. Making latency, i.e., the time difference between video capture and playback, short is not easy because in that period, the video has to be captured, encoded, packaged and transferred from a server to a client, which may be far away from each other and subject to unpredictable network conditions. What makes this even more difficult is that latency is oftentimes not the only consideration in users’ desired quality of experience (QoE) — other conflicting factors such as high video quality, low rebuffering rate and fewer quality switches also need to be considered. Many solutions have been proposed to tackle streaming in general, but only few have looked into better catering to the more challenging LLL streaming scenarios. In this talk, I will present the main challenges in LLL streaming and then present our last solution that tackles the aforementioned challenges to enhance the LLL streaming performance. The proposed solution re-visit and extend several important components (collectively called Low-on-Latency, LoL) in adaptive streaming systems. LoL includes three essential modules: bitrate adaptation (both heuristic and learning-based), playback control and throughput measurement. Wednesday, 5/8/2020, 09:30 – 10:20 Network Programmability in the Data-Plane – CHAN Mun Choon The arrival of programmable switching ASICs has made it possible to implement distributed algorithms and applications in the data-plane of the network switch. In this talk, I will first present the motivation and recent evolution of network programming. Next, I will present some recent work we have done using data-plane programmability, including: (1) BurstRadar, a system that monitors microbursts, (2) SQR, a scheme for in-network packet loss recovery and (3) DPTP, a time synchronization protocol with the core logic running in the data-plane. All systems have been implemented on the Barefoot Tofino Switch using the P4 language. Finally, I will highlight some on-going work and future directions. Wednesday, 5/8/2020, 10:20 – 10:45 Large Batch Optimization for Deep Learning: Training BERT in 76 minutes – Yang You Newly Joined Faculty In the last three years, supercomputers have become increasingly popular in leading AI companies. Amazon built a High-Performance Computing (HPC) cloud. Google released its first 100-petaFlop supercomputer (TPU Pod). Facebook made a submission on the Top500 supercomputer list. Why do they like supercomputers? Because the computation of deep learning is very expensive. For example, even with 16 TPUs, BERT training takes more than 3 days. On the other hand, supercomputers can process 10^17 floating-point operations per second. So why don’t we just use supercomputers and finish the training of deep neural networks in a very short time? The reason is that deep learning does not have enough parallelism to make full use of thousands or even millions of processors in a typical modern supercomputer. There are two directions for parallelizing deep learning: model parallelism and data parallelism. Model parallelism is very limited. For data parallelism, current optimizers can not scale to thousands of processors because large-batch training is a sharp minimum problem. In this talk, I will introduce LARS (Layer-wise Adaptive Rate Scaling) and LAMB (Layer-wise Adaptive Moments for Batch training) optimizers, which can find more parallelism for deep learning. They can not only make deep learning systems scale well, but they can also help real-world applications to achieve higher accuracy. Since 2017, all the Imagenet training speed world records have been achieved using LARS. LARS was added to MLperf, which is the industry benchmark for fast deep learning. Google used LAMB to reduce BERT training time from 3 days to 76 minutes and achieve new state-of-the-art results on GLUE, RACE, and SQuAD benchmarks. The approaches introduced in this talk have been used by state-of-the-art distributed systems at Google, Intel, NVIDIA, Sony, Tencent, and so on. Wednesday, 5/8/2020, 10:45 – 11:10 On Adversarial Bias and the Robustness of Fair Machine Learning – Hongyan Chang Optimizing prediction accuracy can come at the expense of fairness. Towards minimizing discrimination against a group, fair machine learning algorithms strive to equalize the behavior of a model across different groups, by imposing a fairness constraint on models. However, we show that giving the same importance to groups of different sizes and distributions, to counteract the effect of bias in training data, can be in conflict with robustness. We analyze data poisoning attacks against group-based fair machine learning, with the focus on equalized odds. An adversary who can control sampling or labeling for a fraction of training data, can reduce the test accuracy significantly beyond what he can achieve on unconstrained models. Adversarial sampling and adversarial labeling attacks can also worsen the model’s fairness gap on test data, even though the model satisfies the fairness constraint on training data. We analyze the robustness of fair machine learning through an empirical evaluation of attacks on multiple algorithms and benchmark datasets. Wednesday, 5/8/2020, 11:45 – 12:10 Attacks Which Do Not Kill Training Make Adversarial Learning Stronger – Zhang Jingfeng Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization. Wednesday, 5/8/2020, 13:10 – 14:00 The Rise of Model Counting: A Child of SAT Revolution – Kuldeep Singh MEEL The paradigmatic NP-complete problem of Boolean satisfiability (SAT) is a central problem in Computer Science. The past 20 years have witnessed a SAT revolution with the development of conflict-driven clause-learning (CDCL) SAT solvers. Such solvers combine a classical backtracking search with a rich set of effective heuristics. While 20 years ago SAT solvers were able to solve instances with at most a few hundred variables, modern SAT solvers solve instances with up to millions of variables in a reasonable time. The SAT revolution opens up opportunities to design practical algorithms with rigorous guarantees for problems in complexity classes beyond NP by replacing a NP oracle with a SAT Solver. In this talk, we will discuss how we use SAT revolution to design practical algorithms for one of the fundamental problems in formal methods and artificial intelligence: model counting. Model counting is a fundamental computational problem with applications in diverse areas spanning neural network verification, reliability estimation, explainable AI, probabilistic inference, security vulnerability analysis, and the like. While counting has been studied extensively by theoreticians for the past three decades. Yet, in spite of this extensive study, it has been extremely difficult to reduce this theory to practice. We examine how the process of revisiting and refining the theory to leverage the SAT revolution has led to the development of the the first scalable framework for model counting: ApproxMC. ApproxMC can handle industrial-scale problem instances involving hundreds of thousands of variables, while providing provably strong approximation guarantees. Wednesday, 5/8/2020, 14:00 – 14:25 Estimating Total Body Surface Area Using Subject-specific Features – Kim-Ngan Nguyen Estimating total body surface area (TBSA) is an important task as it helps doctors to evaluate the severity of the wound for each patient. Ideally, a 3D model of each subject’s body is needed to accurately measure the TBSA. However, these 3D scan machines are expensive and not portable. Traditional methods, such as Rule of Nine [1], Rule of Palm [2], and Lund-Browder chart [3] derive TBSA based on the generic human figures. As a result, they are inaccurate because they do not consider the variation between the human body. To handle this limitation, other methods estimate TBSA from the height and weight of a specific subject [4, 5, 6]. So far, none of the previous works consider other visual information of a subject, such as facial and hand features. In this research work, we apply a linear regression model to identify the relationship between TBSA of the 3D model and subject-specific features, namely facial feature, hand feature, height, weight, and gender. Wednesday, 5/8/2020, 14:25 – 14:50 Hardness of Approximation of (Multi-)LCS over Small Alphabet – Rajendra Kumar The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc. It is well known that (decision version of) the problem of finding the length of a LCS of an arbitrary number of input sequences (which we refer to as Multi-LCS problem) is NP-complete. Jiang and Li [SICOMP'95] showed that if Max-Clique is hard to approximate within a factor of$s$then Multi-LCS is also hard to approximate within a factor of$\Theta(s)$. By the NP-hardness of the problem of approximating Max-Clique by Zuckerman [ToC'07], for any constant$\delta>0$, the length of a LCS of arbitrary number of input sequences of length$n$each, cannot be approximated within an$n^{1-\delta}$-factor in polynomial time unless$\mathtt{P}=\mathtt{NP}$. However, the reduction of Jiang and Li assumes the alphabet size to be$\Omega(n)$. So far no hardness result is known for the problem of approximating Multi-LCS over sub-linear sized alphabet. On the other hand, it is easy to get$1/|\Sigma|$-factor approximation for strings of alphabet$\Sigma$. In this work, we make a significant progress towards proving hardness of approximation over small alphabet by showing a polynomial-time reduction from the well-studied \emph{densest$k$-subgraph} problem with \emph{perfect completeness} to approximating Multi-LCS over alphabet of size$poly(n/k)$. As a consequence, from the known hardness result of densest$k$-subgraph problem (e.g. [Manurangsi, STOC'17]) we get that no polynomial-time algorithm can give an$n^{-o(1)}$-factor approximation of Multi-LCS over an alphabet of size$n^{o(1)}\$, unless the Exponential Time Hypothesis is false.

Wednesday, 5/8/2020, 15:00 – 15:25 Detecting transpositions from DNA sequencing data – Ramesh Rajaby

To be updated ...

Wednesday, 5/8/2020, 15:25 – 15:50 Mechanized Verification of Graph-manipulating Programs – Shengyi Wang

We develop powerful and general techniques to mechanically verify realistic programs that manipulate heap-represented graphs. These graphs can exhibit well-known organization principles, such as being a directed acyclic graph or a disjoint-forest; alternatively, these graphs can be totally unstructured. The common thread for such structures is that they exhibit deep intrinsic sharing and can be expressed using the language of graph theory. We construct a modular and general setup for reasoning about abstract mathematical graphs and use separation logic to define how such abstract graphs are represented concretely in the heap. We develop a Localize rule that enables modular reasoning about such programs, and show how this rule can support existential quantifiers in postconditions and smoothly handle modified program variables. We demonstrate the generality and power of our techniques by integrating them into the Verified Software Toolchain and certifying the correctness of seven graph-manipulating programs written in CompCert C, including a 400-line generational garbage collector for the CertiCoq project. While doing so, we identify two places where the semantics of C is too weak to define generational garbage collectors of the sort used in the OCaml runtime. Our proofs are entirely machine-checked in Coq.

Wednesday, 5/8/2020, 15:50 – 16:15 Smart Contract Repair – Xiao Liang Yu

Smart contracts are automated or self-enforcing contracts that can be used to exchange assets without having to place trust in third parties. Many commercial transactions use smart contracts due to their potential benefits in terms of secure peer-to-peer transactions independent of external parties. Experience shows that many commonly used smart contracts are vulnerable to serious malicious attacks which may enable attackers to steal valuable assets of involving parties. There is therefore a need to apply automated analysis and automated repair techniques to detect and repair bugs in smart contracts to ensure that vulnerabilities are addressed promptly. In this talk, I will present the first general-purpose automated smart contract repair approach that is also gas-aware. Our method considers the gas usage of the candidate patches by leveraging our novel notion of gas dominance relationship. The relevant background will also be presented.

Wednesday, 5/8/2020, 16:15 – 16:40 Semantic Program Repair – Sergey Mechtaev SoC Alumni in Academia

The goal of automated program repair is to modify a given incorrect program to satisfy a given specification. One of the key challenges of program repair is its limited scalability when it is applied to large programs. This talk presents an approach of using symbolic execution to analyse the semantic impact of synthesized patches in order to alleviate the scalability problem. The presented approach scales to complex real-world software, and is able to repair non-trivial defects, such as the famous Heartbleed vulnerability. Besides, this approach can provide correctness guarantees of the generated patches when a formal specification or a reference implementation are available.

Thursday, 6/8/2020, 09:30 – 10:20 Healthcare Analytics: Learning from Multiple Heterogeneous Data Sources – Vaibhav RAJAN

Healthcare Analytics: Learning from Multiple Heterogeneous Data Sources

Thursday, 6/8/2020, 10:20 – 10:45 Finding Reachability in Faulty Graphs – Diptarka Chakraborty Newly Joined Faculty

In the real world, networks are prone to failures. Most of the time, such failures are unavoidable and also unpredictable in physical systems. Due to this reason edge (or vertex) failure model draws considerable attention from the researchers in the recent past. In this talk, we will survey recent advancements on various problems related to connectivity in a graph under failures.

Thursday, 6/8/2020, 10:45 – 11:10 Manthan: A Data-Driven Approach for Boolean Functional Synthesis – Priyanka Golia

Given a relational specification between Boolean inputs and outputs, the problem of Boolean functional synthesis is to construct each output as a function of the inputs such that the specification is met. Synthesizing Boolean functions is one of the challenging problems in Computer Science. It has seen multiple proposals, including incremental determination, decomposition techniques from knowledge compilation, and counterexample guided refinement techniques via self-substitutions. In this talk, we will discuss Manthan, a novel data-driven approach for Boolean functional synthesis. Manthan views the problem of functional synthesis as a classification problem, relying on advances in constrained sampling for data generation, and advances in automated reasoning for a novel proof-guided refinement and provable verification. On an extensive evaluation over 609 benchmarks, Manthan significantly improves the current state of the art, solving 356 benchmarks compared to 280, which is the most solved by a state of the art technique.

Thursday, 6/8/2020, 11:20 – 11:45 Digitalization in Two Retail Food and Beverage Companies: A Resource Orchestration View – Mariya Tsyganova

Anchoring on the theoretical lens of resource orchestration view (ROV), this research investigates qualitatively how two Singapore-based, family-owned food and beverage (F&B) companies digitally transformed their businesses. Our analysis based on interview data reveals that core to digitalization is the capacity to appropriately deploy digital technologies to accelerate resource liquefaction. This enables companies to take advantage of market opportunities. Other than providing insights on how resource liquefaction can be, this research extends ROV, by suggesting that with digital technologies, companies could liquefy, i.e.: expand and contract their resources on demand.

Thursday, 6/8/2020, 11:45 – 12:10 On the Privacy Risks of Model Explanations – Martin R. Strobel

Privacy and transparency are two key elements of trustworthy machine learning. Model explanations can provide more insight into a model's decisions on input data. This, however, can impose a significant privacy risk to the model's training set. We analyze whether an adversary can exploit model explanations to infer sensitive information about the model's training set. We investigate this research problem primarily using membership inference attacks: inferring whether a data point belongs to the training set of a model given its explanations. We study this problem for three popular types of model explanations: backpropagation-based, perturbation-based and example-based attribution methods. We devise membership inference attacks based on these model explanations, and extensively test them on a variety of datasets. We show that both backpropagation- and example-based explanations can leak a significant amount of information about individual data points in the training set. Finally, we discuss the resistance of perturbation-based attribution methods to existing attack models; interestingly, their resistance to such attacks is related to a crucial shortcoming of such model explanations uncovered by recent works.

Thursday, 6/8/2020, 13:10 – 14:00 Opacity and Unintended Outcomes in Blockchains: Algorithmic Governance through the Proof-of-Stake Consensus Mechanism – HAHN Jungpil

Algorithms are increasingly instrumental in making or mediating decisions, a phenomenon referred to as algorithmic governance. The literature suggests algorithmic governance can be fraught with opacity and unintended outcomes. If opacity is a problem, a natural question that arises is how algorithmic governance can be designed to address emergent unintended consequences. In this talk, I will focus on blockchains as a prominent example of algorithmic governance. Blockchains are distributed databases of transactions, best known as the underlying technology of cryptocurrencies and several other major Fintech innovations. Blockchains are managed by a network of agents without a defined central operator, and use consensus mechanisms to incentivize agreement about the blockchain’s content. Decentralization of decision-making power in consensus mechanisms is critical, giving rise to tamper-resistant data storage, the main value proposition of blockchains. However, recent attacks illustrate that decentralization cannot be taken for granted. We use an agent-based model and simulation of proof-of-stake, a salient consensus mechanism, to analyze how the interplay of agent behavior and technological artifact jointly produces an emergent outcome, that is, the distribution of decision-making power. We show how manipulations of key blockchain parameters affect the degree of decentralization. Our findings contribute to understanding the mechanisms that lead to decentralization in blockchain governance and can be applied in designing proof-of-stake blockchains that are prone to decentralization and therefore more secure. We also suggest agent-based simulations as a method to develop design theories for algorithmic governance and show how agent-based simulations provide a tool for understanding the mechanisms that lead to emergent and sometimes unintended outcomes in algorithmic governance.

Thursday, 6/8/2020, 14:00 – 14:25 CnGAN: Generative Adversarial Networks for Cross-network user preference generation for non-overlapped users – Dilruk Perera

A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are ignored. As a solution, we propose CnGAN, a novel multi-task learning based, encoder-GAN-recommender architecture. The proposed model synthetically generates source network user preferences for non-overlapped users by learning the mapping from target to source network preference manifolds. The resultant user preferences are used in a Siamese network based neural recommender architecture. Furthermore, we propose a novel user-based pairwise loss function for recommendations using implicit interactions to better guide the generation process in the multi-task learning environment. We illustrate our solution by generating user preferences on the Twitter source network for recommendations on the YouTube target network. Extensive experiments show that the generated preferences can be used to improve recommendations for non-overlapped users. The resultant recommendations achieve superior performance compared to the state-of-the-art cross-network recommender solutions in terms of accuracy, novelty and diversity.

Thursday, 6/8/2020, 14:25 – 14:50 Quantitative Verification of Neural Networks And Its Security Applications – Teodora Baluta

Neural networks are increasingly employed in safety-critical domains. This has prompted interest in verifying or certifying logically encoded properties of neural networks. Prior work has largely focused on checking existential properties, wherein the goal is to check whether there exists any input that violates a given property of interest. However, neural network training is a stochastic process, and many questions arising in their analysis require probabilistic and quantitative reasoning, i.e., estimating how many inputs satisfy a given property. To this end, our paper proposes a novel and principled framework to quantitative verification of logical properties specified over neural networks. Our framework is the first to provide PAC-style soundness guarantees, in that its quantitative estimates are within a controllable and bounded error from the true count. We instantiate our algorithmic framework by building a prototype tool called NPAQ that enables checking rich properties over binarized neural networks. We show how emerging security analyses can utilize our framework in 3 concrete point applications: quantifying robustness to adversarial inputs, efficacy of trojan attacks, and fairness/bias of given neural networks.

Thursday, 6/8/2020, 15:00 – 15:25 What Your Gait Reveals About You – Sanka Rasnayaka

Modern personal devices measure and store vast amounts of sensory data such as Inertial Measurement Unit (IMU) data. These on-body sensor data can be used as a biometric by observing human movement (gait). People are less cautious about privacy vulnerabilities of such sensory data. We highlight what ancillary information can be derived from on-body sensor data and the effect of sensor location towards these privacy invasions. By analyzing sensor locations with respect to privacy and utility we discover sensor locations which preserve utility such as biometric authentication while reducing privacy vulnerability. We have collected (1) a multi-stream on-body IMU dataset using 3 IMU sensors, consisting of 6 sensor locations,6 actions along with various physical, personality and socio-economic characteristics from 53 participants. (2) an opinion survey of the relative importance of each attribute from 566 participants. Using these datasets we show that gait data reveals a lot of personal information, which maybe a privacy concern. The opinion survey reveals a ranking of the physical characteristics based on the perceived importance. Using a privacy vulnerability index we show that sensors located in the front pocket/wrist are more privacy invasive compared to back-pocket/bag which are less privacy invasive without a signiﬁcant loss of utility as a biometric.

Thursday, 6/8/2020, 15:25 – 15:50 Certifying Certainty and Uncertainty in Approximate Membership Query Structures – Kiran Gopinathan

Approximate Membership Query structures (AMQs) rely on randomisation for time- and space-efficiency, while introducing a possibility of false positive and false negative answers. Correctness proofs of such structures involve subtle reasoning about bounds on probabilities of getting certain outcomes. Because of these subtleties, a number of unsound arguments in such proofs have been made over the years. In this work, we address the challenge of building rigorous and reusable computer-assisted proofs about probabilistic specifications of AMQs. We describe the framework for systematic decomposition of AMQs and their properties into a series of interfaces and reusable components. We implement our framework as a library in the Coq proof assistant and showcase it by encoding in it a number of non-trivial AMQs, such as Bloom filters, counting filters, quotient filters and blocked constructions, and mechanising the proofs of their probabilistic specifications. We demonstrate how AMQs encoded in our framework guarantee the absence of false negatives by construction. We also show how the proofs about probabilities of false positives for complex AMQs can be obtained by means of verified reduction to the implementations of their simpler counterparts. Finally, we provide a library of domain-specific theorems and tactics that allow a high degree of automation in probabilistic proofs.

Thursday, 6/8/2020, 15:50 – 16:15 Risk Prediction in Intensive Care Units – Ghanvatkar Suparna Deepak

Risk prediction models in Intensive Care Units (ICU), are crucial for clinical decision support tasks such as identifying high-risk patients and prioritizing their care. These models can utilize different sources of patient information in ICUs: Electronic Health Records record demographics, laboratory results, vitals, and clinical notes; and Bedside Monitors record very high-frequency data, such as ECG. Most of the data is temporal but at different time scales. For example, vitals may be measured once in a few hours, while ECG consists of 125 measurements per second. Previous mortality models have not used these temporal data sources at very different time scales together for mortality prediction. We take the first step towards building such a model and develop MTS-RNN, a deep recurrent neural network architecture for modeling multiple time-series at different resolutions. Preliminary experiments show that our model, by integrating these heterogeneous data sources, outperforms state-of-the-art in-hospital mortality prediction models.

Thursday, 6/8/2020, 16:15 – 16:40 A Value-Based View of Crowdfunding Rewards and Crowdfunding Performance – Lusi Yang SoC Alumni in Academia

In this study, we integrate the insights of consumption value theory and the crowdfunding literature to develop a value-based view of crowdfunding rewards to systematically theorize and synthesize the underlying mechanisms through which the various rewards offered by crowdfunding projects can incentivize crowdfunders’ backing decisions. Identifying three basic value dimensions carried by different crowdfunding rewards — utilitarian value, socioemotional value, and participatory value, our value-based view posits that these three values satisfy crowdfunders’ different needs and thus motivate them to back a project on distinct grounds and considerations. Through this view, the performance of a crowdfunding project can thus be shaped by the joint effects of the three values delivered by all of its offered rewards. Given their distinct incentivizing mechanisms, we further posit that socioemotional value and participatory value will potentially substitute the effects of utilitarian value, thus leading to an inverted U-shaped effect of the combination of the three values on crowdfunding performance. Evidence from a novel multimethod design that integrates field data, topic modeling, and a survey with crowdfunding professionals largely supports our theoretical predictions. Our theoretical framework and empirical findings have important implications for both theory and practice.
Paper full text: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3591606