09:00 – 10:30 Challenges in Reliable Machine Learning - Kamalika Chaudhuri [slides]
09:00 – 10:30 A New Learning Paradigm - Green Learning - C.-C. Jay Kuo [slides]
Tuesday, 4/1/2022, 09:00 – 10:30
Challenges in Reliable Machine Learning - Kamalika Chaudhuri
As machine learning is increasingly deployed, there is a need for reliable and robust methods that go beyond simple test accuracy. In this talk, we will discuss two challenges that arise in reliable machine learning. The first is robustness to adversarial examples, that are small imperceptible perturbations to legitimate test inputs that cause machine learning classifiers to misclassify. While recent work has proposed many attacks and defenses, why exactly they arise still remains a mystery. In this talk, we'll take a closer look at this question.
The second problem is overfitting, that many generative models are known to be prone to. Motivated by privacy concerns, we formalize a form of overfitting that we call data-copying -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample test for detecting data-copying, and study the performance of our test on several canonical models and datasets.
Kamalika Chaudhuri is currently an Associate Professor at the University of California, San Diego. She received a Bachelor of Technology degree in Computer Science and Engineering in 2002 from Indian Institute of Technology, Kanpur, and a PhD in Computer Science from University of California at Berkeley in 2007. After a postdoctoral stint at UCSD, she joined the CSE department at UC San Diego as an assistant professor in 2010. She received an NSF CAREER Award in 2013 and a Hellman Faculty Fellowship in 2012. She has served as the program co-chair for AISTATS 2019 and ICML 2019.
Kamalika's research interests lie in the foundations of trustworthy machine learning -- or machine learning beyond accuracy, which includes problems such as learning from sensitive data while preserving privacy, learning under sampling bias, and in the presence of an adversary.
Wednesday, 5/1/2022, 09:00 – 10:30
Filters - Michael Bender
A Bloom filter maintains a compact, probabilistic representation of a set S of keys from a universe U. The price of being small is that there is a (bounded) probability of false positives.
This talk reviews alternative filter designs, focusing on quotient and cuckoo filters. These newer filters are faster, more space efficient, and support a broader range of operations. We focus on both the theoretical and engineering issues that arise. We then show how to design a filter can adapt based on the results of past queries.
Michael A. Bender is the David R. Smith Leading Scholar of Computer Science at Stony Brook University. His research interests span the areas of data structures and algorithms, parallel computing, cache- and I/O-efficient computing, and scheduling. He has won several awards, including a Test-of-Time award, two Best Paper Awards, and five awards for graduate and undergraduate teaching. He was a member of the Sandia team that won the CPA R&D 100 Award for scheduling in parallel computers.
Bender was Founder and Chief Scientist at Tokutek, Inc, an enterprise database company, which was acquired by Percona in 2015. He has held Visiting Scientist positions at both MIT and King's College London.
Bender received his A.B. in Applied Mathematics from Harvard University in 1992 and obtained a D.E.A. and Magistère in Computer Science from the École Normale Supérieure de Lyon, France in 1993. He completed a Ph.D. on Scheduling Algorithms from Harvard University in 1998.
Wednesday, 5/1/2022, 11:00 – 12:30
Attacking non-private machine learning – Nicholas Carlini
A machine learning model is private if it does not remember (too much) about individual training examples in its training dataset. Models must be private especially when trained on sensitive or personal data, such as medical data, emails, or text messages. This talk surveys different ways to attack a machine learning model's privacy, allowing an adversary who can query the model to violate training data privacy.
First, we discuss membership inference attacks that can predict whether or not any particular example was contained in a models' training dataset. We then show how to extend this into an attack that can actually extract individual examples from the training dataset. For example, in the case of the GPT-2 language model, we can query the model to extract individual personally identifiable information (names, phone numbers, physical addresses), URLs, IRC chat logs, and UUIDs. And finally, we show how to attack defenses that aim to prevent these forms of extraction attacks. Training private machine learning models remains a challenging problem.
Nicholas Carlini is a research scientist at Google Brain. He studies the security and privacy of machine learning, for which he has received best paper awards at ICML, USENIX Security and IEEE S&P. He obtained his PhD from the University of California, Berkeley in 2018
Wednesday, 5/1/2022, 16:00 – 17:30
The Complexity of Compression – Rahul Santhanam
Data compression is central to computer science. How much can a given dataset be compressed? How easy is it to decide whether a dataset is compressible or not? These are natural questions, but it is by no means obvious how to make them rigorous.
The theory of Kolmogorov complexity provides a rigorous framework for this and other fundamental questions in computer science. I will provide a gentle introduction to Kolmogorov complexity and its resource-bounded variants, and discuss applications to complexity theory, learning theory and cryptography.
Rahul Santhanam is Professor of Computer Science at the University of Oxford, and Tutorial Fellow in Computer Science at Magdalen College. He was educated at the Indian Institute of Technology (Madras) and the University of Chicago, and taught at the University of Edinburgh before moving to Oxford in 2016. He works mainly in complexity theory, but also has interests in algorithms, learning theory, cryptography and game theory.
Thursday, 6/1/2022, 09:00 – 10:30
Correct-by-Construction Cryptography Without Performance Compromises - Adam Chlipala
Many of the most widely used software packages depend on cryptography, for instance to secure Web connections using the TLS protocol. At the hearts of many cryptographic protocols are big-integer-arithmetic routines that operate modulo large prime numbers. It is perhaps not widely appreciated that the state of practice has been, when choosing a new prime modulus, to rewrite the arithmetic code from scratch, as hundreds of lines of C or assembly. The same has traditionally been done to adapt an algorithm for a new category of hardware targets. Why? This level of hand optimization was believed to be important to achieve performance and correctness/security simultaneously.
Our team at MIT developed a better way, the Fiat Cryptography tool that generates this style of arithmetic code automatically. Even better, the tooling is proven correct rigorously, subject to machine checking of the proof, using the Coq theorem prover. Our pipeline generates roughly the fastest-known C code for all elliptic curves, at the same time as we improve confidence in correctness over more-established code bases. Since the first wide deployment of our code in Chrome in early 2018, Fiat Cryptography has become part of the development process for several high-profile open-source projects, and a majority of secure Web connections by browsers run our code.
I will present some key background on the two central topics: machine-checked mathematical proofs and optimization techniques for cryptographic arithmetic. Then I will explain the Fiat Cryptography approach, which draws on several traditions within the field of compilers. After explaining how we evaluate effectiveness of our tooling, I will suggest some worthwhile future directions within this field of high-assurance cryptography.
Adam Chlipala has been on the faculty in computer science at MIT since 2011. He did his undergrad at Carnegie Mellon and his PhD at Berkeley, and his research focuses on clean-slate redesign of computer-systems infrastructure, typically taking advantage of machine-checked proofs of functional correctness. Much of his work uses the Coq proof assistant, about which he has written a popular book, "Certified Programming with Dependent Types." He most enjoys finding opportunities for drastic simplification over incumbent abstractions in computer systems, and some favorite tools toward that end are object-capability systems, transactions, proof-carrying code, and high-level languages with whole-program optimizing compilers. Some projects particularly far along the real-world-adoption curve are Fiat Cryptography, for proof-producing generation of low-level cryptographic code, today run by Chrome for most HTTPS connections; and Ur/Web, a production-quality domain-specific language for Web applications.
Thursday, 6/1/2022, 16:00 – 17:30
Learning-By-Doing: Using the FMP Python Notebooks for Audio and Music Processing - Meinard Müller
In this talk, I introduce a novel collection of educational material for teaching and learning fundamentals of music processing (FMP), focusing on the audio domain. This collection, referred to as FMP notebooks discusses well-established topics in Music Information Retrieval (MIR) as motivating application scenarios, including beat tracking, chord recognition, music synchronization, audio fingerprinting, music segmentation, and source separation, to name a few. The FMP notebooks provide detailed textbook-like explanations of central techniques and algorithms combined with Python code examples that illustrate how to implement the theory.
All components, including the introductions of MIR scenarios, illustrations, sound examples, technical concepts, mathematical details, and code examples, are integrated into a consistent and comprehensive framework based on Jupyter notebooks. I show how the FMP notebooks can be used to study both theory and practice, generate educational material for lectures, and provide baseline implementations for many MIR tasks.
At the same time, I highlight challenges and new research directions within the field of signal processing using music as a motivating and tangible domain.
Meinard Müller received the Diploma degree (1997) in mathematics and the Ph.D. degree (2001) in computer science from the University of Bonn, Germany. Since 2012, he holds a professorship for Semantic Audio Signal Processing at the International Audio Laboratories Erlangen, a joint institute of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Fraunhofer Institute for Integrated Circuits IIS. His research interests include music processing, music information retrieval, audio signal processing, and motion processing. He wrote a monograph titled "Information Retrieval for Music and Motion" (Springer-Verlag, 2007) and a textbook titled "Fundamentals of Music Processing" (Springer-Verlag, 2015). In 2020, he was elevated to IEEE Fellow for contributions to music signal processing.
Friday, 7/1/2022, 09:00 – 10:30
A New Learning Paradigm - Green Learning - C.-C. Jay Kuo
There has been a rapid development of artificial intelligence and machine learning applications in the last decade. The core lies in a large amount of annotated training data and deep learning networks. There is an emerging concern that deep learning solutions are not friendly to the environment due to their large carbon footprint. As sustainability becomes increasingly important, it is desired to investigate a new learning paradigm that is competitive with deep learning in performance yet with significantly lower carbon footprint. I have been devoted to green learning since 2014. The technology has become more mature nowadays. Green learning contains quite a few innovative ideas, including unsupervised feature learning, easy to hard progress learning, etc. It has been successfully applied to image classification, point cloud classification, segmentation, registration, texture synthesis, face verification and gender classification, anomaly localization, etc. Performance comparison between deep learning and green learning for several applications will be presented to demonstrate the potential of green learning.
Dr. C.-C. Jay Kuo received his Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as William M. Hogue Professor, Distinguished Professor of Electrical and Computer Engineering and Computer Science, and Director of the Media Communications Laboratory. His research interests are in visual computing and communication. He is a Fellow of AAAS, NAI, IEEE and SPIE. Dr. Kuo has received numerous awards for his outstanding research contributions, including the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, the 2019 IEEE Signal Processing Society Claude Shannon-Harry Nyquist Technical Achievement Award, the 2020 IEEE TCMC Impact Award, the 72nd annual Technology and Engineering Emmy Award (2020), and the 2021 IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award. Dr. Kuo was Editor-in-Chief for the IEEE Transactions on Information Forensics and Security (2012-2014) and the Journal of Visual Communication and Image Representation (1997-2011). He is currently the Editor-in-Chief for the APSIPA Trans. on Signal and Information Processing (2022-2023). He has guided 161 students to their PhD degrees and supervised 31 postdoctoral research fellows.
Friday, 7/1/2022, 16:00 – 17:30
Testing Software and Hardware against Speculation Contracts - Boris Köpf
Attacks such as Spectre and Meltdown use a combination of speculative execution and shared microarchitectural state to leak information across security domains. Defeating them without massive performance overheads requires careful co-design of software and hardware. In this talk I will present a principled approach to this problem, based on hardware-software contracts for secure speculation, and on techniques that enable testing of software and hardware against them.
Boris Köpf is a Principal Researcher in the Confidential Computing group at Microsoft Research Cambridge, where he works on techniques for tracking information flow in microarchitecture and machine learning systems. Prior to joining MSR, he was a tenured faculty at the IMDEA Software Institute, a postdoc at the Max Planck Institute for Software Systems, and a Ph.D. student at ETH Zurich. Boris served as the PC co-chair of the IEEE Computer Security Foundations Symposium and has received best paper awards from the USENIX Security Symposium and the IEEE Symposium on Security and Privacy.