Julius O. Smith III: History of Virtual Musical Instruments and Effects Based on Physical Modeling

This presentation visits historical developments leading to today’s virtual musical instruments and effects based on physical modeling principles. It is hard not to begin with Daniel Bernoulli and d’Alembert who launched the modal representation (leading to both “additive” and “subtractive” synthesis) and the traveling-wave solution of the wave-equation for vibrating-strings, respectively, in the 18th century. Newtonian mechanics generally suffices mathematically for characterizing physical musical instruments and effects, although quantum mechanics is necessary for fully deriving the speed of sound in air. In addition to the basic ballistics of Newton’s Law f = ma, and spring laws relating force to displacement, friction models are needed for modeling the aggregate behavior of vast numbers of colliding particles. The resulting mathematical models generally consist of ordinary and partial differential equations expressing Newton’s Law, friction models, and perhaps other physical relationships such as temperature dependence. Analog circuits are similarly described. These differential-equation models are then solved in real time on a discrete time-space grid to implement musical instruments and effects. The external forces applied by the performer (or control voltages, etc.) are routed to virtual masses, springs, and/or friction-models, and they may impose moving boundary conditions for the discretized differential-equation solver. To achieve maximum quality per unit of computation, techniques from digital signal processing are typically used to implement the differential-equation solvers in ways that are numerically robust, energy aware, and minimizing computational complexity. In addition to reviewing selected historical developments, this presentation will try to summarize some of the known best practices for computational physical modeling in existing real-time virtual musical instruments and effects.

Julius O. Smith teaches a music signal-processing course sequence and supervises related research at the Center for Computer Research in Music and Acoustics (CCRMA). He is formally a professor of music and (by courtesy) electrical engineering at Stanford University. In 1975, he received his BS/EE degree from Rice University, where he got a solid grounding in the field of digital signal processing and modeling for control. In 1983, he received the PhD/EE degree from Stanford University, specializing in techniques for digital filter design and system identification, with application to violin modeling. His work history includes the Signal Processing Department at Electromagnetic Systems Laboratories, Inc., working on systems for digital communications, the Adaptive Systems Department at Systems Control Technology, Inc., working on research problems in adaptive filtering and spectral estimation, and NeXT Computer, Inc., where he was responsible for sound, music, and signal processing software for the NeXT computer workstation. Prof. Smith is a Fellow of the Audio Engineering Society and the Acoustical Society of America. He is the author of four online books and numerous research publications in his field.

Avery Wang: Robust Indexing and Search in a Massive Corpus of Audio Recordings

In this talk I will give an overview of the Shazam audio recognition technology.   The Shazam service takes a query comprised of a short sample of ambient audio (as little as 2 seconds) from a microphone and searches a massive database of recordings comprising more than 40 million soundtracks.   The query may be degraded with significant additive noise (< 0 dB SNR), environmental acoustics, as well as nonlinear distortions.   The computational scaling is such that a query may cost as little as a millisecond of processing time.   Previous algorithms could index hundreds of items, required seconds of processing time, and were less tolerant to noise and distortion by 20-30 dB SNR. In aggregate, the Shazam algorithm represents a leap of more than a factor of 1E+10 in efficiency over prior art.  I will discuss the various innovations leading to this result.

Avery Wang is co-founder and Chief Scientist at Shazam Entertainment, and principal inventor of the Shazam search algorithm. He holds BS and MS degrees in Mathematics and MS and PhD degrees in Electrical Engineering, all from Stanford University. As a graduate student he received an NSF Graduate Fellowship to study computational neuroscience. He also received a Fulbright Scholarship to study at the Institut für Neuroinformatik at the Ruhr-Universität Bochum under Christoph von der Malsburg, focusing on auditory perception and the cocktail party effect. Upon returning to Stanford, he studied under Julius O. Smith, III at CCRMA, with a thesis titled "Instantaneous and Frequency-Warped Signal Processing Techniques for Auditory Source Separation”. He was about to do a post-doc at UCSF in auditory neuroscience when he was recruited by Chromatic Research working on high-performance multimedia DSP algorithms and hardware. He has over 40 issued patents.

Miller Puckette: Time-domain Manipulation via STFTs

Perhaps the most important shortcoming of frequency-domain signal processing results from the Heisenberg limit that often forces tradeoffs between time and frequency resolution. In this paper we propose manipulating sounds by altering their STFTs in ways that affect time spans smaller than the analysis window length. An example of a situation in which this could be useful is the algorithm of Griffin and Lim, which generates a time-domain signal that optimally matches a (possibly overspecified) short-time amplitude spectrum. We propose an adaptation of Griffin-Lim to simultaneously optimize a signal to match such amplitude spectra on two or more different time scales, in order to simultaneously manage both transients and tuning.

Miller Puckette obtained a B.S. in Mathematics from MIT (1980) and a PhD in Mathematics from Harvard (1986) where he was a Putnam Fellow. He was a member of MIT's Media Lab from its inception until 1987, and then a researcher at IRCAM, founded by composer and conductor Pierre Boulez. At IRCAM he wrote Max, a widely used computer music software environment, released commercially by Opcode Systems in 1990 and now available from Puckette joined the music department of the University of California, San Diego in 1994, where he is now professor. From 2000 to 2011 he was Associate Director of UCSD's Center for Research in Computing and the Arts (CRCA). He is currently developing Pure Data ("Pd"), an open-source real-time multimedia arts programming environment. Puckette has collaborated with many artists and musicians, including Philipe Manoury (whose Sonus ex Machina cycle was the first major work to use Max), and Rand Steiger, Vibeke Sorensen, and Juliana Snapper. Since 2004 he has performed with the Convolution Brothers. In 2008 Puckette received the SEAMUS Lifetime Achievement Award.


Jean-Marc Jot: Efficient Reverberation Rendering for Complex Interactive Audio Scenes

Artificial reverberation algorithms originated several decades ago with Schroeder’s pioneering work in the late 60s, and have been widely employed commercially since the 80s in music and soundtrack production. In the late 90s, artificial reverberation was introduced in game 3D audio engines, which today are evolving into interactive binaural audio rendering systems for virtual reality. By exploiting the perceptual and statistical properties of diffuse reverberation decays in closed rooms, computationally efficient reverberators based on feedback delay networks can be designed to automatically match with verisimilitude the “reverberation fingerprint” of any room. They can be efficiently implemented on standard mobile processors to simulate complex natural sound scenes, shared among a multiplicity of virtual sound sources having different positions and directivity, and combined to simulate complex acoustical spaces. In this tutorial presentation, we review the fundamental assumptions and design principles of artificial reverberation, apply them to design parametric reverberators, and extend them to realize computationally efficient interactive audio engines suitable for untethered virtual and augmented reality applications.

Jean-Marc Jot is a Distinguished Fellow at Magic Leap. Previously, at Creative Labs, he led the design and development of SoundBlaster audio processing algorithms and architectures, including OpenAL/EAX technologies for game 3D audio authoring and rendering. Before relocating to Califonia in the late 90s, he conducted research at IRCAM in Paris, where he designed the Spat software suite for immersive audio creation and performance. He is a Fellow of the AES and has authored numerous patents and papers on spatial audio signal processing and coding. His current research interests include immersive audio for virtual and augmented reality in wearable devices and domestic or automotive environments.

Brian Hamilton: Room Acoustic Simulation: Overview and Recent Developments

Simulation of room acoustics has applications in architectural acoustics, audio engineering, video games; also it is gaining importance in virtual reality applications, where realistic 3D sound rendering plays an integral part in creating a sense of immersion within a virtual space. This tutorial will give an overview of room acoustic simulation methods, ranging from traditional approaches based on principles of geometrical and statistical acoustics, to numerical methods that solve the wave equation in three spatial dimensions, including recent developments of finite difference time domain (FDTD) methods resulting from the recently completed five-year NESS project ( Computational costs and practical considerations will be discussed, along with the benefits and limitations of these frameworks. Simulation techniques will be illustrated through animations and sound examples.

Brian Hamilton is a Postdoctoral Research Fellow in the Acoustics and Audio group at the University of Edinburgh. His research focusses on numerical methods for large-scale 3-D room acoustics simulations and spatial audio. He received B.Eng. (Hons) and M.Eng. degrees in Electrical Engineering from McGill University in Montréal, QC, Canada, in 2009 and 2012, respectively, and his Ph.D. from the University of Edinburgh in 2016.

David Berners: Modeling Circuits with Nonlinearities in Discrete Time

Modeling techniques for circuits with nonlinear components will be discussed. Nodal analysis and K-method models will be developed and compared in the context of delay-free loop resolution. Standard discretization techniques will be reviewed, including forward- and  backward-difference and bilinear transforms. Interaction between nonlinearities and choice of discretization method will be discussed. Piecewise functional models for memoryless nonlinearities will be developed, as well as iterative methods for solving equations with no analytic solution. Emphasis will be placed on Newton's method and related techniques. Convergence and seeding for iterative methods will be reviewed. Relative computational expense will be discussed for piecewise vs. iterative approaches. Time permitting, modeling of time-varying systems will be discussed.

David Berners is Chief Scientist of Universal Audio Inc., a hardware and software manufacturer for the professional audio market. At UA, Dr. Berners leads research and development efforts in audio effects processing, including dynamic range compression, equalization, distortion and delay effects, and specializing in modeling of vintage analog equipment. He is also an adjunct professor at CCRMA at Stanford University, where he teaches a graduate class in audio effects processing. Dr. Berners has held positions at the Lawrence Berkeley National Laboratory, NASA Jet Propulsion Laboratory, and Allied Signal. He received his Ph.D. from Stanford University, M.S. from Caltech, and his S.B. from MIT, all in electrical engineering.

Julian D. Parker: From Algorithm to Instrument

The discipline of designing algorithms for creative processing of musical audio is now fairly mature in academia, as evidenced by the continuing popularity of the DAFx conference. However, this large corpus of work is motivated primarily by the traditional concerns of the academic signal-processing community - that being technical novelty or improvement in quantifiable metrics related to signal quality or computational performance. Whilst these factors are extremely important, they are only a small part of the process of designing an inspiring and engaging tool for the creative generation or processing of sound. Algorithms for this use must be designed with as much thought given to subjective qualities like aesthetics and usability as to technical considerations. In this tutorial I present my own experiences of trying to bridge this gap, and the design principles I've arrived at in the process. These principles will be illustrated both with abstract examples and with case studies from the work I've done at Native Instruments.

Julian Parker is a researcher and designer working in the area of musical signal processing. He started his academic career studying Natural Sciences at the University of Cambridge, before moving on to study for the MSc in Acoustics & Music Technology at the University of Edinburgh. In 2013, he completed his doctoral degree at Aalto University, Finland, concentrating on methods for modelling the audio-range behaviour of mechanical springs used for early artificial reverberation. Since graduating he has been employed at Native Instruments GmbH, where he now heads up DSP development and research. He has published on a variety of topics including reverberation, physical modelling of both mechanical and electrical systems, and digital filter design.

Panel Guests

Stefania Serafin: DAFX welcomes female researchers, so how come we are so few?

In this talk I will give an overview of my experience from the past 20 years as a female researcher in sound and music computing. I was lucky enough to have several positive stories to share, and I believe this is mostly due to my role models, mentors, colleagues and students I met until now. I will also provide my viewpoint on why so few female researchers attend conferences such as DAFX, in the hope to stimulate an interesting discussion.

Stefania Serafin is currently Professor with special responsibilities in sound for multimodal environments at Aalborg University Copenhagen. She received a PhD degree in computer-based music theory and acoustics from Stanford University in 2004, and a Master in Acoustics, computer science and signal processing applied to music from Ircam (Paris), in 1997. She has been a visiting professor at the University of Virginia (2003), and a visiting scholar at Stanford University (1999), Cambridge University (2002), and KTH Stockholm (2003). She is the president of the Sound and Music Computing association, and has co-chaired the NIME conference in May 2017. She has been principal investigator for several EU and national projects. Her main research interests include sound models for interactive systems, multimodal interfaces and virtual reality, and sonic interaction design.

Jude Brereton : Gender balance in audio: how to fix it?

Gender equality in STEM has recently been pulled into the public spotlight via popular culture and headline-grabbing events at tech-based companies. Within the audio engineering community we are now starting to build a fuller picture of the extent of gender imbalance in our own discipline, by gathering, analysing and publishing data on gender representation in the subject. There are many complex reasons behind the continued under-representation of women in STEM subjects; many of these are hotly debated on social-media. But, if it’s a problem for audio, how do we fix it? In this talk I would like draw on some of our experiences at the University of York from our long engagement with the Athena SWAN charter, which recognises the advancement of gender equality in academic institutions. I will also suggest some actions we can all take to help build an inclusive audio community where all can thrive.

Dr Jude Brereton is Senior Lecturer in Audio and Music Technology, in the Department of Electronic Engineering , University of York, UK. Until September 2018 she was Programme Leader of the MSc in Audio and Music Technology; she teaches postgraduate and undergraduate students in the areas of virtual acoustics and auralization, music performance analysis, and voice analysis/synthesis. Jude’s research centres on the use of virtual reality technology to provide interactive acoustic environments for music performance and analysis. Until recently, she was Chair of the Departmental Equality and Diversity committee and was instrumental in achieving the ECU Athena SWAN Bronze award, which recognises the department’s commitment to gender equality. She is dedicated to progressing gender equality in audio engineering, through innovative, creative approaches to teaching grounded in interdisciplinary research. Before beginning her academic career, she worked in arts and music administration, and is still active in promoting research-inspired music and theatre performance events combining art and science for public engagement and outreach.

Musical Guest

Trevor Wishart

Trevor Wishart (b 1946) is a composer/performer from the North of England specialising in sound metamorphosis, and constructing the software to make it possible (Sound Loom / CDP). He has lived and worked as composer-in-residence in Australia, Canada, Germany, Holland, Sweden, and the USA. He creates music with his own voice, for professional groups, or in imaginary worlds conjured up in the studio. His aesthetic and technical ideas are described in the books On Sonic Art, Audible Design and Sound Composition (2012), and he is a principal author of the Composers Desktop Project sound-processing software. His most well-known works include The VOX Cycle, Red Bird, Tongues Of Fire, Two Women, Imago and Globalalia, and pieces have been commissioned by the Paris Biennale, Massachussets Council for the Arts and Humanities, the DAAD in Berlin, the French Ministry of Culture and the BBC Proms. In 2008 he was awarded the Giga-Herz Grand prize for his life’s work. Between 2006 and 2010 he was composer-in-residence in the North East of England (based at Durham University) creating the sound-surround Digital Opera Encounters in the Republic of Heaven, and during 2011, as Artist in Residence at the University of Oxford, began work on the project The Secret resonance of Things, transforming astronomical and mathematical data into musical material. He has also been involved in community, environmental and educational projects, and his Sounds Fun books of musical games was republished in Japanese. For further information consult