Rafael Valle

I'm a research scientist at NVIDIA focusing on audio applications.

During my PhD at UC Berkeley I was advised mainly by Prof. Sanjit Seshia and Prof. Edmund Campion and my research focused on machine listening and improvisation. At UC Berkeley, I was part of the TerraSwarm Research Center, where I worked on problems related to adversarial attacks and verified artificial intelligence.

During Fall 2016 I was a Research Intern at Gracenote in Emeryville, where I worked on audio classification using Deep Learning. Previously I was a Scientist Intern at Pandora in Oakland, where I investigated segments and scores that describe novelty seeking behavior in listeners.

Before coming to Berkeley, I completed a master's in Computer Music from HMDK Stuttgart in Germany and a bachelor's in Orchestral Conducting from UFRJ in Brazil.

CV | LinkedIn | github

  • Paper about high quality and fast speech synthesis from mel-spectrograms.
  • Paper about easily identifying GAN samples is on arXiv.
  • Paper about attacking speaker recognition with deep generative models is on arXiv.
  • Paper about sequence generation (text, speech, music) with GANs is in progress.

[NEW] WaveGlow: a Flow-based Generative Network for Speech Synthesis
Ryan Prenger, Rafael Valle, Bryan Catanzaro
arXiv, 2018

pdf | samples | abstract

We propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable.


[NEW] TequilaGAN: How to easily identify GAN samples
Rafael Valle, Wilson Cai and Anish Doshi
arXiv, 2018

pdf | abstract

In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data.

sym sym

[NEW] Attacking Speaker Recognition with Deep Generative Models
Anish Doshi, Wilson Cai and Rafael Valle
arXiv, 2017

pdf | abstract | code

In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that samples generated with SampleRNN and WaveNet are unable to fool a CNN-based speaker recognition system. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our semi-supervised learning method is able to perform both targeted and untargeted attacks, raising questions related to security in speaker authentication systems.

[NEW] Sequence Generation with GANs
Rafael Valle

github | abstract | audio

In this paper we investigate the generation of sequences using generative adversarial networks (GANs). We open the paper by providing a brief introduction to sequence generation and challenges in GANs. We briefly describe encoding strategies for text and MIDI data in light of their use with convolutional architectures. In our experiments we consider the unconditional generation of polyphonic and monophonic piano roll generation as well as short sequences. For each data type, we provide sonic or text examples of generated data, interpolation in the latent space and vector arithmetic.


Audio-Based Room Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models
Rafael Valle
Future Technologies Conference (FTC), 2016
Detection and Classification of Acoustic Scenes and Events , 2016

pdf | abstract | bibtex | arXiv | code

This paper outlines preliminary steps towards the development of an audio based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide results obtained from experiments with audio data from a retail store in Palo Alto, California. Model assessment is done via leave-two-out Bootstrap and model convergence achieves good accuracy, thus representing a contribution to multimodal people counting algorithms.

        title={ABROA: Audio-Based Room-Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models},
        author={Valle, Rafael},
        journal={arXiv preprint arXiv:1607.07801},

Missing Data Imputation for Supervised Classification
Jason Poulos and Rafael Valle
Applied Artificial Intelligence, 2018

pdf | abstract | bibtex | arXiv | code

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different degrees of missing data perturbation. The results show imputation methods can increase predictive accuracy in the presence of missing-data perturbation. Additionally, we find that for imputed models, missing data perturbation can improve prediction accuracy by regularizing the classifier.

        title={Missing Data Imputation for Supervised Learning},
        author={Poulos, Jason and Valle, Rafael},
        journal={arXiv preprint arXiv:1610.09075},

Learning and Visualizing Music Specifications using Pattern Graphs
Rafael Valle, Daniel Fremont, Ilge Akkaya, Alexandre Donze, Adrian Freed and Sanjit Seshia
ISMIR, 2016

pdf | abstract | bibtex | code

We describe a system to learn and visualize specifications from song(s) in symbolic and audio formats. The core of our approach is based on a software engineering procedure called specification mining. Our procedure extracts patterns from feature vectors and uses them to build pattern graphs. The feature vectors are created by segmenting song(s) and extracting time and and frequency domain features from them, such as chromagrams, chord degree and interval classification. The pattern graphs built on these feature vectors provide the likelihood of a pattern between nodes, as well as start and ending nodes. The pattern graphs learned from a song(s) describe formal specifications that can be used for human interpretable quantitatively and qualitatively song comparison or to perform supervisory control in machine improvisation. We offer results in song summarization, song and style validation and machine improvisation with formal specifications.

        title={Learning and Visualizing Music Specifications using Pattern Graphs},
        author={Valle, Rafael and Fremont, Daniel J and Akkaya, Ilge and Donze, Alexandre and Freed, Adrian and Seshia, Sanjit S},
        booktitleaddon= {Proceedings of the Seventeenth ISMIR Conference}