Rafael Valle

I'm a research scientist at NVIDIA focusing on audio applications.

During my PhD at UC Berkeley I was advised mainly by Prof. Sanjit Seshia and Prof. Edmund Campion. My research focuses on machine listening and improvisation. At UC Berkeley, I'm part of the TerraSwarm Research Center, where I work on problems related to adversarial attacks and verified artificial intelligence.

During Fall 2016 I was a Research Intern at Gracenote in Emeryville, where I worked on audio classification using Deep Learning. Previously I was a Scientist Intern at Pandora in Oakland, where I investigated segments and scores that describe novelty seeking behavior in listeners.

Before coming to Berkeley, I completed a master's in Computer Music from HMDK Stuttgart in Germany and a bachelor's in Orchestral Conducting from UFRJ in Brazil.

CV | LinkedIn | github

  • Paper about attacking speaker recognition with deep generative models is on arXiv.
  • Paper about interesting properties of samples generated with GANs is on arXiv.
  • Paper about sequence generation (text, speech, music) with GANs is in progress.
sym sym

[NEW] Attacking Speaker Recognition with Deep Generative Models
Anish Doshi, Wilson Cai and Rafael Valle
arXiv, 2017

pdf | abstract

In this paper we investigate the ability of generative adversarial networks (GANs) to synthesize spoofing attacks on modern speaker recognition systems. We first show that the modern architectures of SampleRNN and WaveNet are unable to fool CNN-based speaker recognition systems. We propose a modification of the Wasserstein GAN objective function to make use of data that is real but not from the class being learned. Our method is able to perform both targeted and untargeted attacks against state of the art systems, which calls attention to issues related with security.


[NEW] Interesting Properties of GAN Samples
Rafael Valle, Wilson Cai and Anish Doshi

pdf | abstract

In this paper we investigate numerical properties of samples produced with adversarial methods, specially Generative Adversarial Networks. We analyze pixel value statistics of real and fake data and compute distances based on the marginal distribution of perceptually significant features. We provide results on MNIST, music and speech data and show that GAN generated samples have interesting signatures that can be used to identify the source of the data and detect adversarial attacks.

[NEW] Sequence Generation with GANs
Rafael Valle

github | abstract | audio

In this paper we investigate the generation of sequences using generative adversarial networks (GANs). We open the paper by providing a brief introduction to sequence generation and challenges in GANs. We briefly describe encoding strategies for text and MIDI data in light of their use with convolutional architectures. In our experiments we consider the unconditional generation of polyphonic and monophonic piano roll generation as well as short sequences. For each data type, we provide sonic or text examples of generated data, interpolation in the latent space and vector arithmetic.


Audio-Based Room Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models
Rafael Valle
Future Technologies Conference (FTC), 2016
Detection and Classification of Acoustic Scenes and Events , 2016

pdf | abstract | bibtex | arXiv | code

This paper outlines preliminary steps towards the development of an audio based room-occupancy analysis model. Our approach borrows from speech recognition tradition and is based on Gaussian Mixtures and Hidden Markov Models. We analyze possible challenges encountered in the development of such a model, and offer several solutions including feature design and prediction strategies. We provide results obtained from experiments with audio data from a retail store in Palo Alto, California. Model assessment is done via leave-two-out Bootstrap and model convergence achieves good accuracy, thus representing a contribution to multimodal people counting algorithms.

        title={ABROA: Audio-Based Room-Occupancy Analysis using Gaussian Mixtures and Hidden Markov Models},
        author={Valle, Rafael},
        journal={arXiv preprint arXiv:1607.07801},

Missing Data Imputation for Supervised Classification
Jason Poulos and Rafael Valle
Applied Artificial Intelligence, 2018

pdf | abstract | bibtex | arXiv | code

This paper compares methods for imputing missing categorical data for supervised learning tasks. The ability of researchers to accurately fit a model and yield unbiased estimates may be compromised by missing data, which are prevalent in survey-based social science research. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different degrees of missing data perturbation. The results show imputation methods can increase predictive accuracy in the presence of missing-data perturbation. Additionally, we find that for imputed models, missing data perturbation can improve prediction accuracy by regularizing the classifier.

        title={Missing Data Imputation for Supervised Learning},
        author={Poulos, Jason and Valle, Rafael},
        journal={arXiv preprint arXiv:1610.09075},

Learning and Visualizing Music Specifications using Pattern Graphs
Rafael Valle, Daniel Fremont, Ilge Akkaya, Alexandre Donze, Adrian Freed and Sanjit Seshia
ISMIR, 2016

pdf | abstract | bibtex | code

We describe a system to learn and visualize specifications from song(s) in symbolic and audio formats. The core of our approach is based on a software engineering procedure called specification mining. Our procedure extracts patterns from feature vectors and uses them to build pattern graphs. The feature vectors are created by segmenting song(s) and extracting time and and frequency domain features from them, such as chromagrams, chord degree and interval classification. The pattern graphs built on these feature vectors provide the likelihood of a pattern between nodes, as well as start and ending nodes. The pattern graphs learned from a song(s) describe formal specifications that can be used for human interpretable quantitatively and qualitatively song comparison or to perform supervisory control in machine improvisation. We offer results in song summarization, song and style validation and machine improvisation with formal specifications.

        title={Learning and Visualizing Music Specifications using Pattern Graphs},
        author={Valle, Rafael and Fremont, Daniel J and Akkaya, Ilge and Donze, Alexandre and Freed, Adrian and Seshia, Sanjit S},
        booktitleaddon= {Proceedings of the Seventeenth ISMIR Conference}