time: 32m30s

Voice Control

{kind: title}


Having a voice control might be one interesting way of breaking out this command-response sort of dynamic. Which isn't in itself bad, I mean there are all kind of musical forms that rely on call and response, so I mean that's maybe kind of an interesting aspect to play with. But, in term of having more flow, less discreteness in terms of the way you interface with the machine using like some sort of regression network or something like that might be an interesting way of creating a more dynamic interaction with the voice, that could involve pieces of speech but also could go down to the phoneme, could be looking at different timbral qualities. So the voice becomes in a sense a very expressive instrument. Not besides being already a very expressive instrument, but comes a very expressive way of communicating like coding basically, in some sense of the word. And, yeah, I agree that's much more interesting to do something that's a little bit more mixed, rather than this kind of command-response sort of way. Cause I also think it challenges a little bit the epistemology of programming already. Because programming, even if something is as fluid as SuperCollider, is still a command-response kind of model.


I mean, either you write a code that it's correct or not, that is already one big layer of filtering out. That is why I am thinking this voice interface could be interesting like, you know, in an improvisation, if you could just start exploring different sounds. And the system would still kind of react to that in some way that does not necessarily is "Ok, I am waiting here for something that is a valid command or something"...


Right, which could also be maybe thought of in a different way. The speech could, through some algorithm, be turned into text, like a speech to text detector. And then there's a coding language that will accept pretty much anything. Which is something you could do in SuperCollider using the pre-processor. That might be an interesting approach. You do not focus so much on the speech recognition being really dynamic and robust, but you actually use the limitations of it and use that as part of the performance also. Like, you know, make the glitches, make the misinterpretations and the misunderstandings also clear. And then write a language that's accommodating enough to work with that. That could also be really interesting.


There is also probably much more rewarding because otherwise to get it very precise and correct is a lot of effort, for perhaps very little reward. That way you could probably come to something that works comparably fast, let's say. If you say "I can still improve the recognition, but let's start with the assumption that everything that comes out as text characters can be used" then you can get a fast prototype in a way. And from there probably anyway new questions will open, otherwise you would work a lot on trying to get something and then from there you would say "Oh, what's the next step now?"


Yeah, and I think maybe that allows the focus to be a bit more on the language, and the possibilities of what the language can represent in terms of dynamic processes and things like this, I'm really thinking about that. In a much longer term I would be also interested in building a live coding language that I can just use also, thinking about what are the aspects that you want or that I would want in a language that would allow me to compose music. So maybe that's a good approach. To accept that maybe the speech recognition stuff is used at a very basic or superficial level.

Experimentalstudio meeting, 09_10_2018

Jonathan Reus, Hanns Holger Rutz, Daniele Pozzi

file: JR/audio/181009/ZOOM0021.wav

meta: true
persons: [HHR, DP, JR, POZ]
kind: conversation
origin: spoken
place: experimental studio
date: 181009

keywords: [voice recognition, speech to text, programming]