Jonathan Sterne – We Otter Not



Part of: Talking to Each Other: A Collective Sounding Project

Read More

Additional voices:
  • Carrie Rentschler
  • 6 anonymous speakers
  • Various processors

This piece dramatizes the process of automatic speaker identification. 

In 2020-1, more than a million Canadians moved their university education online, using Zoom for meetings. Access advocates quickly noted that Zoom has a captioning function, and urged people to use it. Most users chose automatic captioning, which is provided by a company called Otter’s user agreement says that its users do not own any data that result from the processing of their voices. One such kind of data is voiceprinting. Like fingerprinting, the idea is that each person’s voice can be uniquely identified. The science behind voiceprints is not very good or accurate. But that has not stopped the machine learning industry from trying. 

For “We Otter Not,” I sent a request to the students in my “Disability, Technology, Communication” seminar to use the sound of their voices—but not what they said—as the basis for a piece about voice identification (I sent the request after grades were posted). 6 out of 14 students consented. If the same basic rules of consent were applied to Otter, it would not have millions of voices to process.

[Ponderous man]
A spectrum appears on screen

representing the sum total of sounds that you hear.

From left to right, there are the colours violet, indigo, blue, green, yellow, orange, red.

And the streams of colour fall from the top of the screen towards the bottom

like a colourful waterfall.

Constantly shifting lattice patterns of more and less intense colours

become visible, and fade, within each band.

Total darkness indicates the absence of a frequency.

The colours will continue fallling on the screen for the duration of this piece.

[1970s computer voice]
Layer One

[Ponderous man, optimistically]
Every voice has a spectrum, every voice has a kepstrum.

[User agreement woman]
9.3 Limited License Grant to

Customer retains all ownership rights to the User Content processed using the service.

You grant a worldwide, non-exclusive, royalty-free, fully paid right and license

with the right to sublicense

[difficult to hear]
to host, store, transfer, display, perform, reproduce, modify,

export, process, transform, and distribute your User Content, in whole or in part

[continues speaking but overtaken by choir]

[continues speaking but overtaken by choir]

and through any media channels now known or hereafter developed

and through any media channels now known or hereafter developed

and through any media channels

and through channels

through channels, through.

[1970s computer voice]
Layer Two

[Ponderous man, with anticipation]
Every mouth makes transients, or, let’s crunch some data.

[User agreement woman]
9.9 Machine Learning. shall have the right to collect and analyze data and other information

relating to the provision, use and performance of various aspects of the Service.

and related systems and technologies “Usage Data”.

The Service may be implemented using machine learning systems

with features and implementations designed to generate statistics,

calibrate data models, and improve algorithms in the course of processing

User Content and Usage Data “Machine Learning”.

Machine Learning/Usage Data

[1970s computer voice]
Layer Three.

[Ponderous man]
Every speaker has a cadence.

[User agreement woman, stridently]
Nothing in these Terms

gives you any rights in or to any part of the Service or the Machine Learning

generated by Company or the Machine Learning generated

in the course of providing the service.

[Ponderous man]
The sound will soon stop.

Then the waterfall of colour will run out

dropping into a blackness that eventually takes over the screen.

There is nothing left to hear.

There is nothing left to analyze.

There is no one left to identify.

[A drone starts, it seems like one voice, and then multiple voices. It’s a fuzzy choir stuck in time]

[The fuzzy choir gets louder and louder]

[The stuck choir gets unstuck and suddenly leaves]

[The sound of nomming. It’s a crunchy, greasy, hot sandwich. More chewing. Sounds of deliciousness continue]

[Indistinguishable chatter surrounds you, like you’re in the middle of a conversation]

[People are talking but it’s impossible to make out what they are saying. The voices sound high pitched and percussive]

[Maybe the word “normalize” can be heard at one point.]

[The crunchy voices and sandwich continue]

[Wow, that must have been a really good sandwich. Lip smacking, swallowing, and chop licking]

[Indistinguishable singsongy chatter. Voices come and go]

[Each one sounds different, but it’s impossible to make out what they are saying]

[They surround you. Ongoing]

[Chatter sounds: a bad mobile phone connection, very slow drunk robots]

[a fish-person talking on and on underwater]

[Indistinguishable chatter sounds morph and continue]

[chatter fades out]

[with user agreement woman and a distorted voice adding emphasis]

[medium-high pitch voice]
This video was made in the context of a two-week workshop for Talking To Each Other, a multimedia project on the topics of access, disability, and collective sound making. The workshop was facilitated by Piper Curtis and Razan AlSalah. The Talking To Each Other project was directed by Simone Lucas with the Access in the Making Lab and The Feminist Media Studio.