1. ABOUT THE DATASET
--------------------

Title:	VOCAL+ (VOice for laryngeal Cancer Analysis in Leeds)

Creator(s): Mary Paterson [1], James Moor [2], Luisa Cutillo [1]

Organisation(s): 1. Faculty of Engineering and Physical Sciences, University of Leeds
                 2. Ear, Nose and Throat Department, Leeds Teaching Hospitals NHS Trust

Rights-holder(s): Copyright 2025 University of Leeds

Publication Year: 2025

Description: The VOCAL+ dataset contains voice recordings of patients referred on the NHS 2 week wait (2WW) pathway for suspected head and neck cancer to the ENT Department at Leeds Teaching Hospital NHS Trust. Alongside voice recordings we also present demographic, lifestyle, and symptom data for each patient as well as their diagnosis. Each patient completed nine voice tasks modified from the  Consensus Auditory- Perceptual Evaluation of Voice (CAPE-V) framework. This includes two sustained vowels (/a/ and /i/), six sentences: "The blue spot is on the key again.", "How hard did he hit him?", "We were away a year ago.", "We eat eggs every Easter.", "My mama makes lemon muffins.", and "Peter will keep at the peak.", and a picture descripton of the Cookie Theft image. Each patient was recorded simultaniously with two recording devices, an Olympus LS-P1, and a Nokia 110 phone. Recordings for both devices are available. 

Cite as: Mary Paterson, James Moor, Luisa Cutillo (2025): VOCAL+ (VOice for laryngeal Cancer Analysis in Leeds). University of Leeds. [Dataset] https://doi.org/10.5518/1661


2. TERMS OF USE
---------------

 Unless otherwise stated, this dataset is licensed under a Creative Commons Attribution 4.0 International Licence: https://creativecommons.org/licenses/by/4.0/.


3. PROJECT AND FUNDING INFORMATION
----------------------------------

Title: AI Analysis of Voice to Aid Laryngeal Cancer Diagnosis 

Dates: 01/09/2020 - 01/09/2025

Funding organisation: UKRI Engineering and Physical Sciences Research Council (EPSRC)

Grant no.: EP/S024336/1

4. CONTENTS
-----------
File listing

All audio files are stored as .wav files in the 'Audio' folder. Each file is named in the following convention: id-task-device.wav 

id - the patient's unique identifier which is also used in the csv files that contains their demographic, lifestyle, and symptom data

task - which of the nine voice tasks is included in this file

device - which of the two recording devices was used to record this audio

The patient's demographic, lifestyle, and symptom data is stored in RecordingDemographics.csv this file contains 16 columns.

The columns are: ID, Age, Sex, Tobacco use, Alcohol consumption, Caffeine consumption, Hoarse voice, Neck lump, Sore throat, Lump in throat, Difficulty swallowing, Something in throat, Pain on swallowing, Unexplained ear ache, Difficulty breathing through or blood coming from one nostril, Diagnosis.

ID - the patient's unique identifier which is also used in the audio file names

Age - the age of the patient at the time of the recording

Sex - the sex of the patient

Tobacco use - self-reported tobacco use in the last three months (one of: "Never used", "Ex user", or "Current user")

Alcohol consumption - self-reported weekly alchol consumption (one of: "None", "Little", or "Moderate or Excessive")

Caffeine consumption - delf-reported daily caffeine consumption (one of: "None", "1-2 drinks", "3-6 drinks", or "more than 6 drinks")

Diagnosis - the patient's diagnosis as recoreded by their clinician. Diagnosises have been grouped to avoid re-identification of patients

All other columns are categorical self reported symptoms (one of: "Yes", "Sometimes" or "No")

Where data is missing patient's did not fill in the corresponding information while at their appointment.


5. METHODS
----------

Patients referred on the 2WW pathway for suspected Head & Neck cancer to Leeds Teaching Hospitals ENT Department were be offered the opportunity to be recruited into the study. To be included in the study patients must:
- have been referred on the 2WW pathway for suspected head and neck cancer
- be over the age of 18
- be able to provide informed consent
- be able to read standard English text and be fluent in speaking English

Patient's were excluded if they were: 
- under age 18 years
- unable to complete the consent procedure as determined by their direct care team or member of the clinical research team
- did not provide informed consent

All patients attended a structured interview in which they were asked to complete a total of nine voice tasks based on the  Consensus Auditory- Perceptual Evaluation of Voice (CAPE-V) framework. The tasks are as follows:

1. Sustained vowels – The participants sustain two vowel sounds /a/ and /i/ (a lax and tense vowel respectively).
2. Sentences – The participants were be asked to say the following six short sentences:
    1. "The blue spot is on the key again."
    2. "How hard did he hit him?"
    3. "We were away a year ago."
    4. "We eat eggs every Easter."
    5. "My mama makes lemon muffins."
    6. "Peter will keep at the peak."
3. Running speech - patients were asked to describe the "Cookie Theft" picture

Each patient was recorded using two devices simultaniously, an Olympus LS-P1, and a Nokia 110 phone. The interview recordings were then split into the different tasks. The recording devices were placed on a desk in front of the patient as close to each other as possible. While most patients were recorded in the same room, some patient's had to be recorded in different rooms due to space contraints.

Data was collected between March and August of 2024.