ATVS logo Biometric Recognition Group - ATVS EPS logo UAM logo

INSTRUCTIONS FOR DOWNLOADING Gaudi_25

  1. Download license agreement, send by email one signed and scanned copy to joaquin.gonzalezuam.es (javier.hernandezouam.es in cc) according to the instructions given in point 2.
     

  2. Send an email to joaquin.gonzalezuam.es (javier.hernandezouam.es in cc), as follows:
    Subject: [DATABASE download: Gaudi_25]

    Body
    : Your name, e-mail, telephone number, organization, postal mail, purpose for which you will use the database, time and date at which you sent the email with the signed license agreement.
     

  3. Once the email copy of the license agreement has been received at ATVS, you will receive an email with a username, a password, and a time slot to download the database.
     

  4. Download the database, for which you will need to provide the authentication information given in step 4. After you finish the download, please notify by email to joaquin.gonzalezuam.es that you have successfully completed the transaction.
     

  5. For more information, please contact: joaquin.gonzalezuam.es



DESCRIPTION OF Gaudi_25

The Gaudi_25 database is a subset of the Gaudi database, including the first 25 speakers. The speech files follow these characteristics:

  • The sampling rate is 16.000 Hz for files "M1" to "M9" (microphone sessions).

  • The sampling rate is 8.000 Hz for files "T1" to "T5" (telephone sessions).

  • All files are 16 bits binary.

  • Each file has a 20 byte header, specific of the segmentation software used and independent of the speech samples.

Neither speakers 9, 10 and 25, nor M7 session for any speaker, exist.

Files 003T1D00.WAV, 003T1E00.WAV, 003T1F00.WAV, 022T4C09.WAV, 022T4C10.WAV, 022T4D00.WAV, 030T2D00.WAV, 030T2E00.WAV and 030T2F00.WAV are damaged. There are other missing files in the databse (see missing_gaudi.txt).



FILES NOMENCLATURE

File names will follow this format:

  • The three first characters correspond to the speaker number.

  • Characters fourth and fifth indicate:

    • "T1".- Telephone conversation, 1st session, UPM internal network.

    • "T2".- Telephone conversation, 2nd session, switched telephone network.

    • "M1".- Microphone left channel, 1st session, SONY ECM-66B lapel microphone.

    • "M2".- Microphone left channel, 2nd session, SONY ECM-66B lapel microphone.

    • "M3".- Microphone left channel, 3rd session, SONY ECM-66B lapel microphone.

    • "M4".- Microphone right channel, 1st session, AKH D80S desktop microphone.

    • "M5".- Microphone right channel, 2nd session, AKG TriPower desktop microphone.

    • "M6".- Microphone right channel, 3rd session, AKG C410 desktop microphone.

    • "T3".- Telephone conversation, 3rd session, switched telephone network.

    • "T4".- Telephone conversation, 4th session, switched telephone network.

    • "T5".- Telephone conversation, 5th session, switched telephone network.

    • "M8".- Microphone left channel, 4th session, SONY ECM-66B lapel microphone.

    • "M9".- Microphone left channel, 4th session, Target CPT3GX desktop microphone.

  • The sixth character corresponds to the task:

    • "A".- Isolated numbers, common to all speakers.

    • "B".- Number strings.

    • "C".- Sentences, common to all speakers.

    • "D".- Text, common to all speakers.

    • "E".- Specific text, common to all speakers.

    • "F".- Spontaneous speech.

  • The last two charactes (seventh and eitgh) determine the sub-task:

    • "A00".- Isolated number, common to all speakers. Simple task.

    • "B01 ... B10".- First to last numeric strings.

    • "C01 ... C10".- First to last sentence.

    • "D00".- Text, common to all speakers, telephone.

    • "D01".- Text, common to all speakers, microphone. Normal rate.

    • "D02".- Text, common to all speakers, microphone. Slow rate.

    • "D03".- Text, common to all speakers, microphone. Fast rate.

    • "E00".- Specific text, common to all speakers. Simple task.

    • "F00".- Spontaneous speech. Simple task.



REFERENCES

For further information on the database, we refer the reader to the following article:

  • [SC2000] Ortega García, J., González Rodríguez, J., Marrero-Aguiar, V., "AHUMADA: A large speech corpus in Spanish for speaker characterization and identification", Speech Communication, vol. 31, pp. 255-264, June 2000.

Please remember to reference article [SC2000] on any work made public, whatever the form, based directly or indirectly on any part of the Gaudi_25 database.