💡A large dataset for speech detection with a size of more than 150 thousand hours in 6000+ languages has been published
The
dataset contains about 150 thousand hours of audio in more than 6,000 languages. The number of unique ISO codes in a given dataset does not coincide with the actual number of languages, since similar languages can be encoded with the same code.
The data was labeled for the voice detection task at a time sampling of approximately 30 milliseconds (or 512 samples at a sampling rate of 16 kilohertz).
Обсуждение 0
Обсуждение не доступно в веб-версии. Чтобы написать комментарий, перейдите в приложение Telegram.
Обсудить в Telegram