SpeechCluster: A speech database builder’s multitool
My 2005 project SpeechCluster has had a quiet but long active life (eg, a book published in 2020 on African linguistics [1] references it) but it seems to have disappeared from the interwebs. I have uploaded it to github, and minimally updated the Python to 3.12:
https://github.com/llaisdy/speechcluster
The point of SpeechCluster is to semi-automate some of the data handling and curation tasks related to building a speech database – i.e., a corpus of speech audio and matching transcriptions.
References
[1] Danie J. Prinsloo & Nompumelelo Zondi. (2020). "From postcolonial African language lexicography to globally competitive e-lexicography in Africa" in Russell H. Kaschula & H. Ekkehard Wolff (eds). The Transformative Power of Language: From Postcolonial to Knowledge Societies in Africa. Cambridge University Press.