Llaisdy
24 Mar 2024

SpeechCluster: A speech database builder’s multitool

My 2005 project SpeechCluster has had a quiet but long active life (eg, a book published in 2020 on African linguistics [1] references it) but it seems to have disappeared from the interwebs. I have uploaded it to github, and minimally updated the Python to 3.12:

https://github.com/llaisdy/speechcluster

The point of SpeechCluster is to semi-automate some of the data handling and curation tasks related to building a speech database – i.e., a corpus of speech audio and matching transcriptions.

References

[1] Danie J. Prinsloo & Nompumelelo Zondi. (2020). "From postcolonial African language lexicography to globally competitive e-lexicography in Africa" in Russell H. Kaschula & H. Ekkehard Wolff (eds). The Transformative Power of Language: From Postcolonial to Knowledge Societies in Africa. Cambridge University Press.

Tags: NLP python speech