Llaisdy
16 May 2024

SpeechCluster: now on PyPi!

Available for download here: https://pypi.org/project/speechcluster/

SpeechCluster is a library (set of modules) and a cli (set of terminal commands) that help automate some of the data handling and curation tasks related to building a speech database – i.e., a corpus of speech audio and matching transcriptions.

The new version 3.0 has two sets of updates: I have updated the main codepaths to work with Python 3.12, and I have added the cli. Now, instead of doing fake segmentaton by running a script like this:

$ segFake.py -d wav -c context.json -o TextGrid 

… we can use the cli like this:

$ sc fake -d wav -c context.json -o TextGrid 

Next steps

Any development work will include updating the codebase generally and expanding the tests into a comprehensive safety net.

The next main feature to add will be a force subcommand, which will use PyTorch to do forced alignment on input data (eg pairs of audio & transcription files).

This will open the way to developing speechcluster as a cli frontend to PyTorch/JAX/etc.

Tags: NLP python speech