I assume you mean toolkit if you are interested in training or developing models.

It is hard to answer this generic question without the context. But it is not an easy implementation. Feature extraction is just a small but pretty simple part of ASR. Since some toolkits are open source, you may start from their source code. Develop a new toolkit from scratch may take some time.

If you just want to transcript audio, there are other libraries available. You can Google them easily. The competitive landscape can change so I usually don’t comment it because whatever I comment can be obsolete quickly.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store