Discussion
Search code, repositories, users, issues, pull requests...
Tacite: Is it English only?
ilaksh: Thanks for open sourcing this.Is there any way to do a custom voice as a DIY? Or we need to go through you? If so, would you consider making a pricing page for purchasing a license/alternative voice? All but one of the voices are unusable in a business context.
ks2048: You should put examples comparing the 4 models you released - same text spoken by each.
rohan_joshi: thanks a lot for the feedback. yes, we're working on a diy way to add custom voices and will also be releasing a model with more professional voices in the next 2-3 weeks. as of now, we're providing commercial support for custom voices, languages and deployment through the support form on our github. can you share more about your business use-case? if possible, i'd like to ensure the next release can serve that.
ks2048: There's a number of recent, good quality, small TTS models.If the author doesn't describe some detail about the data, training, or a novel architecture, etc, I only assume they just took another one, do a little finetuning, and repackage as a new product.
the_duke: Any recommendations?
fwsgonzo: How much work would it be to use the C++ ONNX run-time with this instead of Python? Is it a Claudeable amount of work?The iOS version is Swift-based.
rohan_joshi: shouldn't be hard. what backend/hardware are you interested in running this with? i'll add an example for using C++ onnx model. btw check out roadmap, our inference engine will be out 1-2 weeks and it is expected to be faster than onnx.
devinprater: A lot of these models struggle with small text strings, like "next button" that screen readers are going to speak a lot.
DavidTompkins: This would be great as a js package - 25mb is small enough that I think it'd be worth it (in-browser tts is still pretty bad and varies by browser)
magicalhippo: A lot of good small TTS models in recent times. Most seem to struggle hard on prosody though.Kokoro TTS for example has a very good Norwegian voice but the rhythm and emphasizing is often so out of whack the generated speech is almost incomprehensible.Haven't had time to check this model out yet, how does it fare here? What's needed to improve the models in this area now that the voice part is more or less solved?
soco: That, and also using English words in the middle of another language phrase confuses them a lot.