Recent research about open source speech recognition libraries

What is the issue?

the app i am developing requires

  • a coustomized hotword(keyword) detection
  • korean text to korean speech
  • korea speech to korean text

Hotword detection (Continuous Speech Recognition)

Speech to Text & Text to Speech (Korean)

  • kaldi is a toolkit for speech recognition written in C++

  • Zeroth Project (Kaldi based)
    • MoreCoin: a mobile app to collect voice data from various users link
    • Explanation of how korean speech recognition works: link
      • Multi layer structure to analyze voices
      • explain the difference between English and Korean
    • using AWS server, web crawler to collect 13GB pronounciation dictionary and language models
  • KaKao API (Newton API) 20,000 requests free
    • provides both STT & TTS
    • reaserch of open source API(Korean): link
    • good quality of speech
  • Naver Clova Speech Recognition API (STT): link, 4 Won/15sec
  • Naver Clova Speech Synthesis API (TTS): link, 4 Won/1000chars

  • CMUsphinx
    • How to use other languages? link
    • Need Korean acoustic model, language model
  • Amazon Lex (STT) & Polly (TTS)
    • Maybe we can use third party language translation soulution
    • IoT solution: link
  • IBM watson (SK NUGU use this API: http://www.newspim.com/news/view/20170206000145)