What is the issue?
the app i am developing requires
- a coustomized hotword(keyword) detection
- korean text to korean speech
- korea speech to korean text
Hotword detection (Continuous Speech Recognition)
- Snowboy (KITT.AI)
- customizable hotword detection engine for you to create your own hotword like “OK Google” or “Alexa”
- DNN (deep neural networks)
- can customize model(personal, private) from https://snowboy.kitt.ai/ (login, korean keyword is okay)
- always listening (record thread can run on foreground service)
- https://stackoverflow.com/questions/50956228/thread-within-service-android-app
- no internet required
- Light-weight (if there noise around, CPU usage may increase)
- Supports various platforms and programming languages
- Need a commercial license (Please contact snowboy@kitt.ai)
- Android Demo https://github.com/Kitt-AI/snowboy/tree/master/examples/Android
- Porcupine (Picovoice)
- Alireza Kenarsari (Founder at Picovoice) says porcupine has lower miss rate than snowboy
- https://github.com/Picovoice/wakeword-benchmark
- https://medium.com/@alirezakenarsarianhari/yet-another-wake-word-detection-engine-a2486d36d8d4
- but when i really try it, snowboy worked better on my development environment (I may need to try more models)
- AAR style library
- Provides (Standard, Tiny) version
- PocketSphinx
Speech to Text & Text to Speech (Korean)
-
kaldi is a toolkit for speech recognition written in C++
- Zeroth Project (Kaldi based)
- MoreCoin: a mobile app to collect voice data from various users link
- Explanation of how korean speech recognition works: link
- Multi layer structure to analyze voices
- explain the difference between English and Korean
- using AWS server, web crawler to collect 13GB pronounciation dictionary and language models
- KaKao API (Newton API) 20,000 requests free
- provides both STT & TTS
- reaserch of open source API(Korean): link
- good quality of speech
- Naver Clova Speech Recognition API (STT): link, 4 Won/15sec
-
Naver Clova Speech Synthesis API (TTS): link, 4 Won/1000chars
- CMUsphinx
- How to use other languages? link
- Need Korean acoustic model, language model
- Amazon Lex (STT) & Polly (TTS)
- Maybe we can use third party language translation soulution
- IoT solution: link
- IBM watson (SK NUGU use this API: http://www.newspim.com/news/view/20170206000145)