Close

SODA finally landed, client working

A project log for Android offline speech recognition natively on PC

Porting the Android on-device speech recognition found in GBoard to TensorFlow Lite or LWTNN

biemsterbiemster 12/15/2020 at 18:0323 Comments

UPD: wine is not necessary anymore.

So SODA finally landed, sort of, and for a couple weeks already apparently. I've been on the lookout for the Linux library, since that is my preferred environment and I was under the impression that the development was taking place on that platform. But I was wrong, and the Windows and macOS libraries were available since late November.

Since I'm much more capable on a Linux machine, I've searched (and found!) a way to use either one of those available libraries. In my last post I reported on quite a successful project with the Google TTS library, which resulted in a very lightweight client for it. And fortunately the same can be said for the SODA client, resulting in a very small code base with only the library as dependency. This enabled me to work with wine, and have it pipe the data straight from whatever Linux application I wanted to use to the Windows DLL.

Just issue the following command:

$ ecasound -f:16,1,16000 -i alsa -o:stdout | wine gasr.exe

and watch your conversations roll over the screen: 

W1215 22:58:43.683654      44 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
>>> hello
>>> hello from
>>> hello from
>>> hello from sod
>>> hello from soda
>>> hello from soda
>>> final: hello from soda

The SODA client I wrote is developed in a separate repository (gasr), as it will be mostly just a tool to do the full reverse engineering of the RNN and transducer. But having an actual working implementation will greatly improve my ability to figure out the inner workings of the models.

Using wine as an intermediate is still far from ideal, but I guess that the Linux library will also pop up soon considering ChromeOS would depend on it.

UPDATE:

As @a1is pointed out, the Linux library is also out there already, so no need to go the wine way anymore. And as an added bonus, the GBoard models are working with these libraries as well! That opens up a whole world of experimentation, since there are already quite a few of those spotted in the wild!

UPDATE2:

Now with a python client in the repo, for easier integration with home automation and such.

Discussions

Prateek Xaxa wrote 02/23/2022 at 15:35 point

Hi @biemster , I'm using the chrome Version 98.0.4758.102 (Official Build) (64-bit)
and the "SODA.dll" got the APIs changed.

Is there a way I can get the older version of google chrome or the DLLs ? Thanks

  Are you sure? yes | no

tamburini.fabio wrote 11/29/2021 at 15:39 point

This project is exactly what I am looking for, white compliments!

I downloaded and installed google-chrome-stable_90.0.4430.72-1 for Linux, but I did not find any "libsoda.so" anywhere. I am pretty sure that, once getting the library, I will be able to do the job, but... Is there an older 90 version than that available online?
Please, give me some pointer in finding the library... :P
Thanks!
                                  

  Are you sure? yes | no

Khaled wrote 04/13/2021 at 03:12 point

I have worked on patching the dll but I am getting the follwing :

W0413 03:04:16.243065    6984 soda_async_impl.cc:260] Creating soda_impl
E0413 03:04:16.321433    6984 soda_impl.cc:258] SODA needs a positive sample rate in mics audio format.
E0413 03:04:16.327124    6984 descriptor.cc:4079] Invalid proto descriptor for file "":
E0413 03:04:16.327283    6984 descriptor.cc:4082]   : Missing field: FileDescriptorProto.name.
F0413 03:04:16.328347    6984 generated_message_reflection.cc:3158] Check failed: file != nullptr
*** Check failure stack trace: ***
    @   00007FFD5B41672F  (unknown)
    @   00007FFD5B415FE0  (unknown)
    @   00007FFD5B416CAB  (unknown)
    @   00007FFD5B23C440  (unknown)
    @   00007FFD5B23E487  (unknown)
    @   00007FFD5B23C1D7  (unknown)
    @   00007FFD5B13FA92  (unknown)
    @   00007FFD58C90B30  (unknown)
    @   00007FFD58C83198  (unknown)
    @   00007FFD58C82B85  (unknown)
    @   00007FFD58C834F4  (unknown)
    @   00007FFD58C83328  (unknown)
    @   00007FFD58C8565F  (unknown)
    @   00007FFDA8434461  (unknown)
    @   00007FFDA843418D  (unknown)
    @   00007FFDA8434042  (unknown)
    @   00007FFDA5862CC5  (unknown)
    @   00007FFDA5862A27  (unknown)
    @   00007FFDA586267C  (unknown)
    @   00007FFD5C6F0735  (unknown)
    @   00007FFD5C6EE0D4  (unknown)
    @   00007FFD5C6E95A3  (unknown)
    @   00007FFD5C6AB815  (unknown)
    @   00007FFD5C69B49B  (unknown)
    @   00007FFD5C69B3F9  (unknown)
    @   00007FFD5C69B11A  (unknown)
    @   00007FFD5C69B09A  (unknown)
    @   00007FFD5C72E4AF  (unknown)
    @   00007FFD5C72E11C  (unknown)
    @   00007FFD5C72EBF3  (unknown)
    @   00007FFD5C730860  (unknown)
    @   00007FFD5C730E7E  (unknown)

I am using the python wrapper. Is this because I didn't bypass the call stack verification or there is something else I am missing.

  Are you sure? yes | no

biemster wrote 04/14/2021 at 14:42 point

I've never seen this error before, there seems to be something wrong with the SodaConfig you are feeding it. This likely has to do with the way you patched it, but that's a wild guess actually.

  Are you sure? yes | no

haroldfinch wrote 02/26/2021 at 07:39 point

First of all, you are a genius man! Great work ♥

I followed your instructions on GitHub (I'm using windows)

Downloaded chrome canary build, got the soda.dll file and soda models folder in the project directory

Used snowman with IDA disassembler on the DLL file, to get the API key( i have no idea if this is how we get it or how to locate the key, just picked the first key like string I could find xD )

Ran the python file, but getting the following error: 

W0226 07:26:04.505873   16308 soda_async_impl.cc:261] Creating soda_impl
W0226 07:26:04.692540   16308 terse_processor.cc:278] TISID disabled.
W0226 07:26:04.704322   16308 soda_async_impl.cc:420] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
Traceback (most recent call last):
2 got <gasr.LP_c_byte object at 0x000002A5D0DF7CC0>
  File "app.py", line 8, in <module>
    client.start()
  File "C:\Users\Username\Desktop\My Projects\google_stt\gasr\gasr.py", line 39, in start
    self.sodalib.ExtendedSodaStart(self.handle)
OSError: exception: access violation reading 0x00000000D0EC6D98

Am I in the right direction with the IDA thingy? Any help is appreciated. 

  Are you sure? yes | no

biemster wrote 02/27/2021 at 21:17 point

You're definitely on the right track here, nicely done! However, that key like string is not the actual API key. And even if you did manage to deduce the correct key somehow (I couldn't), there is still a call stack verification in the library that will prevent you from running it (the library wants to be called only by a chrome process). So you should find these checks using your disassembler (there are three in a row), and patch the binary so those if statements are either skipped, or rendered non-functional (use something like a NOP sled or similar). And don't forget to init the result of those checks to true, since it is initialized to false in the original library.

That said, I never tested the python wrapper on Windows, so you might run in to additional issues. If possible it might be best to start testing with the C code using MinGW.

Good luck!

  Are you sure? yes | no

woopdio wrote 12/31/2020 at 00:11 point

Do you happen to know if the api key is the same between platforms and does ChromeOS already include SODA?

I'm on Mac OSX/Hackintosh and after a lot of trial and error I've got the Soda and en_us model components downloaded and the Verify calls patched out of libsoda so I think I'm only missing the api key at this point hopefully. 

I tried getting it from the .so alone but my reverse engineering knowledge turned out to be too limited for that so far so I thought getting it from the actual call via debugger would be easier but it doesn't look like the Mac Chrome Canary build has it yet. So I'm just wondering if it would be even worth to set up a ChromeOS VM and tracing in it to get it from there or if I should just wait for the Mac release.

Also thank you for this project and sticking with it!

  Are you sure? yes | no

woopdio wrote 01/19/2021 at 21:18 point

Canary 90 out now still without MacOS Soda it seems :(

time to wait some more I guess

  Are you sure? yes | no

biemster wrote 01/21/2021 at 14:03 point

Soda on MacOS is out there for a while as far as I know, just behind some experimental flags. The api key check should be skipped the same way as the other verification calls, did you maybe forget to init the result of the 3 checks to 'true'?

I think it should be in ChromeOS as well already, but that will not help you on osx since the binaries are not compatible. (plus I actually did not find the chromeos lib either yet)

  Are you sure? yes | no

Jude Ashly wrote 12/25/2020 at 17:38 point

That's an incredible amount of persistence and hard work put in !!!  I need some help, I'm stuck with lagging issues here 

ecasound -f:16,1,16000 -i alsa -o:stdout | ./gasr 
********************************************************************************
*        ecasound v2.9.3 (C) 1997-2020 Kai Vehmanen and others    
********************************************************************************
(eca-chainsetup) Chainsetup "untitled-chainsetup"
(eca-chainsetup) NOTE: Real-time configuration, but insufficient privileges to utilize real-time scheduling (SCHED_FIFO). With small buffersizes, this may cause audible glitches during processing.
(eca-chainsetup) "rt" buffering mode selected.
(eca-chainsetup) Opened input "alsa", mode "read". Format: s16_le, channels 1, srate 16000, interleaved.
(audioio-raw) Outputting to standard output [rw].
(eca-chainsetup) Opened output "stdout", mode "read/write (update)". Format: s16_le, channels 1, srate 16000, interleaved.
[* Connected chainsetup: "untitled-chainsetup" *]
[* Controller/Starting batch processing *]
[* Engine - Driver start *]
WARNING: Logging before InitGoogle() is written to STDERR
W1225 10:35:31.357862   49008 soda_async_impl.cc:231] Creating soda_impl
I1225 10:35:31.357999   49008 soda_impl.cc:275] Maximum audio history (ms): 30000
I1225 10:35:31.358021   49008 soda_impl.cc:304] Adding Resampler from 16000 to 16000
I1225 10:35:31.358100   49008 soda_impl.cc:482] Enabling power evaluator.
I1225 10:35:31.358103   49008 soda_impl.cc:492] Adding preamble processor.
I1225 10:35:31.358106   49008 soda_impl.cc:512] Enabling On Device ASR
W1225 10:35:31.358137   49008 language_pack_utils.cc:103] Error reading from ./SODAModels/configs: error 'No such file or directory' while opening directory './SODAModels/configs': No such file or directory
I1225 10:35:31.358155   49008 terse_processor.cc:634] Config file: ./SODAModels/dictation.config
I1225 10:35:31.358282   49008 terse_processor.cc:163] Loaded PipelineDef.
I1225 10:35:31.358297   49008 dir_path.cc:52] Checking FileExists: ./ep
I1225 10:35:31.358303   49008 dir_path.cc:57] Not Found FileExists: ./ep
I1225 10:35:31.358306   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/ep
I1225 10:35:31.358313   49008 dir_path.cc:54] Found FileExists: ./SODAModels/ep
I1225 10:35:31.358318   49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.358575   49008 dir_path.cc:52] Checking FileExists: ./ep_mean_stddev
I1225 10:35:31.358595   49008 dir_path.cc:57] Not Found FileExists: ./ep_mean_stddev
I1225 10:35:31.358599   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/ep_mean_stddev
I1225 10:35:31.358605   49008 dir_path.cc:54] Found FileExists: ./SODAModels/ep_mean_stddev
I1225 10:35:31.358643   49008 dir_path.cc:52] Checking FileExists: ./syms
I1225 10:35:31.358651   49008 dir_path.cc:57] Not Found FileExists: ./syms
I1225 10:35:31.358655   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/syms
I1225 10:35:31.358662   49008 dir_path.cc:54] Found FileExists: ./SODAModels/syms
I1225 10:35:31.358807   49008 dir_path.cc:52] Checking FileExists: ./embedded_fix_ampm.mfar
I1225 10:35:31.358815   49008 dir_path.cc:57] Not Found FileExists: ./embedded_fix_ampm.mfar
I1225 10:35:31.358820   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_fix_ampm.mfar
I1225 10:35:31.358826   49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_fix_ampm.mfar
I1225 10:35:31.358892   49008 dir_path.cc:52] Checking FileExists: ./embedded_class_denorm.mfar
I1225 10:35:31.358901   49008 dir_path.cc:57] Not Found FileExists: ./embedded_class_denorm.mfar
I1225 10:35:31.358905   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_class_denorm.mfar
I1225 10:35:31.358911   49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_class_denorm.mfar
I1225 10:35:31.358959   49008 dir_path.cc:52] Checking FileExists: ./embedded_normalizer.mfar
I1225 10:35:31.358968   49008 dir_path.cc:57] Not Found FileExists: ./embedded_normalizer.mfar
I1225 10:35:31.358972   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_normalizer.mfar
I1225 10:35:31.358978   49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_normalizer.mfar
I1225 10:35:31.359079   49008 dir_path.cc:52] Checking FileExists: ./embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359088   49008 dir_path.cc:57] Not Found FileExists: ./embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359093   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359101   49008 dir_path.cc:54] Found FileExists: ./SODAModels/embedded_replace_annotated_punct_words_dash.mfar
I1225 10:35:31.359147   49008 dir_path.cc:52] Checking FileExists: ./offensive_word_normalizer.mfar
I1225 10:35:31.359155   49008 dir_path.cc:57] Not Found FileExists: ./offensive_word_normalizer.mfar
I1225 10:35:31.359158   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/offensive_word_normalizer.mfar
I1225 10:35:31.359164   49008 dir_path.cc:54] Found FileExists: ./SODAModels/offensive_word_normalizer.mfar
I1225 10:35:31.359206   49008 dir_path.cc:52] Checking FileExists: ./enc0
I1225 10:35:31.359213   49008 dir_path.cc:57] Not Found FileExists: ./enc0
I1225 10:35:31.359216   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/enc0
I1225 10:35:31.359221   49008 dir_path.cc:54] Found FileExists: ./SODAModels/enc0
I1225 10:35:31.359225   49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.381130   49008 dir_path.cc:52] Checking FileExists: ./enc1
I1225 10:35:31.381161   49008 dir_path.cc:57] Not Found FileExists: ./enc1
I1225 10:35:31.381164   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/enc1
I1225 10:35:31.381168   49008 dir_path.cc:54] Found FileExists: ./SODAModels/enc1
I1225 10:35:31.381170   49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.458703   49008 dir_path.cc:52] Checking FileExists: ./dec
I1225 10:35:31.458773   49008 dir_path.cc:57] Not Found FileExists: ./dec
I1225 10:35:31.458781   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/dec
I1225 10:35:31.458795   49008 dir_path.cc:54] Found FileExists: ./SODAModels/dec
I1225 10:35:31.458802   49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.488069   49008 dir_path.cc:52] Checking FileExists: ./joint
I1225 10:35:31.488111   49008 dir_path.cc:57] Not Found FileExists: ./joint
I1225 10:35:31.488116   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/joint
I1225 10:35:31.488123   49008 dir_path.cc:54] Found FileExists: ./SODAModels/joint
I1225 10:35:31.488127   49008 neural_network_resource.cc:71] Initializing for TENSORFLOW_LITE
I1225 10:35:31.489495   49008 dir_path.cc:52] Checking FileExists: ./input_mean_stddev
I1225 10:35:31.489523   49008 dir_path.cc:57] Not Found FileExists: ./input_mean_stddev
I1225 10:35:31.489528   49008 dir_path.cc:52] Checking FileExists: ./SODAModels/input_mean_stddev
I1225 10:35:31.489534   49008 dir_path.cc:54] Found FileExists: ./SODAModels/input_mean_stddev
I1225 10:35:31.489595   49008 terse_processor.cc:173] Initialized ResourceManager.
I1225 10:35:31.489660   49008 terse_processor.cc:184] Initialized GoogleRecognizer.
W1225 10:35:31.489709   49008 terse_processor.cc:242] TISID disabled.
I1225 10:35:31.489715   49008 terse_processor.cc:718] Domain: CAPTION
E1225 10:35:31.490875   49008 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1225 10:35:31.492369   49008 terse_processor.cc:1293] Resetting Terse Processor
I1225 10:35:31.492396   49008 terse_processor.cc:838] Cancelling session.
W1225 10:35:31.492675   49008 decoder_endpointer_stream.cc:35] Acoustic ep reader thread cancelled.
I1225 10:35:31.492849   49008 terse_processor.cc:755] Setup completed
I1225 10:35:31.492860   49008 soda_impl.cc:558] Server ASR Disabled
I1225 10:35:31.492867   49008 soda_impl.cc:606] Initializing audio logger
W1225 10:35:31.492877   49008 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
I1225 10:35:31.492881   49008 soda_async_impl.cc:577] Session parameters updated. Reconfiguring SODA.
I1225 10:35:31.696233   49013 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1225 10:35:31.701219   49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
W1225 10:35:32.685130   49013 lag_detector.cc:30] Pipeline lagging by 827 ms. Continue processing samples.
W1225 10:35:33.138113   49013 lag_detector.cc:30] Pipeline lagging by 1060 ms. Continue processing samples.
W1225 10:35:33.505378   49013 lag_detector.cc:30] Pipeline lagging by 1227 ms. Continue processing samples.
W1225 10:35:33.871842   49013 lag_detector.cc:30] Pipeline lagging by 1393 ms. Continue processing samples.
I1225 10:35:33.963726   49013 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1225 10:35:33.963874   49013 terse_processor.cc:1046] Flushing pending events ..
I1225 10:35:33.964031   49013 terse_processor.cc:1063] Start longform loop on remaining audio: 1.16s
I1225 10:35:33.965874   49031 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1225 10:35:33.966170   49028 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1225 10:35:33.989725   49027 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1225 10:35:34.008353   49026 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1225 10:35:34.123794   49030 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1225 10:35:34.123921   49033 terse_processor.cc:469] Final recognition has been created.
I1225 10:35:34.123928   49013 terse_processor.cc:1558] Longform resets session because secs in this session are: 0.84
I1225 10:35:34.123950   49013 terse_processor.cc:838] Cancelling session.
I1225 10:35:34.123941   49033 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
W1225 10:35:34.124055   49013 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
E1225 10:35:34.124817   49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1225 10:35:34.129116   49071 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1225 10:35:34.129125   49068 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1225 10:35:34.159723   49067 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1225 10:35:34.196678   49066 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1225 10:35:34.269877   49069 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1225 10:35:34.269959   49072 terse_processor.cc:469] Final recognition has been created.
I1225 10:35:34.269974   49072 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1225 10:35:34.269974   49013 terse_processor.cc:1088] Stop looping and end session. Audio left: 50ms
I1225 10:35:34.270332   49013 soda_impl.cc:1035] Got pipeline signal out
I1225 10:35:34.270389   49013 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1225 10:35:34.270869   49013 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
W1225 10:35:34.272062   49013 lag_detector.cc:30] Pipeline lagging by 1614 ms. Continue processing samples.
W1225 10:35:34.601753   49013 lag_detector.cc:30] Pipeline lagging by 1743 ms. Continue processing samples.
^C[* Controller/Batch processing finished (0) *]
[* Engine exiting *]
(eca-control-objects) Disconnecting chainsetup:  "untitled-chainsetup".

  Are you sure? yes | no

biemster wrote 12/26/2020 at 14:49 point

Thanks! It's always nice to be rewarded with a working end product after such a long project :)

From your log I see that you're using a model with a 'dictation.config', so I guess you're trying a gboard model? (I did not know that libsoda would automatically pick up those though, so no need for symlinking anymore :)).

When I get a lagging pipeline it is usually because I'm calling AddAudio with the wrong parameters, maybe your len parameter is off by a factor 2? Maybe this specific model requires 32 bit input, or 8 bit?

It would help if you'd specify which RNNT model you're trying to run.

  Are you sure? yes | no

Jude Ashly wrote 12/26/2020 at 17:00 point

I've changed chuck_size to 1024. Now my log looks like this 

  77857 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:02.418223   77857 terse_processor.cc:1293] Resetting Terse Processor
I1226 09:55:02.418240   77857 terse_processor.cc:838] Cancelling session.
W1226 09:55:02.418537   77857 decoder_endpointer_stream.cc:35] Acoustic ep reader thread cancelled.
I1226 09:55:02.418698   77857 terse_processor.cc:755] Setup completed
I1226 09:55:02.418710   77857 soda_impl.cc:558] Server ASR Disabled
I1226 09:55:02.418720   77857 soda_impl.cc:606] Initializing audio logger
W1226 09:55:02.418736   77857 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
I1226 09:55:02.418743   77857 soda_async_impl.cc:577] Session parameters updated. Reconfiguring SODA.
I1226 09:55:02.461880   77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:02.462976   77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.161415   77862 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1226 09:55:03.161525   77862 terse_processor.cc:1046] Flushing pending events ..
I1226 09:55:03.161595   77862 terse_processor.cc:1063] Start longform loop on remaining audio: 1.16s
I1226 09:55:03.162907   77875 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.163285   77882 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.196991   77878 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.232416   77880 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:03.341455   77877 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:03.341546   77876 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:03.341556   77862 terse_processor.cc:1558] Longform resets session because secs in this session are: 0.84
I1226 09:55:03.341573   77876 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1226 09:55:03.341574   77862 terse_processor.cc:838] Cancelling session.
W1226 09:55:03.341698   77862 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
E1226 09:55:03.342467   77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.346847   77884 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.346826   77891 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.380860   77886 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.404026   77887 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:03.479328   77888 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:03.479369   77885 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:03.479385   77885 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
I1226 09:55:03.479379   77862 terse_processor.cc:1088] Stop looping and end session. Audio left: 50ms
I1226 09:55:03.479728   77862 soda_impl.cc:1035] Got pipeline signal out
I1226 09:55:03.479788   77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:03.480237   77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
I1226 09:55:03.799547   77862 soda_async_impl.cc:1090] Forcing a SODA sync to get the final event and reset ASR.
I1226 09:55:03.799596   77862 terse_processor.cc:1046] Flushing pending events ..
I1226 09:55:03.799660   77862 terse_processor.cc:1063] Start longform loop on remaining audio: 1.33s
I1226 09:55:03.800333   77862 terse_processor.cc:1088] Stop looping and end session. Audio left: 1.33s
I1226 09:55:03.800760   77895 pipeline.cc:49] [Threadname 'audio_level_eve'] Finished run.
I1226 09:55:03.801030   77897 pipeline.cc:49] [Threadname 'endpointer_even'] Finished run.
I1226 09:55:03.817305   77900 pipeline.cc:49] [Threadname 'rnnt_encoder0'] Finished run.
I1226 09:55:03.850583   77893 pipeline.cc:49] [Threadname 'rnnt_encoder1'] Finished run.
I1226 09:55:04.015813   77894 pipeline.cc:49] [Threadname 'end_of_utteranc'] Finished run.
I1226 09:55:04.015937   77899 terse_processor.cc:469] Final recognition has been created.
I1226 09:55:04.015959   77899 pipeline.cc:49] [Threadname 'recognition_eve'] Finished run.
W1226 09:55:04.016060   77862 log_creator-internal.cc:423] Failed to merge results for logging:
UNKNOWN: Result times overlap [type.googleapis.com/util.ErrorSpacePayload='SpeechErrorSpace::SpeechError(-73560)']
I1226 09:55:04.016357   77862 soda_impl.cc:1035] Got pipeline signal out
I1226 09:55:04.016403   77862 terse_processor.cc:1199] No terse session, starting a new one on input audio.
E1226 09:55:04.016863   77862 pie_rnn_fst_decoder_graph.cc:25] Using deprecated decoder_graph_type RNN_FST. Use decoder_graph_type DUAL and DualFstDecoderParams instead.
^C[* Controller/Batch processing finished (0) *]

Im using lp_rnnt-20181012 model.

  Are you sure? yes | no

biemster wrote 12/26/2020 at 17:04 point

I see a couple deprecation errors, do you need to use this specific (old) model? There are some much better new ones out there.

  Are you sure? yes | no

0xEb0 wrote 12/17/2020 at 19:35 point

Thanks a lot for sharing!

On gasr repo, you say that there's also a fr_fr model, but i can't get my hands on it.

Latests GBoard/Recorder APK (from APKMirror) seems to reference en_us package only.
Some hint perhaps ? :)

P.S. Am I on the right track xxx_cb8c332f_8612b6f5_392665af_xxx ?

  Are you sure? yes | no

biemster wrote 12/18/2020 at 13:40 point

You're definitely on the right track! The block around that ctx needs a little nudge. The link to the fr_fr model was added by accident to a gboard superpack json, did you figure out how to search for those?

  Are you sure? yes | no

0xEb0 wrote 12/19/2020 at 14:33 point

I thought I figured out how to get them, but the fr_fr one seems well hidden. It may be a specific/unique GBoard version. I tried (randomly) several from APKMirror

What I do : GBoard APKs  > apktool > grep superpacks-manifests > get some JSON files that lead me to URLs xxx/en_us/ondevice_recognizer / lp_rnnt-<date>.zip.

But these are always en_us / 2019 versions.

  Are you sure? yes | no

biemster wrote 12/19/2020 at 20:37 point

@0xEb0 try building a script that sweeps all the <dates>, you'll find some nice surprises! One of those will be significantly better than what's included in soda, or the 2019 gboard model.

  Are you sure? yes | no

Abraham Devos wrote 12/17/2020 at 15:50 point

This is amazing progress! 

Followed your footsteps on GitHub, gtts works great (confirmed) ; gasr does not (see below).

Managed to download libsoda & us-en model.

Biemster, can it be my versions are off vs. your setup (mine: SODA 0.08, SODAModel us-en 0.04).

I noticed on gtts one minor version difference meant working/not working.

W1217 14:34:36.006657    6252 soda_async_impl.cc:231] Creating soda_impl
W1217 14:34:36.084269    6252 terse_processor.cc:242] TISID disabled.
W1217 14:34:36.108929    6252 soda_async_impl.cc:390] Soda session starting (require_hotword:0, hotword_timeout_in_millis:0)
W1217 14:34:36.135319   11444 soda_async_impl.cc:765] Soda session stopped due to: STOP_CALLED
E1217 14:34:36.136378    6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.170122    6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.170666    6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.185747    6252 mapped-file.cc:44] Failed to unmap region: 0
E1217 14:34:36.187252    6252 mapped-file.cc:44] Failed to unmap region: 0
W1217 14:34:36.187736    6252 soda_async_impl.cc:793] Deleting soda_impl

  Are you sure? yes | no

biemster wrote 12/17/2020 at 17:42 point

Thanks! Your soda versions are the same as mine, but most likely the api key and call stack verification is blocking you now. Time to whip out your disassembler!

  Are you sure? yes | no

a1is wrote 12/17/2020 at 00:47 point

linux libsoda.so already avalible, but i'am stuck with quest for searching api key.. 
can you give more info?...

  Are you sure? yes | no

biemster wrote 12/17/2020 at 10:04 point

Seriously you found the Linux library? I still don't see it when I use the extension tools, are you sure it's not the placeholder file you found? As for the API key, this is where the project enters a gray area, which I outlined in the code repo:

https://github.com/biemster/gasr/issues/1

The google speech team is not fond of "unauthorized repurposing" of their work, which is understandable.

  Are you sure? yes | no

goddade wrote 12/16/2020 at 10:50 point

Significant progress!

Can you tell me how to get libsoda file?

I searched the chrome directory, but did not find libsoda.

  Are you sure? yes | no

biemster wrote 12/16/2020 at 17:35 point

I've just now addressed this in an issue in the gasr repository on github, but can't go into too much detail unfortunately as the speech team at Google does not want this to be repurposed. So I can't put it here in public.

  Are you sure? yes | no