FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

kxrob · 2022-12-23T10:32:00Z

SpeechRecognition is a massive dependency, like pocketsphinx. And possibly others too. Making those dependencies "extra" would remove a lot distribution load, burden and install errors.

Anyway one wouldn't expect a tool "textract" to run complex AI recognition tools just so light-mindedly - which are instable and non-deterministic. One wouldn't use those in serious projects. Usually such file types need to be filtered before letting textract try.

So these massiv AI libraries should better all become "extra" dependencies at least.

pencil · 2023-02-15T02:37:18Z

I came here to suggest the same thing: It would be great if textract was more lightweight by default. I only need something to extract text from common document formats such as .pdf, .rtf, .docx. The dependency on SpeechRecognition is problematic because its massive size greatly slows down build time of our project and increases the size of the resulting Docker image substantially.

As @kxrob suggested, the dependency could be moved to "extra" and the tool could provide clear instructions if the package is unavailable when trying to extract text from an audio file, e.g. "Extracting text from audio files is an optional feature. Please run pip install SpeechRecognition~=3.8.1".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

kxrob commented Dec 23, 2022

pencil commented Feb 15, 2023 •

edited

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

Comments

kxrob commented Dec 23, 2022

pencil commented Feb 15, 2023 • edited

pencil commented Feb 15, 2023 •

edited