Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Make SpeechRecognition etc. large AI libs just "extra" dependencies. #451

Open
kxrob opened this issue Dec 23, 2022 · 1 comment
Open

Comments

@kxrob
Copy link

kxrob commented Dec 23, 2022

SpeechRecognition is a massive dependency, like pocketsphinx. And possibly others too. Making those dependencies "extra" would remove a lot distribution load, burden and install errors.

Anyway one wouldn't expect a tool "textract" to run complex AI recognition tools just so light-mindedly - which are instable and non-deterministic. One wouldn't use those in serious projects. Usually such file types need to be filtered before letting textract try.

So these massiv AI libraries should better all become "extra" dependencies at least.

@pencil
Copy link

pencil commented Feb 15, 2023

I came here to suggest the same thing: It would be great if textract was more lightweight by default. I only need something to extract text from common document formats such as .pdf, .rtf, .docx. The dependency on SpeechRecognition is problematic because its massive size greatly slows down build time of our project and increases the size of the resulting Docker image substantially.

As @kxrob suggested, the dependency could be moved to "extra" and the tool could provide clear instructions if the package is unavailable when trying to extract text from an audio file, e.g. "Extracting text from audio files is an optional feature. Please run pip install SpeechRecognition~=3.8.1".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants