mp3 text extraction Exception - 5MB~ file #460

RiccardoRomagnoli · 2023-04-08T16:55:18Z

Describe the bug
Get HTTP error from SpeechRecognition when trying to extract text from an mp3 file of 5MB

Desktop (please complete the following information):

OS: Ubuntu
Textract version 1.6.5
Python version 3.8

Additional context
Add any other context about the problem here.

File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/speech_recognition/init.py", line 840, in recognize_google
response = urlopen(request, timeout=self.operation_timeout)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/init.py", line 79, in process
return parser.process(filename, input_encoding, output_encoding, **kwargs)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/utils.py", line 46, in process
byte_string = self.extract(filename, **kwargs)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/audio.py", line 28, in extract
speech = self.extract(temp_filename, method, **kwargs)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/textract/parsers/audio.py", line 39, in extract
speech = r.recognize_google(audio)
File "/home/riccardo/.conda/envs/stochastic/lib/python3.8/site-packages/speech_recognition/init.py", line 842, in recognize_google
raise RequestError("recognition request failed: {}".format(e.reason))
speech_recognition.RequestError: recognition request failed: Bad Request

@jpweytjens

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mp3 text extraction Exception - 5MB~ file #460

mp3 text extraction Exception - 5MB~ file #460

RiccardoRomagnoli commented Apr 8, 2023

mp3 text extraction Exception - 5MB~ file #460

mp3 text extraction Exception - 5MB~ file #460

Comments

RiccardoRomagnoli commented Apr 8, 2023