我正在使用Python的SpeechRecognition为直播流生成字幕.我注意到,当我听麦克风输入时,识别器需要几秒钟的静默时间才能停止捕获音频.有没有办法减少0.5秒所需的静默时间?我对使用其他方法/库持开放态度,只要它不是太低级别的东西.
到目前为止,我的代码如下:
import speech_recognition as sr
from googletrans import Translator
import threading
# Config
OUTPUT_FILE_NAME = "transcription.txt"
def listen(recognizer, microphone):
with microphone as source:
audio = recognizer.listen(source)
return audio
def transcribe(audio, recognizer, translator):
try:
uk_text = recognizer.recognize_google(audio, language="uk-UA")
translated_text = translator.translate(uk_text, src="uk", dest="en")
write_to_file(OUTPUT_FILE_NAME, translated_text.text)
except sr.UnknownValueError:
print("Could not understand audio.")
write_to_file(OUTPUT_FILE_NAME, "")
except sr.RequestError as e:
print(f"Error occurred during recognition: {e}")
def write_to_file(file_path, text):
with open(file_path, "w", encoding="utf-8") as file:
file.write(text)
def get_mic():
for index, source in enumerate(sr.Microphone.list_microphone_names()):
print(f"{index}: {source}")
while True:
index = input("Select an index from the list above: ")
try:
return int(index)
except ValueError:
print("Invalid index")
if __name__ == "__main__":
mic_index = get_mic()
translator = Translator()
recognizer = sr.Recognizer()
microphone = sr.Microphone(device_index=mic_index)
print("Adjusting for ambient noise, please don't say anything...")
with microphone as source:
recognizer.adjust_for_ambient_noise(source)
print("Listening...")
try:
while True:
audio = listen(recognizer, microphone)
transcription_thread = threading.Thread(
target=transcribe,
kwargs={"audio":audio,
"recognizer":recognizer,
"translator":translator}
)
transcription_thread.setDaemon(True)
transcription_thread.start()
except KeyboardInterrupt:
print("\nShutting down recognition service...")
write_to_file("transcription.txt", "Recognition service inactive. This is sample text.")
我试过在.listen()
函数上使用phrase_time_limit
,但这不是我想要的,因为它有时会在单词中间打断我.