我正在使用Python的SpeechRecognition为直播流生成字幕.我注意到,当我听麦克风输入时,识别器需要几秒钟的静默时间才能停止捕获音频.有没有办法减少0.5秒所需的静默时间?我对使用其他方法/库持开放态度,只要它不是太低级别的东西.

到目前为止,我的代码如下:

import speech_recognition as sr
from googletrans import Translator
import threading

# Config
OUTPUT_FILE_NAME = "transcription.txt"
        
def listen(recognizer, microphone):
    with microphone as source:
        audio = recognizer.listen(source)
        return audio
        
def transcribe(audio, recognizer, translator):
    try:
        uk_text = recognizer.recognize_google(audio, language="uk-UA")
        translated_text = translator.translate(uk_text, src="uk", dest="en")
        write_to_file(OUTPUT_FILE_NAME, translated_text.text)
    except sr.UnknownValueError:
        print("Could not understand audio.")
        write_to_file(OUTPUT_FILE_NAME, "")
    except sr.RequestError as e:
        print(f"Error occurred during recognition: {e}")

def write_to_file(file_path, text):
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(text)

def get_mic():
    for index, source in enumerate(sr.Microphone.list_microphone_names()):
        print(f"{index}: {source}")
    while True:
        index = input("Select an index from the list above: ")
        try:
            return int(index)
        except ValueError:
            print("Invalid index")

if __name__ == "__main__":

    mic_index = get_mic()
    translator = Translator()
    recognizer = sr.Recognizer()
    microphone = sr.Microphone(device_index=mic_index)

    print("Adjusting for ambient noise, please don't say anything...")
    with microphone as source:
        recognizer.adjust_for_ambient_noise(source)

    print("Listening...")

    try:
        while True:
            audio = listen(recognizer, microphone)
            transcription_thread = threading.Thread(
                                    target=transcribe,
                                    kwargs={"audio":audio,
                                    "recognizer":recognizer,
                                    "translator":translator}
                                    )
            transcription_thread.setDaemon(True)
            transcription_thread.start()
    except KeyboardInterrupt:
        print("\nShutting down recognition service...")
        write_to_file("transcription.txt", "Recognition service inactive. This is sample text.")

我试过在.listen()函数上使用phrase_time_limit,但这不是我想要的,因为它有时会在单词中间打断我.

推荐答案

source code of the SpeechRecognition library开始,您需要的参数是pause_threshold,这是Recognizer对象采用的参数.

self.pause_threshold = 0.8  
# seconds of non-speaking audio before a phrase is considered complete

在上面的代码中,它将按如下方式传递:

recognizer = sr.Recognizer(pause_threshold=0.5) # or other value

try 使用pause_threshold值进行试验.

Python相关问答推荐

Class_weight参数不影响RandomForestClassifier不平衡数据集中的结果

我必须将Sigmoid函数与r2值的两种类型的数据集(每种6个数据集)进行匹配,然后绘制匹配函数的求导.我会犯错

使用plotnine和Python构建地块

仿制药的类型铸造

如何标记Spacy中不包含特定符号的单词?

将两只Pandas rame乘以指数

发生异常:TclMessage命令名称无效.!listbox"

用Python解密Java加密文件

Pre—Commit MyPy无法禁用非错误消息

连接一个rabrame和另一个1d rabrame不是问题,但当使用[...]'运算符会产生不同的结果

如何在Python中获取`Genericums`超级类型?

LocaleError:模块keras._' tf_keras. keras没有属性__internal_'''

Python—为什么我的代码返回一个TypeError

为什么我的sundaram筛这么低效

30个非DATETIME天内的累计金额

Js的查询结果可以在PC Chrome上显示,但不能在Android Chrome、OPERA和EDGE上显示,而两者都可以在Firefox上运行

Python—在嵌套列表中添加相同索引的元素,然后计算平均值

Python协议不兼容警告

Python OPCUA,modbus通信代码运行3小时后出现RuntimeError

read_csv分隔符正在创建无关的空列