【Raspberry Pi】USBマイク音声をPythonで取得/録音する方法

こんにちは！　けい(Twitter)です。

今回は、ラズパイに接続したUSBマイクからの音声をPythonで取得する方法についてまとめていきたいと思います。

前回はコマンドラインでマイク音声を録音&再生したので、そちらの記事も載せておきます。

【Raspberry Pi】USBマイクを接続する方法

必要なもの

ラズパイ

【国内正規代理店品】Raspberry Pi4 ModelB 4GB ラズベリーパイ4 技適対応品【RS・OKdo版】

created by Rinker

Raspberry Pi

¥9,970 (2024/07/26 11:07:37時点 Amazon調べ-詳細)

USBマイク

ラズパイで動作するマイクを使用します。

サンワサプライ USBマイクロホン単一指向性直挿し型 MM-MCU02BK

created by Rinker

サンワサプライ

¥2,945 (2024/07/26 01:34:14時点 Amazon調べ-詳細)

PyAudioのインストール

Pythonで、USBマイクからの音声信号を取得するために、PyAudioをインストールします。
PyAudioはPythonでオーディオ機器を制御するのに便利なライブラリです。

では、インストールしていきましょう。

次のコマンドをターミナルで実行します。

$ sudo pip3 install pyaudio
$ sudo apt install libportaudio0 libportaudio2 libportaudiocpp0 portaudio19-dev

一行目がpipコマンドでpyaudioをインストールしています。

2行目は、pyaudioを使用するために必要なライブラリをインストールしています。

一つのコマンドでもインストールできる

今回私は、pipコマンドでインストールしましたが、aptコマンドでインストールすれば、必要なライブラリも自動的にインストールしてくれるようなので、aptコマンドの方がいいかも。

一応aptコマンドだと次のコマンドでインストールできます。

＊上の2つのコマンドを実行したらこのコマンドは実行しなくても大丈夫です。

$ sudo apt install python3-pyaudio

マイクの接続番号を確認

では、USBマイクをラズパイに挿してください。

ラズパイがUSBマイクを何番のデバイスとして認識しているかを確認する必要があります。
この番号は、USBマイクで録音する時に必要になります。

次のpythonプログラムを適当な名前で保存して、実行してください。

import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print(p.get_device_info_by_index(i))

ずらーっとこのような文字が表示されます。

{'index': 0, 'structVersion': 2, 'name': 'bcm2835 HDMI 1: - (hw:0,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 1, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 2, 'structVersion': 2, 'name': 'USB Microphone: Audio (hw:2,0)', 'hostApi': 0, 'maxInputChannels': 1, 'maxOutputChannels': 0, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': -1.0, 'defaultHighInputLatency': 0.034829931972789115, 'defaultHighOutputLatency': -1.0, 'defaultSampleRate': 44100.0}
{'index': 3, 'structVersion': 2, 'name': 'sysdefault', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 4, 'structVersion': 2, 'name': 'lavrate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 5, 'structVersion': 2, 'name': 'samplerate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 6, 'structVersion': 2, 'name': 'speexrate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 7, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.00873015873015873, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 8, 'structVersion': 2, 'name': 'upmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0015419501133786847, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 9, 'structVersion': 2, 'name': 'vdownmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 6, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0015419501133786847, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 10, 'structVersion': 2, 'name': 'dmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.021333333333333333, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.021333333333333333, 'defaultSampleRate': 48000.0}
{'index': 11, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.00873015873015873, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

indexの番号が1から11まであることが分かります。
このindexの番号がデバイスの番号を示しています。

さらに、index：2に、’USB Microphone: Audio (hw:2,0)’という名前のデバイスがあることが分かります。

これがUSBマイクです。デバイス番号が2であることが分かりました。

USBマイクを接続する前

比較のためにUSBマイクを接続する前に、pythonプログラムを実行したときの結果を貼っておきます。

{'index': 0, 'structVersion': 2, 'name': 'bcm2835 HDMI 1: - (hw:0,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 1, 'structVersion': 2, 'name': 'bcm2835 Headphones: - (hw:1,0)', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 2, 'structVersion': 2, 'name': 'sysdefault', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 3, 'structVersion': 2, 'name': 'lavrate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 4, 'structVersion': 2, 'name': 'samplerate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 5, 'structVersion': 2, 'name': 'speexrate', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 128, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0016099773242630386, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 6, 'structVersion': 2, 'name': 'pulse', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.00873015873015873, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 7, 'structVersion': 2, 'name': 'upmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 8, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0015419501133786847, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 8, 'structVersion': 2, 'name': 'vdownmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 6, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.0015419501133786847, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}
{'index': 9, 'structVersion': 2, 'name': 'dmix', 'hostApi': 0, 'maxInputChannels': 0, 'maxOutputChannels': 2, 'defaultLowInputLatency': -1.0, 'defaultLowOutputLatency': 0.021333333333333333, 'defaultHighInputLatency': -1.0, 'defaultHighOutputLatency': 0.021333333333333333, 'defaultSampleRate': 48000.0}
{'index': 10, 'structVersion': 2, 'name': 'default', 'hostApi': 0, 'maxInputChannels': 32, 'maxOutputChannels': 32, 'defaultLowInputLatency': 0.008684807256235827, 'defaultLowOutputLatency': 0.00873015873015873, 'defaultHighInputLatency': 0.034807256235827665, 'defaultHighOutputLatency': 0.034829931972789115, 'defaultSampleRate': 44100.0}

indexが、1から10までであることが分かります。

つまり、接続後は一つ増えています。

PyAudioで録音するプログラム

それでは、Pythonで録音するプログラムを見ていきましょう。

import pyaudio
import wave

form_1 = pyaudio.paInt16 # 16-bit resolution
chans = 1 # 1 channel
samp_rate = 44100 # 44.1kHz　サンプリング周波数
chunk = 4096 # 2^12 一度に取得するデータ数
record_secs = 3 # 録音する秒数
dev_index = 2 # デバイス番号
wav_output_filename = 'test.wav' # 出力するファイル

audio = pyaudio.PyAudio() # create pyaudio instantiation

# create pyaudio stream
stream = audio.open(format = form_1,rate = samp_rate,channels = chans, \
                    input_device_index = dev_index,input = True, \
                    frames_per_buffer=chunk)
print("recording")
frames = []

# loop through stream and append audio chunks to frame array
for i in range(0,int((samp_rate/chunk)*record_secs)):
    data = stream.read(chunk)
    frames.append(data)

print("finished recording")

# stop the stream, close it, and terminate the pyaudio instantiation
stream.stop_stream()
stream.close()
audio.terminate()

# save the audio frames as .wav file
wavefile = wave.open(wav_output_filename,'wb')
wavefile.setnchannels(chans)
wavefile.setsampwidth(audio.get_sample_size(form_1))
wavefile.setframerate(samp_rate)
wavefile.writeframes(b''.join(frames))
wavefile.close()