[数据分析实践]

时间：2023-11-29 本站点击：1

数据背景

作为“世界灭绝之都”，夏威夷已经失去了68%的鸟类物种，其后果可能会损害整个食物链。研究人员利用种群监测来了解本地鸟类对环境变化和保护措施的反应。但岛上的许多鸟类都被隔离在难以接近的高海拔栖息地。由于身体监测困难，科学家们转向了声音记录。这种被称为生物声学监测的方法可以为研究濒危鸟类种群提供一种被动的、低成本的、经济的策略。目前处理大型生物声学数据集的方法涉及对每个记录的手工注释。这需要专门的训练和大量的时间。因此使用机器学习技能，通过声音来识别鸟类的种类，可以节约大量成本。具体来说，开发一个模型，可以处理连续的音频数据，然后从声音上识别物种。最好的条目将能够用有限的训练数据训练可靠的分类器。

数据介绍

数据集来源：https://www.kaggle.com/competitions/birdclef-2022/data

下载方式：https://github.com/Kaggle/kaggle-api kaggle competitions download -c birdclef-2022

train_metadata.csv:为训练数据提供了广泛的元数据

primary_label -鸟类的编码。可以通过将代码附加到https://ebird.org/species/来查看有关鸟类代码的详细信息，例如美国乌鸦的代码添加到https://ebird.org/species/amecro

secondary_labels: 记录员标注的背景物种，空列表并不意味着没有背景鸟的声音。

author - 提供录音的eBird用户

Filename:关联音频文件。

rating: 浮动值在0.0到5.0之间，作为Xeno-canto的质量等级和背景物种数量的指标，其中5.0是最高的，1.0是最低的。0.0表示此记录还没有用户评级。

train_audio:大量的训练数据由xenocanto.org的用户慷慨上传的单个鸟类叫声的短录音组成。这些文件已被下采样到32khz，适用于匹配测试集的音频，并转换为ogg格式。

test_soundscapes:当您提交一个笔记本时，test_soundscapes目录将填充大约5500段录音，用于评分。每一个都是1分钟几毫秒的ogg音频格式，并只有一个音景可供下载。

test.csv：测试数据

row_id：行的唯一标识符。

file_id：音频文件的唯一标识符。

bird ：一行的ebird代码。每个音频文件每5秒窗口有一排为每个得分物种。

end_time：5秒时间窗口(5、10、15等)的最后一秒。

音频特征提取

特征提取是突出信号中最具辨别力和影响力的特征的过程。本文将引导完成音频处理中的一些重要特征提取，你可以将其扩展到适合的问题域的许多其他类型的特征。本文的其余部分只是一个生物技术学生的尝试，向你解释ta在过去几天能够理解的任何信号处理。

我们将讨论的三种主要音频特征提取类型 ```

Time Domain 2. Frequency Domain 3. Spectrum-Based

import os import pandas as pd import torch import torchaudio import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline import plotly.express as px import librosa import librosa.display import IPython.display as ipd import sklearn import warnings import seaborn as sns warnings.filterwarnings('ignore')  #导入数据 train_csv=pd.read_csv('../input/birdclef-2021/train_metadata.csv') train_csv.head()

加载并频文件

将音频文件加载为浮点时间序列并提供其原生采样率

采样频率（或采样率）是音频中每秒的采样数（数据点）

可以通过将数据点总数除以采样频率来检查音频长度

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)

audio_file, _ = librosa.effects.trim(y)print('Audio File:', audio_file, '\n')print('Audio File shape:', np.shape(audio_file))

#用在例子中y_astfly, sr_astfly = librosa.load(astfly)audio_astfly, _ = librosa.effects.trim(y_astfly)y_casvir, sr_casvir = librosa.load(casvir)audio_casvir, _ = librosa.effects.trim(y_casvir)y_subfly, sr_subfly = librosa.load(subfly)audio_subfly, _ = librosa.effects.trim(y_subfly)y_wilfly, sr_wilfly = librosa.load(wilfly)audio_wilfly, _ = librosa.effects.trim(y_wilfly)y_verdin, sr_verdin = librosa.load(verdin)audio_verdin, _ = librosa.effects.trim(y_verdin)y_solsan, sr_solsan = librosa.load(solsan)audio_solsan, _ = librosa.effects.trim(y_solsan)

[]()

1.时域特征

波形可视化

fig, ax = plt.subplots(6, figsize = (16, 12))fig.suptitle('Sound Waves', fontsize=16)librosa.display.waveplot(y = audio_astfly, sr = sr_astfly, color = "#A300F9", ax=ax[0])librosa.display.waveplot(y = audio_casvir, sr = sr_casvir, color = "#4300FF", ax=ax[1])librosa.display.waveplot(y = audio_subfly, sr = sr_subfly, color = "#009DFF", ax=ax[2])librosa.display.waveplot(y = audio_wilfly, sr = sr_wilfly, color = "#00FFB0", ax=ax[3])librosa.display.waveplot(y = audio_verdin, sr = sr_verdin, color = "#D9FF00", ax=ax[4])librosa.display.waveplot(y = audio_solsan, sr = sr_solsan, color = "r", ax=ax[5]);for i, name in zip(range(6), birds):    ax[i].set_ylabel(name, fontsize=13)

频谱图

频谱图是信号频率随时间变化的频谱的直观表示，它们是信号的时频图。使用频谱图，我们可以看到能量水平 (dB) 如何随时间变化。它是一种直观的方式，表示信号在特定波形中出现的各种频率下随时间变化的信号强度或“响度”。频谱图是通常被描述为热图，即通过改变颜色或亮度来显示强度的图像。

n_fft=2048hop_length=512# Short-time Fourier transform (STFT)D_astfly = np.abs(librosa.stft(audio_astfly, n_fft = n_fft, hop_length = hop_length))# Convert an amplitude spectrogram to Decibels-scaled spectrogram.DB_astfly = librosa.amplitude_to_db(D_astfly, ref = np.max)# === PLOT ===fig, ax = plt.subplots(1, 1, figsize=(12, 6))fig.suptitle('Log Frequency Spectrogram', fontsize=16)# fig.delaxes(ax[1, 2])img=librosa.display.specshow(DB_astfly, sr = sr_astfly, hop_length = hop_length, x_axis = 'time',                          y_axis = 'log', cmap = 'cool', ax=ax)ax.set_title('ASTFLY', fontsize=13) plt.colorbar(img,ax=ax)

RMSE

信号的能量对应于其总幅度，其对于音频信号，这大致表征了信号的响度。RMSE是一种表征信号能量的方法，计算均方的平方根（音频帧幅度平方的平均值).

S, phase = librosa.magphase(librosa.stft(audio_astfly))S_db=librosa.amplitude_to_db(S, ref=np.max)rms = librosa.feature.rms(S=S)fig, ax = plt.subplots(nrows=2, sharex=True,figsize = (16, 6))times = librosa.times_like(rms)ax[0].semilogy(times, rms[0], label='RMS Energy')ax[0].set(xticks=[])ax[0].legend()ax[0].label_outer()librosa.display.specshow(S_db,                         y_axis='log', x_axis='time', ax=ax[1])ax[1].set(title='log Power spectrogram')plt.show()

梅尔光谱图

梅尔频谱图是将频率转换为梅尔标度的频谱图

# Create the Mel SpectrogramsS_astfly = librosa.feature.melspectrogram(audio_astfly, sr=sr_astfly)S_DB_astfly = librosa.amplitude_to_db(S_astfly, ref=np.max)# === PLOT ====fig, ax = plt.subplots(1, 1, figsize=(12, 6))fig.suptitle('Mel Spectrogram', fontsize=16)img=librosa.display.specshow(S_DB_astfly, sr = sr_astfly, hop_length = hop_length, x_axis = 'time',                          y_axis = 'log', cmap = 'cool', ax=ax)ax.set_title('ASTFLY', fontsize=13)plt.colorbar(img,ax=ax)

过零率(ZCR)

音频信号的 ZCR 定义为信号改变符号的速率。 ZCR 是检测语音帧是有声、无声还是无声的一种有效且简单的方法。预计清音段产生比语音段更高的ZCR，理想情况下静音段的 ZCR 等于 0

# Total zero_crossings in our 1 songzero_astfly = librosa.zero_crossings(audio_astfly, pad=False)zero_casvir = librosa.zero_crossings(audio_casvir, pad=False)zero_wilfly = librosa.zero_crossings(audio_wilfly, pad=False)zero_subfly = librosa.zero_crossings(audio_subfly, pad=False)zero_verdin = librosa.zero_crossings(audio_verdin, pad=False)zero_solsan = librosa.zero_crossings(audio_solsan, pad=False)zero_birds_list = [zero_astfly, zero_casvir, zero_wilfly, zero_subfly, zero_verdin,zero_solsan]for bird, name in zip(zero_birds_list, birds):    print("{} change rate is {:,}".format(name, sum(bird)))'''astfly change rate is 92,121casvir change rate is 1,651,380subfly change rate is 30,477wilfly change rate is 740,062verdin change rate is 1,246,690solsan change rate is 923,452'''

Harmonic和Percussive Signals的分离

声音大致可以分为两类。- 一方面，谐波是我们感知为音高的声音，是什么让我们听到旋律和和弦。

另一方面，敲击声类似于噪音，通常源于乐器的声部，如击鼓声或语音中的辅音。

y_harm_casvir, y_perc_casvir = librosa.effects.hpss(audio_casvir)D_casvir = np.abs(librosa.stft(audio_casvir, n_fft = n_fft, hop_length = hop_length))DB_casvir = librosa.amplitude_to_db(D_casvir, ref = np.max)plt.figure(figsize = (16, 6))plt.plot(y_perc_casvir, color = '#FFB100')plt.plot(y_harm_casvir, color = '#A300F9')plt.legend(("Perceptrual", "Harmonics"))plt.title("Harmonics + Percussive : Casvir Bird", fontsize=16);

H, P = librosa.decompose.hpss(librosa.stft(audio_casvir)) plt.figure(figsize=(16, 6)) plt.subplot(3, 1, 1) librosa.display.specshow(DB_casvir, y_axis='log') plt.colorbar(format='%+2.0f dB') plt.title('Full power spectrogram: Harmonic + Percussive')

harmonic spectrogram will show more horizontal/pitch-dependent changes

plt.subplot(3, 1, 2) librosa.display.specshow(librosa.amplitude_to_db(np.abs(H), ref=np.max), y_axis='log') plt.colorbar(format='%+2.0f dB') plt.title('Harmonic power spectrogram') plt.subplot(3, 1, 3)

percussive spectrogram will show more vertical/time-dependent changes

librosa.display.specshow(librosa.amplitude_to_db(np.abs(P), ref=np.max), y_axis='log') plt.colorbar(format='%+2.0f dB') plt.title('Percussive power spectrogram') plt.tight_layout() plt.show()

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)0

tempo, beat_frames = librosa.beat.beat_track(y=y_harm_casvir, sr=sr_casvir) print('Detected Tempo: '+ str(tempo) + ' beats/min') beat_times = librosa.frames_to_time(beat_frames, sr=sr) beat_time_diff = np.ediff1d(beat_times) beat_nums = np.arange(1, np.size(beat_times)) fig, ax = plt.subplots() fig.set_size_inches(20, 5) ax.set_ylabel("Time difference (s)") ax.set_xlabel("Beats") g = sns.barplot(beat_nums, beat_time_diff, palette="rocket",ax=ax) g = g.set(xticklabels=[])

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)1

Create Tempo BPM variable

tempoastfly, = librosa.beat.beat_track(y_astfly, sr = sr_astfly) tempocasvir, = librosa.beat.beat_track(y_casvir, sr = sr_casvir) tempowilfly, = librosa.beat.beat_track(y_wilfly, sr = sr_wilfly) temposubfly, = librosa.beat.beat_track(y_subfly, sr = sr_subfly) tempoverdin, = librosa.beat.beat_track(y_verdin, sr = sr_verdin) temposolsan, = librosa.beat.beat_track(y_solsan, sr = sr_solsan) data = pd.DataFrame({"Type": birds , "BPM": [tempo_astfly, tempo_casvir, tempo_wilfly, tempo_subfly, tempo_verdin,tempo_solsan] })

Plot

plt.figure(figsize = (16, 6)) ax = sns.barplot(y = data["BPM"], x = data["Type"], palette="rocket") plt.ylabel("BPM", fontsize=14) plt.yticks(fontsize=13) plt.xticks(fontsize=13) plt.xlabel("") plt.title("BPM for 6 Different Bird Species", fontsize=16);

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)2

chroma=librosa.feature.chroma_stft(y=audio_casvir, sr=sr_casvir) fig, ax = plt.subplots(1,figsize = (10, 5)) img = librosa.display.specshow(chroma, y_axis='chroma', x_axis='time', ax=ax) fig.colorbar(img, ax=ax) ax.set(title='Chromagram')

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)3

using an energy(magnitude) spectrum

S = np.abs(librosa.stft(audio_casvir)) chroma = librosa.feature.chroma_stft(S=S, sr=sr_casvir)#applying the logarithmic fourier transform fig, ax = plt.subplots(1,figsize = (10, 5)) img = librosa.display.specshow(chroma, y_axis='chroma', x_axis='time', ax=ax) fig.colorbar(img, ax=ax) ax.set(title='Chromagram')

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)4

chroma_stft = librosa.feature.chroma_stft(y=audio_casvir, sr=sr_casvir) chroma_cq = librosa.feature.chroma_cqt(y=audio_casvir, sr=sr_casvir) fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True,figsize = (10, 9)) librosa.display.specshow(chroma_stft, y_axis='chroma', x_axis='time', ax=ax[0]) ax[0].set(title='chroma_stft') ax[0].label_outer() img = librosa.display.specshow(chroma_cq, y_axis='chroma', x_axis='time', ax=ax[1]) ax[1].set(title='chroma_cqt')

ax[1].label_outer()

img = librosa.display.specshow(chroma_cens, y_axis='chroma', x_axis='time', ax=ax[2])

ax[2].set(title='chroma_cens')

fig.colorbar(img, ax=ax)

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)5

chroma_stft = librosa.feature.chroma_stft(y=audio_casvir, sr=sr_casvir) chroma_cens = librosa.feature.chroma_cens(y=audio_casvir, sr=sr_casvir)

fig, ax = plt.subplots(nrows=2, sharex=True, sharey=True,figsize = (10, 9)) librosa.display.specshow(chroma_stft, y_axis='chroma', x_axis='time', ax=ax[0]) ax[0].set(title='chroma_stft') ax[0].label_outer()

img = librosa.display.specshow(chroma_cens, y_axis='chroma', x_axis='time', ax=ax[1]) ax[1].set(title='chroma_cens') fig.colorbar(img, ax=ax)

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)6

Calculate the Spectral Centroids

spectral_centroids = librosa.feature.spectral_centroid(audio_casvir, sr=sr_casvir)[0]

Shape is a vector

print('Centroids:', spectral_centroids, '\n') print('Shape of Spectral Centroids:', spectral_centroids.shape, '\n')

Computing the time variable for visualization

frames = range(len(spectral_centroids))

Converts frame counts to time (seconds)

t = librosa.frames_to_time(frames)

print('frames:', frames, '\n') print('t:', t)

Function that normalizes the Sound Data

def normalize(x, axis=0): return sklearn.preprocessing.minmax_scale(x, axis=axis)

Plotting the Spectral Centroid along the waveform

plt.figure(figsize = (16, 6)) librosa.display.waveplot(audio_casvir, sr=sr_casvir, alpha=0.4, color = '#A300F9', lw=3) plt.plot(t, normalize(spectral_centroids), color='#FFB100', lw=2) plt.legend(["Spectral Centroid", "Wave"]) plt.title("Spectral Centroid: Casvir Bird", fontsize=16);

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)7

contrast = librosa.feature.spectral_contrast(y=y_harm_casvir,sr=sr_casvir) plt.figure(figsize=(15,5)) librosa.display.specshow(contrast, x_axis='time') plt.colorbar() plt.ylabel('Frequency bands') plt.title('Spectral contrast')

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)8

Spectral RollOff Vector # Spectral RollOff Vector

spectral_rolloff = librosa.feature.spectral_rolloff(audio_astfly, sr=sr_astfly)[0]

Computing the time variable for visualization

frames = range(len(spectral_rolloff))

Converts frame counts to time (seconds)

t = librosa.frames_to_time(frames)

The plot

plt.figure(figsize = (16, 6)) librosa.display.waveplot(audio_astfly, sr=sr_astfly, alpha=0.4, color = '#A300F9', lw=3) plt.plot(t, normalize(spectral_rolloff), color='#FFB100', lw=3) plt.legend(["Spectral Rolloff", "Wave"]) plt.title("Spectral Rolloff: Astfly Bird", fontsize=16);

y, sr = librosa.load(subfly)print('y:', y, '\n')print('y shape:', np.shape(y), '\n')print('Sample Rate (KHz):', sr, '\n')print('Check Len of Audio:', np.shape(y)[0]/sr)9

mfcc=librosa.feature.mfcc(y=audio_astfly, sr=sr_astfly) fig, ax = plt.subplots(1,figsize = (12, 6)) img = librosa.display.specshow(mfcc, x_axis='time', ax=ax) print(mfcc.shape) fig.colorbar(img, ax=ax) ax.set(title='MFCC')

audio_file, _ = librosa.effects.trim(y)print('Audio File:', audio_file, '\n')print('Audio File shape:', np.shape(audio_file))0

原文：https://juejin.cn/post/7097230551785930766

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。
如若转载，请注明出处：/AI/1181.html

[数据分析实践]

数据背景

数据介绍

音频特征提取

我们将讨论的三种主要音频特征提取类型 ```

加载并频文件

1.时域特征

波形可视化

频谱图

RMSE

信号的能量对应于其总幅度，其对于音频信号，这大致表征了信号的响度。RMSE是一种表征信号能量的方法，计算均方的平方根（音频帧幅度平方的平均值).

梅尔光谱图

梅尔频谱图是将频率转换为梅尔标度的频谱图

过零率(ZCR)

音频信号的 ZCR 定义为信号改变符号的速率。 ZCR 是检测语音帧是有声、无声还是无声的一种有效且简单的方法。 预计清音段产生比语音段更高的ZCR，理想情况下静音段的 ZCR 等于 0

Harmonic和Percussive Signals的分离

声音大致可以分为两类。- 一方面，谐波是我们感知为音高的声音，是什么让我们听到旋律和和弦。

harmonic spectrogram will show more horizontal/pitch-dependent changes

percussive spectrogram will show more vertical/time-dependent changes

Create Tempo BPM variable

Plot

using an energy(magnitude) spectrum

ax[1].label_outer()

img = librosa.display.specshow(chroma_cens, y_axis='chroma', x_axis='time', ax=ax[2])

ax[2].set(title='chroma_cens')

Calculate the Spectral Centroids

Shape is a vector

Computing the time variable for visualization

Converts frame counts to time (seconds)

Function that normalizes the Sound Data

Plotting the Spectral Centroid along the waveform

Spectral RollOff Vector # Spectral RollOff Vector

Computing the time variable for visualization

Converts frame counts to time (seconds)

The plot

最新文章

音频信号的 ZCR 定义为信号改变符号的速率。 ZCR 是检测语音帧是有声、无声还是无声的一种有效且简单的方法。预计清音段产生比语音段更高的ZCR，理想情况下静音段的 ZCR 等于 0