8月改动
- 原代码没有指定训练epoch数,在train.py-109添加了max_epochs;运行代码需要高版本libstdcxx-ng,应该在requirements.txt中添加;新添参数max_epochs,默认为300;新添参数wandb_project代表wandb运行工程名;增加了seed参数,方便横向对比实验结果
- 9.14——尝试在VPIDM中添加SiT作为模型结构,需要pip安装timm包。
【9.17附】SiT要求输入尺寸固定为方形数据input_size*input_size,但为了让其适用于频域语音数据frame_length*n_fft,改用动态位置编码。最终导致参数量爆炸,batch_size设为1都会爆显存
【9.18附】通过修改
9月改动
- 【9.14】尝试在VPIDM中添加SiT作为模型结构,需要pip安装timm包。
- 【9.17】SiT要求输入尺寸固定为方形数据input_size*input_size,但频域语音数据frame_length*n_fft,做了以下改动。
1. 使用to_2tuple将原本的单参数input_size转变为双参数元组
2. 重写PatchEmbed类,将部分参数扩展为元组,从而适应非方形数据
3. 重写get_2d_sincos_pos_embed函数,基于动态计算的 H_patches 和 W_patches 生成了正确尺寸的 pos_embed,从而适应非方形输入。
4. 在 unpatchify 函数中,你处理了不同高度和宽度补丁的重组,将 (H_patches, W_patches) 作为输入,使得模型能够正确将补丁恢复为非方形的完整图像。
5. 在forward函数里增加了对输入数据转实数、转复数操作 - 【9.23】通过修改参数初始化方法,使模型可以成功收敛,但效果不好
- cm-75为depth=12, hidden_size=768, patch_size=2, num_heads=12, batch_size=6,sit_copy6
10月改动
对比表格
# t= tensor([0.9600], device=’cuda:0′): drift_real mean: 0.001028, std: 1.387510 drift min: -6.255919, max: 5.645583 drift_imag mean: -0.003041, std: 1.389403 drift_imag min: -5.551543, max: 5.848334 diffusion mean: 1.946401, std: nan diffusion min: 1.946401, max: 1.946401 # t= tensor([0.9200], device=’cuda:0′): drift_real mean: 0.000242, std: 1.350083 drift min: -5.983840, max: 5.690376 drift_imag mean: -0.003913, std: 1.347464 drift_imag min: -5.779256, max: 5.787510 diffusion mean: 1.903549, std: nan diffusion min: 1.903549, max: 1.903549 # t= tensor([0.8800], device=’cuda:0′): drift_real mean: 0.001425, std: 1.304999 drift min: -6.003798, max: 6.229289 drift_imag mean: -0.003935, std: 1.309390 drift_imag min: -5.515249, max: 5.315574 diffusion mean: 1.858913, std: nan diffusion min: 1.858913, max: 1.858913 # t= tensor([0.8400], device=’cuda:0′): drift_real mean: -0.000629, std: 1.259740 drift min: -5.315304, max: 6.192216 drift_imag mean: 0.000887, std: 1.264473 drift_imag min: -5.324466, max: 4.936073 diffusion mean: 1.812459, std: nan diffusion min: 1.812459, max: 1.812459 # t= tensor([0.8000], device=’cuda:0′): drift_real mean: 0.000784, std: 1.216656 drift min: -5.250953, max: 6.184686 drift_imag mean: -0.001661, std: 1.217426 drift_imag min: -5.093382, max: 5.483801 diffusion mean: 1.764160, std: nan diffusion min: 1.764160, max: 1.764160 # t= tensor([0.7600], device=’cuda:0′): drift_real mean: 0.002392, std: 1.167682 drift min: -5.635061, max: 5.103218 drift_imag mean: 0.002525, std: 1.169464 drift_imag min: -4.827188, max: 4.963645 diffusion mean: 1.713991, std: nan diffusion min: 1.713991, max: 1.713991 # t= tensor([0.7200], device=’cuda:0′): drift_real mean: 0.001672, std: 1.116133 drift min: -4.601490, max: 4.841025 drift_imag mean: -0.000539, std: 1.120077 drift_imag min: -4.573432, max: 5.087404 diffusion mean: 1.661936, std: nan diffusion min: 1.661936, max: 1.661936 # t= tensor([0.6800], device=’cuda:0′): drift_real mean: 0.002910, std: 1.062951 drift min: -5.032801, max: 4.544854 drift_imag mean: -0.000708, std: 1.065635 drift_imag min: -4.604265, max: 4.528798 diffusion mean: 1.607982, std: nan diffusion min: 1.607982, max: 1.607982 # t= tensor([0.6400], device=’cuda:0′): drift_real mean: 0.000675, std: 1.009798 drift min: -4.441285, max: 4.203805 drift_imag mean: 0.000950, std: 1.012471 drift_imag min: -4.139987, max: 4.678671 diffusion mean: 1.552119, std: nan diffusion min: 1.552119, max: 1.552119 # t= tensor([0.6000], device=’cuda:0′): drift_real mean: 0.001295, std: 0.956149 drift min: -4.137105, max: 3.852574 drift_imag mean: 0.001996, std: 0.956204 drift_imag min: -4.118358, max: 4.533556 diffusion mean: 1.494342, std: nan diffusion min: 1.494342, max: 1.494342 # t= tensor([0.5600], device=’cuda:0′): drift_real mean: 0.002181, std: 0.897933 drift min: -4.365409, max: 3.855125 drift_imag mean: 0.000585, std: 0.899173 drift_imag min: -3.638911, max: 4.058249 diffusion mean: 1.434645, std: nan diffusion min: 1.434645, max: 1.434645 # t= tensor([0.5200], device=’cuda:0′): drift_real mean: 0.001265, std: 0.838747 drift min: -3.582977, max: 3.580992 drift_imag mean: 0.001077, std: 0.839164 drift_imag min: -3.670391, max: 3.394095 diffusion mean: 1.373023, std: nan diffusion min: 1.373023, max: 1.373023 # t= tensor([0.4800], device=’cuda:0′): drift_real mean: 0.001136, std: 0.780881 drift min: -3.638036, max: 3.298007 drift_imag mean: 0.002820, std: 0.782339 drift_imag min: -3.280854, max: 3.344223 diffusion mean: 1.309467, std: nan diffusion min: 1.309467, max: 1.309467 # t= tensor([0.4400], device=’cuda:0′): drift_real mean: 0.002041, std: 0.722704 drift min: -2.978718, max: 2.985824 drift_imag mean: 0.002030, std: 0.723352 drift_imag min: -3.132399, max: 2.797324 diffusion mean: 1.243960, std: nan diffusion min: 1.243960, max: 1.243960 # t= tensor([0.4000], device=’cuda:0′): drift_real mean: 0.000621, std: 0.664624 drift min: -3.110819, max: 2.997027 drift_imag mean: 0.001773, std: 0.664240 drift_imag min: -2.646359, max: 2.830245 diffusion mean: 1.176469, std: nan diffusion min: 1.176469, max: 1.176469 # t= tensor([0.3600], device=’cuda:0′): drift_real mean: 0.001700, std: 0.606232 drift min: -2.742868, max: 2.802910 drift_imag mean: -0.001700, std: 0.605417 drift_imag min: -2.734457, max: 2.802108 diffusion mean: 1.106941, std: nan diffusion min: 1.106941, max: 1.106941 # t= tensor([0.3200], device=’cuda:0′): drift_real mean: 0.002795, std: 0.546745 drift min: -2.702860, max: 2.266898 drift_imag mean: -0.002274, std: 0.548131 drift_imag min: -2.467861, max: 2.577819 diffusion mean: 1.035286, std: nan diffusion min: 1.035286, max: 1.035286 # t= tensor([0.2800], device=’cuda:0′): drift_real mean: 0.001690, std: 0.490185 drift min: -2.142412, max: 2.194646 drift_imag mean: -0.002413, std: 0.491714 drift_imag min: -2.091719, max: 2.043411 diffusion mean: 0.961359, std: nan diffusion min: 0.961359, max: 0.961359 # t= tensor([0.2400], device=’cuda:0′): drift_real mean: 0.001244, std: 0.433069 drift min: -1.773319, max: 1.891073 drift_imag mean: -0.001816, std: 0.433641 drift_imag min: -1.945599, max: 1.986796 diffusion mean: 0.884932, std: nan diffusion min: 0.884932, max: 0.884932 # t= tensor([0.2000], device=’cuda:0′): drift_real mean: 0.001100, std: 0.375882 drift min: -1.637365, max: 1.696351 drift_imag mean: -0.001723, std: 0.374826 drift_imag min: -1.775090, max: 1.532630 diffusion mean: 0.805636, std: nan diffusion min: 0.805636, max: 0.805636 # t= tensor([0.1600], device=’cuda:0′): drift_real mean: 0.002833, std: 0.319433 drift min: -1.461934, max: 1.443037 drift_imag mean: -0.002973, std: 0.317190 drift_imag min: -1.577771, max: 1.452456 diffusion mean: 0.722879, std: nan diffusion min: 0.722879, max: 0.722879 # t= tensor([0.1200], device=’cuda:0′): drift_real mean: 0.002107, std: 0.262938 drift min: -1.290313, max: 1.291217 drift_imag mean: -0.002558, std: 0.260862 drift_imag min: -1.150687, max: 1.123912 diffusion mean: 0.635657, std: nan diffusion min: 0.635657, max: 0.635657 # t= tensor([0.0800], device=’cuda:0′): drift_real mean: 0.001805, std: 0.207426 drift min: -1.087441, max: 1.107205 drift_imag mean: -0.002439, std: 0.204845 drift_imag min: -0.857314, max: 0.864121 diffusion mean: 0.542166, std: nan diffusion min: 0.542166, max: 0.542166 # t= tensor([0.0400], device=’cuda:0′): drift_real mean: 0.001622, std: 0.151578 drift min: -1.096648, max: 1.074418 drift_imag mean: -0.002582, std: 0.149009 drift_imag min: -0.656835, max: 0.631456 diffusion mean: 0.438765, std: nan diffusion min: 0.438765, max: 0.438765 ################################################## 文件: p232_161.wav, N=25 SI-SDR: 24.52938173290932 ESTOI: 0.9642957507514722 | # t= tensor([0.9600], device=’cuda:0′): drift_real mean: 0.001028, std: 1.387510 drift min: -6.255919, max: 5.645583 drift_imag mean: -0.003041, std: 1.389403 drift_imag min: -5.551543, max: 5.848334 diffusion mean: 1.946401, std: nan diffusion min: 1.946401, max: 1.946401 # t= tensor([0.9200], device=’cuda:0′): drift_real mean: 0.000242, std: 1.350083 drift min: -5.983840, max: 5.690376 drift_imag mean: -0.003913, std: 1.347464 drift_imag min: -5.779256, max: 5.787510 diffusion mean: 1.903549, std: nan diffusion min: 1.903549, max: 1.903549 # t= tensor([0.8800], device=’cuda:0′): drift_real mean: 0.001425, std: 1.304999 drift min: -6.003798, max: 6.229289 drift_imag mean: -0.003935, std: 1.309390 drift_imag min: -5.515249, max: 5.315574 diffusion mean: 1.858913, std: nan diffusion min: 1.858913, max: 1.858913 # t= tensor([0.8400], device=’cuda:0′): drift_real mean: -0.000629, std: 1.259740 drift min: -5.315304, max: 6.192216 drift_imag mean: 0.000887, std: 1.264473 drift_imag min: -5.324466, max: 4.936073 diffusion mean: 1.812459, std: nan diffusion min: 1.812459, max: 1.812459 # t= tensor([0.8000], device=’cuda:0′): drift_real mean: 0.000784, std: 1.216656 drift min: -5.250953, max: 6.184686 drift_imag mean: -0.001661, std: 1.217426 drift_imag min: -5.093382, max: 5.483801 diffusion mean: 1.764160, std: nan diffusion min: 1.764160, max: 1.764160 # t= tensor([0.7600], device=’cuda:0′): drift_real mean: 0.002392, std: 1.167682 drift min: -5.635061, max: 5.103218 drift_imag mean: 0.002525, std: 1.169464 drift_imag min: -4.827188, max: 4.963645 diffusion mean: 1.713991, std: nan diffusion min: 1.713991, max: 1.713991 # t= tensor([0.7200], device=’cuda:0′): drift_real mean: 0.001672, std: 1.116133 drift min: -4.601490, max: 4.841025 drift_imag mean: -0.000539, std: 1.120077 drift_imag min: -4.573432, max: 5.087404 diffusion mean: 1.661936, std: nan diffusion min: 1.661936, max: 1.661936 # t= tensor([0.6800], device=’cuda:0′): drift_real mean: 0.002910, std: 1.062951 drift min: -5.032801, max: 4.544854 drift_imag mean: -0.000708, std: 1.065635 drift_imag min: -4.604265, max: 4.528798 diffusion mean: 1.607982, std: nan diffusion min: 1.607982, max: 1.607982 # t= tensor([0.6400], device=’cuda:0′): drift_real mean: 0.000675, std: 1.009798 drift min: -4.441285, max: 4.203805 drift_imag mean: 0.000950, std: 1.012471 drift_imag min: -4.139987, max: 4.678671 diffusion mean: 1.552119, std: nan diffusion min: 1.552119, max: 1.552119 # t= tensor([0.6000], device=’cuda:0′): drift_real mean: 0.001295, std: 0.956149 drift min: -4.137105, max: 3.852574 drift_imag mean: 0.001996, std: 0.956204 drift_imag min: -4.118358, max: 4.533556 diffusion mean: 1.494342, std: nan diffusion min: 1.494342, max: 1.494342 # t= tensor([0.5600], device=’cuda:0′): drift_real mean: 0.002181, std: 0.897933 drift min: -4.365409, max: 3.855125 drift_imag mean: 0.000585, std: 0.899173 drift_imag min: -3.638911, max: 4.058249 diffusion mean: 1.434645, std: nan diffusion min: 1.434645, max: 1.434645 # t= tensor([0.5200], device=’cuda:0′): drift_real mean: 0.001265, std: 0.838747 drift min: -3.582977, max: 3.580992 drift_imag mean: 0.001077, std: 0.839164 drift_imag min: -3.670391, max: 3.394095 diffusion mean: 1.373023, std: nan diffusion min: 1.373023, max: 1.373023 # t= tensor([0.4800], device=’cuda:0′): drift_real mean: 0.001136, std: 0.780881 drift min: -3.638036, max: 3.298007 drift_imag mean: 0.002820, std: 0.782339 drift_imag min: -3.280854, max: 3.344223 diffusion mean: 1.309467, std: nan diffusion min: 1.309467, max: 1.309467 # t= tensor([0.4400], device=’cuda:0′): drift_real mean: 0.002041, std: 0.722704 drift min: -2.978718, max: 2.985824 drift_imag mean: 0.002030, std: 0.723352 drift_imag min: -3.132399, max: 2.797324 diffusion mean: 1.243960, std: nan diffusion min: 1.243960, max: 1.243960 # t= tensor([0.4000], device=’cuda:0′): drift_real mean: 0.000621, std: 0.664624 drift min: -3.110819, max: 2.997027 drift_imag mean: 0.001773, std: 0.664240 drift_imag min: -2.646359, max: 2.830245 diffusion mean: 1.176469, std: nan diffusion min: 1.176469, max: 1.176469 # t= tensor([0.3600], device=’cuda:0′): drift_real mean: 0.001700, std: 0.606232 drift min: -2.742868, max: 2.802910 drift_imag mean: -0.001700, std: 0.605417 drift_imag min: -2.734457, max: 2.802108 diffusion mean: 1.106941, std: nan diffusion min: 1.106941, max: 1.106941 # t= tensor([0.3200], device=’cuda:0′): drift_real mean: 0.002795, std: 0.546745 drift min: -2.702860, max: 2.266898 drift_imag mean: -0.002274, std: 0.548131 drift_imag min: -2.467861, max: 2.577819 diffusion mean: 1.035286, std: nan diffusion min: 1.035286, max: 1.035286 # t= tensor([0.2800], device=’cuda:0′): drift_real mean: 0.001690, std: 0.490185 drift min: -2.142412, max: 2.194646 drift_imag mean: -0.002413, std: 0.491714 drift_imag min: -2.091719, max: 2.043411 diffusion mean: 0.961359, std: nan diffusion min: 0.961359, max: 0.961359 # t= tensor([0.2400], device=’cuda:0′): drift_real mean: 0.001244, std: 0.433069 drift min: -1.773319, max: 1.891073 drift_imag mean: -0.001816, std: 0.433641 drift_imag min: -1.945599, max: 1.986796 diffusion mean: 0.884932, std: nan diffusion min: 0.884932, max: 0.884932 # t= tensor([0.2000], device=’cuda:0′): drift_real mean: 0.001100, std: 0.375882 drift min: -1.637365, max: 1.696351 drift_imag mean: -0.001723, std: 0.374826 drift_imag min: -1.775090, max: 1.532630 diffusion mean: 0.805636, std: nan diffusion min: 0.805636, max: 0.805636 # t= tensor([0.1600], device=’cuda:0′): drift_real mean: 0.002833, std: 0.319433 drift min: -1.461934, max: 1.443037 drift_imag mean: -0.002973, std: 0.317190 drift_imag min: -1.577771, max: 1.452456 diffusion mean: 0.722879, std: nan diffusion min: 0.722879, max: 0.722879 # t= tensor([0.1200], device=’cuda:0′): drift_real mean: 0.002107, std: 0.262938 drift min: -1.290313, max: 1.291217 drift_imag mean: -0.002558, std: 0.260862 drift_imag min: -1.150687, max: 1.123912 diffusion mean: 0.635657, std: nan diffusion min: 0.635657, max: 0.635657 # t= tensor([0.0800], device=’cuda:0′): drift_real mean: 0.001805, std: 0.207426 drift min: -1.087441, max: 1.107205 drift_imag mean: -0.002439, std: 0.204845 drift_imag min: -0.857314, max: 0.864121 diffusion mean: 0.542166, std: nan diffusion min: 0.542166, max: 0.542166 # t= tensor([0.0400], device=’cuda:0′): drift_real mean: 0.001622, std: 0.151578 drift min: -1.096648, max: 1.074418 drift_imag mean: -0.002582, std: 0.149009 drift_imag min: -0.656835, max: 0.631456 diffusion mean: 0.438765, std: nan diffusion min: 0.438765, max: 0.438765 ################################################## 文件: p232_161.wav, N=25 SI-SDR: 24.52938173290932 ESTOI: 0.9642957507514722 |