Table 5. Ablation studies of the gradual sparsity increase schedule. The number of training epochs are 3, 5 and 5 for MNLI, QQP and FEVER respectively. The subnetworks are at 90% sparsity. The numbers in the subscripts are standard deviations. gradual soft MNLI HANS QQP PAWSqqp PAWSqqp FEVER Symm1 Symm2 fixed hard 72.090.92 72.630.31 52.560.92 52.820.47 fixed hard 71.641.85 77.080.66 55.701.92 46.483.55 49.591.84 49.380.98 fixed hard 49.565.09 72.800.95 27.452.94 46.670.73 29.754.40 52.330.75 0.2∼0.9 73.610.28 75.060.31 53.900.87 54.991.28 0.2∼0.9 75.790.39 77.540.47 51.570.69 50.920.97 47.940.98 48.860.89 0.2∼0.9 73.531.36 77.01 . 46.471.66 49.87 . 52.421.39 56.57 . 0.5∼0.9 gradual soft 0.5∼0.9 gradual soft 0.5∼0.9 0.7∼0.9 76.840.46 56.720.75 0.7∼0.9 79.490.58 46.591.81 51.150.73 0.7∼0.9 79.010.68 51.740.71 58.170.33 Table 6: Results of ▇▇▇▇▇▇▇-base and ▇▇▇▇-large on the NLI task. We conduct mask training with ▇▇▇ loss on the standard fine-tuned PLMs. “0.5 0.7" denotes gradual sparsity increase. The numbers in the subscripts are standard deviations. ▇▇▇▇▇▇▇-base MNLI ▇▇▇▇ full model std 87.140.21 68.330.88 ▇▇▇ 86.560.18 76.151.35 0.5 85.400.14 75.170.55 ▇▇▇▇-large MNLI HANS full model std 86.840.13 69.442.39 ▇▇▇ 86.250.17 76.271.55 0.5 85.470.28 75.400.64 mask train 0.7 83.480.29 68.631.33 mask train 0.7 77.546.10 60.197.56 0.5∼0.7 84.410.15 71.951.23 0.5∼0.7 84.830.26 70.182.24
Appears in 2 contracts
Sources: Research and Development, Research Paper