ResNeXt 架構介紹

Chinese Deep Learning

written by LiaoWC on 2021-06-19

架構

ResNeXt[1] 的架構類似於 ResNet[2]，但不一樣的地方在於 ResNeXt 把輸入分成許多相同寬度的卷積層，之後再合起來。如下圖所示，左邊為一般 ResNet 的 bottleneck block，右邊是 ResNeXt 的 block 架構。分支的數量稱為 cardinality。

ResNeXt block 有三種相等的形式，如下圖：

與比較 ResNet 比較

下表為 ResNet-50 和 ResNeXt-50 的比較。兩者參數數量及 FLOPs 差不多。（FLOPs：floating point operations，浮點運算數量，可以想成計算量[3]）表中 ResNeXt-50 的中括號的 C 指的是 cardinality。

可以看出在 ImageNet-1K 資料集 ResNeXt 的效果比 ResNet 好。

比較：Cardinality vs Deeper vs Wider

從下表可以看出把模型加大複雜度時，加 cardinality 比變得更深或更寬的表現還要好。ResNet-101 1x64d 的 top-1 err 是 22.0，加深變 ResNet-200 的 top-1 error 下降到 21.2，加寬到 1x100d 的 top-1 error 下降到 21.3，而 ResNext-101 32x4d 本身就有更好的 top-1 err 21.3，它增加複雜度到 2x64d 和 64x4d 又變得更好。

比較：有沒有residual

無論有沒有使用 residual，ResNeXt-50 表現都比較好。

引用與資料來源

[1] S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, "Aggregated Residual Transformations for Deep Neural Networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 5987-5995, doi: 10.1109/CVPR.2017.634. (https://ieeexplore.ieee.org/document/8100117)

[2] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.

[3] https://www.zhihu.com/question/65305385/answer/451060549