Layer | Kernel size (stride, padding) | Feature size |
---|---|---|
Input | – | s × 224 × 224 × 3 |
Convolution + ReLU | 11 × 11 (4, 2) | s × 55 × 55 × 64 |
Max pooling | 3 × 3 (2, 0) | s × 27 × 27 × 64 |
Convolution + ReLU | 5 × 5 (1, 2) | s × 27 × 27 × 192 |
Max pooling | 3 × 3 (2, 0) | s × 13 × 13 × 192 |
Convolution + ReLU | 3 × 3 (1, 1) | s × 13 × 13 × 384 |
Convolution + ReLU | 3 × 3 (1, 1) | s × 13 × 13 × 256 |
Convolution + ReLU | 3 × 3 (1, 1) | s × 13 × 13 × 256 |
Max pooling | 3 × 3 (2, 0) | s × 6 × 6 × 256 |
Adaptive average pooling, max value extraction | 7 × 7 | s × 1 × 1 × 256, 1 × 1 × 1 × 256 |
Dens | – | 1 × 1 × 1 × 128 |
Dens | – | 1 × 1 × 1 × 64 |
Output + sigmoid | – | 1 |