RegNet
NAS-like methods explore a search space to find optimal models for specific tasks. In contrast, RegNets take a systematic approach by analyzing and refining a pre-defined architectural design space. Their primary goal is to identify general design guidelines for model parameters that lead to optimal performance.
Both methods start with a general pre-defined architectural design. As suggested by the term ‘search space,’ NAS-like methods search this space, using educated guesses or trained policies, to find optimal models. In contrast, RegNets focus on identifying design guidelines that exhibit strong performance and generalization abilities across various contexts, including different hardware platforms and tasks. These guidelines are refined through an incremental manual process, narrowing the design space into an optimal design space that informs how to design model parameters. Sampling this space to locate the optimal model is more of a bonus than the primary objective.
RegNets introduce a design space called AnyNetX, as depicted below. Each network within this space features a standard architecture consisting of a stem, a body, and a head for generating final predictions, as illustrated in the left-most part of the diagram. By sampling models with specific variations of model parameters and evaluating their performance, RegNets uncover patterns and trends. Specifically, they select guidelines that not only lead to better performance but also suggest design simplifications when no performance loss is observed.
The body contains multiple processing stages, and each stage (i) consists of dᵢ blocks. The total number of blocks across all stages is denoted by d. All blocks utilize a 1×1 convolution to extract features across channels, followed by a group convolution, and finally, another 1×1 convolution. But for Block 1, a stride-2 group convolution is employed to reduce spatial resolution.
AnyNetX offers control over four essential parameters:
- Model Depth (d).
- Channel Width (w₁, w₂, w₃, and w₄): Determines the width of channels for each stage.
- Bottleneck Ratio (bᵢ): This modifies the channel width in the initial 1×1 convolution of every block.
- Group Width (gᵢ): Specifies the group convolution width in each block.
By analyzing the performance of sampled models across various configurations, the study proposes general guidelines to narrow down the design space further:
- Use a single bottleneck ratio (bᵢ = b) value and a single group width (gᵢ = g) value for all stages.
- Increase the channel width w₁, w₂, w₃, and w₄ progressively.
- Gradually increase the number of layers (dᵢ) within each stage of the neural network, except for the final stage.
Another significant finding reveals a linear correlation between the channel width uⱼ and the model depth index (denoted by j), with w representing the hyperparameters.
But the computed uⱼ is a real number, and we need an integer for the channel width. Hence, we will apply a two-step process to quantize the value into an integer. First, we find sⱼ with the following progressive growth formula, with w representing the hyperparameters again.
Next, we apply the quantization function below to determine the channel width wⱼ for index j.
These guidelines reduce the design space into one referred to as RegNetX, a name derived from the paper’s objective to create a design space filled with simple, regular models.
After further refinement of the RegNetX design space, it reveals interesting findings that diverge from current network design practices. Remarkably, it challenges the widespread belief that greater model depth leads to better performance, revealing instead that the optimal models in RegNetX typically consist of around 20 blocks. Furthermore, best models in RegNetX achieve optimal performance by eliminating the bottleneck entirely through a bottleneck ratio of 1.0, a strategy that has been embraced by some other studies. Furthermore, the traditional practice doubles the channel width from one stage to another. However, the RegNetX study favors a value close to 2.5 for peak performance.
The final analysis focuses on RegNetX models that have been enhanced with the Squeeze-and-Excitation (SE) operation. This enhancement results in the creation of a new design space known as RegNetY. The integration of the SE operation, which adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels, significantly boosts the representational power of the network. Consequently, the RegNetY design space demonstrates notable performance gains compared to the original RegNetX models.
Announcement
Generative AI has been an exciting area of development in recent years, sparking my interest in writing a book on the topic. RegNets are fascinating models that I’ve invested considerable time in studying. However, they didn’t quite fit the book’s main focus. However, I believe they hold valuable insights that are worth exploring. Instead of discarding them, I’ve decided to share these insights through a series of articles on Medium.
While the book isn’t finished yet, you can follow its progress and be among the first to know when it’s complete by connecting with me on Medium or LinkedIn. Stay tuned for the official launch announcement!