Configuration¶
All architectural choices in BertBlocks are controlled through a single
BertBlocksConfig dataclass. This page explains the key
parameters and how to use preset configurations.
Creating a Configuration¶
import bertblocks as bb
config = bb.BertBlocksConfig(
vocab_size=30522,
hidden_size=768,
num_blocks=12,
num_attention_heads=12,
)
All parameters have sensible defaults. See the BertBlocksConfig
API reference for the full list.
Key Parameters¶
Model dimensions¶
Parameter |
Description |
Default |
|---|---|---|
|
Vocabulary size |
– |
|
Hidden dimension |
|
|
FFN intermediate dimension |
|
|
Number of transformer layers |
|
|
Number of attention heads |
|
|
Maximum sequence length |
|
Architectural choices¶
Parameter |
Description |
Options |
|---|---|---|
|
Normalization function |
|
|
Where to apply normalization |
|
|
Activation function |
|
|
Feed-forward type |
|
|
Embedding positional encoding |
|
|
Block-level positional encoding |
|
|
Attention implementation |
|
Advanced options¶
Parameter |
Description |
|---|---|
|
Number of key-value heads for GQA (defaults to |
|
Enable query-key normalization |
|
Enable local (sliding window) attention |
|
Window size for local attention |
|
Gating mechanism for attention output |
Preset Configurations¶
BertBlocks includes preset configurations that reproduce known architectures:
BertConfig¶
Reproduces the original BERT architecture:
from bertblocks.config import BertConfig
config = BertConfig(vocab_size=30522)
You can also create a BertConfig from a HuggingFace model:
config = BertConfig.from_huggingface("bert-base-uncased")
ModernBertConfig¶
Reproduces the ModernBERT architecture:
from bertblocks.config import ModernBertConfig
config = ModernBertConfig(vocab_size=50368)
config = ModernBertConfig.from_huggingface("answerdotai/ModernBERT-base")
Validation¶
BertBlocksConfig validates parameters on construction. For
example, hidden_size must be divisible by num_attention_heads, and enum-typed parameters
are checked against their allowed values. Invalid configurations raise descriptive errors.
YAML Configuration¶
For training via the CLI, configurations are specified in YAML files. See the configs/
directory for examples:
uv run -m bertblocks fit --config configs/pretraining.yaml