Getting Started¶
Installation¶
Install BertBlocks with uv:
uv pip install -e .
For development dependencies (linting, testing, docs):
uv pip install -e ".[dev]"
Optional extras:
Flash Attention:
uv pip install -e ".[flash-attn]"for FlashAttention-2 supportOptimizers:
uv pip install -e ".[optimizers]"for BitsAndBytes quantized optimizersW&B logging:
uv pip install -e ".[wandb]"for Weights & Biases integration
Quick Start¶
Create a model from a configuration¶
import bertblocks as bb
config = bb.BertBlocksConfig(
vocab_size=30522,
hidden_size=768,
num_blocks=12,
num_attention_heads=12,
norm_fn="rms",
block_pos_enc_kind="alibi",
mlp_type="glu",
actv_fn="silu",
)
model = bb.BertBlocksForMaskedLM(config)
Load a HuggingFace model¶
BertBlocks can reproduce select HuggingFace encoder architectures and optionally load their weights:
import bertblocks as bb
model = bb.from_huggingface("answerdotai/ModernBERT-base", load_weights=True)
See the HuggingFace Integration guide for details.
Train a model¶
Train with the default configuration using the CLI:
uv run -m bertblocks fit --config configs/pretraining.yaml
BertBlocks uses PyTorch Lightning for training.
The BertBlocksPretrainingModule wraps the model with
optimizer and scheduler configuration, while
BertBlocksPretrainingDataModule handles data loading
with streaming and dynamic masking.
:::{note} See the Configuration Guide for detailed information about all available training options. :::
:::{tip}
For distributed training across multiple GPUs, set the appropriate devices and num_nodes in your training config.
:::
:::{warning} FlashAttention-2 requires a CUDA-capable GPU. For CPU training or unsupported GPUs, install without the flash-attn extra. :::
What’s Next¶
Architecture Overview – understand how components compose
Configuration Guide – explore all configuration options
API Reference – full class and function documentation