Integration¶
The bertblocks.integration package converts HuggingFace models into BertBlocks equivalents.
Loading Models¶
- bertblocks.integration.from_huggingface( ) BertBlocksModel[source]¶
Instantiate an equivalent BertBlocksModel from HuggingFace pretrained models.
Automatically detects the model type and routes to the appropriate conversion function. Supports BERT-like encoder models available on HuggingFace Hub.
- Parameters:
pretrained_model_name_or_path (str) – HuggingFace model identifier (e.g., “bert-base-uncased”, “modernbert-base”) or local path to a pretrained model directory.
load_weights (bool, optional) – Whether to transfer weights from the pretrained HuggingFace model. If True, copies all layer parameters. If False, only loads the configuration and initializes a fresh model with random weights. Defaults to True.
add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.
- Returns:
- A BertBlocks model with architecture matched to the source HuggingFace model,
optionally loaded with pretrained weights.
- Return type:
- Raises:
ValueError – If the model type is not supported or cannot be detected.
OSError – If the model path does not exist or is not accessible.
ImportError – If required HuggingFace transformers models are not installed.
- Supported Models:
BERT and variants (bert-base-uncased, bert-large-uncased, etc.)
ModernBERT (modernbert-base, modernbert-large, etc.)
Other BERT-like encoder models compatible with HuggingFace
Example
>>> from bertblocks.integration import from_huggingface >>> # Load BERT model >>> bert_model = from_huggingface("bert-base-uncased") >>> # Load ModernBERT model >>> modernbert_model = from_huggingface("modernbert-base") >>> # Load without transferring weights >>> fresh_model = from_huggingface("bert-base-uncased", load_weights=False)
References
HuggingFace Model Hub: https://huggingface.co/models
Model-Specific Loaders¶
- bertblocks.integration.load_modernbert.from_modernbert_model(
- pretrained_model_name_or_path: str,
- load_weights: bool = True,
- add_pooling_layer: bool = False,
- attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'flash_attention_2',
Instantiate an equivalent BertBlocks model from pretrained HuggingFace ModernBERT weights and config.
Converts a HuggingFace ModernBERT model to BertBlocks architecture with optional weight transfer. ModernBERT uses ALiBi positional encodings, GLU feed-forward layers, and supports local attention, which are all supported by BertBlocks.
- Parameters:
pretrained_model_name_or_path (str) – HuggingFace model identifier or local path to a pretrained ModernBERT model (e.g., “modernbert-base”, “./path/to/model”).
load_weights (bool, optional) – Whether to transfer weights from the pretrained ModernBERT model. If True, copies all embeddings, attention, feed-forward, and normalization layer weights. If False, only loads the configuration and initializes a fresh model. Defaults to True.
add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.
attn_implementation (Literal["flash_attention_2", "sdpa", "eager"], optional) – Attention implementation backend to use: - “flash_attention_2”: Use FlashAttention-2 for faster inference (requires flash-attn package) - “sdpa”: Use PyTorch’s scaled-dot-product attention (default, recommended for most cases) - “eager”: Use manual attention implementation (slower, for compatibility) Defaults to “flash_attention_2”.
- Returns:
- A BertBlocks model with architecture matched to ModernBERT, optionally
loaded with pretrained weights.
- Return type:
- Raises:
ValueError – If the model config cannot be loaded or if the model type is not ModernBERT.
OSError – If the model path does not exist or is not accessible.
Note
The weight transfer is exact and lossless; no approximation is used.
All layer parameters (embeddings, QKV projections, GLU layers, norms) are copied directly.
The pooling layer (if added) is initialized with new random weights.
Final normalization layer weights are transferred if included in the model.
Example
>>> from bertblocks.integration import from_modernbert_model >>> # Load and convert a pretrained ModernBERT model with FlashAttention >>> model = from_modernbert_model("modernbert-base", load_weights=True) >>> # Load with SDPA backend for broader compatibility >>> model = from_modernbert_model("modernbert-base", attn_implementation="sdpa")
References
“Smashing Language Barriers with Multilingual Transformers” (https://arxiv.org/abs/2406.07581)
“ModernBERT” (https://github.com/AnswerDotAI/ModernBERT)
- bertblocks.integration.load_bert.from_bert_model( ) BertBlocksModel[source]¶
Instantiate an equivalent BertBlocks model from pretrained HuggingFace BERT weights and config.
Converts a HuggingFace BERT model to BertBlocks architecture with optional weight transfer. The BertBlocks model uses post-normalization and standard MLP architecture to match BERT.
- Parameters:
pretrained_model_name_or_path (str) – HuggingFace model identifier or local path to a pretrained BERT model (e.g., “bert-base-uncased”, “./path/to/model”).
load_weights (bool, optional) – Whether to transfer weights from the pretrained BERT model. If True, copies all embeddings, attention, feed-forward, and normalization layer weights. If False, only loads the configuration and initializes a fresh model. Defaults to True.
add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.
- Returns:
- A BertBlocks model with architecture matched to BERT, optionally
loaded with pretrained weights.
- Return type:
- Raises:
ValueError – If the model config cannot be loaded or if the model type is not BERT.
OSError – If the model path does not exist or is not accessible.
Note
The weight transfer is exact and lossless; no approximation is used.
All layer parameters (embeddings, QKV projections, feed-forward, norms) are copied directly.
The pooling layer (if added) is initialized with new random weights.
Example
>>> from bertblocks.integration import from_bert_model >>> # Load and convert a pretrained BERT model >>> model = from_bert_model("bert-base-uncased", load_weights=True) >>> # Or load just the config without weights for a fresh model >>> model = from_bert_model("bert-base-uncased", load_weights=False)
References
“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (https://arxiv.org/abs/1810.04805)