Integration¶

The bertblocks.integration package converts HuggingFace models into BertBlocks equivalents.

Loading Models¶

bertblocks.integration.from_huggingface( pretrained_model_name_or_path: str, load_weights: bool = True, add_pooling_layer: bool = False, attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa', ) → BertBlocksModel[source]¶

Instantiate an equivalent BertBlocksModel from HuggingFace pretrained models.

Automatically detects the model type and routes to the appropriate conversion function. Supports BERT-like encoder models available on HuggingFace Hub.

Parameters:

pretrained_model_name_or_path (str) – HuggingFace model identifier (e.g., “bert-base-uncased”, “modernbert-base”) or local path to a pretrained model directory.
load_weights (bool, optional) – Whether to transfer weights from the pretrained HuggingFace model. If True, copies all layer parameters. If False, only loads the configuration and initializes a fresh model with random weights. Defaults to True.
add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Useful for sequence-level classification tasks. Defaults to False.
attn_implementation (str, optional) – Attention backend. One of “flash_attention_2”, “sdpa”, or “eager”. Defaults to “sdpa”.

Returns:

A BertBlocks model with architecture matched to the source HuggingFace model,: optionally loaded with pretrained weights.

Return type:

BertBlocksModel

Raises:

ValueError – If the model type is not supported or cannot be detected.

Model-Specific Loaders¶

bertblocks.integration.load_modernbert.from_modernbert_model( orig_model: ModernBertModel, add_pooling_layer: bool = False, attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa', ) → BertBlocksModel[source]¶

Instantiate an equivalent BertBlocks model from a HuggingFace ModernBERT model instance.

Parameters:

orig_model – An instance of a HuggingFace ModernBertModel.
add_pooling_layer (bool, optional) – Whether to add a pooling layer. Defaults to False.
attn_implementation (str, optional) – Attention backend. One of “flash_attention_2”, “sdpa”, or “eager”. Defaults to “sdpa”.

Returns:

A BertBlocks model with architecture matched to ModernBERT,: loaded with pretrained weights.

Return type:

BertBlocksModel

bertblocks.integration.load_bert.from_bert_model( orig_model: BertModel, add_pooling_layer: bool = False, attn_implementation: Literal['flash_attention_2', 'sdpa', 'eager'] = 'sdpa', ) → BertBlocksModel[source]¶

Instantiate an equivalent BertBlocks model from a HuggingFace BERT model instance.

Converts a HuggingFace BERT model to BertBlocks architecture with weight transfer. The BertBlocks model uses post-normalization and standard MLP architecture to match BERT.

Parameters:

orig_model (BertModel) – An instance of a HuggingFace BertModel that has been loaded with pretrained weights.
add_pooling_layer (bool, optional) – Whether to add a pooling layer that processes the [CLS] token. Defaults to False.

Returns:

A BertBlocks model with architecture matched to BERT,: loaded with pretrained weights.

Return type:

BertBlocksModel

References

“BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” (https://arxiv.org/abs/1810.04805)