Embeddings Layers¶
A simple embedding layer. |
|
An extendable embedding layer + tokenizer. |
|
Rotary embedding layer. |
- class Embedding(num_embeddings, embedding_dim, init_scale=0.02, **kwargs)[source]¶
A simple extension of
torch.nn.Embedding
to allow more control over the weights initializer. The learnable weights of the module of shape (num_embeddings, embedding_dim) are initialized from \(\mathcal{N}(0, \text{init_scale})\).- Parameters:
num_embeddings (int) – size of the dictionary of embeddings
embedding_dim (int) – the size of each embedding vector
init_scale (float, optional) – standard deviation of the normal distribution used for the initialization. Defaults to 0.02, which is the default value used in most transformer models
**kwargs – Additional arguments. Refer to the documentation of
torch.nn.Embedding
for details
- class InfiniteVocabEmbedding(embedding_dim, init_scale=0.02)[source]¶
Embedding layer with a vocabulary that can be extended. Vocabulary is saved along with the model, and is reloaded when the state_dict is loaded. This is useful when the vocabulary is dynamically generated, e.g. from a dataset. For this reason this class also plays the role of the tokenizer.
This layer is initially lazy, i.e. it does not have a weight matrix. The weight matrix is initialized when:
The vocabulary is initialized via
initialize_vocab()
.or The model is loaded from a checkpoint that contains the vocabulary.
If the vocabulary is initialized before
load_state_dict()
is called, an error will be raised if the vocabulary in the checkpoint does not match the vocabulary in the model. The order of the words in the vocabulary does not matter, as long as the words are the same.If you would like to create a new variant of an existing
InfiniteVocabEmbedding
(that you loaded from a checkpoint), you can use:extend_vocab()
to add new words to the vocabulary. The embeddings for the new
words will be initialized randomly.
subset_vocab()
to select a subset of the vocabulary. The embeddings for the
selected words will be copied from the original embeddings, and the ids for the selected words will change and
tokenizer()
will be updated accordingly.This module also plays the role of the tokenizer, which is accessible via
tokenizer()
, and is a Callable.Warning
If you are only interested in loading a subset of words from a checkpoint, do not call
initialize_vocab()
, first load the checkpoint then usesubset_vocab()
.- Parameters:
- initialize_vocab(vocab)[source]¶
Initialize the vocabulary with a list of words. This method should be called only once, and before the model is trained. If you would like to add new words to the vocabulary, use
extend_vocab()
instead.Note
A special word “NA” will always be in the vocabulary, and is assigned the index 0. 0 is used for padding.
- Parameters:
vocab (List[str]) – A list of words to initialize the vocabulary.
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding.vocab OrderedDict([('NA', 0), ('apple', 1), ('banana', 2), ('cherry', 3)]) >>> embedding.weight.shape torch.Size([4, 64])
- extend_vocab(vocab, exist_ok=False)[source]¶
Extend the vocabulary with a list of words. If a word already exists in the vocabulary, an error will be raised. The embeddings for the new words will be initialized randomly, and new ids will be assigned to the new words.
- Parameters:
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding InfiniteVocabEmbedding(embedding_dim=64, num_embeddings=4) >>> new_words = ["date", "elderberry", "fig"] >>> embedding.extend_vocab(new_words) InfiniteVocabEmbedding(embedding_dim=64, num_embeddings=7) >>> embedding.vocab OrderedDict([('NA', 0), ('apple', 1), ('banana', 2), ('cherry', 3), ('date', 4), ('elderberry', 5), ('fig', 6)]) >>> embedding.weight.shape torch.Size([7, 64])
- subset_vocab(vocab, inplace=True)[source]¶
Select a subset of the vocabulary. The embeddings for the selected words will be copied from the original embeddings, and the ids for the selected words will be updated accordingly.
An error will be raised if one of the words does not exist in the vocabulary.
- Parameters:
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding InfiniteVocabEmbedding(embedding_dim=64, num_embeddings=4) >>> selected_words = ["banana", "cherry"] >>> embedding.subset_vocab(selected_words) InfiniteVocabEmbedding(embedding_dim=64, num_embeddings=3) >>> embedding.vocab OrderedDict([('NA', 0), ('banana', 1), ('cherry', 2)]) >>> embedding.weight.shape torch.Size([3, 64])
- tokenizer(words)[source]¶
Convert a word or a list of words to their token indices.
- Parameters:
- Returns:
A token index or a list of token indices.
- Return type:
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding.tokenizer("banana") 2 >>> embedding.tokenizer(["apple", "cherry", "apple"]) [1, 3, 1]
- detokenizer(index)[source]¶
Convert a token index to a word.
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding.detokenizer(2) 'banana'
- is_lazy()[source]¶
Returns True if the module is not initialized.
Example
>>> from torch_brain.nn import InfiniteVocabEmbedding >>> embedding = InfiniteVocabEmbedding(64) >>> embedding.is_lazy() True >>> vocab = ["apple", "banana", "cherry"] >>> embedding.initialize_vocab(vocab) >>> embedding.is_lazy() False
- class RotaryEmbedding(dim, t_min=0.0001, t_max=4.0)[source]¶
Custom rotary positional embedding layer. This function generates sinusoids of different frequencies, which are then used to modulate the input data. Half of the dimensions are not rotated.
The frequencies are computed as follows:
\[f(i) = {t_{\min}} \cdot \frac{t_{\max}}{t_{\min}}^{2i/dim}\]To rotate the input data, use
apply_rotary_pos_emb()
.- Parameters: