Skip to content

When characters are added to the tokenizer with the init_unk: true setting, the first 2 characters are not initialized with the <unk> embeddings #1060

Description

@mmartin9684-sil

When new characters are added to the NLLB tokenizer and the init_unk configuration setting is enabled, the first 2 new characters are not initialized with the embeddings of the character.

Metadata

Metadata

Labels

bugSomething isn't workingpipeline 4: trainIssue related to training a model.

Type

No fields configured for Bug.

Projects

Status
📋 Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions