Transformers.js documentation
utils/data-structures
utils/data-structures
Custom data structures.
These are only used internally, meaning an end-user shouldnβt need to access anything here.
- utils/data-structures
- static
- .PriorityQueue
new PriorityQueue(comparator)
.size
.isEmpty()
βboolean
.peek()
βany
.push(...values)
βnumber
.extend(values)
βnumber
.pop()
βany
.replace(value)
β*
._siftUpFrom(node)
- .CharTrie
- .TokenLattice
new TokenLattice(sentence, bosTokenId, eosTokenId)
.insert(pos, length, score, tokenId)
.viterbi()
βArray.<TokenLatticeNode>
.piece(node)
βstring
.tokens()
βArray.<string>
.tokenIds()
βArray.<number>
- .DictionarySplitter
new DictionarySplitter(dictionary)
.split(text)
βArray.<string>
- .LRUCache
- .PriorityQueue
- inner
- ~CharTrieNode
new CharTrieNode(isLeaf, children)
.default()
βCharTrieNode
- ~TokenLatticeNode
new TokenLatticeNode(tokenId, nodeId, pos, length, score)
.clone()
βTokenLatticeNode
- ~CharTrieNode
- static
utils/data-structures.PriorityQueue
Efficient Heap-based Implementation of a Priority Queue.
It uses an array-based binary heap, where the root is at index 0
, and the
children of node i
are located at indices 2i + 1
and 2i + 2
, respectively.
Adapted from the following sources:
- https://stackoverflow.com/a/42919752/13989043 (original)
- https://github.com/belladoreai/llama-tokenizer-js (minor improvements)
Kind: static class of utils/data-structures
- .PriorityQueue
new PriorityQueue(comparator)
.size
.isEmpty()
βboolean
.peek()
βany
.push(...values)
βnumber
.extend(values)
βnumber
.pop()
βany
.replace(value)
β*
._siftUpFrom(node)
new PriorityQueue(comparator)
Create a new PriorityQueue.
Param | Type | Description |
---|---|---|
comparator | function | Comparator function to determine priority. Defaults to a MaxHeap. |
priorityQueue.size
The size of the queue
Kind: instance property of PriorityQueue
priorityQueue.isEmpty() β <code> boolean </code>
Check if the queue is empty.
Kind: instance method of PriorityQueue
Returns: boolean
- true
if the queue is empty, false
otherwise.
priorityQueue.peek() β <code> any </code>
Return the element with the highest priority in the queue.
Kind: instance method of PriorityQueue
Returns: any
- The highest priority element in the queue.
priorityQueue.push(...values) β <code> number </code>
Add one or more elements to the queue.
Kind: instance method of PriorityQueue
Returns: number
- The new size of the queue.
Param | Type | Description |
---|---|---|
...values | any | The values to push into the queue. |
priorityQueue.extend(values) β <code> number </code>
Add multiple elements to the queue.
Kind: instance method of PriorityQueue
Returns: number
- The new size of the queue.
Param | Type | Description |
---|---|---|
values | Array.<any> | The values to push into the queue. |
priorityQueue.pop() β <code> any </code>
Remove and return the element with the highest priority in the queue.
Kind: instance method of PriorityQueue
Returns: any
- The element with the highest priority in the queue.
priorityQueue.replace(value) β <code> * </code>
Replace the element with the highest priority in the queue with a new value.
Kind: instance method of PriorityQueue
Returns: *
- The replaced value.
Param | Type | Description |
---|---|---|
value | * | The new value. |
priorityQueue._siftUpFrom(node)
Helper function to sift up from a given node.
Kind: instance method of PriorityQueue
Param | Type | Description |
---|---|---|
node | number | The index of the node to start sifting up from. |
utils/data-structures.CharTrie
A trie structure to efficiently store and search for strings.
Kind: static class of utils/data-structures
charTrie.extend(texts)
Adds one or more texts
to the trie.
Kind: instance method of CharTrie
Param | Type | Description |
---|---|---|
texts | Array.<string> | The strings to add to the trie. |
charTrie.push(text)
Adds text to the trie.
Kind: instance method of CharTrie
Param | Type | Description |
---|---|---|
text | string | The string to add to the trie. |
charTrie.commonPrefixSearch(text)
Searches the trie for all strings with a common prefix of text
.
Kind: instance method of CharTrie
Param | Type | Description |
---|---|---|
text | string | The common prefix to search for. |
utils/data-structures.TokenLattice
A lattice data structure to be used for tokenization.
Kind: static class of utils/data-structures
- .TokenLattice
new TokenLattice(sentence, bosTokenId, eosTokenId)
.insert(pos, length, score, tokenId)
.viterbi()
βArray.<TokenLatticeNode>
.piece(node)
βstring
.tokens()
βArray.<string>
.tokenIds()
βArray.<number>
new TokenLattice(sentence, bosTokenId, eosTokenId)
Creates a new TokenLattice instance.
Param | Type | Description |
---|---|---|
sentence | string | The input sentence to be tokenized. |
bosTokenId | number | The beginning-of-sequence token ID. |
eosTokenId | number | The end-of-sequence token ID. |
tokenLattice.insert(pos, length, score, tokenId)
Inserts a new token node into the token lattice.
Kind: instance method of TokenLattice
Param | Type | Description |
---|---|---|
pos | number | The starting position of the token. |
length | number | The length of the token. |
score | number | The score of the token. |
tokenId | number | The token ID of the token. |
tokenLattice.viterbi() β <code> Array. < TokenLatticeNode > </code>
Implements the Viterbi algorithm to compute the most likely sequence of tokens.
Kind: instance method of TokenLattice
Returns: Array.<TokenLatticeNode>
- The most likely sequence of tokens.
tokenLattice.piece(node) β <code> string </code>
Kind: instance method of TokenLattice
Returns: string
- The array of nodes representing the most likely sequence of tokens.
Param | Type |
---|---|
node | TokenLatticeNode |
tokenLattice.tokens() β <code> Array. < string > </code>
Kind: instance method of TokenLattice
Returns: Array.<string>
- The most likely sequence of tokens.
tokenLattice.tokenIds() β <code> Array. < number > </code>
Kind: instance method of TokenLattice
Returns: Array.<number>
- The most likely sequence of token ids.
utils/data-structures.DictionarySplitter
A data structure which uses a trie to split a string into tokens based on a dictionary. It can also use a regular expression to preprocess the input text before splitting.
NOTE: To ensure multi-byte characters are handled correctly, we operate at byte-level instead of character-level.
Kind: static class of utils/data-structures
- .DictionarySplitter
new DictionarySplitter(dictionary)
.split(text)
βArray.<string>
new DictionarySplitter(dictionary)
Param | Type | Description |
---|---|---|
dictionary | Array.<string> | The dictionary of words to use for splitting. |
dictionarySplitter.split(text) β <code> Array. < string > </code>
Splits the input text into tokens based on the dictionary.
Kind: instance method of DictionarySplitter
Returns: Array.<string>
- An array of tokens.
Param | Type | Description |
---|---|---|
text | string | The input text to split. |
utils/data-structures.LRUCache
A simple Least Recently Used (LRU) cache implementation in JavaScript. This cache stores key-value pairs and evicts the least recently used item when the capacity is exceeded.
Kind: static class of utils/data-structures
new LRUCache(capacity)
Creates an LRUCache instance.
Param | Type | Description |
---|---|---|
capacity | number | The maximum number of items the cache can hold. |
lruCache.get(key) β <code> any </code>
Retrieves the value associated with the given key and marks the key as recently used.
Kind: instance method of LRUCache
Returns: any
- The value associated with the key, or undefined if the key does not exist.
Param | Type | Description |
---|---|---|
key | any | The key to retrieve. |
lruCache.put(key, value)
Inserts or updates the key-value pair in the cache. If the key already exists, it is updated and marked as recently used. If the cache exceeds its capacity, the least recently used item is evicted.
Kind: instance method of LRUCache
Param | Type | Description |
---|---|---|
key | any | The key to add or update. |
value | any | The value to associate with the key. |
lruCache.clear()
Clears the cache.
Kind: instance method of LRUCache
utils/data-structures~CharTrieNode
Represents a node in a character trie.
Kind: inner class of utils/data-structures
- ~CharTrieNode
new CharTrieNode(isLeaf, children)
.default()
βCharTrieNode
new CharTrieNode(isLeaf, children)
Create a new CharTrieNode.
Param | Type | Description |
---|---|---|
isLeaf | boolean | Whether the node is a leaf node or not. |
children | Map.<string, CharTrieNode> | A map containing the node's children, where the key is a character and the value is a |
CharTrieNode.default() β <code> CharTrieNode </code>
Returns a new CharTrieNode
instance with default values.
Kind: static method of CharTrieNode
Returns: CharTrieNode
- A new CharTrieNode
instance with isLeaf
set to false
and an empty children
map.
utils/data-structures~TokenLatticeNode
Kind: inner class of utils/data-structures
- ~TokenLatticeNode
new TokenLatticeNode(tokenId, nodeId, pos, length, score)
.clone()
βTokenLatticeNode
new TokenLatticeNode(tokenId, nodeId, pos, length, score)
Represents a node in a token lattice for a given sentence.
Param | Type | Description |
---|---|---|
tokenId | number | The ID of the token associated with this node. |
nodeId | number | The ID of this node. |
pos | number | The starting position of the token in the sentence. |
length | number | The length of the token. |
score | number | The score associated with the token. |
tokenLatticeNode.clone() β <code> TokenLatticeNode </code>
Returns a clone of this node.
Kind: instance method of TokenLatticeNode
Returns: TokenLatticeNode
- A clone of this node.
< > Update on GitHub