Fix exports & decoding #2

schnerd · 2020-09-08T04:01:30Z

module exports were removed in 5766918

Also, while that commit seemed to fix emoji encoding, I believe emoji decoding still doesn't work. This commit uses TextDecoder to decode (since encoding now uses TextEncoder).

Also added some quick tests and CI that runs for free in Github on all commits/PRs – hope that's okay!

schnerd · 2020-09-08T04:10:16Z

.github/workflows/node.js.yml

+    - run: npm ci
+    - run: npm run build --if-present
+    - run: npm test
+


This will run tests on commits/PRs to ensure there are no regressions

nickwalton

Awesome looks great!

I don't actually want to encode into tokens for my use case, quickly count to check my request won't exceed the limit. This should be faster since we don't initialize the memory for the output array. ``` const crypto = require('crypto'); // Generate a random string of a given length function generateRandomString(length) { return crypto.randomBytes(length).toString('hex'); } const {encode, decode, countTokens} = require('gpt-3-encoder') let str = 'This is an example sentence to try encoding out on!' // let now = Date.now(); let encoded = encode(str) console.log('Encoded this string looks like: ', encoded) console.log('We can look at each token and what it represents) let tokencount = 0; for(let token of encoded){ tokencount ++; console.log({token, string: decode([token])}) } console.log("there are n tokens: ", tokencount); let decoded = decode(encoded) console.log('We can decode it back into:\n', decoded) let now = Date.now(); // todo: write an benchmark for the above method vs int countTokens(str) str = generateRandomString(10000); console.time('fencode'); encoded = encode(str); console.log(`First encode to cache string n stuff in mem`); console.timeEnd('fencode'); console.log(`Original string length: ${str.length}`); // Benchmark the encode function console.time('encode'); encoded = encode(str); console.log(`Encoded string length: ${encoded.length}`); console.timeEnd('encode'); // Benchmark the countTokens function console.time('countTokens'); let tokenCount = countTokens(str); console.log(`Number of tokens: ${tokenCount}`); console.timeEnd('countTokens'); console.log(`Original string length: ${str.length}`); console.log(`Encoded string length: ${encoded.length}`); console.log(`Number of tokens: ${tokenCount}`); ``` ``` We can decode it back into: This is an example sentence to try encoding out on! First encode to cache string n stuff in mem fencode: 163.57ms Original string length: 20000 Encoded string length: 11993 encode: 124.265ms Number of tokens: 11993 countTokens: 29.2ms Original string length: 20000 Encoded string length: 11993 Number of tokens: 11993 ``` Co-authored-by: Kier <syonfox@users.noreply.github.com>

schnerd changed the title ~~Fix exports~~ Fix exports [wip] Sep 8, 2020

fix exports and decoding

c325df9

schnerd force-pushed the fix-exports branch from e10929b to c325df9 Compare September 8, 2020 04:09

schnerd commented Sep 8, 2020

View reviewed changes

schnerd changed the title ~~Fix exports [wip]~~ Fix exports & decoding Sep 8, 2020

nickwalton approved these changes Sep 8, 2020

View reviewed changes

nickwalton merged commit 1e87339 into latitudegames:master Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix exports & decoding #2

Fix exports & decoding #2

schnerd commented Sep 8, 2020 •

edited

Loading

schnerd Sep 8, 2020

nickwalton left a comment

Fix exports & decoding #2

Fix exports & decoding #2

Conversation

schnerd commented Sep 8, 2020 • edited Loading

schnerd Sep 8, 2020

Choose a reason for hiding this comment

nickwalton left a comment

Choose a reason for hiding this comment

schnerd commented Sep 8, 2020 •

edited

Loading