Is there an alternative that doesn't use node.js's file system module? #40

zhiyi-zhang-duke · 2023-05-20T18:07:54Z

Hi everyone, this is currently the de facto way to encode and decode strings in javascript. But I cannot import this node library in a chrome extension because it relies on the fs module:

Uncaught TypeError: fs.readFileSync is not a function
    at ./node_modules/gpt-3-encoder/Encoder.js (Encoder.js:5:1)
    at __webpack_require__ (bootstrap:19:1)
    at ./node_modules/gpt-3-encoder/index.js (index.js:1:28)
    at __webpack_require__ (bootstrap:19:1)
    at make namespace object:7:1
    at popup.js:22:2
    at popup.js:22:2

This is problematic since a great use case for this library is to count token usage BEFORE sending text to chatgpt. Is there any way this library can be made to not rely on fs so that it can be imported in a chrome extension like setting?

The text was updated successfully, but these errors were encountered:

niieani · 2023-05-24T07:35:55Z

See my package that started off as a fork of this one: gpt-tokenizer.

iskandarreza · 2023-05-26T16:35:50Z

If you just want to count tokens, you could further trim down this package I made, which was forked off of this as well: gpt-tok

niieani · 2023-05-26T21:41:25Z

Just FYI @iskandarreza your package is based on this one, which means it doesn't support counting tokens for gpt-3.5-turbo or gpt-4, only for the older models.
I've made a full rewrite for version 2.0 of gpt-tokenizer to include support for these newer models. It includes a function to count tokens efficiently.
You can see the playground at gpt-tokenizer.dev

iskandarreza · 2023-05-26T21:52:30Z

Thank you for the heads up @niieani
I will prepare to make updates to support gpt-3.5-turbo in the near future. This was a quick and light implementation that works for my current simple use case but I foresee needing to add at least support for gpt-3.5-turbo sooner rather than later.

zhiyi-zhang-duke · 2023-05-28T01:41:47Z

@niieani Thank you for the answer! I tried your fork and it works well. It's not exactly what they're putting out here:
https://platform.openai.com/tokenizer

But it's pretty close and that's good enough for me. (Maybe I'm not using the right one? That's definitely a possibility, I'm using this property in entry of webpack.config.js:
'gpt-tokenizer': './node_modules/gpt-tokenizer/dist/cl100k_base.js',

niieani · 2023-05-28T04:58:16Z

Yeah the one on the OpenAI is using the older tokenizer encoding on their website, not the cl100k. They use p50k. You also shouldn’t need to alias to the dist directory.

nilsreichardt mentioned this issue Jun 23, 2023

Tokens are slightly different from OpenAI Tokenizer niieani/gpt-tokenizer#20

Closed

bderenzi mentioned this issue Oct 6, 2023

Real time token counting dimagi/open-chat-studio#31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an alternative that doesn't use node.js's file system module? #40

Is there an alternative that doesn't use node.js's file system module? #40

zhiyi-zhang-duke commented May 20, 2023 •

edited

Loading

niieani commented May 24, 2023

iskandarreza commented May 26, 2023

niieani commented May 26, 2023 •

edited

Loading

iskandarreza commented May 26, 2023

zhiyi-zhang-duke commented May 28, 2023

niieani commented May 28, 2023 •

edited

Loading

Is there an alternative that doesn't use node.js's file system module? #40

Is there an alternative that doesn't use node.js's file system module? #40

Comments

zhiyi-zhang-duke commented May 20, 2023 • edited Loading

niieani commented May 24, 2023

iskandarreza commented May 26, 2023

niieani commented May 26, 2023 • edited Loading

iskandarreza commented May 26, 2023

zhiyi-zhang-duke commented May 28, 2023

niieani commented May 28, 2023 • edited Loading

zhiyi-zhang-duke commented May 20, 2023 •

edited

Loading

niieani commented May 26, 2023 •

edited

Loading

niieani commented May 28, 2023 •

edited

Loading