Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an alternative that doesn't use node.js's file system module? #40

Open
zhiyi-zhang-duke opened this issue May 20, 2023 · 6 comments

Comments

@zhiyi-zhang-duke
Copy link

zhiyi-zhang-duke commented May 20, 2023

Hi everyone, this is currently the de facto way to encode and decode strings in javascript. But I cannot import this node library in a chrome extension because it relies on the fs module:

Uncaught TypeError: fs.readFileSync is not a function
    at ./node_modules/gpt-3-encoder/Encoder.js (Encoder.js:5:1)
    at __webpack_require__ (bootstrap:19:1)
    at ./node_modules/gpt-3-encoder/index.js (index.js:1:28)
    at __webpack_require__ (bootstrap:19:1)
    at make namespace object:7:1
    at popup.js:22:2
    at popup.js:22:2

This is problematic since a great use case for this library is to count token usage BEFORE sending text to chatgpt. Is there any way this library can be made to not rely on fs so that it can be imported in a chrome extension like setting?

@niieani
Copy link

niieani commented May 24, 2023

See my package that started off as a fork of this one: gpt-tokenizer.

@iskandarreza
Copy link

If you just want to count tokens, you could further trim down this package I made, which was forked off of this as well: gpt-tok

@niieani
Copy link

niieani commented May 26, 2023

Just FYI @iskandarreza your package is based on this one, which means it doesn't support counting tokens for gpt-3.5-turbo or gpt-4, only for the older models.
I've made a full rewrite for version 2.0 of gpt-tokenizer to include support for these newer models. It includes a function to count tokens efficiently.
You can see the playground at gpt-tokenizer.dev

@iskandarreza
Copy link

Thank you for the heads up @niieani
I will prepare to make updates to support gpt-3.5-turbo in the near future. This was a quick and light implementation that works for my current simple use case but I foresee needing to add at least support for gpt-3.5-turbo sooner rather than later.

@zhiyi-zhang-duke
Copy link
Author

@niieani Thank you for the answer! I tried your fork and it works well. It's not exactly what they're putting out here:
https://platform.openai.com/tokenizer

But it's pretty close and that's good enough for me. (Maybe I'm not using the right one? That's definitely a possibility, I'm using this property in entry of webpack.config.js:
'gpt-tokenizer': './node_modules/gpt-tokenizer/dist/cl100k_base.js',

@niieani
Copy link

niieani commented May 28, 2023

Yeah the one on the OpenAI is using the older tokenizer encoding on their website, not the cl100k. They use p50k. You also shouldn’t need to alias to the dist directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants