(py package) tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
-
Updated
Jun 6, 2024 - Jupyter Notebook
(py package) tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Vision Search Engine is a sophisticated and versatile search engine designed to provide highly accurate and efficient search capabilities. Leveraging a suite of advanced algorithms and techniques, this project is equipped to handle a wide array of search functionalities, ensuring precise and relevant results.
Tools and resources for the computational processing of Nheengatu (Modern Tupi)
Easy token price estimates for LLMs
A package to download and preprocess a Wikipedia dump, in any language.
Create, manage and earn by creating gated-contents, track subscription made for contents.
๐ซ Industrial-strength Natural Language Processing (NLP) in Python
Basis Theory Developer Documentation
Code for Zero-Shot Tokenizer Transfer
This repo contains my work & The code base for this TensorFlow Developer specialization offered by deeplearning.AI
TokenScript schema, specs and paper
A Python library for interacting with TI-(e)z80 (82/83/84 series) calculator files
This repository explores the process of automatic text summarization using traditional methods and modern NLP models. It includes steps for text cleaning, word frequency analysis, and summarization, along with a comparison of summaries generated by different transformer models.
Build and tokenize your own smart contract factory using Fundi, Openzeppelin, and Chainlink contracts with Foundry framework on Etherum/Base Sepolia
ERC-3643 - Raptor Version is a simple, educational look at the T-REX standard. Using Solidity and Web3, this project demystifies tokenized securities. Remember, Raptor is for learning, not production. Dive in for an accessible peek into blockchain finance!
Sudachi in Rust ๐ฆ and new generation of SudachiPy
serverless โ๏ธ ๐ , pseudonymizing proxy between Worklytics and your workplace ๐ผ SaaS data sources' APIs. Data Loss Prevention (DLP) ๐ก๏ธ๐ and compliance layer deployable to AWS Lambda or GCP Cloud Functions.
Fast and memory-efficient library for WordPiece tokenization as it is used by BERT.
Add a description, image, and links to the tokenization topic page so that developers can more easily learn about it.
To associate your repository with the tokenization topic, visit your repo's landing page and select "manage topics."