Skip to main content

Built and signed on GitHub Actions

This package works with Bun, Cloudflare Workers, Node.js, Deno, Browsers
JSR Score
70%
Published
2 months ago (0.3.2)

JSR

Unicode library for javascript/typescript

This is a library that enables unicode category matching. This will provide efficient lookup of character sets and lazily build out all tables as needed. The primary purpose of this unicode library is for writing parsers that need to understand unicode. The example below creates a set of unicode categories that could be used for say "identifiers" or "start identifiers".

Nearly all unicode tables are autogenerated from the unicode.txt file found here. Lu, Lt, Ll are copied from pyright because the unicode.txt only contains a generic L category.

For some reason, in the unicode.txt file all letters (upper and lowercase) are grouped under the same category of L&. For these the name is converted to just L and there is no distinction between upper and lower case by this package.

Usage

import { Char, Unicode, UnicodeCategory } from "@kgwinnup/unicode";

const uc = new Unicode([UnicodeCategory.Lu, UnicodeCategory.Ll]);

if (uc.lookup('A'.charCodeAt(0)) == true) {
    console.log("part of set")
}

if (uc.lookup(Char.A) == true) {
    console.log("part of set")
}

The lookup function also supports surrogate pairs.

import { Char, Unicode, UnicodeCategory } from "@kgwinnup/unicode";

const uc = new Unicode([UnicodeCategory.Lu, UnicodeCategory.Ll]);

if (uc.lookup('A'.charCodeAt(0)) == true) {
    console.log("part of set")
}

if (uc.lookup(Char.A) == true) {
    console.log("part of set")
}

const str = "𐐀";
// UnicodeCategory.L is all letters
const uc2 = new Unicode([UnicodeCategory.L], [UnicodeCategory.surrogateL]);
if (uc2.lookup(str.charCodeAt(0), str.charCodeAt(1))) {
    console.log("part of the set");
}

// https://www.compart.com/en/unicode/U+10EAD
// this char should not be a letter
const str2 = "𐺭";
if (!uc2.lookup(str2.charCodeAt(0), str2.charCodeAt(1))) {
    console.log("should not be part of the set")
}

// you can also check if the first char is a surrogate char
if (uc.isSurrogate(str2.charCodeAt(0))) {
    console.log("should be a surrogate char")
}
Built and signed on
GitHub Actions
View transparency log

Add Package

deno add @kgwinnup/unicode

Import symbol

import * as mod from "@kgwinnup/unicode";

Add Package

npx jsr add @kgwinnup/unicode

Import symbol

import * as mod from "@kgwinnup/unicode";

Add Package

yarn dlx jsr add @kgwinnup/unicode

Import symbol

import * as mod from "@kgwinnup/unicode";

Add Package

pnpm dlx jsr add @kgwinnup/unicode

Import symbol

import * as mod from "@kgwinnup/unicode";

Add Package

bunx jsr add @kgwinnup/unicode

Import symbol

import * as mod from "@kgwinnup/unicode";