Skip to main content
Home

Built and signed on GitHub Actions

Works with
This package works with Cloudflare Workers, Node.js, Deno, Bun, Browsers
This package works with Cloudflare Workers
This package works with Node.js
This package works with Deno
This package works with Bun
This package works with Browsers
JSR Score100%
Downloads68/wk
Publisheda year ago (0.1.2)

The cheerio port of readability.js

class Readability

Readability is the main class of the library.

It exposes a single method, parse, which takes an HTML string and returns an object containing the extracted content.

Constructors

new
Readability(
options?: Partial<Options>,
)

Properties

private
$: CheerioAPI
private
articleByline: string | null
private
articleDir: string | null
private
articleLang: string | null
private
articleTitle: string | null
private
attempts: any[]
private
flags: number

Methods

private
addContentScore(element: Candidate)
private
clean(
tags: string[],
): void
private
cleanClasses($el: Cheerio<Element>): void

Removes the class="" attribute from every element in the given subtree, except those that match the classesToPreserve array from the options object.

private
cleanConditionally(
tag: string,
): void

Clean an element of all tags of type "tag" if they look fishy. "Fishy" is an algorithm based on content length, classnames, link density, number of images & embeds, etc.

private
cleanEmbeds($el: Cheerio<Element>): void
private
cleanHeaders($el: Cheerio<Element>): void

Clean out spurious headers from an Element.

private
fixLazyImages($el: Cheerio<Element>): void
private
flagIsActive(flag: number)

Return an object indicating how many rows and columns this table has.

private
getTextDensity(
tags: string[],
)
private
grabArticle()

Check if this node is an H1 or H2 element whose content is mostly the same as the article title.

log(...args: any[]): void
private
markDataTables($el: Cheerio<Element>): void

Look for 'data' (as opposed to 'layout') tables, for which we use similar checks as https://searchfox.org/mozilla-central/rev/f82d5c549f046cb64ce5602bfd894b7ae807c8f8/accessible/generic/TableAccessible.cpp#19

Runs readability.

Workflow:

  1. Prep the document by removing script tags, css, etc.
  2. Build readability's DOM tree.
  3. Grab the article content from the current dom tree.
  4. Replace the current DOM tree with the new one.
  5. Read peacefully.
private
postProcessContent($articleContent: Cheerio<Element> | null): void
private
prepArticle($articleContent: Cheerio<Element>): void
private
removeFlag(flag: number): void

Report package

Please provide a reason for reporting this package. We will review your report and take appropriate action.

Please review the JSR usage policy before submitting a report.

Add Package

deno add jsr:@paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";
or

Import directly with a jsr specifier

import { Readability } from "jsr:@paoramen/cheer-reader";

Add Package

pnpm i jsr:@paoramen/cheer-reader
or (using pnpm 10.8 or older)
pnpm dlx jsr add @paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";

Add Package

yarn add jsr:@paoramen/cheer-reader
or (using Yarn 4.8 or older)
yarn dlx jsr add @paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";

Add Package

vlt install jsr:@paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";

Add Package

npx jsr add @paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";

Add Package

bunx jsr add @paoramen/cheer-reader

Import symbol

import { Readability } from "@paoramen/cheer-reader";