Readability is the main class of the library.
It exposes a single method, parse, which takes an HTML string and returns
an object containing the extracted content.
Readability($: CheerioAPI,options?: Partial<Options>,)
$: CheerioAPI
articleByline: string | null
articleDir: string | null
articleLang: string | null
articleTitle: string | null
attempts: any[]
addContentScore(element: Candidate)
cleanClasses($el: Cheerio<Element>): void
Removes the class="" attribute from every element in the given subtree, except those that match the classesToPreserve array from the options object.
cleanConditionally(): void
Clean an element of all tags of type "tag" if they look fishy. "Fishy" is an algorithm based on content length, classnames, link density, number of images & embeds, etc.
cleanEmbeds($el: Cheerio<Element>): void
cleanHeaders($el: Cheerio<Element>): void
Clean out spurious headers from an Element.
fixLazyImages($el: Cheerio<Element>): void
fixRelativeUris($articleContent: Cheerio<Element>): void
flagIsActive(flag: number)
getArticleMetadata(jsonld: Metadata)
getArticleTitle()
getClassWeight(el: Element)
getRowAndColumnCount($table: Cheerio<Element>)
Return an object indicating how many rows and columns this table has.
getTextDensity()
grabArticle()
headerDuplicatesTitle($el: Cheerio<Element>)
Check if this node is an H1 or H2 element whose content is mostly the same as the article title.
log(...args: any[]): void
markDataTables($el: Cheerio<Element>): void
Look for 'data' (as opposed to 'layout') tables, for which we use similar checks as https://searchfox.org/mozilla-central/rev/f82d5c549f046cb64ce5602bfd894b7ae807c8f8/accessible/generic/TableAccessible.cpp#19
Runs readability.
Workflow:
- Prep the document by removing script tags, css, etc.
- Build readability's DOM tree.
- Grab the article content from the current dom tree.
- Replace the current DOM tree with the new one.
- Read peacefully.
postProcessContent($articleContent: Cheerio<Element> | null): void
prepArticle($articleContent: Cheerio<Element>): void
removeFlag(flag: number): void