Spaces:

reisarod
/

gradio

Runtime error

App Files Files Community

gradio / node_modules /tldts /README.md

reisarod

Upload folder using huggingface_hub

5fae594 verified 4 months ago

preview code

raw

history blame contribute delete

11.2 kB

	# tldts - Blazing Fast URL Parsing

	`tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.

	Features:

	1. Tuned for performance (order of 0.1 to 1 μs per input)
	2. Handles both URLs and hostnames
	3. Full Unicode/IDNA support
	4. Support parsing email addresses
	5. Detect IPv4 and IPv6 addresses
	6. Continuously updated version of the public suffix list
	7. TypeScript, ships with `umd`, `esm`, `cjs` bundles and _type definitions_
	8. Small bundles and small memory footprint
	9. Battle tested: full test coverage and production use

	# Install

	```bash
	npm install --save tldts
	```

	# Usage

	Using the command-line interface:

	```js
	$ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
	{
	"domain": "writethedocs.org",
	"domainWithoutSuffix": "writethedocs",
	"hostname": "www.writethedocs.org",
	"isIcann": true,
	"isIp": false,
	"isPrivate": false,
	"publicSuffix": "org",
	"subdomain": "www"
	}
	```

	Programmatically:

	```js
	const { parse } = require('tldts');

	// Retrieving hostname related informations of a given URL
	parse('http://www.writethedocs.org/conf/eu/2017/');
	// { domain: 'writethedocs.org',
	// domainWithoutSuffix: 'writethedocs',
	// hostname: 'www.writethedocs.org',
	// isIcann: true,
	// isIp: false,
	// isPrivate: false,
	// publicSuffix: 'org',
	// subdomain: 'www' }
	```

	Modern _ES6 modules import_ is also supported:

	```js
	import { parse } from 'tldts';
	```

	Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts

	# API

	- `tldts.parse(url \| hostname, options)`
	- `tldts.getHostname(url \| hostname, options)`
	- `tldts.getDomain(url \| hostname, options)`
	- `tldts.getPublicSuffix(url \| hostname, options)`
	- `tldts.getSubdomain(url, \| hostname, options)`
	- `tldts.getDomainWithoutSuffix(url \| hostname, options)`

	The behavior of `tldts` can be customized using an `options` argument for all
	the functions exposed as part of the public API. This is useful to both change
	the behavior of the library as well as fine-tune the performance depending on
	your inputs.

	```js
	{
	// Use suffixes from ICANN section (default: true)
	allowIcannDomains: boolean;
	// Use suffixes from Private section (default: false)
	allowPrivateDomains: boolean;
	// Extract and validate hostname (default: true)
	// When set to `false`, inputs will be considered valid hostnames.
	extractHostname: boolean;
	// Validate hostnames after parsing (default: true)
	// If a hostname is not valid, not further processing is performed. When set
	// to `false`, inputs to the library will be considered valid and parsing will
	// proceed regardless.
	validateHostname: boolean;
	// Perform IP address detection (default: true).
	detectIp: boolean;
	// Assume that both URLs and hostnames can be given as input (default: true)
	// If set to `false` we assume only URLs will be given as input, which
	// speed-ups processing.
	mixedInputs: boolean;
	// Specifies extra valid suffixes (default: null)
	validHosts: string[] \| null;
	}
	```

	The `parse` method returns handy properties about a URL or a hostname.

	```js
	const tldts = require('tldts');

	tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
	// { domain: 'amazonaws.com',
	// domainWithoutSuffix: 'amazonaws',
	// hostname: 'spark-public.s3.amazonaws.com',
	// isIcann: true,
	// isIp: false,
	// isPrivate: false,
	// publicSuffix: 'com',
	// subdomain: 'spark-public.s3' }

	tldts.parse(
	'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
	{ allowPrivateDomains: true },
	);
	// { domain: 'spark-public.s3.amazonaws.com',
	// domainWithoutSuffix: 'spark-public',
	// hostname: 'spark-public.s3.amazonaws.com',
	// isIcann: false,
	// isIp: false,
	// isPrivate: true,
	// publicSuffix: 's3.amazonaws.com',
	// subdomain: '' }

	tldts.parse('gopher://domain.unknown/');
	// { domain: 'domain.unknown',
	// domainWithoutSuffix: 'domain',
	// hostname: 'domain.unknown',
	// isIcann: false,
	// isIp: false,
	// isPrivate: true,
	// publicSuffix: 'unknown',
	// subdomain: '' }

	tldts.parse('https://192.168.0.0'); // IPv4
	// { domain: null,
	// domainWithoutSuffix: null,
	// hostname: '192.168.0.0',
	// isIcann: null,
	// isIp: true,
	// isPrivate: null,
	// publicSuffix: null,
	// subdomain: null }

	tldts.parse('https://[::1]'); // IPv6
	// { domain: null,
	// domainWithoutSuffix: null,
	// hostname: '::1',
	// isIcann: null,
	// isIp: true,
	// isPrivate: null,
	// publicSuffix: null,
	// subdomain: null }

	tldts.parse('[email protected]'); // email
	// { domain: 'emailprovider.co.uk',
	// domainWithoutSuffix: 'emailprovider',
	// hostname: 'emailprovider.co.uk',
	// isIcann: true,
	// isIp: false,
	// isPrivate: false,
	// publicSuffix: 'co.uk',
	// subdomain: '' }
	```

	\| Property Name \| Type \| Description \|
	\| :-------------------- \| :----- \| :---------------------------------------------- \|
	\| `hostname` \| `str` \| `hostname` of the input extracted automatically \|
	\| `domain` \| `str` \| Domain (tld + sld) \|
	\| `domainWithoutSuffix` \| `str` \| Domain without public suffix \|
	\| `subdomain` \| `str` \| Sub domain (what comes after `domain`) \|
	\| `publicSuffix` \| `str` \| Public Suffix (tld) of `hostname` \|
	\| `isIcann` \| `bool` \| Does TLD come from ICANN part of the list \|
	\| `isPrivate` \| `bool` \| Does TLD come from Private part of the list \|
	\| `isIP` \| `bool` \| Is `hostname` an IP address? \|

	## Single purpose methods

	These methods are shorthands if you want to retrieve only a single value (and
	will perform better than `parse` because less work will be needed).

	### getHostname(url \| hostname, options?)

	Returns the hostname from a given string.

	```javascript
	const { getHostname } = require('tldts');

	getHostname('google.com'); // returns `google.com`
	getHostname('fr.google.com'); // returns `fr.google.com`
	getHostname('fr.google.google'); // returns `fr.google.google`
	getHostname('foo.google.co.uk'); // returns `foo.google.co.uk`
	getHostname('t.co'); // returns `t.co`
	getHostname('fr.t.co'); // returns `fr.t.co`
	getHostname(
	'https://user:[email protected]:8080/some/path?and&query#hash',
	); // returns `example.co.uk`
	```

	### getDomain(url \| hostname, options?)

	Returns the fully qualified domain from a given string.

	```javascript
	const { getDomain } = require('tldts');

	getDomain('google.com'); // returns `google.com`
	getDomain('fr.google.com'); // returns `google.com`
	getDomain('fr.google.google'); // returns `google.google`
	getDomain('foo.google.co.uk'); // returns `google.co.uk`
	getDomain('t.co'); // returns `t.co`
	getDomain('fr.t.co'); // returns `t.co`
	getDomain('https://user:[email protected]:8080/some/path?and&query#hash'); // returns `example.co.uk`
	```

	### getDomainWithoutSuffix(url \| hostname, options?)

	Returns the domain (as returned by `getDomain(...)`) without the public suffix part.

	```javascript
	const { getDomainWithoutSuffix } = require('tldts');

	getDomainWithoutSuffix('google.com'); // returns `google`
	getDomainWithoutSuffix('fr.google.com'); // returns `google`
	getDomainWithoutSuffix('fr.google.google'); // returns `google`
	getDomainWithoutSuffix('foo.google.co.uk'); // returns `google`
	getDomainWithoutSuffix('t.co'); // returns `t`
	getDomainWithoutSuffix('fr.t.co'); // returns `t`
	getDomainWithoutSuffix(
	'https://user:[email protected]:8080/some/path?and&query#hash',
	); // returns `example`
	```

	### getSubdomain(url \| hostname, options?)

	Returns the complete subdomain for a given string.

	```javascript
	const { getSubdomain } = require('tldts');

	getSubdomain('google.com'); // returns ``
	getSubdomain('fr.google.com'); // returns `fr`
	getSubdomain('google.co.uk'); // returns ``
	getSubdomain('foo.google.co.uk'); // returns `foo`
	getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo`
	getSubdomain('t.co'); // returns ``
	getSubdomain('fr.t.co'); // returns `fr`
	getSubdomain(
	'https://user:[email protected]:443/some/path?and&query#hash',
	); // returns `secure`
	```

	### getPublicSuffix(url \| hostname, options?)

	Returns the [public suffix][] for a given string.

	```javascript
	const { getPublicSuffix } = require('tldts');

	getPublicSuffix('google.com'); // returns `com`
	getPublicSuffix('fr.google.com'); // returns `com`
	getPublicSuffix('google.co.uk'); // returns `co.uk`
	getPublicSuffix('s3.amazonaws.com'); // returns `com`
	getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns `s3.amazonaws.com`
	getPublicSuffix('tld.is.unknown'); // returns `unknown`
	```

	# Troubleshooting

	## Retrieving subdomain of `localhost` and custom hostnames

	`tldts` methods `getDomain` and `getSubdomain` are designed to work only with _known and valid_ TLDs.
	This way, you can trust what a domain is.

	`localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:

	```js
	const tldts = require('tldts');

	tldts.getDomain('localhost'); // returns null
	tldts.getSubdomain('vhost.localhost'); // returns null

	tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
	tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
	```

	## Updating the TLDs List

	`tldts` made the opinionated choice of shipping with a list of suffixes directly
	in its bundle. There is currently no mechanism to update the lists yourself, but
	we make sure that the version shipped is always up-to-date.

	If you keep `tldts` updated, the lists should be up-to-date as well!

	# Performance

	`tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).

	Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.

	## Contributors

	`tldts` is based upon the excellent `tld.js` library and would not exist without
	the many contributors who worked on the project:
	<a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>

	This project would not be possible without the amazing Mozilla's
	[public suffix list][]. Thank you for your hard work!

	# License

	[MIT License](LICENSE).

	[badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
	[badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
	[public suffix list]: https://publicsuffix.org/list/
	[list the recent changes]: https://github.com/publicsuffix/list/commits/master
	[changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
	[public suffix]: https://publicsuffix.org/learn/