File size: 3,856 Bytes
5c2ed06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
Dashycode
=========

Dashycode is a code for arbitrary strings into a restricted lowercase-alphanumeric-with-dashes character set.

For instance:

    > Dashycode.encode("What IS Dashycode, really? πŸ€”")
    'what-is-dashycode-really--3x2awuinvx5eznar3'

    > Dashycode.decode('what-is-dashycode-really--3x2awuinvx5eznar3')
    'What IS Dashycode, really? πŸ€”'

Its intended use is to reversibly store arbitrary strings in URLs or domain-names as human-readably as possible.

Dashycode is similar to other ways of encoding strings into restricted character sets, like urlencoding, punycode, or Base64. It's more human-readable than urlencoding or Base64, and can handle strings punycode can't handle.


# Features

Dashycode's output is guaranteed to be a valid domain name (ignoring length considerations). In addition to containing only lowercase alphanumeric characters and dashes, it is guaranteed to be non-empty, and to never start nor end with a dash.

    > Dashycode.encode("")
    '0--0'

    > Dashycode.encode(" ")
    '0--05'

    > Dashycode.encode("ζ—₯本θͺž")
    '0--0htdqm79vxb74'

As an encoding, Dashycode is reversible: any string will always encode to a unique output which decodes to that exact original string. Everything is preserved: capitalization, whitespace, etc.

    > Dashycode.decode("0--0")
    ''

    > Dashycode.decode("0--05")
    ' '

    > Dashycode.decode("0--0htdqm79vxb74")
    'ζ—₯本θͺž'

Dashycode is designed for human-readable text, but any data you can stuff into a JavaScript string can be encoded. However, if you primarily want to encode binary data, you should probably be using [Base32]. (Dashycode is ~20% less efficient than Base32 for max-entropy binary data.)

  [Base32]: https://en.wikipedia.org/wiki/Base32


# Readability

Dashycode tries to be maximally readable. Strings containing only lowercase alphanumeric characters are returned unmodified:

    > Dashycode.encode("lettersandnumb3rsonly")
    'lettersandnumb3rsonly'

Strings containing spaces are returned with dashes:

    > Dashycode.encode("this is a lowercase sentence")
    'this-is-a-lowercase-sentence'

Only strings with other characters (or with multiple spaces in a row) will have an additional code tacked onto the end, in a way that maximizes readability:

    > Dashycode.encode("This is a regular sentence.")
    'this-is-a-regular-sentence--32e5'

Also for readability, the code part will not contain `0`, `o`, `l`, or `1`.


# Compared to other encodings

Dashycode encodes/decodes text, like urlencoding or Punycode.

Of these, Dashycode is most similar to Punycode, in terms of readability as well as being a valid domain name. The main difference is that Punycode is not designed to encode all text, and cannot create a valid domain name if the input contains ASCII symbols.

    > punycode.encode("This is *&@^$&")
    'This is *&@^$&-'

    > Dashycode.encode("This is *&@^$&")
    'this-is--3mbqscmxi7'

Compared to urlencoding, Dashycode is much more readable.

    > encodeURIComponent("100% of sentences should be readable")
    '100%25%20of%20sentences%20should%20be%20readable'

    > Dashycode.encode("100% of sentences should be readable")
    '100-of-sentences-should-be-readable--ke'

Dashycode is only ~20% less efficient than Punycode on pure non-ASCII text:

    > punycode.encode("ζ—₯本θͺžγ―いい言θͺžγ¨ζ€γ„ます。")
    'r6j3gaa9hwd0b0h4388bzcm1md968luxbea'

    > Dashycode.encode("ζ—₯本θͺžγ―いい言θͺžγ¨ζ€γ„ます。")
    '0--0htdqm79vxb7yh5eg4389j2m52cwxb7ya5eyg2e9j2mvhitm7sw42e'

The reason for the slightly lower efficiency on non-ASCII text is to make common ASCII text very efficient:

    > Dashycode.encode("Add dash dash three to capitalize")
    'add-dash-dash-three-to-capitalize--3'

    > Dashycode.encode("CamelCase")
    'camelcase--fa'


# License

MIT license