File size: 7,125 Bytes
5fae594
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
# clarinet

`clarinet` is a sax-like streaming parser for JSON. works in the browser and node.js. `clarinet` is inspired (and forked) from [sax-js][saxjs]. just like you shouldn't use `sax` when you need `dom` you shouldn't use `clarinet` when you need `JSON.parse`. for a more detailed introduction and a performance study please refer to this [article][blog]. 

# design goals

`clarinet` is very much like [yajl] but written in javascript:

* written in javascript
* portable
* robust (~110 tests pass before even announcing the project)
* data representation independent
* fast
* generates verbose, useful error messages including context of where
   the error occurs in the input text.
* can parse json data off a stream, incrementally
* simple to use
* tiny

# motivation

the reason behind this work was to create better full text support in node. creating indexes out of large (or many) json files doesn't require a full understanding of the json file, but it does require something like `clarinet`.

# installation

## node.js

1. install [npm]
2. `npm install clarinet`
3. `var clarinet = require('clarinet');`

## browser

1. minimize clarinet.js
2. load it into your webpage

# usage

## basics

``` js
var clarinet = require("clarinet")
  , parser = clarinet.parser()
  ;

parser.onerror = function (e) {
  // an error happened. e is the error.
};
parser.onvalue = function (v) {
  // got some value.  v is the value. can be string, double, bool, or null.
};
parser.onopenobject = function (key) {
  // opened an object. key is the first key.
};
parser.onkey = function (key) {
  // got a key in an object.
};
parser.oncloseobject = function () {
  // closed an object.
};
parser.onopenarray = function () {
  // opened an array.
};
parser.onclosearray = function () {
  // closed an array.
};
parser.onend = function () {
  // parser stream is done, and ready to have more stuff written to it.
};

parser.write('{"foo": "bar"}').close();
```

``` js
// stream usage
// takes the same options as the parser
var stream = require("clarinet").createStream(options);
stream.on("error", function (e) {
  // unhandled errors will throw, since this is a proper node
  // event emitter.
  console.error("error!", e)
  // clear the error
  this._parser.error = null
  this._parser.resume()
})
stream.on("openobject", function (node) {
  // same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.json")
  .pipe(stream)
  .pipe(fs.createReadStream("file-altered.json"))
```

## arguments

pass the following arguments to the parser function.  all are optional.

`opt` - object bag of settings regarding string formatting.  all default to `false`.

settings supported:

* `trim` - boolean. whether or not to trim text and comment nodes.
* `normalize` - boolean. if true, then turn any whitespace into a single
  space.

## methods

`write` - write bytes onto the stream. you don't have to do this all at
once. you can keep writing as much as you want.

`close` - close the stream. once closed, no more data may be written until
it is done processing the buffer, which is signaled by the `end` event.

`resume` - to gracefully handle errors, assign a listener to the `error`
event. then, when the error is taken care of, you can call `resume` to
continue parsing. otherwise, the parser will not continue while in an error
state.

## members

at all times, the parser object will have the following members:

`line`, `column`, `position` - indications of the position in the json
document where the parser currently is looking.

`closed` - boolean indicating whether or not the parser can be written to.
if it's `true`, then wait for the `ready` event to write again.

`opt` - any options passed into the constructor.

and a bunch of other stuff that you probably shouldn't touch.

## events

all events emit with a single argument. to listen to an event, assign a
function to `on<eventname>`. functions get executed in the this-context of
the parser object. the list of supported events are also in the exported
`EVENTS` array.

when using the stream interface, assign handlers using the `EventEmitter`
`on` function in the normal fashion.

`error` - indication that something bad happened. the error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. by
listening to this event, you can keep an eye on that kind of stuff. note:
this happens *much* more in strict mode. argument: instance of `Error`.

`value` - a json value. argument: value, can be a bool, null, string on number

`openobject` - object was opened. argument: key, a string with the first key of the object (if any)

`key` - an object key: argument: key, a string with the current key

`closeobject` - indication that an object was closed

`openarray` - indication that an array was opened

`closearray` - indication that an array was closed

`end` - indication that the closed stream has ended.

`ready` - indication that the stream has reset, and is ready to be written
to.

## samples

some [samples] are available to help you get started. one that creates a list of top npm contributors, and another that gets a bunch of data from twitter and generates valid json.

# roadmap

check [issues]

# contribute

everyone is welcome to contribute. patches, bug-fixes, new features

1. create an [issue][issues] so the community can comment on your idea
2. fork `clarinet`
3. create a new branch `git checkout -b my_branch`
4. create tests for the changes you made
5. make sure you pass both existing and newly inserted tests
6. commit your changes
7. push to your branch `git push origin my_branch`
8. create an pull request

helpful tips:

check `index.html`. there's two env vars you can set, `CRECORD` and `CDEBUG`. 

* `CRECORD` allows you to `record` the event sequence from a new json test so you don't have to write everything. 
* `CDEBUG` can be set to `info` or `debug`. `info` will `console.log` all emits, `debug` will `console.log` what happens to each char. 

in `test/clarinet.js` there's two lines you might want to change. `#8` where you define `seps`, if you are isolating a test you probably just want to run one sep, so change this array to `[undefined]`. `#718` which says `for (var key in docs) {` is where you can change the docs you want to run. e.g. to run `foobar` i would do something like `for (var key in {foobar:''}) {`.

# meta

* code: `git clone git://github.com/dscape/clarinet.git`
* home: <http://github.com/dscape/clarinet>
* bugs: <http://github.com/dscape/clarinet/issues>
* build: [![build status](https://secure.travis-ci.org/dscape/clarinet.png)](http://travis-ci.org/dscape/clarinet)

`(oO)--',-` in [caos]

[npm]: http://npmjs.org
[issues]: http://github.com/dscape/clarinet/issues
[caos]: http://caos.di.uminho.pt/
[saxjs]: http://github.com/isaacs/sax-js
[yajl]: https://github.com/lloyd/yajl
[samples]: https://github.com/dscape/clarinet/tree/master/samples
[blog]: http://writings.nunojob.com/2011/12/clarinet-sax-based-evented-streaming-json-parser-in-javascript-for-the-browser-and-nodejs.html