File size: 7,125 Bytes
5fae594 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
# clarinet
`clarinet` is a sax-like streaming parser for JSON. works in the browser and node.js. `clarinet` is inspired (and forked) from [sax-js][saxjs]. just like you shouldn't use `sax` when you need `dom` you shouldn't use `clarinet` when you need `JSON.parse`. for a more detailed introduction and a performance study please refer to this [article][blog].
# design goals
`clarinet` is very much like [yajl] but written in javascript:
* written in javascript
* portable
* robust (~110 tests pass before even announcing the project)
* data representation independent
* fast
* generates verbose, useful error messages including context of where
the error occurs in the input text.
* can parse json data off a stream, incrementally
* simple to use
* tiny
# motivation
the reason behind this work was to create better full text support in node. creating indexes out of large (or many) json files doesn't require a full understanding of the json file, but it does require something like `clarinet`.
# installation
## node.js
1. install [npm]
2. `npm install clarinet`
3. `var clarinet = require('clarinet');`
## browser
1. minimize clarinet.js
2. load it into your webpage
# usage
## basics
``` js
var clarinet = require("clarinet")
, parser = clarinet.parser()
;
parser.onerror = function (e) {
// an error happened. e is the error.
};
parser.onvalue = function (v) {
// got some value. v is the value. can be string, double, bool, or null.
};
parser.onopenobject = function (key) {
// opened an object. key is the first key.
};
parser.onkey = function (key) {
// got a key in an object.
};
parser.oncloseobject = function () {
// closed an object.
};
parser.onopenarray = function () {
// opened an array.
};
parser.onclosearray = function () {
// closed an array.
};
parser.onend = function () {
// parser stream is done, and ready to have more stuff written to it.
};
parser.write('{"foo": "bar"}').close();
```
``` js
// stream usage
// takes the same options as the parser
var stream = require("clarinet").createStream(options);
stream.on("error", function (e) {
// unhandled errors will throw, since this is a proper node
// event emitter.
console.error("error!", e)
// clear the error
this._parser.error = null
this._parser.resume()
})
stream.on("openobject", function (node) {
// same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.json")
.pipe(stream)
.pipe(fs.createReadStream("file-altered.json"))
```
## arguments
pass the following arguments to the parser function. all are optional.
`opt` - object bag of settings regarding string formatting. all default to `false`.
settings supported:
* `trim` - boolean. whether or not to trim text and comment nodes.
* `normalize` - boolean. if true, then turn any whitespace into a single
space.
## methods
`write` - write bytes onto the stream. you don't have to do this all at
once. you can keep writing as much as you want.
`close` - close the stream. once closed, no more data may be written until
it is done processing the buffer, which is signaled by the `end` event.
`resume` - to gracefully handle errors, assign a listener to the `error`
event. then, when the error is taken care of, you can call `resume` to
continue parsing. otherwise, the parser will not continue while in an error
state.
## members
at all times, the parser object will have the following members:
`line`, `column`, `position` - indications of the position in the json
document where the parser currently is looking.
`closed` - boolean indicating whether or not the parser can be written to.
if it's `true`, then wait for the `ready` event to write again.
`opt` - any options passed into the constructor.
and a bunch of other stuff that you probably shouldn't touch.
## events
all events emit with a single argument. to listen to an event, assign a
function to `on<eventname>`. functions get executed in the this-context of
the parser object. the list of supported events are also in the exported
`EVENTS` array.
when using the stream interface, assign handlers using the `EventEmitter`
`on` function in the normal fashion.
`error` - indication that something bad happened. the error will be hanging
out on `parser.error`, and must be deleted before parsing can continue. by
listening to this event, you can keep an eye on that kind of stuff. note:
this happens *much* more in strict mode. argument: instance of `Error`.
`value` - a json value. argument: value, can be a bool, null, string on number
`openobject` - object was opened. argument: key, a string with the first key of the object (if any)
`key` - an object key: argument: key, a string with the current key
`closeobject` - indication that an object was closed
`openarray` - indication that an array was opened
`closearray` - indication that an array was closed
`end` - indication that the closed stream has ended.
`ready` - indication that the stream has reset, and is ready to be written
to.
## samples
some [samples] are available to help you get started. one that creates a list of top npm contributors, and another that gets a bunch of data from twitter and generates valid json.
# roadmap
check [issues]
# contribute
everyone is welcome to contribute. patches, bug-fixes, new features
1. create an [issue][issues] so the community can comment on your idea
2. fork `clarinet`
3. create a new branch `git checkout -b my_branch`
4. create tests for the changes you made
5. make sure you pass both existing and newly inserted tests
6. commit your changes
7. push to your branch `git push origin my_branch`
8. create an pull request
helpful tips:
check `index.html`. there's two env vars you can set, `CRECORD` and `CDEBUG`.
* `CRECORD` allows you to `record` the event sequence from a new json test so you don't have to write everything.
* `CDEBUG` can be set to `info` or `debug`. `info` will `console.log` all emits, `debug` will `console.log` what happens to each char.
in `test/clarinet.js` there's two lines you might want to change. `#8` where you define `seps`, if you are isolating a test you probably just want to run one sep, so change this array to `[undefined]`. `#718` which says `for (var key in docs) {` is where you can change the docs you want to run. e.g. to run `foobar` i would do something like `for (var key in {foobar:''}) {`.
# meta
* code: `git clone git://github.com/dscape/clarinet.git`
* home: <http://github.com/dscape/clarinet>
* bugs: <http://github.com/dscape/clarinet/issues>
* build: [](http://travis-ci.org/dscape/clarinet)
`(oO)--',-` in [caos]
[npm]: http://npmjs.org
[issues]: http://github.com/dscape/clarinet/issues
[caos]: http://caos.di.uminho.pt/
[saxjs]: http://github.com/isaacs/sax-js
[yajl]: https://github.com/lloyd/yajl
[samples]: https://github.com/dscape/clarinet/tree/master/samples
[blog]: http://writings.nunojob.com/2011/12/clarinet-sax-based-evented-streaming-json-parser-in-javascript-for-the-browser-and-nodejs.html |