Why use Papa Parse?

There's a thousand CSV libraries for Javascript. Papa is different. It's written with correctness and performance in mind. Papa is the first (and so far only) multi-threaded CSV parser that runs on web pages. It can parse files gigabytes in size without crashing the browser. It correctly handles malformed or edge-case CSV text. It can parse files on the local file system or download them over the Internet. Papa is boss.

Privacy advocates also use Papa Parse to avoid having to transmit sensitive files over the Internet. Now all the processing can be done locally on the client's computer. This is especially significant considering some organizations' policies.

As of version 4, Papa Parse is the fastest CSV parser for the browser, whereas it used to be the slowest.

Can I use Papa Parse server-side with Node.js?

There's a fork of Papa called Baby Parse which is published on npm. Some features are unavailable (like worker threads and file opening/downloading), but the core parser is functional.

Does Papa Parse have any dependencies?

No. Papa Parse has no dependencies. If jQuery is present, however, it plugs in to make it easier to select files from the DOM.

Can I put other libraries in the same file as Papa Parse?

Yes, but then don't use the Web Worker feature unless your other dependencies are battle-hardened for worker threads. A worker thread loads an entire file, not just a function, so all those dependencies would be executed in an environment without a DOM and other window features. If any of those dependencies crash (Cannot read property "defaultView" of undefined is common), the whole worker thread will crash and parsing will not succeed.

Which browsers is it compatible with?

All modern, competent browsers should support all of the features. However, as usual, use IE at your own risk. It looks like IE 10+ and Safari 6+ should support all the features. Firefox and Chrome should work with all features back to versions 3 and 4. Opera 11 and up should be fine. If you really need to use Papa in old IE or Opera, then keep the fancy features off and you may be in luck.

Can Papa Parse be loaded asynchronously (after the page loads)?

Yes. But if you want to use Web Workers, you'll need to specify the relative path to Papa Parse. To do this, set Papa.SCRIPT_PATH to the relative path of the Papa Parse file. In synchronous loading, this is automatically detected.

Is it open source? (Can I contribute something?)

Yes, please! I don't want to do this all by myself. Head over to the GitHub project page and hack away. If you're making a significant change, open an issue first so we can talk about it.

What's the deal with fast mode?

Fast mode makes Papa Parse screaming fast, but you wouldn't want to use it if there are quoted fields in your input. Fast mode is fast because it makes one major assumption: no quoted fields. If you don't specify fastMode either way, fast mode will be turned on automatically if there are no quote characters in the input. With fast mode on, 1 GB files can be parsed in about 20 seconds.

Why do non-ASCII characters look weird?

It's probably an encoding issue. The FileReader API allows you to specify an encoding, which you can do using the encoding configuration property. This property only works with local files and does not apply to strings or remote files. Also see issues #64 and #169 if you're having trouble parsing CSV files generated from Excel.


Can Papa load and parse huge files?

Yes. Parsing huge text files is facilitated by streaming, where the file is loaded a little bit at a time, parsed, and the results are sent to your step callback function, row-by-row. You can also get results chunk-by-chunk (which is usually faster) by using the chunk callback function in the same way.

How do I stream my input?

Just specify a step callback function. Results will not be available after parsing is finished, however. You have to inspect the results one row at a time.

What if I want more than 1 row at a time?

Use the chunk callback instead. It works just like step, but you get an entire chunk of the file at a time, rather than a single row. Don't try to use step and chunk together (the behavior is undefined).

What is a stream and when should I stream files?

A stream is a unique data structure which, given infinite time, gives you infinite space. So if you're short on memory (as browsers often are), use a stream.

Wait, does that mean streaming takes more time?

Yes and no. Typically, when we gain speed, we pay with space. The opposite is true, too. Streaming uses significantly less memory with large inputs, but since the reading happens in chunks and results are processed at each row instead of at the very end, yes, it can be slower.

But consider the alternative: upload the file to a remote server, open and process it there, then compress the output and have the client download the results. How long does it take you to upload a 500 MB or 1 GB file? Then consider that the server still has to open the file and read its contents, which is what the client would have done minutes ago. The server might parse it faster with natively-compiled binaries, but only if its resources are dedicated to the task and isn't already parsing files for many other users.

So unless your clients have a fiber line and you have a scalable cloud application, local parsing by streaming is nearly guaranteed to be faster.

How do I get all the results together after streaming?

You don't. Unless you assemble it manually. And really, don't do that... it defeats the purpose of using a stream. Just take the parts you need as they come through.

Does Papa use a true stream?

Papa uses HTML5's FileReader API which uses a stream. FileReader doesn't technically allow us to hook into the underlying stream, but it does let us load the file in pieces. But fortunately you don't have to worry about that; it's all taken care of for you. Just take the results one row at a time.

Can I stream files over a network or the Internet?

Yes, Papa Parse supports this. It will download a file in pieces using HTTP's standard Range header, then pass the parsed results to your step function just like a local file. However, these requests may not work cross-origin (different domain/hostname), depending on the server's configuration.

Streaming remote files also requires the Content-Range header in the server's response. Most production-ready servers support this header, but Python's SimpleHTTPServer does not. If you need a quick and easy server, Caddy will do the trick: $ caddy

Can I pause and resume parsing?

Yes, as long as you are streaming and not using a worker. Your step callback (same with the chunk callback) is passed a parser which has pause, resume, and abort functions. This is exceptionally useful when performing asynchronous actions during parsing, for example, AJAX requests. You can always abort parsing in your callback, even when using workers, but pause and resume is only available without a worker.

Multi-Threading (Workers)

What is a web worker? Why use one?

HTML5 Web Workers facilitate basic multi-threading in the browser. This means that a web page can spawn a new thread in the operating system that runs Javascript code. This is highly beneficial for long-running scripts that would otherwise lock up the web page.

How do I use a worker?

Just specify worker: true in your config. You'll also need to make a complete callback (unless you're streaming) so that you can get the results, because using a worker makes the parse function asynchronous.

Can I use a worker if I combine/concatenate my Javascript files?

Probably not. It's safest to concatenate the rest of your dependencies and include Papa Parse in a seperate file. Any library that expects to have access to the window or DOM will crash when executed in a worker thread. Only put other libraries in the same file if they are ready to be used in worker threads.

When should I use a worker?

That's up to you. The most typical reason to use a web worker is if your web page becomes unresponsive during parsing. In other words, if it freezes and you can't click things or the scrolling becomes choppy. If that happens, some browsers (like Firefox) will warn the user that a script has become unresponsive or is taking a long time (even if it's working properly). If this happens to you or some of your users, consider using a web worker, at least for the large inputs.

However, read the next answer for more info. Using workers has performance implications (both good and bad).

What are the performance implications of using a worker thread?

Using a worker will be a little slower. In Javascript, threads don't share memory. That's really annoying because sharing memory is the primary reason for multi-threading. As such, all parse results in a worker thread need to be copied to the main thread. And if you're parsing a string in a worker thread, that string also needs to be copied into the worker in the first place. (Files will be opened or downloaded by the worker itself, so the input doesn't need to be copied from the main thread in those cases.)

The process of sending data between the page and the worker thread can stall the main page for just a moment. Each thread must also wait for the data to finish sending before un-blocking.

Basically: if you don't have much time, don't use a worker. If you can afford a little extra time, use a worker. It will keep your page from appearing unresponsive and give users an overall better experience.

Can I stream and use a worker at the same time?

Yup. If the input is too large to fit in memory (or large enough to crash the browser), streaming is always the answer, even in a worker thread. Workers keep the page reactive. Streaming makes it able to fit in memory. Use both if you need to.

Can I pause/resume workers?

No. This would drastically slow down parsing, as it would require the worker to wait after every chunk for a "continue" signal from the main thread. But you can abort workers by calling .abort() on the parser that gets passed to your callback function.