HTML5's FileReader API lets you read files from disk. The user must choose the file(s), but then you have permission to load them in text or binary format. If you've used it before, you know that it's asynchronous, kind of like XMLHttpRequest: you set up a FileReader with callback functions before you perform a read:
But did you know that there is a synchronous FileReader? No callbacks necessary:
Nice, huh! There's one catch, though: it's only available in web workers. This is because reading a file synchronously is a blocking operation and would lock up the web page. Therefore, you can only use FileReaderSync in a separate thread.
When I rewrote Papa Parse for version 3.0 which would support running in a web worker, I was excited to use this new FileReaderSync thing. One cool feature about Papa is that it can "stream" large files by loading them in pieces so that they fit into memory. Both FileReader APIs support this "chunking" mechanism, so I took advantage of it.
Finally, it was ready to test. I executed it and roughly measured its performance. My first test was in the main thread using FileReader (asynchronous):
Then I tried it in the worker thread using FileReaderSync. Like its asynchronous sister code, this routine was periodically sending updates to the console so I could keep an eye on it. Rather abruptly, updates stopped coming. I had to resort to system-level monitors to ensure my thread hadn't locked up or died. It was still there, but its CPU usage dropped significantly and updates slowed to a crawl:
It took nearly 20x longer to process the file in a dedicated worker thread using a synchronous file reader! What!?
Isn't it odd that its speed sharply, steadily declines at exactly 30 seconds?
Here's the two graphs on the same plane:
I asked the folks on Stack Overflow and Google+, and guess where I got the answer? Google+, believe it or not. Tyler Ault suggested trying the regular, asynchronous FileReader in the worker thread.
I did and it worked.
Performance in the worker then was comparable to the main thread. No slowdowns. Why?
I don't know for sure. A couple theories were discussed:
- The fact that it slowed at exactly 30 seconds is interesting. It's possible the browser was doing some throttling of threads that ran for that long without a pause. (Though that seems to defeat the purpose of using worker threads.)
If you have more light to share on the topic, please feel free to comment (and +mention me to be sure I get it) or tell me on Twitter @mholt6 or something.