Transform Data

In notebook:
FrontEndMasters Networking and Streams
Created at:
2017-09-23
Updated:
2017-11-08
Tags:
backend Node JS JavaScript Fundamentals

fs

We can read a file and stream the file contents to stdout:

var fs = require('fs')

fs.createReadStream('greetz.txt')
  .pipe(process.stdout)

$ echo beep boop > greetz.txt
$ node greetz.js
beep boop

now let's transform the data before we print it out!


fs

You can chain .pipe() calls together just like the | operator in bash:

var fs = require('fs')

fs.createReadStream('greetz.txt')
  .pipe(...)
  .pipe(process.stdout)

Transform strings to uppercase

Let's transform strings to uppercase.

var fs = require('fs')
var through = require('through2') // ☛ uses this module

fs.createReadStream('greetz.txt')
  .pipe(through(toUpper)) // ☛ just pass through `through`
  .pipe(process.stdout)

function toUpper (buf, enc, next) { // ☛ buf is for NodeJS Buffer, binary represantation of data
  // you can just ignore enc (encoding)
  // next is what you want to do with the next piece of data ↴
  next(null, buf.toString().toUpperCase()) // ☛ null is for the error, buf.toString(..) is the next piece of data we want to send out to our stream
  // in _this_ format the output has to be a buffer or a string
}

Question: Why do you need to add through2 and not pipe directly?

  • through2 helps to transform our function to a stream. My note: reminds me of functors

All of the above works on chunks of data, that can be of any size. We cannot have assumption about the size of the data. It's up to the operating system. There are ways to manually chunk data (we'll see later).

You can use other library than through2 or even core modules, but he prefers to use this library.

Question: Is it better to use userland packages or core NodeJS modules?

  • He suggests to work with userland packages, they are nice to use. Less boilerplate.

Or he prefers to use readableStream module and not the core, because the core can change with a NodeJS update and break things (e.g. your users having different versions of NodeJS installed). Then you can explicit versioning control. In general, better to use packages and/with semantic versioning.


Reading from stdin instead of a file

stdin

Instead of reading from a file, we could read from stdin. In this case, your input can be the output of another program and not a file!

var through = require('through2');

process.stdin // ☛ just replace this
  .pipe(through(toUpper))
  .pipe(process.stdout)

function toUpper (buf, enc, next) {
  next(null, buf.toString().toUpperCase())
}

Then, he just starts up his program $ node loud.js and whatever he writes in the console is transformed to uppercase. Or $ <loud.js node loud.js pipe a file into the program.