Introducing Streams

In notebook:
FrontEndMasters Networking and Streams
Created at:
2017-09-23
Updated:
2017-11-08
Tags:
backend Node JS JavaScript Fundamentals

#streams in NodeJS

streams

node.js has a handy interface for shuffling data around called streams

When you need to do some IO glue.

Examples are compression, different kinds of transformations to build a data pipeline.


stream origins

Streams are an old idea. The idea of pipes.

"We should have some ways of connecting programs like garden hose--screw in another segment when it becomes necessary to massage data in another way. This is the way of IO also."

Doug McIlroy. October 11, 1964


why streams?

  • we can compose streaming abstractions
  • we can operate on data chunk by chunk

So instead of keeping large objects in memory. You work with smaller bits. For example streaming a video file. You don't have to read the entire file, just read in chunks and serve in chunks.


composition

Just like how in unix we can pipe commands together:

$ <mobydick.txt.gz gunzip | sed -r 's/\s+/\n/g' | grep -i whale | wc -l
1691

This will tell how many times whale was said in the book. In NodeJS you solve problems in a similar way.


We can pipe abstractions together with streams using .pipe():

Some pseudocode to do the same:

fs.createReadStream('mobydick.txt.gz')
  .pipe(zlib.createGunzip()) ☛ takes compressed input and outputs uncompressed data
  .pipe(replace(/\s+/g, '\n')) ☛ split on white space to create a newline
  .pipe(filter(/whale/i)) ☛ filter would create a list of whale mentions
  .pipe(linecount(console.log)) ☛ then count them

Question: so these would be sequential?

Sometimes you have to buffer up data (compression, encryption) but basically yes. Feeding through the above pipeline. Each chunk goes through each step one by one.


chunk by chunk

With streams, we can operate on data chunk by chunk, without buffering everything into memory.

This means we can write programs that operate on very large files or lazily evaluate network data as it arrives!

It also means we can have hundreds or thousands of concurrent streams without using much memory.


fs

We can read a file and stream the file contents to stdout:

var fs = require('fs')

fs.createReadStream('greetz.txt') ☛ or process.argv[2]
  .pipe(process.stdout)

This is like the UNIX cat commmand.