Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Show HN: s3-lambda – Lambda functions over S3 objects; each, map, reduce, filter

$
0
0

README.md

s3-lambda

s3-lambda enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark.

At Littlstar, we use s3-lambda for all sorts of data pipelining and analytics.

Install

npm install s3-lambda --save

Quick Example

constS3Lambda=require('s3-lambda');// example optionsconstlambda=newS3Lambda({
  access_key_id:'aws-access-key',
  secret_access_key:'aws-secret-key',
  show_progress:true,
  verbose:true,
  max_retries:10,
  timeout:1000
});constbucket='my-bucket';constprefix='path/to/files/';

lambda
  .context(bucket, prefix)
  .forEach(object=> {// do something with object
  })
  .then(_=>console.log('done!'))
  .catch(console.error);

Setting Context

Before initiating a lambda expression, you must tell s3-lambda what files to operate over. You do this by calling context, which returns a promise, so you can chain it with the request. The context function takes four arguments: bucket, prefix, marker, limit, and reverse.

lambda.context(
  bucket, // the s3 bucket to use
  prefix, // the prefix of the files to use - s3-lambda will operate over every file with this prefix
  marker, // (optional, default null) start at this file/prefix
  limit,  // (optional, default Infinity) limit the # of files operated over
  reverse // (optional, default false) if true, operate over all files in reverse
) // .forEach() ... you can chain functions here

You can also provide an array of contexts like this

constctx1= {
  bucket:'my-bucket',
  prefix:'path/to/files/'// marker: 'path/to/files/somefile'
}constctx2= {
  bucket:'my-other-bucket',
  prefix:'path/to/files/'// marker: 'path/to/files/somefile'
}lambda.context([ctx1, ctx2]) // .map() ...

Lambda Functions

Perform synchronous or asynchronous functions over each file in a directory.

  • each
  • forEach
  • map
  • reduce
  • filter

each

each(fn[, isasync])

Performs fn on each S3 object in parallel. You can set the concurrency level (defaults to Infinity). If isasync is true, fn should return a Promise;

lambda
  .context(bucket, prefix)
  .concurrency(5) // operates on 5 objects at a time
  .each(object=>console.log(object))
  .then(_=>console.log('done!')
  .catch(console.error);

forEach

forEach(fn[, isasync])

Iterates over each file in a s3 directory and performs func. If isasync is true, func should return a Promise.

lambda
  .context(bucket, prefix)
  .forEach(object=> { /* do something with object */ })
  .then(_=>console.log('done!')
  .catch(console.error);

map

map(fn[, isasync])

Destructive. Maps fn over each file in an s3 directory, replacing each file with what is returned from the mapper function. If isasync is true, fn should return a Promise.

constaddSmiley=object=> object +':)';

lambda
  .context(bucket, prefix)
  .map(addSmiley)
  .then(console.log('done!'))
  .catch(console.error);

You can make this non-destructive by specifying an output directory.

constoutputBucket='my-bucket';constoutputPrefix='path/to/output/';

lambda
  .context(bucket, prefix)
  .output(outputBucket, outputPrefix)
  .map(addSmiley)
  .then(console.log('done!')
  .catch(console.error)

reduce

reduce(func[, isasync])

Reduces the objects in the working context to a single value.

// concatonates all the filesconstreducer= (previousValue, currentValue, key) => {return previousValue + currentValue
};

lambda
  .context(bucket, prefix)
  .reduce(reducer)
  .then(result=> { /* do something with result */ })
  .catch(console.error);

filter

filter(func[, isasync])

Destructive. Filters (deletes) files in s3. func should return true to keep the object, and false to delete it. If isasync is true, func returns a Promise.

// filters empty filesconstfn=object=>object.length>0;

lambda
  .context(bucket, prefix)
  .filter(fn)
  .then(_=>console.log('done!')
  .catch(console.error);

Just like in map, you can make this non-destructive by specifying an output directory.

lambda
  .context(bucket, prefix)
  .output(outputBucket, outputPrefix)
  .filter(filter)
  .then(console.log('done!'))
  .catch(console.error();

S3 Functions

Promise-based wrapper around common S3 methods.

  • list
  • keys
  • get
  • put
  • copy
  • delete

list

list(bucket, prefix[, marker])

List all keys in s3://bucket/prefix. If you use a marker, s3-lambda will start listing alphabetically from there.

lambda
  .list(bucket, prefix)
  .then(list=>console.log(list))
  .catch(console.error);

keys

keys(bucket, prefix[, marker])

Returns an array of keys for the given bucket and prefix.

lambda
  .keys(bucket, prefix)
  .then(keys=>console.log(keys))
  .catch(console.error)

get

get(bucket, key[, encoding[, transformer]])

Gets an object in s3, calling toString(encoding on objects.

lambda
  .get(bucket, key)
  .then(object=> { /* do something with object */ }
  .catch(console.error);

Optionally you can supply your own transformer function to use when retrieving objects.

constzlib=require('zlib');consttransformer=object=> {returnzlib.gunzipSync(object).toString('utf8');
}

lambda
  .get(bucket, key, null, transformer)
  .then(object=> { /* do something with object */ }
  .catch(console.error);

put

put(bucket, key, object[, encoding])

Puts an object in s3. Default encoding is utf8.

lambda
  .put(bucket, key, 'hello world!')
  .then(console.log('done!').catch(console.error);

copy

copy(bucket, key, targetBucket, targetKey)

Copies an object in s3 from s3://sourceBucket/sourceKey to s3://targetBucket/targetKey.

lambda
  .copy(sourceBucket, sourceKey, targetBucket, targetKey)
  .then(console.log('done!').catch(console.error);

delete

delete(bucket, key)

Deletes an object in s3 (s3://bucket/key).

lambda
  .delete(bucket, key)
  .then(console.log('done!').catch(console.error);

Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>