README.md
s3-lambda
s3-lambda
enables you to run lambda functions over a context of S3 objects. It has a stateless architecture with concurrency control, allowing you to process a large number of files very quickly. This is useful for quickly prototyping complex data jobs without an infrastructure like Hadoop or Spark.
At Littlstar, we use s3-lambda
for all sorts of data pipelining and analytics.
Install
npm install s3-lambda --save
Quick Example
constS3Lambda=require('s3-lambda');// example optionsconstlambda=newS3Lambda({
access_key_id:'aws-access-key',
secret_access_key:'aws-secret-key',
show_progress:true,
verbose:true,
max_retries:10,
timeout:1000
});constbucket='my-bucket';constprefix='path/to/files/';
lambda
.context(bucket, prefix)
.forEach(object=> {// do something with object
})
.then(_=>console.log('done!'))
.catch(console.error);
Setting Context
Before initiating a lambda expression, you must tell s3-lambda
what files to operate over. You do this by calling context
, which returns a promise, so you can chain it with the request. The context function takes four arguments: bucket, prefix, marker, limit, and reverse.
lambda.context(
bucket, // the s3 bucket to use
prefix, // the prefix of the files to use - s3-lambda will operate over every file with this prefix
marker, // (optional, default null) start at this file/prefix
limit, // (optional, default Infinity) limit the # of files operated over
reverse // (optional, default false) if true, operate over all files in reverse
) // .forEach() ... you can chain functions here
You can also provide an array of contexts like this
constctx1= {
bucket:'my-bucket',
prefix:'path/to/files/'// marker: 'path/to/files/somefile'
}constctx2= {
bucket:'my-other-bucket',
prefix:'path/to/files/'// marker: 'path/to/files/somefile'
}lambda.context([ctx1, ctx2]) // .map() ...
Lambda Functions
Perform synchronous or asynchronous functions over each file in a directory.
- each
- forEach
- map
- reduce
- filter
each
each(fn[, isasync])
Performs fn
on each S3 object in parallel. You can set the concurrency level (defaults to Infinity
).
If isasync
is true, fn
should return a Promise;
lambda
.context(bucket, prefix)
.concurrency(5) // operates on 5 objects at a time
.each(object=>console.log(object))
.then(_=>console.log('done!')
.catch(console.error);
forEach
forEach(fn[, isasync])
Iterates over each file in a s3 directory and performs func
. If isasync
is true, func
should return a Promise.
lambda
.context(bucket, prefix)
.forEach(object=> { /* do something with object */ })
.then(_=>console.log('done!')
.catch(console.error);
map
map(fn[, isasync])
Destructive. Maps fn
over each file in an s3 directory, replacing each file with what is returned
from the mapper function. If isasync
is true, fn
should return a Promise.
constaddSmiley=object=> object +':)';
lambda
.context(bucket, prefix)
.map(addSmiley)
.then(console.log('done!'))
.catch(console.error);
You can make this non-destructive by specifying an output
directory.
constoutputBucket='my-bucket';constoutputPrefix='path/to/output/';
lambda
.context(bucket, prefix)
.output(outputBucket, outputPrefix)
.map(addSmiley)
.then(console.log('done!')
.catch(console.error)
reduce
reduce(func[, isasync])
Reduces the objects in the working context to a single value.
// concatonates all the filesconstreducer= (previousValue, currentValue, key) => {return previousValue + currentValue
};
lambda
.context(bucket, prefix)
.reduce(reducer)
.then(result=> { /* do something with result */ })
.catch(console.error);
filter
filter(func[, isasync])
Destructive. Filters (deletes) files in s3. func
should return true
to keep the object, and false
to delete it. If isasync
is true, func
returns a Promise.
// filters empty filesconstfn=object=>object.length>0;
lambda
.context(bucket, prefix)
.filter(fn)
.then(_=>console.log('done!')
.catch(console.error);
Just like in map
, you can make this non-destructive by specifying an output
directory.
lambda
.context(bucket, prefix)
.output(outputBucket, outputPrefix)
.filter(filter)
.then(console.log('done!'))
.catch(console.error();
S3 Functions
Promise-based wrapper around common S3 methods.
- list
- keys
- get
- put
- copy
- delete
list
list(bucket, prefix[, marker])
List all keys in s3://bucket/prefix
. If you use a marker, s3-lambda
will start listing alphabetically from there.
lambda
.list(bucket, prefix)
.then(list=>console.log(list))
.catch(console.error);
keys
keys(bucket, prefix[, marker])
Returns an array of keys for the given bucket
and prefix
.
lambda
.keys(bucket, prefix)
.then(keys=>console.log(keys))
.catch(console.error)
get
get(bucket, key[, encoding[, transformer]])
Gets an object in s3, calling toString(encoding
on objects.
lambda
.get(bucket, key)
.then(object=> { /* do something with object */ }
.catch(console.error);
Optionally you can supply your own transformer function to use when retrieving objects.
constzlib=require('zlib');consttransformer=object=> {returnzlib.gunzipSync(object).toString('utf8');
}
lambda
.get(bucket, key, null, transformer)
.then(object=> { /* do something with object */ }
.catch(console.error);
put
put(bucket, key, object[, encoding])
Puts an object in s3. Default encoding is utf8
.
lambda
.put(bucket, key, 'hello world!')
.then(console.log('done!').catch(console.error);
copy
copy(bucket, key, targetBucket, targetKey)
Copies an object in s3 from s3://sourceBucket/sourceKey
to s3://targetBucket/targetKey
.
lambda
.copy(sourceBucket, sourceKey, targetBucket, targetKey)
.then(console.log('done!').catch(console.error);
delete
delete(bucket, key)
Deletes an object in s3 (s3://bucket/key
).
lambda
.delete(bucket, key)
.then(console.log('done!').catch(console.error);