Skip to main content

s3

DEPRECATED

This component is deprecated and will be removed in the next major version release. Please consider moving onto alternative components.

Downloads objects within an Amazon S3 bucket, optionally filtered by a prefix. If an SQS queue has been configured then only object keys read from the queue will be downloaded.

# Common config fields, showing default values
input:
label: ""
s3:
bucket: ""
prefix: ""
sqs_url: ""
sqs_body_path: Records.*.s3.object.key
sqs_bucket_path: ""
sqs_envelope_path: ""
region: eu-west-1

Alternatives​

This input is being replaced with the shiny new aws_s3 input, which has improved features, consider trying it out instead.

If an SQS queue is not specified the entire list of objects found when this input starts will be consumed. Note that the prefix configuration is only used when downloading objects without SQS configured.

If your bucket is configured to send events directly to an SQS queue then you need to set the sqs_body_path field to a dot path where the object key is found in the payload. However, it is also common practice to send bucket events to an SNS topic which sends enveloped events to SQS, in which case you must also set the sqs_envelope_path field to where the payload can be found.

When using SQS events it's also possible to extract target bucket names from the events by specifying a path in the field sqs_bucket_path. For each SQS event, if that path exists and contains a string it will used as the bucket of the download instead of the bucket field.

Here is a guide for setting up an SQS queue that receives events for new S3 bucket objects:

https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html

WARNING: When using SQS please make sure you have sensible values for sqs_max_messages and also the visibility timeout of the queue itself.

When Benthos consumes an S3 item as a result of receiving an SQS message the message is not deleted until the S3 item has been sent onwards. This ensures at-least-once crash resiliency, but also means that if the S3 item takes longer to process than the visibility timeout of your queue then the same items might be processed multiple times.

Credentials​

By default Benthos will use a shared credentials file when connecting to AWS services. It's also possible to set them explicitly at the component level, allowing you to transfer data across accounts. You can find out more in this document.

Metadata​

This input adds the following metadata fields to each message:

- s3_key
- s3_bucket
- s3_last_modified_unix*
- s3_last_modified (RFC3339)*
- s3_content_type*
- s3_content_encoding*
- All user defined metadata*
* Only added when NOT using download manager

You can access these metadata fields using function interpolation.

Fields​

bucket​

The bucket to consume from. If sqs_bucket_path is set this field is still required as a fallback.

Type: string
Default: ""

prefix​

An optional path prefix, if set only objects with the prefix are consumed. This field is ignored when SQS is used.

Type: string
Default: ""

sqs_url​

An optional SQS URL to connect to. When specified this queue will control which objects are downloaded from the target bucket.

Type: string
Default: ""

sqs_body_path​

A dot path whereby object keys are found in SQS messages, this field is only required when an sqs_url is specified.

Type: string
Default: "Records.*.s3.object.key"

sqs_bucket_path​

An optional dot path whereby the bucket of an object can be found in consumed SQS messages.

Type: string
Default: ""

sqs_envelope_path​

An optional dot path of enveloped payloads to extract from SQS messages. This is required when pushing events from S3 to SNS to SQS.

Type: string
Default: ""

sqs_max_messages​

The maximum number of SQS messages to consume from each request.

Type: int
Default: 10

sqs_endpoint​

A custom endpoint to use when connecting to SQS.

Type: string
Default: ""

region​

The AWS region to target.

Type: string
Default: "eu-west-1"

endpoint​

Allows you to specify a custom endpoint for the AWS API.

Type: string
Default: ""

credentials​

Optional manual configuration of AWS credentials to use. More information can be found in this document.

Type: object

credentials.profile​

A profile from ~/.aws/credentials to use.

Type: string
Default: ""

credentials.id​

The ID of credentials to use.

Type: string
Default: ""

credentials.secret​

The secret for the credentials being used.

Type: string
Default: ""

credentials.token​

The token for the credentials being used, required when using short term credentials.

Type: string
Default: ""

credentials.role​

A role ARN to assume.

Type: string
Default: ""

credentials.role_external_id​

An external ID to provide when assuming a role.

Type: string
Default: ""

retries​

The maximum number of times to attempt an object download.

Type: int
Default: 3

force_path_style_urls​

Forces the client API to use path style URLs, which helps when connecting to custom endpoints.

Type: bool
Default: false

delete_objects​

Whether to delete downloaded objects from the bucket.

Type: bool
Default: false

download_manager​

Controls if and how to use the download manager API. This can help speed up file downloads, but results in file metadata not being copied.

Type: object

download_manager.enabled​

Whether to use to download manager API.

Type: bool
Default: true

timeout​

The period of time to wait before abandoning a request and trying again.

Type: string
Default: "5s"