AWS Compute Blog

Building an image searching solution with the AWS CDK

This post is written by Mohsen Damshenas, Partner Solutions Architect, Global System Integrators.

This post explains a fully serverless solution for searching images based on their content. This solution enables images uploading to Amazon S3 and image analysis with Amazon Rekognition. It supports database access with the Amazon Aurora Serverless Data API and event management using Amazon EventBridge. The solution is deployed with the AWS Cloud Development Kit (AWS CDK).

Overview

The example application is decoupled and stateless, following serverless architecture best practices:

Reference architecture

In this architecture:

  1. S3 events trigger the AWS Lambda function when S3 events occur. The upload process is non-blocking and it triggers the Lambda function asynchronously (step 6).
  2. Amazon Simple Queue Service (SQS) offers either standard or first in first out (FIFO) queues. Your application can send messages via an API endpoint. SQS creates an event for consumers, such as Lambda, to process the message (step 8).
  3. EventBridge: As one of the key components in event-driven architecture, designed to decouple components using a publish-subscribe pattern. The integration with AWS services makes it ideal for event driven applications using an event bus (steps 10 and 11).
  4. Amazon Aurora Serverless database: a serverless database with a Data API, which allows applications to run queries and mutations via an API call (step 12).

The end user authenticates with Amazon Cognito and is redirected to a landing page served by Amazon API Gateway. This loads an HTML page from a Lambda function. The user can then upload images using S3’s presigned URL feature.

Upload image UI

The new object uploaded to the S3 bucket triggers a Lambda function. This processes the image and adds the metadata to an SQS queue as a message. The new message triggers another Lambda function, which analyzes the contents of the image using Amazon Rekognition. The identified objects in the image are published to EventBridge. All consumers that are subscribed to this type of event receive a copy of the event and its content.

The third Lambda function is subscribed to this event type and receives the event with image metadata and labels. Finally, the function stores all the information in Amazon Aurora Serverless via the Data API feature.

When users search for images, the request is authenticated with Amazon Cognito. Then Amazon API Gateway forwards the request data to a Lambda function. This calls the Amazon Aurora Serverless Data API to retrieve the data and return it to the user.

Authentication reference architecture

Deploying the solution with the AWS CDK

The AWS CDK is an open-source framework for defining and provisioning cloud application resources. It uses common programming languages such as JavaScript, C#, and Python. The AWS CDK command line interface (CLI) allows you to interact with CDK applications. It provides features like synthesizing AWS CloudFormation templates, confirming the security changes, and deploying applications.

This section shows how to prepare the environment for running CDK and the sample code. For this walkthrough, you must have the following prerequisites:

To install the example application:

  1. Install the CDK and test the CDK CLI:
    npm install -g aws-cdk && cdk --version
  2. Download the code from the GitHub repo:
    git clone https://github.com/aws-samples/aws-cdk-examples.git
    cd aws-cdk-examples/python/image-content-search/
  3. Install the dependencies using the Python package manager:
    pip install -r requirements.txt
  4. Deploy the example code with the CDK CLI:
    cdk deploy

After the deployment is completed, the terminal shows that the Amazon Cognito hosted UI login address in the output section. You use this to sign up and start working with the UI.

Deployment output

Understanding the AWS CDK stack

This section reviews the CDK stack by service.

Services construct library

To construct CDK definitions of services in the stack, first import their libraries. This example uses Amazon Cognito, API Gateway, S3, SQS, Lambda, Amazon RDS, AWS Secrets Manager, EventBridge, and AWS Identity and Access Management (AWS IAM).

Add their definitions and import their respective libraries:

from aws_cdk import (
    aws_s3_notifications as _s3notification,
    aws_lambda_event_sources as _lambda_event_source,
    aws_s3 as _s3,
    aws_cognito as _cognito,
    aws_sqs as _sqs,
    aws_iam as _iam,
    aws_events as _events,
    aws_events_targets as _event_targets,
    aws_rds as _rds,
    aws_secretsmanager as _secrets_manager,
    custom_resources as _custom_resources,
    core
)

from aws_cdk.aws_apigateway import (
    RestApi, 
    LambdaIntegration,
    CfnAuthorizer,
    AuthorizationType,
    MockIntegration,
    PassthroughBehavior
)

from aws_cdk.aws_lambda import (
    Code, 
    Function,
    Runtime
)

Amazon Simple Queue Service

The following adds an SQS resource to handle messages and trigger a Lambda function. If the function does not succeed three times, it moves the messages to the dead-letter queue:

image_deadletter_queue = _sqs.Queue(self, "ICS_IMAGES_DEADLETTER_QUEUE")
image_queue = _sqs.Queue(self, "ICS_IMAGES_QUEUE",
    dead_letter_queue={
        "max_receive_count": 3,
        "queue": image_deadletter_queue
    })

SQS helps to make the architecture resilient by enabling retrying functionality and storing failed messages.

Amazon Cognito

Amazon Cognito provides access control and stores user data in user pools. The application uses the hosted UI for a built-in sign-in page. The application defines the following additional components besides the user pool:

  1. User pool app client, which allows access to the information via an app. It also defines the allowed authentication methods and configures the callback URL to redirect the user after successful authentication.
  2. User pool domain, which is used for the built-in sign-in/sign-up page.
required_attribute = _cognito.StandardAttribute(required=True)

users_pool = _cognito.UserPool(self, "ICS_USERS_POOL",
    auto_verify=_cognito.AutoVerifiedAttrs(email=True),
    standard_attributes=_cognito.StandardAttributes(email=required_attribute), 
    self_sign_up_enabled=True)

user_pool_app_client = _cognito.CfnUserPoolClient(self, "ICS_USERS_POOL_APP_CLIENT", 
    supported_identity_providers=["COGNITO"],
    allowed_o_auth_flows=["implicit"],
    allowed_o_auth_scopes=["phone", "email", "openid", "profile"],
    user_pool_id=users_pool.user_pool_id,
    callback_ur_ls=[api_gateway_landing_page_resource.url],
    allowed_o_auth_flows_user_pool_client=True,
    explicit_auth_flows=["ALLOW_REFRESH_TOKEN_AUTH"])

user_pool_domain = _cognito.UserPoolDomain(self, "ICS_USERS_POOL_DOMAIN", 
    user_pool=users_pool, 
    cognito_domain=_cognito.CognitoDomainOptions(domain_prefix="image-content-search"))

Amazon Aurora Serverless

Amazon Aurora Serverless offers a managed relational database service. This simplifies the workload by letting AWS manage the underlying infrastructure, operating system, and dependencies. Amazon Aurora Serverless also offers a feature called Data API, which enables you to use the database without managing connection pools. This feature allows the client to make API calls for queries or mutations.

Before instantiating the RDS resource, you must define the credentials for accessing the database. To keep the credentials secure, this uses AWS Secrets Manager to create a secret and (optionally) rotate it automatically. This secret holds the defined user name and a generated password:

database_secret = _secrets_manager.Secret(self, "ICS_DATABASE_SECRET",
    secret_name="rds-db-credentials/image-content-search-rds-secret",
    generate_secret_string=_secrets_manager.SecretStringGenerator(
        generate_string_key='password',
        secret_string_template='{"username": "dba"}',
        exclude_punctuation=True,
        exclude_characters='/@\" \\\'',
        require_each_included_type=True
    )
)

Next, to define the database instance, the engine is set to “Aurora MySQL” and the engine mode is “Serverless”. The parameter “enable_http_endpoint” enables the Data API. For Amazon Aurora Serverless, set the scaling configuration to pause the database if there are no requests for a specific period, together with the minimum and maximum capacity:

database = _rds.CfnDBCluster(self, "ICS_DATABASE",
    engine=_rds.DatabaseClusterEngine.aurora_mysql(
        version=_rds.AuroraMysqlEngineVersion.VER_5_7_12).engine_type,
    engine_mode="serverless",
    database_name="images_labels",
    enable_http_endpoint=True,
    deletion_protection=False,
master_username=database_secret.secret_value_from_json("username").to_string(),
    master_user_password=database_secret.secret_value_from_json("password").to_string(),
    scaling_configuration=_rds.CfnDBCluster.ScalingConfigurationProperty(
        auto_pause=True,
        min_capacity=2,
        max_capacity=8,
        seconds_until_auto_pause=1800
    ),
)

This code attaches the secret to the RDS cluster:

database_cluster_arn = "arn:aws:rds:{}:{}:cluster:{}".format(core.Aws.REGION, core.Aws.ACCOUNT_ID, database.ref)

secret_target = _secrets_manager.CfnSecretTargetAttachment(self,"ICS_DATABASE_SECRET_TARGET",
    target_type="AWS::RDS::DBCluster",
    target_id=database.ref,
    secret_id=database_secret.secret_arn
)

secret_target.node.add_dependency(database)

Amazon EventBridge

EventBridge is a serverless event bus that enables event-driven applications to communicate with event-sources from within applications or external sources. To add EventBridge to the stack, create an event rule and add a target. Adding the Lambda function target to the EventBridge rule makes the function a subscriber. Finally, grant permission to allow the publishing function to put events on EventBridge bus:

event_bus = _events.EventBus(self, "ICS_IMAGE_CONTENT_BUS")

event_rule = _events.Rule(self, "ICS_IMAGE_CONTENT_RULE",
    rule_name="ICS_IMAGE_CONTENT_RULE",
    description="The event from image analyzer to store the data",
    event_bus=event_bus,
    event_pattern=_events.EventPattern(resources=[image_analyzer_function.function_arn]),
)

event_rule.add_target(_event_targets.LambdaFunction(image_data_function))

event_bus.grant_put_events(image_analyzer_function)
image_analyzer_function.add_environment("EVENT_BUS", event_bus.event_bus_name)

A major advantage of using EventBridge is to decouple services. This means you can update services and their logic without impacting the rest of the application. In this application, EventBridge decouples database actions from other services. For example, if you migrate to PostgreSQL from MySQL, you update the code in the service related to the database service. The change is contained within that one function.

Amazon API Gateway

API Gateway manages the application’s API endpoints. It helps to manage APIs by integrating with services like Amazon CloudWatch for monitoring, Amazon Cognito for access control, or forwarding the request to a Lambda function for processing:

api_gateway = RestApi(self, 'ICS_API_GATEWAY', rest_api_name='ImageContentSearchApiGateway')
api_gateway_resource = api_gateway.root.add_resource("ImageContentSearch")
api_gateway_get_signedurl_resource = api_gateway_resource.add_resource('signedUrl')

AWS Lambda

Lambda is a serverless computing service that runs code in response to events. This event could be an API request from API Gateway, or from S3 containing the metadata of an object, or from EventBridge.

This example shows the Lambda function that handles requests for getting an S3 presigned URL. Presigned URLs provide a secure way to access objects in an S3 bucket. The URL is only valid for a specific duration and only allows performing specific actions:

images_S3_bucket = _s3.Bucket(self, "ICS_IMAGES")

get_signedurl_function = Function(self, "ICS_GET_SIGNED_URL",
    function_name="ICS_GET_SIGNED_URL",
    environment={
        "ICS_IMAGES_BUCKET": images_S3_bucket.bucket_name,
        "DEFAULT_SIGNEDURL_EXPIRY_SECONDS": 3600
    },
    runtime=Runtime.PYTHON_3_7,
    handler="main.handler",
    code=Code.asset("./src/getSignedUrl"))

images_S3_bucket.grant_put(get_signedurl_function, objects_key_pattern="new/*")

Next, include integration with API Gateway as an event source, which triggers this function:

get_signedurl_integration = LambdaIntegration(
    get_signedurl_function, 
    proxy=True, 
    integration_responses=[{
        'statusCode': '200',
        'responseParameters': {
            'method.response.header.Access-Control-Allow-Origin': "'*'",
        }
    }])

Amazon Cognito integrates with API Gateway to authenticate the request:

api_gateway_get_signedurl_authorizer = CfnAuthorizer(self, "ICS_API_GATEWAY_GET_SIGNED_URL_AUTHORIZER",
    rest_api_id=api_gateway_get_signedurl_resource.rest_api.rest_api_id,
    name="ICS_API_GATEWAY_GET_SIGNED_URL_AUTHORIZER",
    type="COGNITO_USER_POOLS",
    identity_source="method.request.header.Authorization",
    provider_arns=[users_pool.user_pool_arn])

api_gateway_get_signedurl_resource.add_method('GET', get_signedurl_integration,
    authorization_type=AuthorizationType.COGNITO,
    method_responses=[{
        'statusCode': '200',
        'responseParameters': {
            'method.response.header.Access-Control-Allow-Origin': True,
        }
    }]
    ).node.find_child('Resource').add_property_override('AuthorizerId', api_gateway_get_signedurl_authorizer.ref)

Cleaning up

Once you have completed the deployment and tested the application, clean up the environment to avoid incurring extra cost. This command removes all resources in this stack provisioned by the CDK:

cdk destroy

You must delete IAM users and roles you have manually created. Also, you must delete any S3 buckets in the stack manually if they contain objects.

Conclusion

This post discusses a fully serverless architecture for searching images based on their contents. It shows how this architecture is decoupled and stateless by using S3 events, SQS messages, an EventBridge bus, and Amazon Aurora Serverless. I walk through deploying the solution with AWS CDK and explain the AWS CDK stack codes.

Next, use the example code to build your own solution with CDK and explore the services in this example.

For more serverless learning resources, visit Serverless Land.