Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Data-Engineer-Associate Exam Dumps - Amazon Web Services AWS Certified Data Engineer Questions and Answers

Question # 74

A company uses AWS Glue Apache Spark jobs to handle extract, transform, and load (ETL) workloads. The company has enabled logging and monitoring for all AWS Glue jobs. One of the AWS Glue jobs begins to fail. A data engineer investigates the error and wants to examine metrics for all individual stages within the job. How can the data engineer access the stage metrics?

Options:

A.

Examine the AWS Glue job and stage details in the Spark UI.

B.

Examine the AWS Glue job and stage metrics in Amazon CloudWatch.

C.

Examine the AWS Glue job and stage logs in AWS CloudTrail logs.

D.

Examine the AWS Glue job and stage details by using the run insights feature on the job.

Buy Now
Question # 75

A company is creating a new data pipeline to populate a data lake. A data analyst needs to prepare and standardize the data before a data engineering team can perform advanced data transformations. The data analyst needs a solution to process the data that does not require writing new code.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Use Python and Pandas in an AWS Glue Studio notebook. Ensure that the data engineers add additional transformations to complete the pipeline.

B.

Use Amazon SageMaker Canvas and SageMaker Data Wrangler to write to a new dataset. Ensure that the data engineers add additional transformations to complete the pipeline by using AWS Glue.

C.

Use AWS Glue Studio with data preparation recipe transformations. Ensure that the data engineers add additional transformations to complete the pipeline.

D.

Create a document that includes the data preparation rules. Ensure that the data engineers implement the rules in AWS Glue.

Buy Now
Question # 76

A media company wants to improve a system that recommends media content to customer based on user behavior and preferences. To improve the recommendation system, the company needs to incorporate insights from third-party datasets into the company ' s existing analytics platform.

The company wants to minimize the effort and time required to incorporate third-party datasets.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Use API calls to access and integrate third-party datasets from AWS Data Exchange.

B.

Use API calls to access and integrate third-party datasets from AWS

C.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from AWS CodeCommit repositories.

D.

Use Amazon Kinesis Data Streams to access and integrate third-party datasets from Amazon Elastic Container Registry (Amazon ECR).

Buy Now
Question # 77

A data engineer configures a large number of AWS Glue jobs that all start up around the same time. All the jobs run for less than 1 hour in the same subnet of the same VPC. All the AWS Glue jobs run on a G.1X worker type.

Some of the jobs occasionally fail with the following error: “The specified subnet does not have enough free addresses to satisfy the request.”

What is the likely root cause of the error?

Options:

A.

There are not enough IP addresses in the subnet.

B.

The G.1X worker type cannot access the subnet.

C.

AWS Glue does not have the correct IAM permissions to add additional IP addresses to the subnet.

D.

There are not enough IP addresses in the VPC.

Buy Now
Question # 78

A company uses an Amazon Redshift cluster that runs on RA3 nodes. The company wants to scale read and write capacity to meet demand. A data engineer needs to identify a solution that will turn on concurrency scaling.

Which solution will meet this requirement?

Options:

A.

Turn on concurrency scaling in workload management (WLM) for Redshift Serverless workgroups.

B.

Turn on concurrency scaling at the workload management (WLM) queue level in the Redshift cluster.

C.

Turn on concurrency scaling in the settings during the creation of and new Redshift cluster.

D.

Turn on concurrency scaling for the daily usage quota for the Redshift cluster.

Buy Now
Question # 79

A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values:

RideID | RiderID | DriverID | RideStatus | TripStartTime | TripEndTime

XA1231 | AXEF1 | BN123 | Active | 2025-02-11 | NULL

XA1232 | AXEF2 | BN124 | Completed | 2025-02-11 | 2025-02-11

The table currently contains billions of items. The table is partitioned by RideID and uses TripStartTime as the sort key. The company wants to use the data to build a personal interface to give drivers the ability to view the rides that each driver has completed, based on RideStatus. The solution must access the necessary data without scanning the entire table.

Which solution will meet these requirements?

Options:

A.

Create a local secondary index (LSI) on DriverID.

B.

Create a global secondary index (GSI) that uses RiderID as the partition key and RideStatus as the sort key.

C.

Create a global secondary index (GSI) that uses DriverID as the partition key and RideStatus as the sort key.

D.

Create a filter expression that uses RiderID and RideStatus.

Buy Now
Question # 80

A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning. The application has very low usage during weekends.

The company must ensure that the application performs consistently during peak usage times.

Which solution will meet these requirements in the MOST cost-effective way?

Options:

A.

Increase the provisioned capacity to the maximum capacity that is currently present during peak load times.

B.

Divide the table into two tables. Provision each table with half of the provisioned capacity of the original table. Spread queries evenly across both tables.

C.

Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak usage times. Schedule lower capacity during off-peak times.

D.

Change the capacity mode from provisioned to on-demand. Configure the table to scale up and scale down based on the load on the table.

Buy Now
Question # 81

A company needs a solution to store and query product data that has variable attributes. The solution must support unpredictable and high-volume queries with single-digit millisecond latency, even during sudden traffic spikes. The solution must retrieve items by a primary identifier named Product ID. The solution must allow flexible queries by secondary attributes named Category and Brand.

Which solution will meet these requirements?

Options:

A.

Use an Amazon DynamoDB table with on-demand capacity to store product data. Store products by primary key. Use global secondary indexes (GSIs) to store secondary attributes.

B.

Use Amazon Aurora with a Multi-AZ deployment to store product data. Use read replicas. Create indexes for primary and secondary attributes.

C.

Use an Amazon OpenSearch Serverless cluster with dynamic scaling to store product data. Index product data by primary and secondary attributes.

D.

Use Amazon ElastiCache (Redis OSS) and Amazon S3 to store product data. Use Amazon Athena to run flexible secondary attribute queries.

Buy Now
Question # 82

A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance.

The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet.

Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)

Options:

A.

Turn on the public access setting for the DB instance.

B.

Update the security group of the DB instance to allow only Lambda function invocations on the database port.

C.

Configure the Lambda function to run in the same subnet that the DB instance uses.

D.

Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.

E.

Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.

Buy Now
Question # 83

Files from multiple data sources arrive in an Amazon S3 bucket on a regular basis. A data engineer wants to ingest new files into Amazon Redshift in near real time when the new files arrive in the S3 bucket.

Which solution will meet these requirements?

Options:

A.

Use the query editor v2 to schedule a COPY command to load new files into Amazon Redshift.

B.

Use the zero-ETL integration between Amazon Aurora and Amazon Redshift to load new files into Amazon Redshift.

C.

Use AWS Glue job bookmarks to extract, transform, and load (ETL) load new files into Amazon Redshift.

D.

Use S3 Event Notifications to invoke an AWS Lambda function that loads new files into Amazon Redshift.

Buy Now
Exam Name: AWS Certified Data Engineer - Associate (DEA-C01)
Last Update: Mar 18, 2026
Questions: 289
Data-Engineer-Associate pdf

Data-Engineer-Associate PDF

$25.5  $84.99
Data-Engineer-Associate Engine

Data-Engineer-Associate Testing Engine

$28.5  $94.99
Data-Engineer-Associate PDF + Engine

Data-Engineer-Associate PDF + Testing Engine

$40.5  $134.99