Month End Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Data-Engineer-Associate Exam Dumps - Amazon Web Services AWS Certified Data Engineer Questions and Answers

Question # 14

A data engineer needs to securely transfer 5 TB of data from an on-premises data center to an Amazon S3 bucket. Approximately 5% of the data changes every day. Updates to the data need to be regularly proliferated to the S3 bucket. The data includes files that are in multiple formats. The data engineer needs to automate the transfer process and must schedule the process to run periodically.

Which AWS service should the data engineer use to transfer the data in the MOST operationally efficient way?

Options:

A.

AWS DataSync

B.

AWS Glue

C.

AWS Direct Connect

D.

Amazon S3 Transfer Acceleration

Buy Now
Question # 15

A gaming company uses Amazon Kinesis Data Streams to collect clickstream data. The company uses Amazon Kinesis Data Firehose delivery streams to store the data in JSON format in Amazon S3. Data scientists at the company use Amazon Athena to query the most recent data to obtain business insights.

The company wants to reduce Athena costs but does not want to recreate the data pipeline.

Which solution will meet these requirements with the LEAST management effort?

Options:

A.

Change the Firehose output format to Apache Parquet. Provide a custom S3 object YYYYMMDD prefix expression and specify a large buffer size. For the existing data, create an AWS Glue extract, transform, and load (ETL) job. Configure the ETL job to combine small JSON files, convert the JSON files to large Parquet files, and add the YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena tab

B.

Create an Apache Spark job that combines JSON files and converts the JSON files to Apache Parquet files. Launch an Amazon EMR ephemeral cluster every day to run the Spark job to create new Parquet files in a different S3 location. Use the ALTER TABLE SET LOCATION statement to reflect the new S3 location on the existing Athena table.

C.

Create a Kinesis data stream as a delivery destination for Firehose. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to run Apache Flink on the Kinesis data stream. Use Flink to aggregate the data and save the data to Amazon S3 in Apache Parquet format with a custom S3 object YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena table.

D.

Integrate an AWS Lambda function with Firehose to convert source records to Apache Parquet and write them to Amazon S3. In parallel, run an AWS Glue extract, transform, and load (ETL) job to combine the JSON files and convert the JSON files to large Parquet files. Create a custom S3 object YYYYMMDD prefix. Use the ALTER TABLE ADD PARTITION statement to reflect the partition on the existing Athena table.

Buy Now
Question # 16

A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table.

The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time.

Which solutions will meet these requirements? (Choose two.)

Options:

A.

Create an AWS Glue partition index. Enable partition filtering.

B.

Bucket the data based on a column that the data have in common in a WHERE clause of the user query

C.

Use Athena partition projection based on the S3 bucket prefix.

D.

Transform the data that is in the S3 bucket to Apache Parquet format.

E.

Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.

Buy Now
Question # 17

A company generates reports from 30 tables in an Amazon Redshift data warehouse. The data source is an operational Amazon Aurora MySQL database that contains 100 tables. Currently, the company refreshes all data from Aurora to Redshift every hour, which causes delays in report generation.

Which combination of steps will meet these requirements with the LEAST operational overhead? (Select TWO.)

Options:

A.

Use AWS Database Migration Service (AWS DMS) to create a replication task. Select only the required tables.

B.

Create a database in Amazon Redshift that uses the integration.

C.

Create a zero-ETL integration in Amazon Aurora. Select only the required tables.

D.

Use query editor v2 in Amazon Redshift to access the data in Aurora.

E.

Create an AWS Glue job to transfer each required table. Run an AWS Glue workflow to initiate the jobs every 5 minutes.

Buy Now
Question # 18

A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts.

Which solution will meet these requirements with the LEAST operational effort?

Options:

A.

Create a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves.

B.

Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.

C.

Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves.

D.

Load the data into Amazon Redshift. Create a view for each country. Create separate 1AM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.

Buy Now
Question # 19

A company has five offices in different AWS Regions. Each office has its own human resources (HR) department that uses a unique IAM role. The company stores employee records in a data lake that is based on Amazon S3 storage.

A data engineering team needs to limit access to the records. Each HR department should be able to access records for only employees who are within the HR department's Region.

Which combination of steps should the data engineering team take to meet this requirement with the LEAST operational overhead? (Choose two.)

Options:

A.

Use data filters for each Region to register the S3 paths as data locations.

B.

Register the S3 path as an AWS Lake Formation location.

C.

Modify the IAM roles of the HR departments to add a data filter for each department's Region.

D.

Enable fine-grained access control in AWS Lake Formation. Add a data filter for each Region.

E.

Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3 access. Restrict access based on Region.

Buy Now
Question # 20

A company is using Amazon S3 to build a data lake. The company needs to replicate records from multiple source databases into Apache Parquet format.

Most of the source databases are hosted on Amazon RDS. However, one source database is an on-premises Microsoft SQL Server Enterprise instance. The company needs to implement a solution to replicate existing data from all source databases and all future changes to the target S3 data lake.

Which solution will meet these requirements MOST cost-effectively?

Options:

A.

Use one AWS Glue job to replicate existing data. Use a second AWS Glue job to replicate future changes.

B.

Use AWS Database Migration Service (AWS DMS) to replicate existing data. Use AWS Glue jobs to replicate future changes.

C.

Use AWS Database Migration Service (AWS DMS) to replicate existing data and future changes.

D.

Use AWS Glue jobs to replicate existing data. Use Amazon Kinesis Data Streams to replicate future changes.

Buy Now
Question # 21

A company needs to implement a data mesh architecture for trading, risk, and compliance teams. Each team has its own data but needs to share views. They have 1,000+ tables in 50 Glue databases. All teams use Athena and Redshift, and compliance requires full auditing and PII access control.

Options:

A.

Create views in Athena for on-demand analysis. Use the Athena views in Amazon Redshift to perform cross-domain analytics. Use AWS CloudTrail to audit data access. Use AWS Lake Formation to establish fine-grained access control.

B.

Use AWS Glue Data Catalog views. Use CloudTrail logs and Lake Formation to manage permissions.

C.

Use Lake Formation to set up cross-domain access to tables. Set up fine-grained access controls.

D.

Create materialized views and enable Amazon Redshift datashares for each domain.

Buy Now
Question # 22

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file.

The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table. The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process.

Which solution will meet these requirements?

Options:

A.

Use a provisioned Amazon EMR cluster to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.

B.

Load all the data files in parallel into Amazon Aurora. Run an AWS Glue job to load the data into Amazon Redshift.

C.

Use an AWS Glue job to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.

D.

Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.

Buy Now
Question # 23

A data engineer is using Amazon Athena to analyze sales data that is in Amazon S3. The data engineer writes a query to retrieve sales amounts for 2023 for several products from a table named sales_data. However, the query does not return results for all of the products that are in the sales_data table. The data engineer needs to troubleshoot the query to resolve the issue.

The data engineer's original query is as follows:

SELECT product_name, sum(sales_amount)

FROM sales_data

WHERE year = 2023

GROUP BY product_name

How should the data engineer modify the Athena query to meet these requirements?

Options:

A.

Replace sum(sales amount) with count(*J for the aggregation.

B.

Change WHERE year = 2023 to WHERE extractlyear FROM sales data) = 2023.

C.

Add HAVING sumfsales amount) > 0 after the GROUP BY clause.

D.

Remove the GROUP BY clause

Buy Now
Exam Name: AWS Certified Data Engineer - Associate (DEA-C01)
Last Update: Feb 1, 2026
Questions: 231
Data-Engineer-Associate pdf

Data-Engineer-Associate PDF

$25.5  $84.99
Data-Engineer-Associate Engine

Data-Engineer-Associate Testing Engine

$28.5  $94.99
Data-Engineer-Associate PDF + Engine

Data-Engineer-Associate PDF + Testing Engine

$40.5  $134.99