Pre-Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certification Questions and Answers

Question # 14

What is the first of a Databricks Python notebook when viewed in a text editor?

Options:

A.

%python

B.

% Databricks notebook source

C.

-- Databricks notebook source

D.

//Databricks notebook source

Buy Now
Question # 15

A junior data engineer on your team has implemented the following code block.

The view new_events contains a batch of records with the same schema as the events Delta table. The event_id field serves as a unique key for this table.

When this query is executed, what will happen with new records that have the same event_id as an existing record?

Options:

A.

They are merged.

B.

They are ignored.

C.

They are updated.

D.

They are inserted.

E.

They are deleted.

Buy Now
Question # 16

What is a method of installing a Python package scoped at the notebook level to all nodes in the currently active cluster?

Options:

A.

Use and Pip install in a notebook cell

B.

Run source env/bin/activate in a notebook setup script

C.

Install libraries from PyPi using the cluster UI

D.

Use and sh install in a notebook cell

Buy Now
Question # 17

A data engineer is performing a join operating to combine values from a static userlookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

Options:

A.

userLookup.join(streamingDF, [ " userid " ], how= " inner " )

B.

streamingDF.join(userLookup, [ " user_id " ], how= " outer " )

C.

streamingDF.join(userLookup, [ " user_id”], how= " left " )

D.

streamingDF.join(userLookup, [ " userid " ], how= " inner " )

E.

userLookup.join(streamingDF, [ " user_id " ], how= " right " )

Buy Now
Question # 18

A data engineer is designing a pipeline in Databricks that processes records from a Kafka stream where late-arriving data is common.

Which approach should the data engineer use?

Options:

A.

Implement a custom solution using Databricks Jobs to periodically reprocess all historical data.

B.

Use batch processing and overwrite the entire output table each time to ensure late data is incorporated correctly.

C.

Use an Auto CDC pipeline with batch tables to simplify late data handling.

D.

Use a watermark to specify the allowed lateness to accommodate records that arrive after their expected window, ensuring correct aggregation and state management.

Buy Now
Question # 19

A data engineer is attempting to execute the following PySpark code:

df = spark.read.table( " sales " )

result = df.groupBy( " region " ).agg(sum( " revenue " ))

However, upon inspecting the execution plan and profiling the Spark job, they observe excessive data shuffling during the aggregation phase.

Which technique should be applied to reduce shuffling during the groupBy aggregation operation?

Options:

A.

Caching the DataFrame df.

B.

Repartition by region before aggregation.

C.

Use coalesce() after the aggregation.

D.

Use broadcast join.

Buy Now
Question # 20

A table is registered with the following code:

Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders ?

Options:

A.

All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query finishes.

B.

All logic will execute when the table is defined and store the result of joining tables to the DBFS; this stored data will be returned when the table is queried.

C.

Results will be computed and cached when the table is defined; these cached results will incrementally update as new records are inserted into source tables.

D.

All logic will execute at query time and return the result of joining the valid versions of the source tables at the time the query began.

E.

The versions of each source table will be stored in the table transaction log; query results will be saved to DBFS with each query.

Buy Now
Question # 21

A data engineer manages a production Lakeflow Declarative Pipeline that processes customer transaction data. The pipeline includes several data quality expectations such as transaction_amount > 0 and customer_id IS NOT NULL. These expectations are defined using the EXPECT clause in SQL.

The engineer aims to monitor the pipeline’s data quality by analyzing the number of records that passed or failed each expectation during the latest pipeline update. The Lakeflow Declarative Pipelines event logs are stored in a Delta table named event_log_table.

For the most recent pipeline update, determine a programmatically appropriate approach to extract information like the name of each expectation, associated dataset, count of records that passed the expectation, and count of records that failed the expectation.

Which method retrieves the desired data quality metrics from the Lakeflow Declarative Pipelines event log?

Options:

A.

Access the event_log_table, filter for events where event_type = ' flow_progress ' , and parse details.flow_progress.data_quality.expectations field to extract the required metrics.

B.

Use the Lakeflow Declarative Pipelines UI to navigate to the specific pipeline, select the dataset, and view the Data Quality tab to manually retrieve the expectation metrics.

C.

Query the event_log_table for events with event_type = ' data_quality ' and directly select the passed_records and failed_records fields.

D.

Access the event_log_table, filter for events where event_type = ' expectation_result ' , and extract the expectation metrics from the details field.

Buy Now
Question # 22

A data engineer wants to ingest a large collection of image files (JPEG and PNG) from cloud object storage into a Unity Catalog–managed table for analysis and visualization.

Which two configurations and practices are recommended to incrementally ingest these images into the table? (Choose 2 answers)

Options:

A.

Move files to a volume and read with SQL editor.

B.

Use Auto Loader and set cloudFiles.format to " BINARYFILE " .

C.

Use Auto Loader and set cloudFiles.format to " TEXT " .

D.

Use Auto Loader and set cloudFiles.format to " IMAGE " .

E.

Use the pathGlobFilter option to select only image files (e.g., " *.jpg,*.png " ).

Buy Now
Question # 23

Which statement describes Delta Lake Auto Compaction?

Options:

A.

An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 GB.

B.

Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job.

C.

Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.

D.

Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.

E.

An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 128 MB.

Buy Now
Exam Name: Databricks Certified Data Engineer Professional Exam
Last Update: Apr 30, 2026
Questions: 195
Databricks-Certified-Professional-Data-Engineer pdf

Databricks-Certified-Professional-Data-Engineer PDF

$25.5  $84.99
Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$28.5  $94.99
Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$40.5  $134.99