Big Halloween Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certification Questions and Answers

Question # 4

A Delta Lake table representing metadata about content posts from users has the following schema:

    user_id LONG

    post_text STRING

    post_id STRING

    longitude FLOAT

    latitude FLOAT

    post_time TIMESTAMP

    date DATE

Based on the above schema, which column is a good candidate for partitioning the Delta Table?

Options:

A.

date

B.

user_id

C.

post_id

D.

post_time

Buy Now
Question # 5

The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows:

The table daily_store_sales contains all the information needed to update store_sales_summary. The schema for this table is:

store_id INT, sales_date DATE, total_sales FLOAT

If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?

Options:

A.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and overwrite the store_sales_summary table with each Update.

B.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and append new rows nightly to the store_sales_summary table.

C.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.

D.

Implement the appropriate aggregate logic as a Structured Streaming read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.

E.

Use Structured Streaming to subscribe to the change data feed for daily_store_sales and apply changes to the aggregates in the store_sales_summary table with each update.

Buy Now
Question # 6

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Options:

A.

withWatermark("event_time", "10 minutes")

B.

awaitArrival("event_time", "10 minutes")

C.

await("event_time + ‘10 minutes'")

D.

slidingWindow("event_time", "10 minutes")

E.

delayWrite("event_time", "10 minutes")

Buy Now
Question # 7

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source. That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.

Which describes how Delta Lake can help to avoid data loss of this nature in the future?

Options:

A.

The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.

B.

Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.

C.

Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.

D.

Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.

E.

Ingestine all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.

Buy Now
Question # 8

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address <> customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address <> staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

    The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Options:

A.

The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

B.

The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

C.

The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

Buy Now
Question # 9

A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create.

Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

Options:

A.

Three new jobs named "Ingest new data" will be defined in the workspace, and they will each run once daily.

B.

The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.

C.

Three new jobs named "Ingest new data" will be defined in the workspace, but no jobs will be executed.

D.

One new job named "Ingest new data" will be defined in the workspace, but it will not be executed.

E.

The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.

Buy Now
Question # 10

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.

Which situation is causing increased duration of the overall job?

Options:

A.

Task queueing resulting from improper thread pool assignment.

B.

Spill resulting from attached volume storage being too small.

C.

Network latency due to some cluster nodes being in different regions from the source data

D.

Skew caused by more data being assigned to a subset of spark-partitions.

E.

Credential validation errors while pulling data from an external system.

Buy Now
Question # 11

Which REST API call can be used to review the notebooks configured to run as tasks in a multi-task job?

Options:

A.

/jobs/runs/list

B.

/jobs/runs/get-output

C.

/jobs/runs/get

D.

/jobs/get

E.

/jobs/list

Buy Now
Question # 12

A Delta Lake table representing metadata about content from user has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE

Based on the above schema, which column is a good candidate for partitioning the Delta Table?

Options:

A.

Date

B.

Post_id

C.

User_id

D.

Post_time

Buy Now
Question # 13

Assuming that the Databricks CLI has been installed and configured correctly, which Databricks CLI command can be used to upload a custom Python Wheel to object storage mounted with the DBFS for use with a production job?

Options:

A.

configure

B.

fs

C.

jobs

D.

libraries

E.

workspace

Buy Now
Exam Name: Databricks Certified Data Engineer Professional Exam
Last Update: Oct 30, 2025
Questions: 195
Databricks-Certified-Professional-Data-Engineer pdf

Databricks-Certified-Professional-Data-Engineer PDF

$25.5  $84.99
Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$28.5  $94.99
Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$40.5  $134.99