Pre-Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certification Questions and Answers

Question # 44

A data engineer has configured their Databricks Asset Bundle with multiple targets in databricks.yml and deployed it to the production workspace. Now, to validate the deployment, they need to invoke a job named my_project_job specifically within the prod target context. Assuming the job is already deployed, they need to trigger its execution while ensuring the target-specific configuration is respected.

Which command will trigger the job execution?

Options:

A.

databricks execute my_project_job -e prod

B.

databricks job run my_project_job --env prod

C.

databricks run my_project_job -t prod

D.

databricks bundle run my_project_job -t prod

Buy Now
Question # 45

A junior developer complains that the code in their notebook isn ' t producing the correct results in the development environment. A shared screenshot reveals that while they ' re using a notebook versioned with Databricks Repos, they ' re using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown.

Which approach will allow this developer to review the current logic for this notebook?

Options:

A.

Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9

B.

Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.

C.

Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch

D.

Merge all changes back to the main branch in the remote Git repository and clone the repo again

E.

Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

Buy Now
Question # 46

The data engineering team maintains a table of aggregate statistics through batch nightly updates. This includes total sales for the previous day alongside totals and averages for a variety of time periods including the 7 previous days, year-to-date, and quarter-to-date. This table is named store_saies_summary and the schema is as follows:

The table daily_store_sales contains all the information needed to update store_sales_summary . The schema for this table is:

store_id INT, sales_date DATE, total_sales FLOAT

If daily_store_sales is implemented as a Type 1 table and the total_sales column might be adjusted after manual data auditing, which approach is the safest to generate accurate reports in the store_sales_summary table?

Options:

A.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and overwrite the store_sales_summary table with each Update.

B.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and append new rows nightly to the store_sales_summary table.

C.

Implement the appropriate aggregate logic as a batch read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.

D.

Implement the appropriate aggregate logic as a Structured Streaming read against the daily_store_sales table and use upsert logic to update results in the store_sales_summary table.

E.

Use Structured Streaming to subscribe to the change data feed for daily_store_sales and apply changes to the aggregates in the store_sales_summary table with each update.

Buy Now
Question # 47

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Events are recorded once per minute per device.

df has the following schema: device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT

Code block:

df.withWatermark( " event_time " , " 10 minutes " )

.groupBy(

________,

" device_id "

)

.agg(

avg( " temp " ).alias( " avg_temp " ),

avg( " humidity " ).alias( " avg_humidity " )

)

.writeStream

.format( " delta " )

.saveAsTable( " sensor_avg " )

Which line of code correctly fills in the blank within the code block to complete this task?

Options:

A.

window( " event_time " , " 5 minutes " ).alias( " time " )

B.

to_interval( " event_time " , " 5 minutes " ).alias( " time " )

C.

" event_time "

D.

lag( " event_time " , " 5 minutes " ).alias( " time " )

Buy Now
Question # 48

A data architect has heard about lake ' s built-in versioning and time travel capabilities. For auditing purposes they have a requirement to maintain a full of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

Options:

A.

Delta Lake time travel does not scale well in cost or latency to provide a long-term versioning solution.

B.

Delta Lake time travel cannot be used to query previous versions of these tables because Type 1 changes modify data files in place.

C.

Shallow clones can be combined with Type 1 tables to accelerate historic queries for long-term versioning.

D.

Data corruption can occur if a query fails in a partially completed state because Type 2 tables requires

Setting multiple fields in a single update.

Buy Now
Question # 49

A nightly job ingests data into a Delta Lake table using the following code:

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.

Which code snippet completes this function definition?

def new_records():

Options:

A.

return spark.readStream.table( " bronze " )

B.

return spark.readStream.load( " bronze " )

C.

D.

return spark.read.option( " readChangeFeed " , " true " ).table ( " bronze " )

E.

Buy Now
Question # 50

A data engineer is configuring Delta Sharing for a Databricks-to-Databricks scenario to optimize read performance. The recipient needs to perform time travel queries and streaming reads on shared sales data.

Which configuration will provide the optimal performance while enabling these capabilities?

Options:

A.

Share tables WITH HISTORY , ensure tables don’t have partitioning enabled, and enable CDF before sharing.

B.

Share tables WITHOUT HISTORY and enable partitioning for better query performance.

C.

Share the entire schema WITHOUT HISTORY and rely on recipient-side caching for performance.

D.

Use the open sharing protocol instead of Databricks-to-Databricks sharing for better performance.

Buy Now
Question # 51

A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on Task A.

If task A fails during a scheduled run, which statement describes the results of this run?

Options:

A.

Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.

B.

Tasks B and C will attempt to run as configured; any changes made in task A will be rolled back due to task failure.

C.

Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task A failed, all commits will be rolled back automatically.

D.

Tasks B and C will be skipped; some logic expressed in task A may have been committed before task failure.

E.

Tasks B and C will be skipped; task A will not commit any changes because of stage failure.

Buy Now
Question # 52

A data engineer needs to implement column masking for a sensitive column in a Unity Catalog-managed table. The masking logic must dynamically check if users belong to specific groups defined in a separate table (group_access) that maps groups to allowed departments.

Which approach should the engineer use to efficiently enforce this requirement?

Options:

A.

Create a UDF that hardcodes allowed groups and apply it as a column mask.

B.

Create a view without selecting the sensitive column.

C.

Apply a column mask that references the group_access mapping table in its UDF.

D.

Use a row filter to restrict access based on the user’s group.

Buy Now
Question # 53

A data engineer is tasked with ensuring that a Delta table in Databricks continuously retains deleted files for 15 days (instead of the default 7 days), in order to permanently comply with the organization’s data retention policy.

Which code snippet correctly sets this retention period for deleted files?

Options:

A.

spark.sql( " ALTER TABLE my_table SET TBLPROPERTIES ( ' delta.deletedFileRetentionDuration ' = ' interval 15 days ' ) " )

B.

from delta.tables import *

deltaTable = DeltaTable.forPath(spark, " /mnt/data/my_table " )

deltaTable.deletedFileRetentionDuration = " interval 15 days "

C.

spark.sql( " VACUUM my_table RETAIN 15 HOURS " )

D.

spark.conf.set( " spark.databricks.delta.deletedFileRetentionDuration " , " 15 days " )

Buy Now
Exam Name: Databricks Certified Data Engineer Professional Exam
Last Update: Apr 29, 2026
Questions: 195
Databricks-Certified-Professional-Data-Engineer pdf

Databricks-Certified-Professional-Data-Engineer PDF

$25.5  $84.99
Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$28.5  $94.99
Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$40.5  $134.99