Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certification Questions and Answers

Question # 54

A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.

The churn prediction model used by the ML team is fairly stable in production. The team is only interested in making predictions on records that have changed in the past 24 hours.

Which approach would simplify the identification of these changed records?

Options:

Apply the churn model to all rows in the customer_churn_params table, but implement logic to perform an upsert into the predictions table that ignores rows where predictions have not changed.

Convert the batch job to a Structured Streaming job using the complete output mode; configure a Structured Streaming job to read from the customer_churn_params table and incrementally predict against the churn model.

Calculate the difference between the previous model predictions and the current customer_churn_params on a key identifying unique customers before making new predictions; only make predictions on those customers not in the previous predictions.

Modify the overwrite logic to include a field populated by calling spark.sql.functions.current_timestamp() as data are being written; use this field to identify records written on a particular date.

Replace the current overwrite logic with a merge statement to modify only those records that have changed; write logic to make predictions on the changed records identified by the change data feed.

Buy Now

Answer:

Explanation:

The approach that would simplify the identification of the changed records is to replace the current overwrite logic with a merge statement to modify only those records that have changed, and write logic to make predictions on the changed records identified by the change data feed. This approach leverages the Delta Lake features of merge and change data feed, which are designed to handle upserts and track row-level changes in a Delta table 1 2 . By using merge, the data engineering team can avoid overwriting the entire table every night, and only update or insert the records that have changed in the source data. By using change data feed, the ML team can easily access the change events that have occurred in the customer_churn_params table, and filter them by operation type (update or insert) and timestamp. This way, they can only make predictions on the records that have changed in the past 24 hours, and avoid re-processing the unchanged records.

The other options are not as simple or efficient as the proposed approach, because:

Option A would require applying the churn model to all rows in the customer_churn_params table, which would be wasteful and redundant. It would also require implementing logic to perform an upsert into the predictions table, which would be more complex than using the merge statement.

Option B would require converting the batch job to a Structured Streaming job, which would involve changing the data ingestion and processing logic. It would also require using the complete output mode, which would output the entire result table every time there is a change in the source data, which would be inefficient and costly.

Option C would require calculating the difference between the previous model predictions and the current customer_churn_params on a key identifying unique customers, which would be computationally expensive and prone to errors. It would also require storing and accessing the previous predictions, which would add extra storage and I/O costs.

Option D would require modifying the overwrite logic to include a field populated by calling spark.sql.functions.current_timestamp() as data are being written, which would add extra complexity and overhead to the data engineering job. It would also require using this field to identify records written on a particular date, which would be less accurate and reliable than using the change data feed.

[References: Merge, Change data feed, ]

Question # 55

A data engineer deploys a multi-task Databricks job that orchestrates three notebooks. One task intermittently fails with Exit Code 1 but succeeds on retry. The engineer needs to collect detailed logs for the failing attempts, including stdout/stderr and cluster lifecycle context, and share them with the platform team.

What steps the data engineer needs to follow using built-in tools?

Options:

Use the notebook interactive debugger to re-run the entire multi-task job, and capture step-through traces for the failing task.

Download worker logs directly from the Spark UI and ignore driver logs, as worker logs contain stdout/stderr for all tasks and cluster events.

Export the notebook run results to HTML; this bundle includes complete stdout, stderr, and cluster event history across all tasks.

From the job run details page, export the job ' s logs or configure log delivery; then retrieve the compute driver logs and event logs from the compute details page to correlate stdout/stderr with cluster events.

Buy Now

Question # 56

A data organization has adopted Delta Sharing to securely distribute curated datasets from a Unity Catalog-enabled workspace . The data engineering team shares large Delta tables internally via Databricks-to-Databricks and externally via Open Sharing for aggregated reports. While testing, they encounter challenges related to access control, data update visibility, and shareable object types.

What is a limitation of the Delta Sharing protocol or implementation when used with Databricks-to-Databricks or Open Sharing?

Options:

With Open Sharing, recipients cannot access Volumes, Models, or notebooks — only static Delta tables are supported.

Delta Sharing does not support Unity Catalog–enabled tables; only legacy Hive Metastore tables are shareable.

With Databricks-to-Databricks sharing, Unity Catalog recipients must re-ingest data manually using COPY INTO or REST APIs.

Delta Sharing (both Databricks-to-Databricks and Open Sharing) allows recipients to modify the source data if they have select privileges.

Buy Now

Question # 57

A Delta Lake table was created with the below query:

Realizing that the original query had a typographical error, the below code was executed:

ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store

Which result will occur after running the second command?

Options:

The table reference in the metastore is updated and no data is changed.

The table name change is recorded in the Delta transaction log.

All related files and metadata are dropped and recreated in a single ACID transaction.

The table reference in the metastore is updated and all data files are moved.

A new Delta transaction log Is created for the renamed table.

Buy Now

Question # 58

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Options:

spark.sql.files.maxPartitionBytes

spark.sql.autoBroadcastJoinThreshold

spark.sql.files.openCostInBytes

spark.sql.adaptive.coalescePartitions.minPartitionNum

spark.sql.adaptive.advisoryPartitionSizeInBytes

Buy Now

Question # 59

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.

The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

Options:

Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.

Create a new table with the required schema and new fields and use Delta Lake ' s deep clone functionality to sync up changes committed to one table to the corresponding table.

Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

Buy Now

Question # 60

Which distribution does Databricks support for installing custom Python code packages?

Options:

sbt

CRAN

CRAM

nom

Wheels

jars

Buy Now

Question # 61

A data engineering team is configuring access controls in Databricks Unity Catalog . They grant the SELECT privilege on the sales catalog to the analyst_group, expecting that members of this group will automatically have SELECT access to all current and future schemas, tables, and views within the catalog.

What describes the privilege inheritance behavior in Unity Catalog?

Options:

Granting SELECT at the catalog level applies to existing schemas and tables but not to those created in the future.

Privileges in Unity Catalog do not cascade; SELECT must be explicitly granted on each schema and table, even if granted at the catalog level.

Privileges granted at the schema level override any catalog-level privileges and prevent access unless explicitly revoked.

Granting SELECT on a catalog automatically applies SELECT to all current and future schemas, tables, and views within that catalog.

Buy Now

Question # 62

Which statement describes Delta Lake optimized writes?

Options:

A shuffle occurs prior to writing to try to group data together resulting in fewer files instead of each executor writing multiple files based on directory partitions.

Optimized writes logical partitions instead of directory partitions partition boundaries are only represented in metadata fewer small files are written.

An asynchronous job runs after the write completes to detect if files could be further compacted; yes, an OPTIMIZE job is executed toward a default of 1 GB.

Before a job cluster terminates, OPTIMIZE is executed on all tables modified during the most recent job.

Buy Now

Question # 63

A data engineering team is migrating off its legacy Hadoop platform. As part of the process, they are evaluating storage formats for performance comparison. The legacy platform uses ORC and RCFile formats. After converting a subset of data to Delta Lake , they noticed significantly better query performance. Upon investigation, they discovered that queries reading from Delta tables leveraged a Shuffle Hash Join , whereas queries on legacy formats used Sort Merge Joins . The queries reading Delta Lake data also scanned less data.

Which reason could be attributed to the difference in query performance?

Options:

Delta Lake enables data skipping and file pruning using a vectorized Parquet reader.

The queries against the Delta Lake tables were able to leverage the dynamic file pruning optimization.

Shuffle Hash Joins are always more efficient than Sort Merge Joins.

The queries against the ORC tables leveraged the dynamic data skipping optimization but not the dynamic file pruning optimization.

Buy Now

Exam Code: Databricks-Certified-Professional-Data-Engineer

Exam Name: Databricks Certified Data Engineer Professional Exam

Last Update: Jun 16, 2026

Questions: 202

Databricks-Certified-Professional-Data-Engineer PDF

$25.5 ~~$84.99~~

Add to Cart

Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$28.5 ~~$94.99~~

Add to Cart

Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Summer Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

certsboard certification exams

Navigation:

Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certification Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Certified-Professional-Data-Engineer PDF

Databricks-Certified-Professional-Data-Engineer Testing Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure