March Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bigdisc65

Databricks Certification Databricks-Certified-Professional-Data-Engineer Release Date

Page: 3 / 8
Question 12

Which is a key benefit of an end-to-end test?

Options:

A.

It closely simulates real world usage of your application.

B.

It pinpoint errors in the building blocks of your application.

C.

It provides testing coverage for all code paths and branches.

D.

It makes it easier to automate your test suite

Question 13

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Options:

A.

spark.sql.files.maxPartitionBytes

B.

spark.sql.autoBroadcastJoinThreshold

C.

spark.sql.files.openCostInBytes

D.

spark.sql.adaptive.coalescePartitions.minPartitionNum

E.

spark.sql.adaptive.advisoryPartitionSizeInBytes

Question 14

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company's data is stored in regional cloud storage in the United States.

The workspace administrator at the company is uncertain about where the Databricks workspace used by the contractors should be deployed.

Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

Options:

A.

Databricks runs HDFS on cloud volume storage; as such, cloud virtual machines must be deployed in the region where the data is stored.

B.

Databricks workspaces do not rely on any regional infrastructure; as such, the decision should be made based upon what is most convenient for the workspace administrator.

C.

Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.

D.

Databricks leverages user workstations as the driver during interactive development; as such, users should always use a workspace deployed in a region they are physically near.

E.

Databricks notebooks send all executable code from the user's browser to virtual machines over the open internet; whenever possible, choosing a workspace region near the end users is the most secure.

Question 15

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Options:

A.

withWatermark("event_time", "10 minutes")

B.

awaitArrival("event_time", "10 minutes")

C.

await("event_time + ‘10 minutes'")

D.

slidingWindow("event_time", "10 minutes")

E.

delayWrite("event_time", "10 minutes")

Page: 3 / 8
Exam Name: Databricks Certified Data Engineer Professional Exam
Last Update: Mar 28, 2024
Questions: 120
Databricks-Certified-Professional-Data-Engineer pdf

Databricks-Certified-Professional-Data-Engineer PDF

$28  $80
Databricks-Certified-Professional-Data-Engineer Engine

Databricks-Certified-Professional-Data-Engineer Testing Engine

$33.25  $95
Databricks-Certified-Professional-Data-Engineer PDF + Engine

Databricks-Certified-Professional-Data-Engineer PDF + Testing Engine

$45.5  $130