Weekend Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Dumps - Databricks Certification Questions and Answers

Question # 14

A data engineer wants to write a Spark job that creates a new managed table. If the table already exists, the job should fail and not modify anything.

Which save mode and method should be used?

Options:

A.

saveAsTable with mode ErrorIfExists

B.

saveAsTable with mode Overwrite

C.

save with mode Ignore

D.

save with mode ErrorIfExists

Buy Now
Question # 15

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Options:

A.

Use an RDD action like reduce() to compute the maximum time

B.

Use an accumulator to record the maximum time on the driver

C.

Broadcast a variable to share the maximum time among workers

D.

Configure the Spark UI to automatically collect maximum times

Buy Now
Question # 16

A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:

Low number of Active Tasks

Many tasks complete in milliseconds

Fewer tasks than available CPUs

Which approach should be used to adjust the partitioning for optimal resource allocation?

Options:

A.

Set the number of partitions equal to the total number of CPUs in the cluster

B.

Set the number of partitions to a fixed value, such as 200

C.

Set the number of partitions equal to the number of nodes in the cluster

D.

Set the number of partitions by dividing the dataset size (1 TB) by a reasonable partition size, such as 128 MB

Buy Now
Question # 17

An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.

How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?

Options:

A.

Locate the executor logs on the Spark master node, typically under the /tmp directory.

B.

Use the command spark-submit with the —verbose flag to print the logs to the console.

C.

Use the Spark UI to select the stage and view the executor logs directly from the stages tab.

D.

Fetch the logs by running a Spark job with the spark-sql CLI tool.

Buy Now
Question # 18

A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set for spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold.

Which type of join will Adaptive Query Execution (AQE) choose in this case?

Options:

A.

A Cartesian join

B.

A shuffled hash join

C.

A broadcast nested loop join

D.

A sort-merge join

Buy Now
Question # 19

A data scientist is working with a Spark DataFrame called customerDF that contains customer information. The DataFrame has a column named email with customer email addresses. The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

A.

customerDF.select(

col("email").substr(0, 5).alias("username"),

col("email").substr(-5).alias("domain")

)

B.

customerDF.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

C.

customerDF.withColumn("username", substring_index(col("email"), "@", 1)) \

.withColumn("domain", substring_index(col("email"), "@", -1))

D.

customerDF.select(

regexp_replace(col("email"), "@", "").alias("username"),

regexp_replace(col("email"), "@", "").alias("domain")

)

Buy Now
Question # 20

A Spark application developer wants to identify which operations cause shuffling, leading to a new stage in the Spark execution plan.

Which operation results in a shuffle and a new stage?

Options:

A.

DataFrame.groupBy().agg()

B.

DataFrame.filter()

C.

DataFrame.withColumn()

D.

DataFrame.select()

Buy Now
Question # 21

A Spark engineer must select an appropriate deployment mode for the Spark jobs.

What is the benefit of using cluster mode in Apache Spark™?

Options:

A.

In cluster mode, resources are allocated from a resource manager on the cluster, enabling better performance and scalability for large jobs

B.

In cluster mode, the driver is responsible for executing all tasks locally without distributing them across the worker nodes.

C.

In cluster mode, the driver runs on the client machine, which can limit the application's ability to handle large datasets efficiently.

D.

In cluster mode, the driver program runs on one of the worker nodes, allowing the application to fully utilize the distributed resources of the cluster.

Buy Now
Question # 22

15 of 55.

A data engineer is working on a Streaming DataFrame (streaming_df) with the following streaming data:

id

name

count

timestamp

1

Delhi

20

2024-09-19T10:11

1

Delhi

50

2024-09-19T10:12

2

London

50

2024-09-19T10:15

3

Paris

30

2024-09-19T10:18

3

Paris

20

2024-09-19T10:20

4

Washington

10

2024-09-19T10:22

Which operation is supported with streaming_df?

Options:

A.

streaming_df.count()

B.

streaming_df.filter("count < 30")

C.

streaming_df.select(countDistinct("name"))

D.

streaming_df.show()

Buy Now
Question # 23

20 of 55.

What is the difference between df.cache() and df.persist() in Spark DataFrame?

Options:

A.

Both functions perform the same operation. The persist() function provides improved performance as its default storage level is DISK_ONLY.

B.

persist() — Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER), and cache() — Can be used to set different storage levels.

C.

Both cache() and persist() can be used to set the default storage level (MEMORY_AND_DISK_DESER).

D.

cache() — Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER), and persist() — Can be used to set different storage levels to persist the contents of the DataFrame.

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python
Last Update: Oct 19, 2025
Questions: 136
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF

$25.5  $84.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

$28.5  $94.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

$40.5  $134.99