Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Dumps - Databricks Certification Questions and Answers

Question # 14

A data engineer wants to write a Spark job that creates a new managed table. If the table already exists, the job should fail and not modify anything.

Which save mode and method should be used?

Options:

saveAsTable with mode ErrorIfExists

saveAsTable with mode Overwrite

save with mode Ignore

save with mode ErrorIfExists

Buy Now

Question # 15

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.

Which technique should be used?

Options:

Use an RDD action like reduce() to compute the maximum time

Use an accumulator to record the maximum time on the driver

Broadcast a variable to share the maximum time among workers

Configure the Spark UI to automatically collect maximum times

Buy Now

Question # 16

A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:

Low number of Active Tasks

Many tasks complete in milliseconds

Fewer tasks than available CPUs

Which approach should be used to adjust the partitioning for optimal resource allocation?

Options:

Set the number of partitions equal to the total number of CPUs in the cluster

Set the number of partitions to a fixed value, such as 200

Set the number of partitions equal to the number of nodes in the cluster

Set the number of partitions by dividing the dataset size (1 TB) by a reasonable partition size, such as 128 MB

Buy Now

Question # 17

An engineer notices a significant increase in the job execution time during the execution of a Spark job. After some investigation, the engineer decides to check the logs produced by the Executors.

How should the engineer retrieve the Executor logs to diagnose performance issues in the Spark application?

Options:

Locate the executor logs on the Spark master node, typically under the /tmp directory.

Use the command spark-submit with the —verbose flag to print the logs to the console.

Use the Spark UI to select the stage and view the executor logs directly from the stages tab.

Fetch the logs by running a Spark job with the spark-sql CLI tool.

Buy Now

Question # 18

A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set for spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold.

Which type of join will Adaptive Query Execution (AQE) choose in this case?

Options:

A Cartesian join

A shuffled hash join

A broadcast nested loop join

A sort-merge join

Buy Now

Question # 19

A data scientist is working with a Spark DataFrame called customerDF that contains customer information. The DataFrame has a column named email with customer email addresses. The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

customerDF.select(

col("email").substr(0, 5).alias("username"),

col("email").substr(-5).alias("domain")

)

customerDF.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

customerDF.withColumn("username", substring_index(col("email"), "@", 1)) \

.withColumn("domain", substring_index(col("email"), "@", -1))

customerDF.select(

regexp_replace(col("email"), "@", "").alias("username"),

regexp_replace(col("email"), "@", "").alias("domain")

)

Buy Now

Question # 20

A Spark application developer wants to identify which operations cause shuffling, leading to a new stage in the Spark execution plan.

Which operation results in a shuffle and a new stage?

Options:

DataFrame.groupBy().agg()

DataFrame.filter()

DataFrame.withColumn()

DataFrame.select()

Buy Now

Question # 21

A Spark engineer must select an appropriate deployment mode for the Spark jobs.

What is the benefit of using cluster mode in Apache Spark™?

Options:

In cluster mode, resources are allocated from a resource manager on the cluster, enabling better performance and scalability for large jobs

In cluster mode, the driver is responsible for executing all tasks locally without distributing them across the worker nodes.

In cluster mode, the driver runs on the client machine, which can limit the application's ability to handle large datasets efficiently.

In cluster mode, the driver program runs on one of the worker nodes, allowing the application to fully utilize the distributed resources of the cluster.

Buy Now

Question # 22

15 of 55.

A data engineer is working on a Streaming DataFrame (streaming_df) with the following streaming data:

name

count

timestamp

Delhi

2024-09-19T10:11

Delhi

2024-09-19T10:12

London

2024-09-19T10:15

Paris

2024-09-19T10:18

Paris

2024-09-19T10:20

Washington

2024-09-19T10:22

Which operation is supported with streaming_df?

Options:

streaming_df.count()

streaming_df.filter("count < 30")

streaming_df.select(countDistinct("name"))

streaming_df.show()

Buy Now

Question # 23

20 of 55.

What is the difference between df.cache() and df.persist() in Spark DataFrame?

Options:

Both functions perform the same operation. The persist() function provides improved performance as its default storage level is DISK_ONLY.

persist() — Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER), and cache() — Can be used to set different storage levels.

Both cache() and persist() can be used to set the default storage level (MEMORY_AND_DISK_DESER).

cache() — Persists the DataFrame with the default storage level (MEMORY_AND_DISK_DESER), and persist() — Can be used to set different storage levels to persist the contents of the DataFrame.

Buy Now

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python

Last Update: Dec 4, 2025

Questions: 136

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF

$25.5 ~~$84.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

$28.5 ~~$94.99~~

Add to Cart

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

$40.5 ~~$134.99~~

Add to Cart

Big Cyber Monday Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

certsboard certification exams

Navigation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Dumps - Databricks Certification Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure