Weekend Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Dumps - Databricks Certification Questions and Answers

Question # 4

What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?

Options:

A.

The conversion will automatically distribute the data across worker nodes

B.

The operation will fail if the Pandas DataFrame exceeds 1000 rows

C.

Data will be lost during conversion

D.

The operation will load all data into the driver's memory, potentially causing memory overflow

Buy Now
Question # 5

54 of 55.

What is the benefit of Adaptive Query Execution (AQE)?

Options:

A.

It allows Spark to optimize the query plan before execution but does not adapt during runtime.

B.

It automatically distributes tasks across nodes in the clusters and does not perform runtime adjustments to the query plan.

C.

It optimizes query execution by parallelizing tasks and does not adjust strategies based on runtime metrics like data skew.

D.

It enables the adjustment of the query plan during runtime, handling skewed data, optimizing join strategies, and improving overall query performance.

Buy Now
Question # 6

A DataFrame df has columns name, age, and salary. The developer needs to sort the DataFrame by age in ascending order and salary in descending order.

Which code snippet meets the requirement of the developer?

Options:

A.

df.orderBy(col("age").asc(), col("salary").asc()).show()

B.

df.sort("age", "salary", ascending=[True, True]).show()

C.

df.sort("age", "salary", ascending=[False, True]).show()

D.

df.orderBy("age", "salary", ascending=[True, False]).show()

Buy Now
Question # 7

You have:

DataFrame A: 128 GB of transactions

DataFrame B: 1 GB user lookup table

Which strategy is correct for broadcasting?

Options:

A.

DataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling itself

B.

DataFrame B should be broadcasted because it is smaller and will eliminate the need for shuffling DataFrame A

C.

DataFrame A should be broadcasted because it is larger and will eliminate the need for shuffling DataFrame B

D.

DataFrame A should be broadcasted because it is smaller and will eliminate the need for shuffling itself

Buy Now
Question # 8

47 of 55.

A data engineer has written the following code to join two DataFrames df1 and df2:

df1 = spark.read.csv("sales_data.csv")

df2 = spark.read.csv("product_data.csv")

df_joined = df1.join(df2, df1.product_id == df2.product_id)

The DataFrame df1 contains ~10 GB of sales data, and df2 contains ~8 MB of product data.

Which join strategy will Spark use?

Options:

A.

Shuffle join, as the size difference between df1 and df2 is too large for a broadcast join to work efficiently.

B.

Shuffle join, because AQE is not enabled, and Spark uses a static query plan.

C.

Shuffle join because no broadcast hints were provided.

D.

Broadcast join, as df2 is smaller than the default broadcast threshold.

Buy Now
Question # 9

A data scientist at a financial services company is working with a Spark DataFrame containing transaction records. The DataFrame has millions of rows and includes columns for transaction_id, account_number, transaction_amount, and timestamp. Due to an issue with the source system, some transactions were accidentally recorded multiple times with identical information across all fields. The data scientist needs to remove rows with duplicates across all fields to ensure accurate financial reporting.

Which approach should the data scientist use to deduplicate the orders using PySpark?

Options:

A.

df = df.dropDuplicates()

B.

df = df.groupBy("transaction_id").agg(F.first("account_number"), F.first("transaction_amount"), F.first("timestamp"))

C.

df = df.filter(F.col("transaction_id").isNotNull())

D.

df = df.dropDuplicates(["transaction_amount"])

Buy Now
Question # 10

Which command overwrites an existing JSON file when writing a DataFrame?

Options:

A.

df.write.mode("overwrite").json("path/to/file")

B.

df.write.overwrite.json("path/to/file")

C.

df.write.json("path/to/file", overwrite=True)

D.

df.write.format("json").save("path/to/file", mode="overwrite")

Buy Now
Question # 11

37 of 55.

A data scientist is working with a Spark DataFrame called customerDF that contains customer information.

The DataFrame has a column named email with customer email addresses.

The data scientist needs to split this column into username and domain parts.

Which code snippet splits the email column into username and domain columns?

Options:

A.

customerDF = customerDF \

.withColumn("username", split(col("email"), "@").getItem(0)) \

.withColumn("domain", split(col("email"), "@").getItem(1))

B.

customerDF = customerDF.withColumn("username", regexp_replace(col("email"), "@", ""))

C.

customerDF = customerDF.select("email").alias("username", "domain")

D.

customerDF = customerDF.withColumn("domain", col("email").split("@")[1])

Buy Now
Question # 12

35 of 55.

A data engineer is building a Structured Streaming pipeline and wants it to recover from failures or intentional shutdowns by continuing where it left off.

How can this be achieved?

Options:

A.

By configuring the option recoveryLocation during SparkSession initialization.

B.

By configuring the option checkpointLocation during readStream.

C.

By configuring the option checkpointLocation during writeStream.

D.

By configuring the option recoveryLocation during writeStream.

Buy Now
Question # 13

16 of 55.

A data engineer is reviewing a Spark application that applies several transformations to a DataFrame but notices that the job does not start executing immediately.

Which two characteristics of Apache Spark's execution model explain this behavior? (Choose 2 answers)

Options:

A.

Transformations are executed immediately to build the lineage graph.

B.

The Spark engine optimizes the execution plan during the transformations, causing delays.

C.

Transformations are evaluated lazily.

D.

The Spark engine requires manual intervention to start executing transformations.

E.

Only actions trigger the execution of the transformation pipeline.

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python
Last Update: Oct 19, 2025
Questions: 136
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF

$25.5  $84.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

$28.5  $94.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

$40.5  $134.99