Weekend Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exam Dumps - Databricks Certification Questions and Answers

Question # 34

41 of 55.

A data engineer is working on the DataFrame df1 and wants the Name with the highest count to appear first (descending order by count), followed by the next highest, and so on.

The DataFrame has columns:

id | Name | count | timestamp

---------------------------------

1 | USA | 10

2 | India | 20

3 | England | 50

4 | India | 50

5 | France | 20

6 | India | 10

7 | USA | 30

8 | USA | 40

Which code fragment should the engineer use to sort the data in the Name and count columns?

Options:

A.

df1.orderBy(col("count").desc(), col("Name").asc())

B.

df1.sort("Name", "count")

C.

df1.orderBy("Name", "count")

D.

df1.orderBy(col("Name").desc(), col("count").asc())

Buy Now
Question # 35

48 of 55.

A data engineer needs to join multiple DataFrames and has written the following code:

from pyspark.sql.functions import broadcast

data1 = [(1, "A"), (2, "B")]

data2 = [(1, "X"), (2, "Y")]

data3 = [(1, "M"), (2, "N")]

df1 = spark.createDataFrame(data1, ["id", "val1"])

df2 = spark.createDataFrame(data2, ["id", "val2"])

df3 = spark.createDataFrame(data3, ["id", "val3"])

df_joined = df1.join(broadcast(df2), "id", "inner") \

.join(broadcast(df3), "id", "inner")

What will be the output of this code?

Options:

A.

The code will work correctly and perform two broadcast joins simultaneously to join df1 with df2, and then the result with df3.

B.

The code will fail because only one broadcast join can be performed at a time.

C.

The code will fail because the second join condition (df2.id == df3.id) is incorrect.

D.

The code will result in an error because broadcast() must be called before the joins, not inline.

Buy Now
Question # 36

The following code fragment results in an error:

Which code fragment should be used instead?

A)

B)

C)

D)

Options:

Buy Now
Question # 37

24 of 55.

Which code should be used to display the schema of the Parquet file stored in the location events.parquet?

Options:

A.

spark.sql("SELECT * FROM events.parquet").show()

B.

spark.read.format("parquet").load("events.parquet").show()

C.

spark.read.parquet("events.parquet").printSchema()

D.

spark.sql("SELECT schema FROM events.parquet").show()

Buy Now
Question # 38

A data scientist of an e-commerce company is working with user data obtained from its subscriber database and has stored the data in a DataFrame df_user. Before further processing the data, the data scientist wants to create another DataFrame df_user_non_pii and store only the non-PII columns in this DataFrame. The PII columns in df_user are first_name, last_name, email, and birthdate.

Which code snippet can be used to meet this requirement?

Options:

A.

df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")

B.

df_user_non_pii = df_user.drop("first_name", "last_name", "email", "birthdate")

C.

df_user_non_pii = df_user.dropfields("first_name", "last_name", "email", "birthdate")

D.

df_user_non_pii = df_user.dropfields("first_name, last_name, email, birthdate")

Buy Now
Question # 39

An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.

The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:

model = get_translation_model(target_lang='es')

return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner, StringType())

How can the MLOps engineer change this code to reduce how many times the language model is loaded?

Options:

A.

Convert the Pandas UDF to a PySpark UDF

B.

Convert the Pandas UDF from a Series → Series UDF to a Series → Scalar UDF

C.

Run the in_spanish_inner() function in a mapInPandas() function call

D.

Convert the Pandas UDF from a Series → Series UDF to an Iterator[Series] → Iterator[Series] UDF

Buy Now
Question # 40

A Spark developer wants to improve the performance of an existing PySpark UDF that runs a hash function that is not available in the standard Spark functions library. The existing UDF code is:

import hashlib

import pyspark.sql.functions as sf

from pyspark.sql.types import StringType

def shake_256(raw):

return hashlib.shake_256(raw.encode()).hexdigest(20)

shake_256_udf = sf.udf(shake_256, StringType())

The developer wants to replace this existing UDF with a Pandas UDF to improve performance. The developer changes the definition of shake_256_udf to this:CopyEdit

shake_256_udf = sf.pandas_udf(shake_256, StringType())

However, the developer receives the error:

What should the signature of the shake_256() function be changed to in order to fix this error?

Options:

A.

def shake_256(df: pd.Series) -> str:

B.

def shake_256(df: Iterator[pd.Series]) -> Iterator[pd.Series]:

C.

def shake_256(raw: str) -> str:

D.

def shake_256(df: pd.Series) -> pd.Series:

Buy Now
Question # 41

A data engineer is running a batch processing job on a Spark cluster with the following configuration:

10 worker nodes

16 CPU cores per worker node

64 GB RAM per node

The data engineer wants to allocate four executors per node, each executor using four cores.

What is the total number of CPU cores used by the application?

Options:

A.

160

B.

64

C.

80

D.

40

Buy Now
Question # 42

An engineer has two DataFrames: df1 (small) and df2 (large). A broadcast join is used:

python

CopyEdit

from pyspark.sql.functions import broadcast

result = df2.join(broadcast(df1), on='id', how='inner')

What is the purpose of using broadcast() in this scenario?

Options:

Options:

A.

It filters the id values before performing the join.

B.

It increases the partition size for df1 and df2.

C.

It reduces the number of shuffle operations by replicating the smaller DataFrame to all nodes.

D.

It ensures that the join happens only when the id values are identical.

Buy Now
Question # 43

Which Spark configuration controls the number of tasks that can run in parallel on the executor?

Options:

Options:

A.

spark.executor.cores

B.

spark.task.maxFailures

C.

spark.driver.cores

D.

spark.executor.memory

Buy Now
Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python
Last Update: Oct 19, 2025
Questions: 136
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 pdf

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF

$25.5  $84.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Testing Engine

$28.5  $94.99
Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Engine

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 PDF + Testing Engine

$40.5  $134.99