Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Question # 34

You want to migrate an Apache Spark 3 batch job from on-premises to Google Cloud. You need to minimally change the job so that the job reads from Cloud Storage and writes the result to BigQuery. Your job is optimized for Spark, where each executor has 8 vCPU and 16 GB memory, and you want to be able to choose similar settings. You want to minimize installation and management effort to run your job. What should you do?

Options:

A.

Execute the job in a new Dataproc cluster.

B.

Execute as a Dataproc Serverless job.

C.

Execute the job as part of a deployment in a new Google Kubernetes Engine cluster.

D.

Execute the job from a new Compute Engine VM.

Buy Now
Question # 35

Your company operates in three domains: airlines, hotels, and ride-hailing services. Each domain has two teams: analytics and data science, which create data assets in BigQuery with the help of a central data platform team. However, as each domain is evolving rapidly, the central data platform team is becoming a bottleneck. This is causing delays in deriving insights from data, and resulting in stale data when pipelines are not kept up to date. You need to design a data mesh architecture by using Dataplex to eliminate the bottleneck. What should you do?

Options:

A.

1. Create one lake for each team. Inside each lake, create one zone for each domain.

2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.

3. Have the central data platform team manage all zones' data assets.

B.

1 Create one lake for each team. Inside each lake, create one zone for each domain.

2. Attach each to the BigQuory datasets created by the individual teams as assets to the respective zone.

3. Direct each domain to manage their own zone's data assets.

C.

1 Create one lake for each domain. Inside each lake, create one zone for each team.

2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.

3. Direct each domain to manage their own lake's data assets.

D.

1 Create one lake for each domain. Inside each lake, create one zone for each team.

2. Attach each of the BigQuery datasets created by the individual teams as assets to the respective zone.

3. Have the central data platform team manage all lakes' data assets.

Buy Now
Question # 36

You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are:

    Decoupling producer from consumer

    Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely

    Near real-time SQL query

    Maintain at least 2 years of historical data, which will be queried with SQ

Which pipeline should you use to meet these requirements?

Options:

A.

Create an application that provides an API. Write a tool to poll the API and write data to Cloud Storage as gzipped JSON files.

B.

Create an application that writes to a Cloud SQL database to store the data. Set up periodic exports of the database to write to Cloud Storage and load into BigQuery.

C.

Create an application that publishes events to Cloud Pub/Sub, and create Spark jobs on Cloud Dataproc to convert the JSON data to Avro format, stored on HDFS on Persistent Disk.

D.

Create an application that publishes events to Cloud Pub/Sub, and create a Cloud Dataflow pipeline that transforms the JSON event payloads to Avro, writing the data to Cloud Storage and BigQuery.

Buy Now
Question # 37

Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop and Spark workloads that they cannot move to BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should they do?

Options:

A.

Store the common data in BigQuery as partitioned tables.

B.

Store the common data in BigQuery and expose authorized views.

C.

Store the common data encoded as Avro in Google Cloud Storage.

D.

Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.

Buy Now
Question # 38

Flowlogistic’s management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

Options:

A.

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

B.

Cloud Pub/Sub, Cloud Dataflow, and Local SSD

C.

Cloud Pub/Sub, Cloud SQL, and Cloud Storage

D.

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

Buy Now
Question # 39

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? Choose 2 answers.

Options:

A.

Publisher throughput quota is too small.

B.

Total outstanding messages exceed the 10-MB maximum.

C.

Error handling in the subscriber code is not handling run-time errors properly.

D.

The subscriber code cannot keep up with the messages.

E.

The subscriber code does not acknowledge the messages that it pulls.

Buy Now
Question # 40

Which TensorFlow function can you use to configure a categorical column if you don't know all of the possible values for that column?

Options:

A.

categorical_column_with_vocabulary_list

B.

categorical_column_with_hash_bucket

C.

categorical_column_with_unknown_values

D.

sparse_column_with_keys

Buy Now
Question # 41

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

Options:

A.

Dataproc Worker

B.

Dataproc Viewer

C.

Dataproc Runner

D.

Dataproc Editor

Buy Now
Question # 42

Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all thedata in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

Options:

A.

Export the data into a Google Sheet for virtualization.

B.

Create an additional table with only the necessary columns.

C.

Create a view on the table to present to the virtualization tool.

D.

Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.

Buy Now
Question # 43

Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.

Which approach should you take?

Options:

A.

Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received.

B.

Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub.

C.

Use the NOW () function in BigQuery to record the event’s time.

D.

Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Buy Now
Exam Name: Google Professional Data Engineer Exam
Last Update: Jun 15, 2025
Questions: 376
Professional-Data-Engineer pdf

Professional-Data-Engineer PDF

$34  $84.99
Professional-Data-Engineer Engine

Professional-Data-Engineer Testing Engine

$38  $94.99
Professional-Data-Engineer PDF + Engine

Professional-Data-Engineer PDF + Testing Engine

$54  $134.99