Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Question # 24

You have several different unstructured data sources, within your on-premises data center as well as in the cloud. The data is in various formats, such as Apache Parquet and CSV. You want to centralize this data in Cloud Storage. You need to set up an object sink for your data that allows you to use your own encryption keys. You want to use a GUI-based solution. What should you do?

Options:

Use Cloud Data Fusion to move files into Cloud Storage.

Use Storage Transfer Service to move files into Cloud Storage.

Use Dataflow to move files into Cloud Storage.

Use BigQuery Data Transfer Service to move files into BigQuery.

Buy Now

Question # 25

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity ‘Movie’ the property ‘actors’ and the property ‘tags’ have multiple values but the property ‘date released’ does not. A typical query would ask for all movies with actor= ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

Options:

Option A

Option B.

Option C

Option D

Buy Now

Question # 26

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query – -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

Options:

Create a separate table for each ID.

Use the LIMIT keyword to reduce the number of rows returned.

Recreate the table with a partitioning column and clustering column.

Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Buy Now

Question # 27

You are designing a data mesh on Google Cloud with multiple distinct data engineering teams building data products. The typical data curation design pattern consists of landing files in Cloud Storage, transforming raw data in Cloud Storage and BigQuery datasets. and storing the final curated data product in BigQuery datasets You need to configure Dataplex to ensure that each team can access only the assets needed to build their data products. You also need to ensure that teams can easily share the curated data product. What should you do?

Options:

1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data.

2 Provide each data engineering team access to the virtual lake.

1 Create a single Dataplex virtual lake and create a single zone to contain landing, raw. and curated data. 2 Build separate assets for each data product within the zone.

3. Assign permissions to the data engineering teams at the zone level.

1 Create a Dataplex virtual lake for each data product, and create a single zone to contain landing, raw, and curated data.

2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.

1 Create a Dataplex virtual lake for each data product, and create multiple zones for landing, raw. and curated data.

2. Provide the data engineering teams with full access to the virtual lake assigned to their data product.

Buy Now

Question # 28

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

Options:

Subsample your test dataset.

Subsample your training dataset.

Increase the number of input features to your model.

Increase the number of layers in your neural network.

Buy Now

Question # 29

You need ads data to serve Al models and historical data tor analytics longtail and outlier data points need to be identified You want to cleanse the data n near-reel time before running it through Al models What should you do?

Options:

Use BigQuery to ingest prepare and then analyze the data and then run queries to create views

Use Cloud Storage as a data warehouse shell scripts tor processing, and BigQuery to create views tor desired datasets

Use Dataflow to identity longtail and outber data points programmatically with BigQuery as a sink

Use Cloud Composer to identify longtail and outlier data points, and then output a usable dataset to BigQuery

Buy Now

Question # 30

You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level every ten seconds, and send that data to the pipeline when levels reach above 70 dBA. You need to detect the average noise level from a sensor when data is received for a duration of more than 30 minutes, but the window ends when no data has been received for 15 minutes What should you do?

Options:

Use session windows with a 30-mmute gap duration.

Use tumbling windows with a 15-mmute window and a fifteen-minute. withAllowedLateness operator.

Use session windows with a 15-minute gap duration.

Use hopping windows with a 15-mmute window, and a thirty-minute period.

Buy Now

Answer:

Explanation:

The key requirements for the windowing strategy are:

A window groups data for a specific sensor.

A window should contain data spanningat least30 minutes ("duration of more than 30 minutes" implies activity for this period).

A window for a sensorendswhen no data has been received from that sensor for 15 minutes (this is a gap).

This scenario perfectly describessession windows.

Session Windows:Session windows group elements (per key, e.g., per sensor ID) that arrive within a certain "gap duration" of each other. A new session starts if data for a key arrives after the gap duration has passed since the last data point for that key.

In this case, if data stops arriving for a sensor for 15 minutes, the current session for that sensor closes. This matches "the window ends when no data has been received for 15 minutes."

The "duration of more than 30 minutes" requirement is a condition you would applyafterthe session window closes. You'd calculate the duration of the data within the closed session window and only compute the average if that session's duration (span of event times within it) exceeds 30 minutes. Session windows themselves don't have a fixed duration; their duration is determined by data activity and the gap.

Let's analyze why other options are less suitable:

A (Hopping windows with a 15-minute window, and a thirty-minute period):Hopping windows have a fixed size and a fixed period. They create overlapping windows. This doesn't align with the dynamic nature of sessions ending based on inactivity. A 30-minute period with a 15-minute window means windows like [0:00-0:15], [0:15-0:30], [0:30-0:45]. If activity is continuous, a 30-minute activity span would be covered, but the window closing is not based on a 15-minute gap of inactivity.

B (Tumbling windows with a 15-minute window and a fifteen-minute .withAllowedLateness operator):Tumbling windows are fixed-size, non-overlapping windows. .withAllowedLateness deals with late data arriving for a window that has already passed its end time, not with defining the window based on activity gaps.

C (Session windows with a 30-minute gap duration):This would mean a session ends only if there's a 30-minute gap of inactivity. The requirement is a 15-minute gap.

Therefore, session windows with a 15-minute gap duration (Option D) correctly model the requirement for windows to close after 15 minutes of inactivity from a sensor. The subsequent filtering for sessions lasting more than 30 minutes is a downstream operation.

[Reference:, Apache Beam Programming Guide > Windowing > Windowing functions > Session windows. "Session windowing assigns elements to windows that represent sessions of activity. A session window starts when the first element arrives for a key. If another element arrives for that key within the specified gap duration, that element is included in the existing session window. If an element arrives after the gap duration, a new session window starts for that element... Session windows are useful for data that is irregularly distributed with respect to time, such as user activity data.", This directly matches the sensor data behavior: data arrives when noise is high, and a period of no data for 15 minutes should close the analysis window for that sensor., , ]

Question # 31

You have 100 GB of data stored in a BigQuery table. This data is outdated and will only be accessed one or two times a year for analytics with SQL. For backup purposes, you want to store this data to be immutable for 3 years. You want to minimize storage costs. What should you do?

Options:

1 Create a BigQuery table clone.

2. Query the clone when you need to perform analytics.

1 Create a BigQuery table snapshot.

2 Restore the snapshot when you need to perform analytics.

1. Perform a BigQuery export to a Cloud Storage bucket with archive storage class.

2 Enable versionmg on the bucket.

3. Create a BigQuery external table on the exported files.

1 Perform a BigQuery export to a Cloud Storage bucket with archive storage class.

2 Set a locked retention policy on the bucket.

3. Create a BigQuery external table on the exported files.

Buy Now

Question # 32

You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard functions to generate daily and monthly reports for all time ranges. Recently, you discovered that some queries that cover long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?

Options:

Convert all daily log tables into date-partitioned tables

Convert the sharded tables into a single partitioned table

Enable query caching so you can cache data from previous months

Create separate views to cover each month, and query from these views

Buy Now

Question # 33

You are building a teal-lime prediction engine that streams files, which may contain Pll (personal identifiable information) data, into Cloud Storage and eventually into BigQuery You want to ensure that the sensitive data is masked but still maintains referential Integrity, because names and emails are often used as join keys How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the Pll data is not accessible by unauthorized individuals?

Options:

Create a pseudonym by replacing the Pll data with cryptogenic tokens, and store the non-tokenized data in a locked-down button.

Redact all Pll data, and store a version of the unredacted data in a locked-down bucket

Scan every table in BigQuery, and mask the data it finds that has Pll

Create a pseudonym by replacing Pll data with a cryptographic format-preserving token

Buy Now

Exam Code: Professional-Data-Engineer

Exam Name: Google Professional Data Engineer Exam

Last Update: Jun 15, 2025

Questions: 376

Professional-Data-Engineer PDF

$34 ~~$84.99~~

Add to Cart

Professional-Data-Engineer Testing Engine

$38 ~~$94.99~~

Add to Cart

Professional-Data-Engineer PDF + Testing Engine

$54 ~~$134.99~~

Add to Cart

Summer Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dealsixty

certsboard certification exams

Navigation:

Professional-Data-Engineer Exam Dumps - Google Cloud Certified Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Professional-Data-Engineer PDF

Professional-Data-Engineer Testing Engine

Professional-Data-Engineer PDF + Testing Engine

Quick Links

Recently New Released Certification Exams

Site Secure