Spring Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: Board70

NCP-AII Exam Dumps - NVIDIA-Certified Professional Questions and Answers

Question # 14

A cluster administrator needs to validate transceiver firmware versions across 200 ports using UFM. Which GUI-based method provides a consolidated view?

Options:

A.

Navigate to ’Devices" > select a switch > "Cables' tab to see ASIC firmware and transceiver versions.

B.

Use "Topology’ view to visually inspect cable icons.

C.

Run mlxlink -d lid- -m on each port manually.

D.

Export all switch logs and grep for ’FW Version".

Buy Now
Question # 15

During East-West fabric validation on a 64-GPU cluster, an engineer runs all_reduce_perf and observes an algorithm bandwidth of 350 GB/s and bus bandwidth of 656 GB/s. What does this indicate about the fabric performance?

Options:

A.

Inconclusive; rerun with point-to-point tests.

B.

Optimal performance; bus bandwidth near theoretical peak for NDR InfiniBand.

C.

Critical failure; bus bandwidth exceeds hardware capabilities.

D.

Suboptimal performance; algorithm bandwidth should match bus bandwidth.

Buy Now
Question # 16

A system administrator noticed a failure on a DGX H100 server. After a reboot, only the BMC is available. What could be the reason for this behavior?

Options:

A.

The network card has no link / connection.

B.

A boot disk has failed.

C.

Multiple GPUs have failed.

D.

There are more than two failed power supplies.

Buy Now
Question # 17

A 24-hour HPL burn-in fails with "illegal value" errors during the first iteration. Which initial troubleshooting step resolves this without compromising burn-in validity?

Options:

A.

Switch from FP64 to FP32 precision.

B.

Disable GPU affinity.

C.

Reduce test duration to 12 hours.

D.

Verify the matrix size is divisible by block size.

Buy Now
Question # 18

What command is needed to measure BER (Bit Error Rate)?

Options:

A.

mlxconfig -d q

B.

ethtool -S

C.

mlxlink -d -c -e

D.

mstflint -d q full

Buy Now
Question # 19

After NCCL burn-in reports "transport retry count exceeded," which corrective action addresses the underlying fabric issue?

Options:

A.

Switch from Ring to Tree algorithms via NCCL_ALGO=TREE

B.

Reduce message size to decrease network utilization

C.

Increase NCCL_IB_TIMEOUT to tolerate longer latencies

D.

Inspect InfiniBand link quality metrics (BER, symbol errors) and replace faulty cables

Buy Now
Question # 20

For a 48-hour NCCL burn-in test, which parameters ensure sustained fabric stress while detecting silent data corruption?

Options:

A.

broadcast_perf -b 4G -e 16G -w 160

B.

all_reduce_perf -b 8G -e 32G -c 1000 -z 1 -G 1000

C.

all_reduce_perf -b 8G -e 32G -z 1 -G 1000

D.

reduce_scatter_perf -f 2 -g 8

Buy Now
Question # 21

An administrator needs to verify HA functionality after configuring BCM (Bright Cluster Manager). Which command confirms the active head node and failover readiness?

Options:

A.

cmsh status to check HA status and active/standby roles.

B.

nvsm show health to validate GPU status on both head nodes.

C.

systemctl restart cmdaemon to force a failover test.

D.

ping to test basic connectivity.

Buy Now
Question # 22

After upgrading to HPL-AI 2.0 on a DGX A100 cluster, a 2x performance gain is observed. Which optimization is primarily responsible for this improvement?

Options:

A.

Reduction of problem size (N) to accelerate computation.

B.

MPI-aware GPU communication that reduces CPU bottlenecks and GPU idle time.

C.

Doubling of GPU clock speeds through firmware updates and relevant configuration.

D.

Automatic NVLink bandwidth doubling via driver updates.

Buy Now
Question # 23

An administrator is configuring node categories in BCM for a DGX BasePOD cluster. They need to group all NVIDIA DGX H200 nodes under a dedicated category for GPU-accelerated workloads. Which approach aligns with NVIDIA's recommended BCM practices?

Options:

A.

Assign nodes to the ’login" category to simplify Slurm integration.

B.

Create a new "dgx-h200" category, assign all DGX H200 nodes to it.

C.

Use the existing "dgxnodes" category without modification, as it is preconfigured for all DGX systems.

D.

Avoid categories and configure each DGX node individually via CLI.

Buy Now
Exam Code: NCP-AII
Exam Name: NVIDIA AI Infrastructure
Last Update: Mar 1, 2026
Questions: 71
NCP-AII pdf

NCP-AII PDF

$25.5  $84.99
NCP-AII Engine

NCP-AII Testing Engine

$28.5  $94.99
NCP-AII PDF + Engine

NCP-AII PDF + Testing Engine

$40.5  $134.99