Use Case Development in data science projects involves identifying business needs and mapping analytics to business decisions. The standard steps include:
Option A: Understanding the key initiatives or challenges.
Option B: Identifying the key stakeholders.
Option C: Capturing the decisions stakeholders must make.
Option E: Brainstorming the questions stakeholders need answered to support decisions.
However:
Option D (Brainstorm the outcomes stakeholders need to answer): Incorrect phrasing. It is not “outcomes” that are brainstormed but questions and decisions.
Thus, the correct answer is Option D.
[Reference:, DASCA Data Scientist Knowledge Framework (DSKF) – Business Use Case Development Process., ]
Question # 25
Which of the following is NOT a correct situation to use Agile?
Options:
A.
When the final product isn’t clearly defined
B.
When clients/stakeholders need to be able to change the scope
C.
When changes need to be implemented during the entire process
Agile methodology is widely adopted in data science projects because these projects often involve uncertain goals, exploratory analysis, and changing requirements. Agile thrives in environments where iteration, collaboration, and adaptability are necessary.
Option A: True for Agile. If the final product is unclear (common in data science), Agile works well because it allows incremental discovery and iterative prototyping.
Option B: True for Agile. Agile frameworks (Scrum, Kanban) emphasize flexibility, which means the scope can evolve as stakeholders learn more from data and models.
Option C: True for Agile. Agile welcomes continuous changes through iterative sprints and feedback loops. This adaptability is crucial in machine learning model development where data insights often reshape project direction.
Since all three situations are valid for Agile, the correct answer to “Which is NOT correct?” is None of the above (Option D).
[Reference:, DASCA Data Scientist Knowledge Framework (DSKF) – Business Applications of Data Science & Agile Methodologies in Data Projects., ]
Question # 26
The grid computing environment uses a middleware to:
Grid computing is a distributed computing model where resources (CPU, memory, storage) are pooled across multiple systems to solve large-scale problems.
Option A (Divide): Middleware helps allocate or divide resources dynamically to different tasks.
Option B (Combine): Middleware integrates diverse resources into a unified system, making them accessible for parallel computing.
Option C: Correct — middleware is the “glue” that enables both combining and dividing resources seamlessly across distributed nodes.
Option D: Incorrect.
Thus, grid computing middleware both combines and divides resources, making Option C correct.
[Reference:, DASCA Data Scientist Knowledge Framework (DSKF) – Big Data Fundamentals: Grid and Distributed Computing., ]
Question # 27
Which of the following is correct?
i. LaTeX is used to publish work in a scientific journal
ii. LaTeX is a markup language that can be compiled into formatted documents
LaTeX is a high-quality typesetting system widely used in academia, particularly in scientific publishing.
Statement i: Correct. LaTeX is widely used to prepare manuscripts for scientific journals, theses, and technical reports.
Statement ii: Correct. LaTeX is a markup language (similar to HTML in concept) that compiles into formatted PDFs/documents.
Statement iii: Correct. LaTeX is a standard for publishing scientific papers due to its ability to handle complex mathematical equations, references, and formatting.
Thus, all three statements are true → Option B (i, ii, iii).
[Reference:, DASCA Data Scientist Knowledge Framework (DSKF) – Programming Tools for Data Science: LaTeX for Scientific Documentation., ]
Question # 28
Which of the following statements is correct?
Options:
A.
Apache claimed that Spark is able to run parallel jobs 100 times faster in memory and 10 times faster on disk in comparison to the traditional Hadoop MapReduce
B.
Apache claimed that Spark is able to run parallel jobs 10 times faster in memory and 100 times faster on disk in comparison to the traditional Hadoop MapReduce
C.
Apache claimed that Spark is able to run parallel jobs 1000 times faster in memory and 100 times faster on disk in comparison to the traditional Hadoop MapReduce
D.
Apache claimed that Spark is able to run parallel jobs 50 times faster in memory and 5 times faster on disk in comparison to the traditional Hadoop MapReduce
Apache Spark is a distributed computing framework designed as an improvement over Hadoop’s MapReduce. According to the official Apache Spark documentation:
Spark can run workloads up to 100x faster in memory.
Spark can run workloads up to 10x faster on disk.
This performance gain comes from Spark’s use of in-memory computation, DAG execution engine, and optimized query execution, compared to the slower, disk-heavy Hadoop MapReduce framework.
Thus, the correct statement is Option A.
[Reference:, DASCA Data Scientist Knowledge Framework (DSKF) – Big Data Ecosystem: Spark vs Hadoop Performance Comparisons., ]