Real time Data Analsyst interview questions For 0-3 YEARS EXP

Infosys DATA ANALYST INTERVIEW EXPERIENCE (0-3) YOE

 

SǪL

Ǫ1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using example.

The RANK(), DENSE_RANK(), and ROW_NUMBER() functions are all used to assign ranks to rows in a result set based on a specified order. However, they differ in how they handle duplicate values (ties). Let’s explore each with examples:

1.  RANK()

  • Assigns a unique rank to rows within a partition of the result
  • Gaps are left in the ranking if there are

Example:

SELECT Name, Score,

RANK() OVER (ORDER BY Score DESC) AS Rank

FROM Students;

Input Data:

 

Name

Score

Alice

95

Bob

95

Charlie

90

David

85

Output:

 

Name

Score

Rank

Alice

95

1

Bob

95

1

Charlie

90

3

David

85

4

Explanation:

 

  • Alice and Bob have the same score, so they share rank
  • The next rank is 3, leaving a gap after the tied

 

2.  DENSE_RANK()

  • Assigns a unique rank to rows within a partition of the result
  • No gaps in the ranking, even if there are

Example:

SELECT Name, Score,

DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank

FROM Students;

Output:

 

Name

Score

DenseRank

Alice

95

1

Bob

95

1

Charlie

90

2

David

85

3

Explanation:

  • Alice and Bob share rank
  • The next rank is 2, with no
 
  

3.  ROW_NUMBER()

  • Assigns a unique sequential number to each row within a
  • Does not handle ties—each row gets a distinct

Example:

SELECT Name, Score,

ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNumber

FROM Students;

Output:

 

Name

Score

RowNumber

Alice

95

1

Bob

95

2

Charlie

90

3

David

85

4

Explanation:

  • Each row gets a unique number regardless of
 
  

Summary of Differences:

 

Function

Handles Ties

Gaps in Ranking

RANK()

Same rank for ties

Yes

DENSE_RANK()

Same rank for ties

No

ROW_NUMBER()

No ties allowed

N/A (unique numbering)

These functions are particularly useful in scenarios like leaderboard generation, identifying top performers, or ranking products based on sales.

 

Ǫ2 – Ǫ4 use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary) Ǫ2: Find the nth highest salary from the Employee table.

To find the nth highest salary from the employee table, you can use the DENSE_RANK() or

RANK() window function. Here’s the query using DENSE_RANK():

Ǫuery to Find the nth Highest Salary:

WITH RankedSalaries AS ( SELECT Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM employee

)

SELECT Salary

FROM RankedSalaries WHERE Rank = <n>;

Explanation:

 

  1. DENSE_RANK() OVER (ORDER BY Salary DESC):
    • Ranks salaries in descending
    • Ensures no gaps in rank even if there are

2.     WHERE Rank = <n>:

  • Filters for the specific rank you are interested in (replace <n> with the desired value, g., 2 for the second highest salary).
 
  

Example:

Input Data:

 

EmpID

ManagerID

JoinDate

Dept

Salary

1

101

2022-01-15

Sales

70000

2

102

2021-05-10

HR

80000

3

103

2023-03-20

IT

70000

4

101

2020-09-25

IT

90000

Find the 2nd Highest Salary:

WITH RankedSalaries AS ( SELECT Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM employee

)

SELECT Salary

FROM RankedSalaries WHERE Rank = 2;

Output:

 

Salary

80000

Alternative Approach (Without Window Functions): You can also use a subquery to achieve the same: SELECT DISTINCT Salary

 

FROM employee e1 WHERE (

SELECT COUNT(DISTINCT Salary)

FROM employee e2

WHERE e2.Salary > e1.Salary

) = <n – 1>;

Explanation:

  • The inner query counts how many distinct salaries are greater than the current
  • If the count is n-1, the current salary is the nth highest.

Ǫ3: You have an employee table with employee ID and manager ID. Find all employees under a specific manager, including their subordinates at any level.

To find all employees under a specific manager, including subordinates at any level (hierarchical query), you can use a Recursive Common Table Expression (CTE) in SǪL.

 
  

Ǫuery:

WITH RecursiveHierarchy AS (

— Anchor member: Get the specified manager’s direct reports SELECT EmpID, ManagerID

FROM employee

WHERE ManagerID = <specific_manager_id>

 

UNION ALL

 

— Recursive member: Find subordinates of the employees in the current level SELECT e.EmpID, e.ManagerID

FROM employee e

INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID

)

SELECT EmpID

FROM RecursiveHierarchy;

 

Explanation:

  1. Anchor Member:
    • Retrieves all direct reports of the specified manager (replace

<specific_manager_id> with the manager’s ID).

2.     Recursive Member:

  • Joins the employee table with the recursive CTE (RecursiveHierarchy) to find employees whose ManagerID matches any EmpID from the current

3.     Recursive Ǫuery Execution:

  • Repeats the process until no more subordinates are

4.     Final SELECT:

  • Extracts all EmpIDs from the
 
  

Example:

Input Data:

 

EmpID

ManagerID

1

101

2

102

3

101

4

3

5

4

101

NULL

102

101

Find all employees under ManagerID = 101:

WITH RecursiveHierarchy AS ( SELECT EmpID, ManagerID FROM employee

WHERE ManagerID = 101

 

UNION ALL

 

SELECT e.EmpID, e.ManagerID FROM employee e

INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID

)

SELECT EmpID

FROM RecursiveHierarchy;

Output:

 

EmpID

1

2

3

4

5

Explanation:

  • EmpID = 1 and EmpID = 3 directly report to ManagerID =
  • EmpID = 4 reports to EmpID =
  • EmpID = 5 reports to EmpID =
 
  

For Databases Without Recursive CTE Support:

If recursive CTEs are not available (e.g., older MySǪL versions), you may need to write stored procedures or implement the hierarchy in application logic.

 

Ǫ4: Write a query to find the cumulative salary of employees department-wise, who have joined company in last 30 days.

To calculate the cumulative salary of employees department-wise who joined in the last 30 days, you can use the SUM() window function along with a filter condition to restrict the data to the last 30 days.

 
  

Ǫuery:

SELECT Dept,

 

EmpID, JoinDate, Salary,

SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate) AS CumulativeSalary FROM employee

WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY

ORDER BY Dept, JoinDate;

 
  

Explanation:

  1. WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY:
    • Filters employees who joined within the last 30

2.     SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate):

  • Calculates the cumulative sum of salaries within each department (PARTITION BY Dept), ordered by the joining date (ORDER BY JoinDate).

3.     ORDER BY Dept, JoinDate:

  • Ensures the output is sorted department-wise and by joining
 
  

Example: Input Data:

EmpID

Dept

JoinDate

Salary

1

Sales

2024-11-05

70000

2

HR

2024-11-15

80000

3

IT

2024-11-20

60000

4

Sales

2024-11-22

50000

5

IT

2024-11-25

55000

Ǫuery Result:

 

Dept

EmpID

JoinDate

Salary

CumulativeSalary

HR

2

2024-11-15

80000

80000

IT

3

2024-11-20

60000

60000

Dept

EmpID

JoinDate

Salary

CumulativeSalary

IT

5

2024-11-25

55000

115000

Sales

1

2024-11-05

70000

70000

Sales

4

2024-11-22

50000

120000

 
  

 

Notes:

  • For SǪL Server: Replace CURDATE() – INTERVAL 30 DAY with GETDATE() –
  • For Oracle: Use SYSDATE – 30 in the WHERE
  • Ensure the JoinDate column is in DATE or DATETIME

Ǫ5: Find the top 2 customers with the highest order amount for each product category, handling ties appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)

To find the top 2 customers with the highest order amount for each product category, including ties (handling ties appropriately), you can use the RANK() or DENSE_RANK() window function.

 
  

Ǫuery:

WITH RankedOrders AS ( SELECT CustomerID,

ProductCategory, OrderAmount,

RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC) AS Rank FROM customer

)

SELECT CustomerID, ProductCategory, OrderAmount

FROM RankedOrders WHERE Rank <= 2

ORDER BY ProductCategory, Rank, CustomerID;

 
  

Explanation:

  1. RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC):
    • Assigns a rank to each customer within the same product category, based on their order amount in descending
    • Handles ties by assigning the same rank to customers with equal order

2.     WHERE Rank <= 2:

  • Filters the result to include only the top 2 customers for each product

3.     ORDER BY ProductCategory, Rank, CustomerID:

  • Ensures the output is sorted by product category, rank, and customer ID for better
 
  

Example: Input Data:

CustomerID

ProductCategory

OrderAmount

1

Electronics

500

2

Electronics

300

3

Electronics

300

4

Clothing

200

5

Clothing

150

6

Clothing

150

7

Clothing

100

Ǫuery Result:

 

CustomerID

ProductCategory

OrderAmount

1

Electronics

500

2

Electronics

300

3

Electronics

300

4

Clothing

200

5

Clothing

150

CustomerID

ProductCategory

OrderAmount

6

Clothing

150

 
  

 

Key Points:

  • Ties Handling:
    • Customers 2 and 3 in “Electronics” both have the same order amount (300), so they are both
    • Similarly, Customers 5 and 6 in “Clothing” both have an order amount of 150, and both are ranked

·        If Only 2 Distinct Ranks Are Needed:

  • Replace RANK() with DENSE_RANK(), which ensures no gaps in

For instance, using DENSE_RANK() for the same input, the 2nd rank in “Electronics” would include only 300, even if there were additional ties.

 

POWER BI

Ǫ1: Explain step-by-step how will you create a sales dashboard from scratch.

 

 

Step-by-Step Process to Create a Sales Dashboard in Power BI: Step 1: Gather Requirements

  • Understand the purpose of the dashboard:
    • Who are the users (e.g., managers, sales reps)?
    • What KPIs or metrics are needed (e.g., total sales, sales by region, profit margin)?
    • Identify filters or slicers (e.g., date range, region, product category).
  • Obtain a clear understanding of the data sources and expected
 
  

Step 2: Prepare Data Sources

  • Identify data sources (e.g., Excel, SǪL Server, cloud services).
  • Ensure data contains required fields like Sales Amount, Region, Product, Date,
  • Clean the data beforehand to ensure accuracy (e.g., no duplicates or missing values).
 
  

Step 3: Load Data into Power BI

  1. Open Power BI Desktop.

2.     Import Data:

  • Click on Home > Get Data and choose the data source (e.g., Excel, SǪL, ).
  • Connect and load the dataset(s).

3.     Preview and Transform:

  • Use Power Ǫuery for data cleaning and transformations:
    • Remove nulls or
    • Format columns (e.g., dates, numbers).
    • Rename fields for clarity (e.g., Sales_Amount → Sales Amount).
 
  

Step 4: Create Relationships

  • Go to the Model View.
  • Establish relationships between tables using primary and foreign
    • Example: Connect Sales table with Products and Regions
  • Ensure relationships have correct cardinality (e.g., one-to-many).
 
  

Step 5: Design Data Model

  • Optimize the model:
    • Use measures for calculations like Total Sales, Profit Margin, or Year-to-Date Sales:

Total Sales = SUM(Sales[Sales Amount])

Profit Margin = DIVIDE(SUM(Sales[Profit]), SUM(Sales[Sales Amount]), 0)

  • Add calculated columns if
 
  

Step 6: Build the Sales Dashboard

  1. Add Visualizations:
    • Drag and drop fields onto the report
    • Use visualizations like:
      • Card Visuals: For KPIs (e.g., Total Sales, Total Profit).
      • Bar/Column Charts: For sales by region or product
  • Line Charts: For trends over
  • Pie/Donut Charts: For sales contribution by
  • Maps: For geographical analysis of

2.     Customize the Layout:

  • Arrange visuals logically (e.g., KPIs at the top, detailed charts below).
  • Use slicers for interactivity (e.g., date range, region, product filters).
 
  

Step 7: Add Interactivity

  • Enable cross-filtering between
  • Add drill-through pages for detailed analysis:
    • Example: Click on a region to drill into individual store
  • Use tooltips to display additional insights on
 
  

Step 8: Format the Dashboard

  • Apply a consistent theme (View > Themes).
  • Use data labels and legends for
  • Ensure visuals are responsive (resize automatically).
 
  

Step 9: Publish and Share

  1. Save the
  2. Publish to Power BI Service:
    • Home > Publish > Choose Workspace.
  3. Share the dashboard:
    • Set permissions for users (e.g., view-only or edit access).
    • Embed the dashboard in emails or applications if
 
  

Step 10: Maintain the Dashboard

  • Schedule data refreshes to ensure up-to-date
  • Gather feedback from users and make iterative
  • Monitor performance and optimize for large datasets if
 
  

Example Sales Dashboard:

Visuals:

  • Total Sales (Card)
  • Sales by Region (Bar Chart)
  • Monthly Trend (Line Chart)
  • Top Products by Sales (Table)
  • Profit Contribution by Region (Pie Chart)

With these steps, you’ll create a comprehensive and interactive Power BI sales dashboard!

 

Ǫ2: Explain how you can optimize a slow Power BI report. How to Optimize a Slow Power BI Report

Optimizing a slow Power BI report involves improving data model efficiency, query performance, and visualization rendering. Below are actionable steps:

 
  

1.  Optimize the Data Model

  • Reduce Dataset Size:
    • Remove unnecessary columns and rows from
    • Avoid loading entire datasets; filter data during import (e.g., specific date ranges).

·        Use Star Schema:

  • Ensure your data model follows a star schema with fact and dimension Avoid using a flat file structure.

·        Avoid Calculated Columns in Power BI:

  • Create calculated columns in the data source or Power Ǫuery instead of DAX, as these are pre-calculated during data
 
  

2.  Improve Power Ǫuery Performance

  • Optimize Transformations:
    • Perform data cleaning and transformations (e.g., filtering, joins) as early as possible in the query
    • Avoid using dynamic parameters in Power Ǫuery where

·        Disable Ǫuery Load for Unused Tables:

  • If you don’t need a table for visuals, set it to “Do Not Load” in Power Ǫuery.

3.  Optimize DAX Calculations

  • Use Measures Instead of Calculated Columns:
    • Measures are calculated on-demand and don’t increase model

·        Simplify DAX Formulas:

  • Avoid complex nested formulas; break them into smaller
  • Use DAX functions like SUMX() judiciously as they can be computationally

·        Pre-Aggregate Data:

  • Aggregate data in the source or use summarized tables for high-granularity
 
  

4.  Reduce Visual Overhead

  • Limit the Number of Visuals:
    • Too many visuals on a single page can slow down Use bookmarks or multiple pages to distribute visuals.

·        Avoid Heavy Visuals:

  • Avoid visuals that load large datasets, like detailed tables or matrix visuals with many columns/rows.
  • Replace custom visuals with built-in Power BI visuals, which are more

·        Disable Auto Date/Time:

  • In Options > Data Load, turn off “Auto Date/Time for New Files” to prevent Power

BI from creating unnecessary date tables.

 
  

5.  Use Aggregations and Summary Tables

  • Implement Aggregations:
    • Create summary tables for frequently used aggregated data (e.g., monthly or yearly totals).
    • Use aggregation-aware tables in Power BI to reduce processing

·        Group Data:

  • Consolidate granular data (e.g., daily sales → monthly sales) if granularity is not

needed.

 
  

6.  Optimize Data Refresh

  • Incremental Data Refresh:
    • For large datasets, use incremental refresh to process only new or updated

·        Partition Data:

  • Partition tables by date or category to improve refresh
 
  

7.      Optimize Relationships

  • Review Cardinality:
    • Use relationships with low cardinality where possible (e.g., many-to-one).

·        Set Appropriate Cross-filter Direction:

  • Avoid bidirectional relationships unless absolutely
 
  

8.  Use DirectǪuery or Hybrid Mode Judiciously

  • Switch to Import Mode:
    • For better performance, use Import Mode instead of DirectǪuery where

·        Optimize DirectǪuery:

  • For DirectǪuery, reduce the number of queries and ensure the source database is well-indexed.

·        Hybrid Mode:

  • Use a combination of Import and DirectǪuery to balance performance and up- to-date data
 
  

9.  Monitor Performance

  • Performance Analyzer:
    • Use the Performance Analyzer tool in Power BI Desktop (View > Performance Analyzer) to identify slow visuals and

·        Database Optimization:

  • Ensure the underlying database is indexed and optimized for query
 
  

10.  Use Compression and Data Types

  • Optimize Data Types:
    • Use appropriate data types (e.g., INTEGER instead of STRING for IDs).

·        Remove Unused Data:

  • Remove columns with unique values (e.g., GUIDs) that cannot be compressed
 
  

11.  Use Variables in DAX

  • Declare variables to store intermediate calculations in DAX formulas, reducing repeated computations:

Measure =

VAR TotalSales = SUM(Sales[Amount]) VAR TotalCost = SUM(Sales[Cost]) RETURN TotalSales – TotalCost

 
  

12.  Limit Data Export

  • Use aggregated data for exports or detailed views instead of large
 
  

Summary

Focus on:

  • Simplifying the data
  • Using efficient DAX
  • Reducing the number of visuals on each
  • Implementing aggregations and summary
  • Monitoring with the Performance Analyzer to address

By applying these optimizations, you can significantly enhance the performance of your Power BI reports.

 

Ǫ3: Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data. 5 Common Chart Types in Power BI and Their Uses

  1. Bar/Column Chart
    • Description: Displays data as horizontal (bar) or vertical (column)

·        Best for:

  • Comparing categorical
  • Showing quantities across distinct categories (e.g., sales by region, profit by product).

·        Example:

  • A bar chart showing monthly sales revenue by product
 
  

2.  Line Chart

  • Description: Represents data points connected by a continuous line, emphasizing trends over

·        Best for:

  • Showing trends, patterns, or changes over a time
  • Analyzing metrics like revenue growth, stock prices, or website

·        Example:

  • A line chart displaying monthly website visits over a
 
  

3.  Pie/Donut Chart

  • Description: Visualizes proportions of a whole as slices of a circle (Pie) or a hollow circle (Donut).

·        Best for:

  • Representing percentage distribution or composition of a
  • Comparing parts to a whole (e.g., market share by company).

·        Example:

  • A pie chart showing the percentage contribution of sales by different
 
  

4.  Scatter Plot

  • Description: Plots individual data points on an X-Y axis to show relationships or correlations between two

·        Best for:

  • Identifying correlations, patterns, or
  • Analyzing relationships like sales marketing spend or age vs. income.

·        Example:

  • A scatter plot showing advertising spend (X-axis) sales revenue (Y-axis).
 
  

5.  Stacked Area Chart

  • Description: Similar to a line chart but with shaded areas beneath lines, often stacked to show cumulative

·        Best for:

  • Showing part-to-whole relationships over
  • Visualizing cumulative trends or comparing the contribution of

·        Example:

  • A stacked area chart displaying revenue contribution from different product categories over several
 
  

Choosing the Right Chart

  • Bar/Column Charts: Categorical
  • Line Charts: Time-series
  • Pie/Donut Charts: Percentage
  • Scatter Plots: Relationships and
  • Stacked Area Charts: Cumulative trends and part-to-whole

Each chart type serves a unique purpose, helping users understand and analyze data more effectively.