Real time Data Analsyst interview questions For 0-3 YEARS EXP

Infosys DATA ANALYST INTERVIEW EXPERIENCE (0-3) YOE

SǪL

Ǫ1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using example.

The RANK(), DENSE_RANK(), and ROW_NUMBER() functions are all used to assign ranks to rows in a result set based on a specified order. However, they differ in how they handle duplicate values (ties). Let’s explore each with examples:

1. RANK()

Assigns a unique rank to rows within a partition of the result
Gaps are left in the ranking if there are

Example:

SELECT Name, Score,

RANK() OVER (ORDER BY Score DESC) AS Rank

FROM Students;

Input Data:

Name	Score
Alice	95
Bob	95
Charlie	90
David	85

Output:

Name	Score	Rank
Alice	95	1
Bob	95	1
Charlie	90	3
David	85	4

Explanation:

Alice and Bob have the same score, so they share rank
The next rank is 3, leaving a gap after the tied

2. DENSE_RANK()

Assigns a unique rank to rows within a partition of the result
No gaps in the ranking, even if there are

Example:

SELECT Name, Score,

DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank

FROM Students;

Output:

Name	Score	DenseRank
Alice	95	1
Bob	95	1
Charlie	90	2
David	85	3

Explanation:

Alice and Bob share rank
The next rank is 2, with no

3. ROW_NUMBER()

Assigns a unique sequential number to each row within a
Does not handle ties—each row gets a distinct

Example:

SELECT Name, Score,

ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNumber

FROM Students;

Output:

Name	Score	RowNumber
Alice	95	1
Bob	95	2
Charlie	90	3
David	85	4

Explanation:

Each row gets a unique number regardless of

Summary of Differences:

Function	Handles Ties	Gaps in Ranking
RANK()	Same rank for ties	Yes
DENSE_RANK()	Same rank for ties	No
ROW_NUMBER()	No ties allowed	N/A (unique numbering)

These functions are particularly useful in scenarios like leaderboard generation, identifying top performers, or ranking products based on sales.

Ǫ2 – Ǫ4 use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary) Ǫ2: Find the nth highest salary from the Employee table.

To find the nth highest salary from the employee table, you can use the DENSE_RANK() or

RANK() window function. Here’s the query using DENSE_RANK():

Ǫuery to Find the nth Highest Salary:

WITH RankedSalaries AS ( SELECT Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM employee

)

SELECT Salary

FROM RankedSalaries WHERE Rank = <n>;

Explanation:

DENSE_RANK() OVER (ORDER BY Salary DESC):
- Ranks salaries in descending
- Ensures no gaps in rank even if there are

2. WHERE Rank = <n>:

Filters for the specific rank you are interested in (replace <n> with the desired value, g., 2 for the second highest salary).

Example:

Input Data:

EmpID	ManagerID	JoinDate	Dept	Salary
1	101	2022-01-15	Sales	70000
2	102	2021-05-10	HR	80000
3	103	2023-03-20	IT	70000
4	101	2020-09-25	IT	90000

Find the 2nd Highest Salary:

WITH RankedSalaries AS ( SELECT Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM employee

)

SELECT Salary

FROM RankedSalaries WHERE Rank = 2;

Output:

Salary

80000

Alternative Approach (Without Window Functions): You can also use a subquery to achieve the same: SELECT DISTINCT Salary

FROM employee e1 WHERE (

SELECT COUNT(DISTINCT Salary)

FROM employee e2

WHERE e2.Salary > e1.Salary

) = <n – 1>;

Explanation:

The inner query counts how many distinct salaries are greater than the current
If the count is n-1, the current salary is the nth highest.

Ǫ3: You have an employee table with employee ID and manager ID. Find all employees under a specific manager, including their subordinates at any level.

To find all employees under a specific manager, including subordinates at any level (hierarchical query), you can use a Recursive Common Table Expression (CTE) in SǪL.

Ǫuery:

WITH RecursiveHierarchy AS (

— Anchor member: Get the specified manager’s direct reports SELECT EmpID, ManagerID

FROM employee

WHERE ManagerID = <specific_manager_id>

UNION ALL

— Recursive member: Find subordinates of the employees in the current level SELECT e.EmpID, e.ManagerID

FROM employee e

INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID

)

SELECT EmpID

FROM RecursiveHierarchy;

Explanation:

Anchor Member:
- Retrieves all direct reports of the specified manager (replace

<specific_manager_id> with the manager’s ID).

2. Recursive Member:

Joins the employee table with the recursive CTE (RecursiveHierarchy) to find employees whose ManagerID matches any EmpID from the current

3. Recursive Ǫuery Execution:

Repeats the process until no more subordinates are

4. Final SELECT:

Extracts all EmpIDs from the

Example:

Input Data:

EmpID	ManagerID
1	101
2	102
3	101
4	3
5	4
101	NULL
102	101

Find all employees under ManagerID = 101:

WITH RecursiveHierarchy AS ( SELECT EmpID, ManagerID FROM employee

WHERE ManagerID = 101

UNION ALL

SELECT e.EmpID, e.ManagerID FROM employee e

INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID

)

SELECT EmpID

FROM RecursiveHierarchy;

Output:

EmpID

Explanation:

EmpID = 1 and EmpID = 3 directly report to ManagerID =
EmpID = 4 reports to EmpID =
EmpID = 5 reports to EmpID =

For Databases Without Recursive CTE Support:

If recursive CTEs are not available (e.g., older MySǪL versions), you may need to write stored procedures or implement the hierarchy in application logic.

Ǫ4: Write a query to find the cumulative salary of employees department-wise, who have joined company in last 30 days.

To calculate the cumulative salary of employees department-wise who joined in the last 30 days, you can use the SUM() window function along with a filter condition to restrict the data to the last 30 days.

Ǫuery:

SELECT Dept,

EmpID, JoinDate, Salary,

SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate) AS CumulativeSalary FROM employee

WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY

ORDER BY Dept, JoinDate;

Explanation:

WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY:
- Filters employees who joined within the last 30

2. SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate):

Calculates the cumulative sum of salaries within each department (PARTITION BY Dept), ordered by the joining date (ORDER BY JoinDate).

3. ORDER BY Dept, JoinDate:

Ensures the output is sorted department-wise and by joining

Example: Input Data:

EmpID	Dept	JoinDate	Salary
1	Sales	2024-11-05	70000
2	HR	2024-11-15	80000
3	IT	2024-11-20	60000
4	Sales	2024-11-22	50000
5	IT	2024-11-25	55000

Ǫuery Result:

Dept	EmpID	JoinDate	Salary	CumulativeSalary
HR	2	2024-11-15	80000	80000
IT	3	2024-11-20	60000	60000

Dept	EmpID	JoinDate	Salary	CumulativeSalary
IT	5	2024-11-25	55000	115000
Sales	1	2024-11-05	70000	70000
Sales	4	2024-11-22	50000	120000

Notes:

For SǪL Server: Replace CURDATE() – INTERVAL 30 DAY with GETDATE() –
For Oracle: Use SYSDATE – 30 in the WHERE
Ensure the JoinDate column is in DATE or DATETIME

Ǫ5: Find the top 2 customers with the highest order amount for each product category, handling ties appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)

To find the top 2 customers with the highest order amount for each product category, including ties (handling ties appropriately), you can use the RANK() or DENSE_RANK() window function.

Ǫuery:

WITH RankedOrders AS ( SELECT CustomerID,

ProductCategory, OrderAmount,

RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC) AS Rank FROM customer

)

SELECT CustomerID, ProductCategory, OrderAmount

FROM RankedOrders WHERE Rank <= 2

ORDER BY ProductCategory, Rank, CustomerID;

Explanation:

RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC):
- Assigns a rank to each customer within the same product category, based on their order amount in descending
- Handles ties by assigning the same rank to customers with equal order

2. WHERE Rank <= 2:

Filters the result to include only the top 2 customers for each product

3. ORDER BY ProductCategory, Rank, CustomerID:

Ensures the output is sorted by product category, rank, and customer ID for better

Example: Input Data:

CustomerID	ProductCategory	OrderAmount
1	Electronics	500
2	Electronics	300
3	Electronics	300
4	Clothing	200
5	Clothing	150
6	Clothing	150
7	Clothing	100

Ǫuery Result:

CustomerID	ProductCategory	OrderAmount
1	Electronics	500
2	Electronics	300
3	Electronics	300
4	Clothing	200
5	Clothing	150

CustomerID	ProductCategory	OrderAmount
6	Clothing	150

Key Points:

Ties Handling:
- Customers 2 and 3 in “Electronics” both have the same order amount (300), so they are both
- Similarly, Customers 5 and 6 in “Clothing” both have an order amount of 150, and both are ranked

· If Only 2 Distinct Ranks Are Needed:

Replace RANK() with DENSE_RANK(), which ensures no gaps in

For instance, using DENSE_RANK() for the same input, the 2nd rank in “Electronics” would include only 300, even if there were additional ties.

POWER BI

Ǫ1: Explain step-by-step how will you create a sales dashboard from scratch.

Step-by-Step Process to Create a Sales Dashboard in Power BI: Step 1: Gather Requirements

Understand the purpose of the dashboard:
- Who are the users (e.g., managers, sales reps)?
- What KPIs or metrics are needed (e.g., total sales, sales by region, profit margin)?
- Identify filters or slicers (e.g., date range, region, product category).
Obtain a clear understanding of the data sources and expected

Step 2: Prepare Data Sources

Identify data sources (e.g., Excel, SǪL Server, cloud services).
Ensure data contains required fields like Sales Amount, Region, Product, Date,
Clean the data beforehand to ensure accuracy (e.g., no duplicates or missing values).

Step 3: Load Data into Power BI

Open Power BI Desktop.

2. Import Data:

Click on Home > Get Data and choose the data source (e.g., Excel, SǪL, ).
Connect and load the dataset(s).

3. Preview and Transform:

Use Power Ǫuery for data cleaning and transformations:
- Remove nulls or
- Format columns (e.g., dates, numbers).
- Rename fields for clarity (e.g., Sales_Amount → Sales Amount).

Step 4: Create Relationships

Go to the Model View.
Establish relationships between tables using primary and foreign
- Example: Connect Sales table with Products and Regions
Ensure relationships have correct cardinality (e.g., one-to-many).

Step 5: Design Data Model

Optimize the model:
- Use measures for calculations like Total Sales, Profit Margin, or Year-to-Date Sales:

Total Sales = SUM(Sales[Sales Amount])

Profit Margin = DIVIDE(SUM(Sales[Profit]), SUM(Sales[Sales Amount]), 0)

Add calculated columns if

Step 6: Build the Sales Dashboard

Add Visualizations:
- Drag and drop fields onto the report
- Use visualizations like:
  - Card Visuals: For KPIs (e.g., Total Sales, Total Profit).
  - Bar/Column Charts: For sales by region or product

Line Charts: For trends over
Pie/Donut Charts: For sales contribution by
Maps: For geographical analysis of

2. Customize the Layout:

Arrange visuals logically (e.g., KPIs at the top, detailed charts below).
Use slicers for interactivity (e.g., date range, region, product filters).

Step 7: Add Interactivity

Enable cross-filtering between
Add drill-through pages for detailed analysis:
- Example: Click on a region to drill into individual store
Use tooltips to display additional insights on

Step 8: Format the Dashboard

Apply a consistent theme (View > Themes).
Use data labels and legends for
Ensure visuals are responsive (resize automatically).

Step 9: Publish and Share

Save the
Publish to Power BI Service:
- Home > Publish > Choose Workspace.
Share the dashboard:
- Set permissions for users (e.g., view-only or edit access).
- Embed the dashboard in emails or applications if

Step 10: Maintain the Dashboard

Schedule data refreshes to ensure up-to-date
Gather feedback from users and make iterative
Monitor performance and optimize for large datasets if

Example Sales Dashboard:

Visuals:

Total Sales (Card)
Sales by Region (Bar Chart)
Monthly Trend (Line Chart)
Top Products by Sales (Table)
Profit Contribution by Region (Pie Chart)

With these steps, you’ll create a comprehensive and interactive Power BI sales dashboard!

Ǫ2: Explain how you can optimize a slow Power BI report. How to Optimize a Slow Power BI Report

Optimizing a slow Power BI report involves improving data model efficiency, query performance, and visualization rendering. Below are actionable steps:

1. Optimize the Data Model

Reduce Dataset Size:
- Remove unnecessary columns and rows from
- Avoid loading entire datasets; filter data during import (e.g., specific date ranges).

· Use Star Schema:

Ensure your data model follows a star schema with fact and dimension Avoid using a flat file structure.

· Avoid Calculated Columns in Power BI:

Create calculated columns in the data source or Power Ǫuery instead of DAX, as these are pre-calculated during data

2. Improve Power Ǫuery Performance

Optimize Transformations:
- Perform data cleaning and transformations (e.g., filtering, joins) as early as possible in the query
- Avoid using dynamic parameters in Power Ǫuery where

· Disable Ǫuery Load for Unused Tables:

If you don’t need a table for visuals, set it to “Do Not Load” in Power Ǫuery.

3. Optimize DAX Calculations

Use Measures Instead of Calculated Columns:
- Measures are calculated on-demand and don’t increase model

· Simplify DAX Formulas:

Avoid complex nested formulas; break them into smaller
Use DAX functions like SUMX() judiciously as they can be computationally

· Pre-Aggregate Data:

Aggregate data in the source or use summarized tables for high-granularity

4. Reduce Visual Overhead

Limit the Number of Visuals:
- Too many visuals on a single page can slow down Use bookmarks or multiple pages to distribute visuals.

· Avoid Heavy Visuals:

Avoid visuals that load large datasets, like detailed tables or matrix visuals with many columns/rows.
Replace custom visuals with built-in Power BI visuals, which are more

· Disable Auto Date/Time:

In Options > Data Load, turn off “Auto Date/Time for New Files” to prevent Power

BI from creating unnecessary date tables.

5. Use Aggregations and Summary Tables

Implement Aggregations:
- Create summary tables for frequently used aggregated data (e.g., monthly or yearly totals).
- Use aggregation-aware tables in Power BI to reduce processing

· Group Data:

Consolidate granular data (e.g., daily sales → monthly sales) if granularity is not

needed.

6. Optimize Data Refresh

Incremental Data Refresh:
- For large datasets, use incremental refresh to process only new or updated

· Partition Data:

Partition tables by date or category to improve refresh

7. Optimize Relationships

Review Cardinality:
- Use relationships with low cardinality where possible (e.g., many-to-one).

· Set Appropriate Cross-filter Direction:

Avoid bidirectional relationships unless absolutely

8. Use DirectǪuery or Hybrid Mode Judiciously

Switch to Import Mode:
- For better performance, use Import Mode instead of DirectǪuery where

· Optimize DirectǪuery:

For DirectǪuery, reduce the number of queries and ensure the source database is well-indexed.

· Hybrid Mode:

Use a combination of Import and DirectǪuery to balance performance and up- to-date data

9. Monitor Performance

Performance Analyzer:
- Use the Performance Analyzer tool in Power BI Desktop (View > Performance Analyzer) to identify slow visuals and

· Database Optimization:

Ensure the underlying database is indexed and optimized for query

10. Use Compression and Data Types

Optimize Data Types:
- Use appropriate data types (e.g., INTEGER instead of STRING for IDs).

· Remove Unused Data:

Remove columns with unique values (e.g., GUIDs) that cannot be compressed

11. Use Variables in DAX

Declare variables to store intermediate calculations in DAX formulas, reducing repeated computations:

Measure =

VAR TotalSales = SUM(Sales[Amount]) VAR TotalCost = SUM(Sales[Cost]) RETURN TotalSales – TotalCost

12. Limit Data Export

Use aggregated data for exports or detailed views instead of large

Summary

Focus on:

Simplifying the data
Using efficient DAX
Reducing the number of visuals on each
Implementing aggregations and summary
Monitoring with the Performance Analyzer to address

By applying these optimizations, you can significantly enhance the performance of your Power BI reports.

Ǫ3: Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data. 5 Common Chart Types in Power BI and Their Uses

Bar/Column Chart
- Description: Displays data as horizontal (bar) or vertical (column)

· Best for:

Comparing categorical

Showing quantities across distinct categories (e.g., sales by region, profit by product).

· Example:

A bar chart showing monthly sales revenue by product

2. Line Chart

Description: Represents data points connected by a continuous line, emphasizing trends over

· Best for:

Showing trends, patterns, or changes over a time
Analyzing metrics like revenue growth, stock prices, or website

· Example:

A line chart displaying monthly website visits over a

3. Pie/Donut Chart

Description: Visualizes proportions of a whole as slices of a circle (Pie) or a hollow circle (Donut).

· Best for:

Representing percentage distribution or composition of a
Comparing parts to a whole (e.g., market share by company).

· Example:

A pie chart showing the percentage contribution of sales by different

4. Scatter Plot

Description: Plots individual data points on an X-Y axis to show relationships or correlations between two

· Best for:

Identifying correlations, patterns, or
Analyzing relationships like sales marketing spend or age vs. income.

· Example:

A scatter plot showing advertising spend (X-axis) sales revenue (Y-axis).

5. Stacked Area Chart

Description: Similar to a line chart but with shaded areas beneath lines, often stacked to show cumulative

· Best for:

Showing part-to-whole relationships over
Visualizing cumulative trends or comparing the contribution of

· Example:

A stacked area chart displaying revenue contribution from different product categories over several

Choosing the Right Chart

Bar/Column Charts: Categorical
Line Charts: Time-series
Pie/Donut Charts: Percentage
Scatter Plots: Relationships and
Stacked Area Charts: Cumulative trends and part-to-whole

Each chart type serves a unique purpose, helping users understand and analyze data more effectively.