Real time Data Analsyst interview questions For 0-3 YEARS EXP
Infosys DATA ANALYST INTERVIEW EXPERIENCE (0-3) YOE
SǪL
Ǫ1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using example.
The RANK(), DENSE_RANK(), and ROW_NUMBER() functions are all used to assign ranks to rows in a result set based on a specified order. However, they differ in how they handle duplicate values (ties). Let’s explore each with examples:
1. RANK()
- Assigns a unique rank to rows within a partition of the result
- Gaps are left in the ranking if there are
Example:
SELECT Name, Score,
RANK() OVER (ORDER BY Score DESC) AS Rank
FROM Students;
Input Data:
Name | Score |
Alice | 95 |
Bob | 95 |
Charlie | 90 |
David | 85 |
Output:
Name | Score | Rank |
Alice | 95 | 1 |
Bob | 95 | 1 |
Charlie | 90 | 3 |
David | 85 | 4 |
Explanation:
- Alice and Bob have the same score, so they share rank
- The next rank is 3, leaving a gap after the tied
2. DENSE_RANK()
- Assigns a unique rank to rows within a partition of the result
- No gaps in the ranking, even if there are
Example:
SELECT Name, Score,
DENSE_RANK() OVER (ORDER BY Score DESC) AS DenseRank
FROM Students;
Output:
Name | Score | DenseRank |
Alice | 95 | 1 |
Bob | 95 | 1 |
Charlie | 90 | 2 |
David | 85 | 3 |
Explanation:
- Alice and Bob share rank
- The next rank is 2, with no
3. ROW_NUMBER()
- Assigns a unique sequential number to each row within a
- Does not handle ties—each row gets a distinct
Example:
SELECT Name, Score,
ROW_NUMBER() OVER (ORDER BY Score DESC) AS RowNumber
FROM Students;
Output:
Name | Score | RowNumber |
Alice | 95 | 1 |
Bob | 95 | 2 |
Charlie | 90 | 3 |
David | 85 | 4 |
Explanation:
- Each row gets a unique number regardless of
Summary of Differences:
Function | Handles Ties | Gaps in Ranking |
RANK() | Same rank for ties | Yes |
DENSE_RANK() | Same rank for ties | No |
ROW_NUMBER() | No ties allowed | N/A (unique numbering) |
These functions are particularly useful in scenarios like leaderboard generation, identifying top performers, or ranking products based on sales.
Ǫ2 – Ǫ4 use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary) Ǫ2: Find the nth highest salary from the Employee table.
To find the nth highest salary from the employee table, you can use the DENSE_RANK() or
RANK() window function. Here’s the query using DENSE_RANK():
Ǫuery to Find the nth Highest Salary:
WITH RankedSalaries AS ( SELECT Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM employee
)
SELECT Salary
FROM RankedSalaries WHERE Rank = <n>;
Explanation:
- DENSE_RANK() OVER (ORDER BY Salary DESC):
- Ranks salaries in descending
- Ensures no gaps in rank even if there are
2. WHERE Rank = <n>:
- Filters for the specific rank you are interested in (replace <n> with the desired value, g., 2 for the second highest salary).
Example:
Input Data:
EmpID | ManagerID | JoinDate | Dept | Salary |
1 | 101 | 2022-01-15 | Sales | 70000 |
2 | 102 | 2021-05-10 | HR | 80000 |
3 | 103 | 2023-03-20 | IT | 70000 |
4 | 101 | 2020-09-25 | IT | 90000 |
Find the 2nd Highest Salary:
WITH RankedSalaries AS ( SELECT Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM employee
)
SELECT Salary
FROM RankedSalaries WHERE Rank = 2;
Output:
Salary |
80000 |
Alternative Approach (Without Window Functions): You can also use a subquery to achieve the same: SELECT DISTINCT Salary
FROM employee e1 WHERE (
SELECT COUNT(DISTINCT Salary)
FROM employee e2
WHERE e2.Salary > e1.Salary
) = <n – 1>;
Explanation:
- The inner query counts how many distinct salaries are greater than the current
- If the count is n-1, the current salary is the nth highest.
Ǫ3: You have an employee table with employee ID and manager ID. Find all employees under a specific manager, including their subordinates at any level.
To find all employees under a specific manager, including subordinates at any level (hierarchical query), you can use a Recursive Common Table Expression (CTE) in SǪL.
Ǫuery:
WITH RecursiveHierarchy AS (
— Anchor member: Get the specified manager’s direct reports SELECT EmpID, ManagerID
FROM employee
WHERE ManagerID = <specific_manager_id>
UNION ALL
— Recursive member: Find subordinates of the employees in the current level SELECT e.EmpID, e.ManagerID
FROM employee e
INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID
)
SELECT EmpID
FROM RecursiveHierarchy;
Explanation:
- Anchor Member:
- Retrieves all direct reports of the specified manager (replace
<specific_manager_id> with the manager’s ID).
2. Recursive Member:
- Joins the employee table with the recursive CTE (RecursiveHierarchy) to find employees whose ManagerID matches any EmpID from the current
3. Recursive Ǫuery Execution:
- Repeats the process until no more subordinates are
4. Final SELECT:
- Extracts all EmpIDs from the
Example:
Input Data:
EmpID | ManagerID |
1 | 101 |
2 | 102 |
3 | 101 |
4 | 3 |
5 | 4 |
101 | NULL |
102 | 101 |
Find all employees under ManagerID = 101:
WITH RecursiveHierarchy AS ( SELECT EmpID, ManagerID FROM employee
WHERE ManagerID = 101
UNION ALL
SELECT e.EmpID, e.ManagerID FROM employee e
INNER JOIN RecursiveHierarchy rh ON e.ManagerID = rh.EmpID
)
SELECT EmpID
FROM RecursiveHierarchy;
Output:
EmpID |
1 |
2 |
3 |
4 |
5 |
Explanation:
- EmpID = 1 and EmpID = 3 directly report to ManagerID =
- EmpID = 4 reports to EmpID =
- EmpID = 5 reports to EmpID =
For Databases Without Recursive CTE Support:
If recursive CTEs are not available (e.g., older MySǪL versions), you may need to write stored procedures or implement the hierarchy in application logic.
Ǫ4: Write a query to find the cumulative salary of employees department-wise, who have joined company in last 30 days.
To calculate the cumulative salary of employees department-wise who joined in the last 30 days, you can use the SUM() window function along with a filter condition to restrict the data to the last 30 days.
Ǫuery:
SELECT Dept,
EmpID, JoinDate, Salary,
SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate) AS CumulativeSalary FROM employee
WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY
ORDER BY Dept, JoinDate;
Explanation:
- WHERE JoinDate >= CURDATE() – INTERVAL 30 DAY:
- Filters employees who joined within the last 30
2. SUM(Salary) OVER (PARTITION BY Dept ORDER BY JoinDate):
- Calculates the cumulative sum of salaries within each department (PARTITION BY Dept), ordered by the joining date (ORDER BY JoinDate).
3. ORDER BY Dept, JoinDate:
- Ensures the output is sorted department-wise and by joining
Example: Input Data:
EmpID | Dept | JoinDate | Salary |
1 | Sales | 2024-11-05 | 70000 |
2 | HR | 2024-11-15 | 80000 |
3 | IT | 2024-11-20 | 60000 |
4 | Sales | 2024-11-22 | 50000 |
5 | IT | 2024-11-25 | 55000 |
Ǫuery Result:
Dept | EmpID | JoinDate | Salary | CumulativeSalary |
HR | 2 | 2024-11-15 | 80000 | 80000 |
IT | 3 | 2024-11-20 | 60000 | 60000 |
Dept | EmpID | JoinDate | Salary | CumulativeSalary |
IT | 5 | 2024-11-25 | 55000 | 115000 |
Sales | 1 | 2024-11-05 | 70000 | 70000 |
Sales | 4 | 2024-11-22 | 50000 | 120000 |
Notes:
- For SǪL Server: Replace CURDATE() – INTERVAL 30 DAY with GETDATE() –
- For Oracle: Use SYSDATE – 30 in the WHERE
- Ensure the JoinDate column is in DATE or DATETIME
Ǫ5: Find the top 2 customers with the highest order amount for each product category, handling ties appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)
To find the top 2 customers with the highest order amount for each product category, including ties (handling ties appropriately), you can use the RANK() or DENSE_RANK() window function.
Ǫuery:
WITH RankedOrders AS ( SELECT CustomerID,
ProductCategory, OrderAmount,
RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC) AS Rank FROM customer
)
SELECT CustomerID, ProductCategory, OrderAmount
FROM RankedOrders WHERE Rank <= 2
ORDER BY ProductCategory, Rank, CustomerID;
Explanation:
- RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount DESC):
- Assigns a rank to each customer within the same product category, based on their order amount in descending
- Handles ties by assigning the same rank to customers with equal order
2. WHERE Rank <= 2:
- Filters the result to include only the top 2 customers for each product
3. ORDER BY ProductCategory, Rank, CustomerID:
- Ensures the output is sorted by product category, rank, and customer ID for better
Example: Input Data:
CustomerID | ProductCategory | OrderAmount |
1 | Electronics | 500 |
2 | Electronics | 300 |
3 | Electronics | 300 |
4 | Clothing | 200 |
5 | Clothing | 150 |
6 | Clothing | 150 |
7 | Clothing | 100 |
Ǫuery Result:
CustomerID | ProductCategory | OrderAmount |
1 | Electronics | 500 |
2 | Electronics | 300 |
3 | Electronics | 300 |
4 | Clothing | 200 |
5 | Clothing | 150 |
CustomerID | ProductCategory | OrderAmount |
6 | Clothing | 150 |
Key Points:
- Ties Handling:
- Customers 2 and 3 in “Electronics” both have the same order amount (300), so they are both
- Similarly, Customers 5 and 6 in “Clothing” both have an order amount of 150, and both are ranked
· If Only 2 Distinct Ranks Are Needed:
- Replace RANK() with DENSE_RANK(), which ensures no gaps in
For instance, using DENSE_RANK() for the same input, the 2nd rank in “Electronics” would include only 300, even if there were additional ties.
POWER BI
Ǫ1: Explain step-by-step how will you create a sales dashboard from scratch.
Step-by-Step Process to Create a Sales Dashboard in Power BI: Step 1: Gather Requirements
- Understand the purpose of the dashboard:
- Who are the users (e.g., managers, sales reps)?
- What KPIs or metrics are needed (e.g., total sales, sales by region, profit margin)?
- Identify filters or slicers (e.g., date range, region, product category).
- Obtain a clear understanding of the data sources and expected
Step 2: Prepare Data Sources
- Identify data sources (e.g., Excel, SǪL Server, cloud services).
- Ensure data contains required fields like Sales Amount, Region, Product, Date,
- Clean the data beforehand to ensure accuracy (e.g., no duplicates or missing values).
Step 3: Load Data into Power BI
- Open Power BI Desktop.
2. Import Data:
- Click on Home > Get Data and choose the data source (e.g., Excel, SǪL, ).
- Connect and load the dataset(s).
3. Preview and Transform:
- Use Power Ǫuery for data cleaning and transformations:
- Remove nulls or
- Format columns (e.g., dates, numbers).
- Rename fields for clarity (e.g., Sales_Amount → Sales Amount).
Step 4: Create Relationships
- Go to the Model View.
- Establish relationships between tables using primary and foreign
- Example: Connect Sales table with Products and Regions
- Ensure relationships have correct cardinality (e.g., one-to-many).
Step 5: Design Data Model
- Optimize the model:
- Use measures for calculations like Total Sales, Profit Margin, or Year-to-Date Sales:
Total Sales = SUM(Sales[Sales Amount])
Profit Margin = DIVIDE(SUM(Sales[Profit]), SUM(Sales[Sales Amount]), 0)
- Add calculated columns if
Step 6: Build the Sales Dashboard
- Add Visualizations:
- Drag and drop fields onto the report
- Use visualizations like:
- Card Visuals: For KPIs (e.g., Total Sales, Total Profit).
- Bar/Column Charts: For sales by region or product
- Line Charts: For trends over
- Pie/Donut Charts: For sales contribution by
- Maps: For geographical analysis of
2. Customize the Layout:
- Arrange visuals logically (e.g., KPIs at the top, detailed charts below).
- Use slicers for interactivity (e.g., date range, region, product filters).
Step 7: Add Interactivity
- Enable cross-filtering between
- Add drill-through pages for detailed analysis:
- Example: Click on a region to drill into individual store
- Use tooltips to display additional insights on
Step 8: Format the Dashboard
- Apply a consistent theme (View > Themes).
- Use data labels and legends for
- Ensure visuals are responsive (resize automatically).
Step 9: Publish and Share
- Save the
- Publish to Power BI Service:
- Home > Publish > Choose Workspace.
- Share the dashboard:
- Set permissions for users (e.g., view-only or edit access).
- Embed the dashboard in emails or applications if
Step 10: Maintain the Dashboard
- Schedule data refreshes to ensure up-to-date
- Gather feedback from users and make iterative
- Monitor performance and optimize for large datasets if
Example Sales Dashboard:
Visuals:
- Total Sales (Card)
- Sales by Region (Bar Chart)
- Monthly Trend (Line Chart)
- Top Products by Sales (Table)
- Profit Contribution by Region (Pie Chart)
With these steps, you’ll create a comprehensive and interactive Power BI sales dashboard!
Ǫ2: Explain how you can optimize a slow Power BI report. How to Optimize a Slow Power BI Report
Optimizing a slow Power BI report involves improving data model efficiency, query performance, and visualization rendering. Below are actionable steps:
1. Optimize the Data Model
- Reduce Dataset Size:
- Remove unnecessary columns and rows from
- Avoid loading entire datasets; filter data during import (e.g., specific date ranges).
· Use Star Schema:
- Ensure your data model follows a star schema with fact and dimension Avoid using a flat file structure.
· Avoid Calculated Columns in Power BI:
- Create calculated columns in the data source or Power Ǫuery instead of DAX, as these are pre-calculated during data
2. Improve Power Ǫuery Performance
- Optimize Transformations:
- Perform data cleaning and transformations (e.g., filtering, joins) as early as possible in the query
- Avoid using dynamic parameters in Power Ǫuery where
· Disable Ǫuery Load for Unused Tables:
- If you don’t need a table for visuals, set it to “Do Not Load” in Power Ǫuery.
3. Optimize DAX Calculations
- Use Measures Instead of Calculated Columns:
- Measures are calculated on-demand and don’t increase model
· Simplify DAX Formulas:
- Avoid complex nested formulas; break them into smaller
- Use DAX functions like SUMX() judiciously as they can be computationally
· Pre-Aggregate Data:
- Aggregate data in the source or use summarized tables for high-granularity
4. Reduce Visual Overhead
- Limit the Number of Visuals:
- Too many visuals on a single page can slow down Use bookmarks or multiple pages to distribute visuals.
· Avoid Heavy Visuals:
- Avoid visuals that load large datasets, like detailed tables or matrix visuals with many columns/rows.
- Replace custom visuals with built-in Power BI visuals, which are more
· Disable Auto Date/Time:
- In Options > Data Load, turn off “Auto Date/Time for New Files” to prevent Power
BI from creating unnecessary date tables.
5. Use Aggregations and Summary Tables
- Implement Aggregations:
- Create summary tables for frequently used aggregated data (e.g., monthly or yearly totals).
- Use aggregation-aware tables in Power BI to reduce processing
· Group Data:
- Consolidate granular data (e.g., daily sales → monthly sales) if granularity is not
needed.
6. Optimize Data Refresh
- Incremental Data Refresh:
- For large datasets, use incremental refresh to process only new or updated
· Partition Data:
- Partition tables by date or category to improve refresh
7. Optimize Relationships
- Review Cardinality:
- Use relationships with low cardinality where possible (e.g., many-to-one).
· Set Appropriate Cross-filter Direction:
- Avoid bidirectional relationships unless absolutely
8. Use DirectǪuery or Hybrid Mode Judiciously
- Switch to Import Mode:
- For better performance, use Import Mode instead of DirectǪuery where
· Optimize DirectǪuery:
- For DirectǪuery, reduce the number of queries and ensure the source database is well-indexed.
· Hybrid Mode:
- Use a combination of Import and DirectǪuery to balance performance and up- to-date data
9. Monitor Performance
- Performance Analyzer:
- Use the Performance Analyzer tool in Power BI Desktop (View > Performance Analyzer) to identify slow visuals and
· Database Optimization:
- Ensure the underlying database is indexed and optimized for query
10. Use Compression and Data Types
- Optimize Data Types:
- Use appropriate data types (e.g., INTEGER instead of STRING for IDs).
· Remove Unused Data:
- Remove columns with unique values (e.g., GUIDs) that cannot be compressed
11. Use Variables in DAX
- Declare variables to store intermediate calculations in DAX formulas, reducing repeated computations:
Measure =
VAR TotalSales = SUM(Sales[Amount]) VAR TotalCost = SUM(Sales[Cost]) RETURN TotalSales – TotalCost
12. Limit Data Export
- Use aggregated data for exports or detailed views instead of large
Summary
Focus on:
- Simplifying the data
- Using efficient DAX
- Reducing the number of visuals on each
- Implementing aggregations and summary
- Monitoring with the Performance Analyzer to address
By applying these optimizations, you can significantly enhance the performance of your Power BI reports.
Ǫ3: Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data. 5 Common Chart Types in Power BI and Their Uses
- Bar/Column Chart
- Description: Displays data as horizontal (bar) or vertical (column)
· Best for:
- Comparing categorical
- Showing quantities across distinct categories (e.g., sales by region, profit by product).
· Example:
- A bar chart showing monthly sales revenue by product
2. Line Chart
- Description: Represents data points connected by a continuous line, emphasizing trends over
· Best for:
- Showing trends, patterns, or changes over a time
- Analyzing metrics like revenue growth, stock prices, or website
· Example:
- A line chart displaying monthly website visits over a
3. Pie/Donut Chart
- Description: Visualizes proportions of a whole as slices of a circle (Pie) or a hollow circle (Donut).
· Best for:
- Representing percentage distribution or composition of a
- Comparing parts to a whole (e.g., market share by company).
· Example:
- A pie chart showing the percentage contribution of sales by different
4. Scatter Plot
- Description: Plots individual data points on an X-Y axis to show relationships or correlations between two
· Best for:
- Identifying correlations, patterns, or
- Analyzing relationships like sales marketing spend or age vs. income.
· Example:
- A scatter plot showing advertising spend (X-axis) sales revenue (Y-axis).
5. Stacked Area Chart
- Description: Similar to a line chart but with shaded areas beneath lines, often stacked to show cumulative
· Best for:
- Showing part-to-whole relationships over
- Visualizing cumulative trends or comparing the contribution of
· Example:
- A stacked area chart displaying revenue contribution from different product categories over several
Choosing the Right Chart
- Bar/Column Charts: Categorical
- Line Charts: Time-series
- Pie/Donut Charts: Percentage
- Scatter Plots: Relationships and
- Stacked Area Charts: Cumulative trends and part-to-whole
Each chart type serves a unique purpose, helping users understand and analyze data more effectively.
Related Posts
- FICO Company Recruitment for Software Engineering-Engineer I Freshers Don’t Miss opportunity
- Strada Company hiring for Workday Graduate Engineer Trainee – Freshers Apply Fast!
- Siemens Healthcare Private Limited hiring for Trainee Technical – Freshers Apply Fast!
- Adobe Company hiring for AEM Consultant SheSparks Intern- Students and Freshers don’t Miss
- CYIENT Company hiring for Data Analyst (Power BI) – Freshers Apply Fast!