Power BI: IPL Analytics- Data Modeling and Transformation-Part 2

In my previous blog we have seen few techniques of data modeling. Now let us  see few more techniques to do the data modeling and data transformation

Add New column:

We have Match table which has all match related details such as winner team, who is the man of the match, is the match won by wickets or runs etc. If we would like to show the data “Win by wickets” and “Win By Runs” based on the winning type, how do we do that?  Just open the table. See there are two columns (Win Type and  Win Margin) which can provide such data, we only need to show this data from columns to the rows.

Let us add new column called Win_By_Wickets and add the formula to show the win_margin in case the match is won by wickets.

Let us follow the steps as shown in below snippets

Win by wickets-New COlumn

Add the column name before the “=” symbol and let us add if condition to check the win_type and display the Win_Margin, Writing this DAX formula is almost similar to how we add formula in excel. Win by wickets-New Column- If Condition

The complete formula would be

“Win_By_Wickets = IF(Match[Win_Type]=”wickets”,Match[Win_Margin],”0″)”

Similarly for Win by Runs, it would be

Win_By_Runs = IF(Match[Win_Type]=”runs”,Match[Win_Margin],”0″)

Now we have these two new columns added in the table for further visualization purpose.

Win by wickets1

Aggregate the data with group by

Ball By Ball table contains the huge data per ball in all the matches. Now if we need to find the statistics of Economy Rate – how many runs the Bowler has given per over. We can use SUMMARIZE function which provides grouping of the data based on particular columns. Output of the summarize function is a table. Its very similar to the select query having group by clause and aggregate function in the SQL server.

NewTable.5

Please refer below expression which selects data from Ball_By_Ball table, groups the data by Bowler and adds additional columns to the same table with various aggregate functions such as SUM, COUNTROWS etc.

EconomyTable = SUMMARIZE(Ball_By_Ball,Ball_By_Ball[Bowler],”Runsperbowler”,SUM(Ball_By_Ball[Runs_Scored]),”Oversbybowler”,COUNTROWS(Ball_By_Ball)/6,”Economy”,(SUM(Ball_By_Ball[Runs_Scored])/(COUNTROWS(Ball_By_Ball)/6)))

NewTable1Here we are calculating the economy rate, hence need the number of overs played by the Bowler and the runs scored,

My Economy column contains the formula as RunsperBowler divided by oversbyBowler. Exact formula is as given below (SUM(Ball_By_Ball[Runs_Scored])/(COUNTROWS(Ball_By_Ball)/6)))

Now we can use this table in report for further visualizations. I would recommend everyone to go to the Microsoft Docs (Link is provided below) to refer the details of the different aggregate functions available in Power BI.

Aggregates in Power BI visualizations

We have seen few more techniques of data modeling in Power BI using Power BI desktop in this blog. Let us see how to create different visualizations of this data and create IPL Analytics reports in my next blog. I just loved creating these reports.

Thank you for reading my blog, if you have any questions please mention in the comments and hit the like button if you like my write up.

Keep Learning !! Happy analyzing and be Powerful 🙂

Continue reading “Power BI: IPL Analytics- Data Modeling and Transformation-Part 2”

Power BI: IPL Analytics- Data Modeling and Transformation-Part 1

In my previous blog we have seen how to connect to excel files to get the data. Now let us go ahead, do some data modeling and data transformation for this data before we start developing reports. I was looking for good example to explain Power BI capabilities and recently saw one excellent Power BI case study on IPL which is Indian Premier League analytics by ashishrj which can be referred below:

https://community.powerbi.com/t5/Data-Stories-Gallery/IPL-Data-Analysis-2008-2016/td-p/141219 

Thanks to Ashish for this creative work and inspiring me to write this post.

You can get free data sources for analysis, POC purposes on internet as mentioned below.

https://catalog.data.gov/dataset?vocab_category_all=Asian

https://www.gapminder.org/data/

https://www.springboard.com/blog/free-public-data-sets-data-science-project/

https://www.yelp.com/dataset/download

 

I am using the data source for IPL from 2008 to 2017 which is freely downloadable from the internet. Let us start how we can develop efficient and beautiful reports in Power BI with IPL Data.

Those who do not know what is IPL, it is a professional Twenty20 cricket league in India contested during April and May of every year by teams representing Indian cities and some states. Hence this data consists of ball by ball details of the matches played until last year along with details about the teams, players etc.

Here I got five CSV files consisting of various information of the IPL matches.

The names are really intuitive to understand the content of the data.

IPL Data Excel sheets

We will create the reports as shown in below snippets.

IPL Data Power BI Report Page1IPL Data Power BI Report Page2

Let us start with Power BI desktop. As mentioned in earlier posts, we will upload these five excels in Power BI desktop and start developing the report.

Once we get data from these excels in tables, we can start with data modeling. Referring to below screenshot, let us click on the Modeling.

datamodeling geographic location

  • Column Category:

Let us say we want to show the number of matches played per location. Such information is available in Match table, however we have do to few changes in the data category so that it is identified as geographic locations. To achieve this, you can click on either Modeling menu or click on data icon on the left side bar as shown in above screenshot.

Kindly refer the below screenshot for next step. Select the CityName column and look at the data category. It is showing  uncategorized, however we want Power BI to identify this as a location i.e. city. Hence select City from the drop down. You can see there are various standard options to categorize your data columns for ease of reporting such as Address, City, Continent , Region, Longitude and Latitude etc

datamodeling geographic location-City.PNG

Once City is selected, you would see one icon near column and this column can be used to show the geographical visuals. Similarly we can use other location fields and change their category to use those in geographical visuals.

Relationship in data tables:

We have different tables and would need to some connect between the tables for better visualizations. Power BI desktop automatically identifies the relationship in the tables based on the identical columns. if you click on the Manage relationships  or click on the icon on side bar, you can see the relationships as shown in below snippet

DataMOdeling Manage relationships1

By default, Power BI identifies below relations in the tables, based on our requirement and understanding, we can create more relations as we are used to create in SQL server management studio

DataMOdeling Manage relationships1Defult relationships

You can edit existing relationships or create new ones based the data and type as many to one, one to one etc

DataMOdeling Manage add relationships

DataMOdeling Manage add relationships1DataMOdeling Manage Edit relationships

This way you would be able to achieve relationships in tables, tables can be from different data sources. Once you practice, more features can be explored.

We have seen few techniques of data modeling in Power BI using Power BI desktop in this blog. Let us see some more techniques of data modeling and data transformation in my next blog.

Thank you for reading my blog, if you have any questions please mention in the comments and hit the like button if you like my write up.

Keep Learning !! Happy analyzing and be Powerful 🙂

 

Continue reading “Power BI: IPL Analytics- Data Modeling and Transformation-Part 1”