Making Data Dashboard with Apache Superset
Hello and welcome back everyone, in this blog, we will explore how we can create awesome data dashboards using Apache superset with little to no code at all. But there are few things one should do before making first dashboard, we need to have installed Superset and have some data too.
Installing Apache Superset
This blog will be using Apache Superset in WSL (Windows Subsystem for Linux) because the library
apache-superset has OS level dependency.
- Install following packages as
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev.
- If some packages are missing,
sudo apt-get updatemight help out.
- Install virtual environment as
pip install virtualenv.
- Create an environment as
python3 -m venv env_name. My env_name will be
- Activate environment as
- Install Apache Superset as
pip install apache-superset.
Setting up Superset
- Create an admin user in your metadata database (use
adminas username to be able to load the examples)
export FLASK_APP=superset superset fab create-admin
- Initialize the db as
superset db upgrade.
- Load some data to play with
- Create default roles and permissions
superset init. If some errors like table not found is shown then thats because of database issue initialization. Check the database or re-install superset. Also make sure to export FLASK_APP first.
- To start a development web server run
superset run. It will open on default port 5000.
Opening First Dashboard
If everything worked fine, then by default, superset should be accessible in http://127.0.0.1:5000/. It should look like below:
Upon entering the password and username that we have set earlier, we could see the empty dashboard as below:
Choosing a Database
We can choose our database by going into Data/Databases.
I am choosing a MySQL database. But mine MySQL connection will be little bit different than others because I will be using Superset running in WSL while my MySQL server will be running in Windows hence I should pass Network IP. By default, MySQL runs in 3306. For using MySQL running in Windows from WSL, please follow this blog of mine.
- First install mysql database server along with MySQL Workbench.
- Then run a query in it
create database COVID_DASHBOARD;to create a new database where we will put our data.
- Create a connecction as:
- Make sure to allow data upload in this database. The settings can be found in Advanced>Security Section.
- To make our data dashboard, we should have data. Apache Superset allows us to use data in following format.
For this project, I am choosing COVID-19 data (CSV format) available in GitHub Repo.
After downloading a CSV file, we will upload it into our database.
Uploading might take little bit more time because there are lots of columns in the data and the size of data itself is huge (196451 rows 67 columns). But we can look if the data upload is on right track or not by querying a table
SELECT * FROM covid_dashboard.covid_raw_data;. A result must be shown.
Once done uploading, something like below should be shown.
Making a Chart
The hard part is completed. Now with little bit of SQL knowledge, we can create charts.
- Go to charts.
- Then add new chart.
- Choose a dataset.
- Choose one chart type. I've selected timeseries.
Next, rename the chart from untitled to cases trend.
Initially the date column might not be in date time type so we need to change its data type from Workbench. And then we need to sync this changes in column in Superset. Which can be done via state. Click on dots on the right side of the dataset name present in left section. Then edit dataset. Then columns and finally sync columns from source and save this.
In the second section, we can tweak the settings for this chart. In its data section, we should select a Time column as Date. Then Time Grain. Then in the Query Section, we need to select a metric, in our case, it will be sum of new_cases. Then in Group By section, we select location. Then run the query to see trend chart like below.
From here, we can do lot of things, we can export the result in CSV format too.
It seems that our result needs little bit of filtering to show trends of countries only. So lets filter those which have NULL in the Continent column.
While hovering over, I want to view highest value name in top so we should add sort in it.
- Next create a bar chart to show top countries with death tolls.
- Please take a careful look into the second column.
- Select Metric as MAX of column total_deaths. Because we want to see the latest value of it and this field is cumulative.
- In Filters, select continent is not equals to null because in location, continent names and some other names are also present and we do not want that.
- In Series, select the column name by which we want to Group Data by. Lets select location.
- In Row Limit select 10, as we want to show only top 10 bars.
- In Sort By, select max of column total_deaths.
- And then run the query to see the chart like below.
Next is, create a map chart to show total deaths across the world.
Now that we have 3 charts, lets create a dashboard by going into Dashboards>New Dashboard.
Next insert charts by drag-and-drop.
We can even download dasboard as image too.
Thats all for this part of exploring Apache Superset and I find this tool very useful because we can create our own charts in more customized way if we are familiar with SQL. There are lots of features still to be explored in Apache Superset and I will try to make next example if time persists.