Data Visualization in Python
Data Visualization is an essential part of exploring and presenting data. It gives the ability to see analytics presented visually, and to enable the understanding of difficult concepts or identifying new patterns.
Programming languages like MATLAB and R have amazing tools for visualizing data. Visualizing tools like Tableu make it easy to produce interactive data visualization products for businesses. So, what does Python have to offer?
In this article, we will briefly explore four Python data visualization tools: matplotlib, seaborn, bokeh, and folium. I will not go into deep detail, but will give you a good enough glance of how these tools work.
Before we start coding we need to go over the requirements. For all examples, I’ll be working in the Juypter environment (aka iPython Notebook). You need to install Anaconda. Anaconda conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. Follow Anaconda’s instructions for downloading and installing the Python 2.7 version.
Next, you’ll need to download all libraries that we will be using. Using pip or conda to install packages is the fast and easy way to get the packages, and check each library's website documentation for more info.
Here is the list of needed libraries:
Created by the late John Hunter, matplotlib is a 2D plotting library that can generate plots, histograms, power spectra, bar charts, error-charts, scatterplots, and etc, with just a few lines of Python code.
In this example below, I created random data to use to plot a data visualization in a box-plot format using the matplotlib library.
In this code example above, we imported the libraries needed, use the %matplotlib inline magic method to display the graph within Juypter, created dummy data with the numpy library, and displayed that data visually in a box-plot format. That is the basic breakdown of the code above.
Now let’s take a look at the seaborn library with the same data.
Seaborn is a Python data visualization library based on matplotlib, with an emphasis on statistical plots. The library provides a high-level interface with excellent resource for common regression and distribution plots, but where it really excels is in its ability to visualize many different features at once.
In this example, we will use the same data, but use seaborn to visualize it.
And this is how it displays its box-plot.
Visually, seaborn has awesome statistical graphics coming out-of-box. Just few tweaks and you have a enticing graphics.
Bokeh is a Python interactive visualization library for web browsers to provide elegant, concise construction of unique graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.
This library is my favorite, due to its ability to provide output in various mediums like html, Jupyter notebook, and servers. Besides providing a simple way to plot complex statistical graphics, it can also embed visualizations to flask and django apps, and transform visualizations written in other libraries like matplotlib, seaborn, and ggplot.
Here is a simple example of Bokeh.
Folium uses Python and the mapping strengths of the Leaflet.js library to manipulate your data in Python, then visualize it in on a Leaflet.js map via Folium. Folium makes it easy to visualize data that's been manipulated in Python on an interactive Leaflet map. This is my go-to library if I want to quickly visualize geographic data.
In this code example we will be making a generic google map search, with the help of the geopy library.
When you run the code it will prompt you to enter a location. Once entered, the geopy modules will find location, and folium will display it on the map.
Geopy is very powerful, you can input a country, city, famous landmark, address, and will find its location and spit out latitude/longitude coordinates.
I entered the location our office location in Irvine, CA. And here is the map result:
On the map, you can zoom in/out, the blue marker shows the location on map and also to give you the exact address of location when click on.
Data visualization tools for Python are catching up with other sought after tools. I only touched the surface of the capabilities of these tools, and I insist you dive into them more. There are other data visualization tools I didn’t mention that you may want to explore. I hope this overview helped you see the Python’s ability to visualize data.
Jaime Gabriel Jingco
Software Engineer/ Applied Labs Assistant Instructor