Using Highcharts Core for Python with Pandas


Pandas is probably the single most popular library for data analysis in the Python ecosystem. Together with NumPy, it is ubquitous. And the Highcharts for Python Toolkit is designed to natively integrate with it.

General Approach

The Highcharts for Python Toolkit provides a number of standard methods that are used to interact with pandas.DataFrame instances. These methods generally take the form:

  • .from_pandas(df) This is always a class method which produces one or more instances, with data pulled from the df argument.

  • .from_pandas_in_rows(df) This is always a class method which produces one instance for every row in the DataFrame (df).

  • .load_from_pandas(df) This is an instance method which updates an instance with data read from the df argument.

Tip

All three of these standard methods are packaged to have batteries included. This means that for simple use cases, you can simply pass a pandas.DataFrame to the method, and the method wlil attempt to determine the optimum way to deserialize the DataFrame into the appropriate Highcharts for Python objects.

However, if you find that you need more fine-grained control, the methods provide powerful tools to give you the control you need when you need it.

These standard methods - with near-identical syntax - are available:


Preparing Your DataFrame

Tip

While it is theoretically possible for Highcharts for Python to work with a nested DataFrame, such structures are generally considered an anti-pattern. We recommend keeping your DataFrame contents 2-dimensional, organized into a single “flat” table of rows and columns.

So let’s try a real-world example. Let’s say you’ve got some annual population counts stored in a CSV file named 'census-time-series.csv'. Using Pandas, you can construct a DataFrame from that CSV file very simply:

df = pandas.read_csv('census-time-series.csv')

This produces a simple 2-dimensional DataFrame. In our case, the resulting table looks like this:

Rendering of the DataFrame produced by pandas.read_csv('census-time-series.csv')

The first column contains the names of geographic regions, while each of the subsequent columns contains the population counts for a given year. However, you’ll notice that the DataFrame index is not set. Unless told otherwise, Highcharts for Python will look for x-axis values in the index.

Secondly, if you were to look under the hood, you’d see that the DataFrame imported all of the numbers in our CSV as strings (because of the presence of the comma), which is obviously a bit of a problem. So let’s fix both of these issues:

df = pandas.read_csv('census-time-series.csv', index_col = 0, thousands = ','))

produces:

Rendering of the DataFrame produced by pandas.read_csv('census-time-series.csv')

Great! Now, let’s say we wanted to visualize this data in various ways.


Creating the Chart: Chart.from_pandas()

Relying on the Defaults

The simplest way to create a chart from a DataFrame is to call Chart.from_pandas() like so:

my_chart = Chart.from_pandas(df)
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df)

As you can see, we haven’t provided any more instructions besides telling it to generate a chart from df. The result is a line chart, with one series for each year, and one point for each region. But because of the structure of our data file, this isn’t a great chart: all the series are stacked on each other! So let’s fix that.

Tip

Unless instructed otherwise, Highcharts for Python will default to using a line chart.

Setting the Series Type

Why don’t we switch it to a bar chart?

my_chart = Chart.from_pandas(df, series_type = 'bar')
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'bar')

Now the result is a little more readable, but still not great: After all, there are more than fifty geographic regions represented for each year, which makes the chart super crowded. Besides, maybe we’re only interested in a specific year: 2019.

Let’s try focusing our chart.

Basic Property Mapping

my_chart = Chart.from_pandas(df,
                             series_type = 'bar',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': '2019'
                             })
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'bar', property_map = {'x': 'Geographic Area', 'y': '2019'})

Much better! We’ve now added a property_map argument to the .from_pandas() method call. This argument tells Highcharts for Python how to map columns in your DataFrame to properties in the resulting chart. In this case, the keys 'x' and 'y' tell Highcharts for Python that you want to map the 'Geographic Area' column to the resulting series’ data points’ .x, and to map the '2019' column to the .y properties, respectively.

The net result is that my_chart contains one BarSeries whose .data property contains a BarDataCollection instance populated with the data from the 'Geographic Area' and '2019' columns in df - and even though 'Geographic Area' is not technically a column, but instead is used as the index, Highcharts for Python still uses it correctly.

But maybe we actually want to compare a couple different years? Let’s try that.

Property Mapping with Multiple Series

my_chart = Chart.from_pandas(df,
                             series_type = 'column',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': ['2017', '2018', '2019']
                             })
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'bar', property_map = {'x': 'Geographic Area', 'y': ['2017', '2018', '2019']})

Now we’re getting somewhere! First, we changed our series type to a ColumnSeries to make it (a little) easier to read. Then we added a list of column names to the 'y' key in the property_map argument. Each of those columns has now produced a separate ColumnSeries instance - but they’re all still sharing the 'Geographic Area' column as their .x value.

Note

You can supply multiple values to any property in the property_map. The example provided above is equivalent to:

my_chart = Chart.from_pandas(df,
                             series_type = 'column',
                             property_map = {
                                 'x': ['Geographic Area', 'Geographic Area', 'Geographic Area'],
                                 'y': ['2017', '2018', '2019']
                             })

The only catch is that the ultimate number of values for each key must match. If there’s only one value, then it will get repeated for all of the others. But if there’s a mismatch, then Highcharts for Python will throw a HighchartsPandasDeserializationError.

But so far, we’ve only been using the 'x' and 'y' keys in our property_map. What if we wanted to configure additional properties? Easy!

Configuring Additional Properties

my_chart = Chart.from_pandas(df,
                             series_type = 'column',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': ['2017', '2018', '2019'],
                                 'id': 'some other column'
                             })

Now, our data frame is pretty simple does not contain a column named ``’some other column’. But *if* it did, then it would use that column to set the :meth:.id <highcharts_gantt.options.series.data.bar.BarData.id>` property of each data point.

Note

You can supply any property you want to the property_map. If the property is not supported by the series type you’ve selected, then it will be ignored.

But our chart is still looking a little basic - why don’t we tweak some series configuration options?

Configuring Series Options

my_chart = Chart.from_pandas(df,
                             series_type = 'column',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': ['2017', '2018', '2019'],
                             },
                             series_kwargs = {
                                 'point_padding': 5
                             })
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'bar', property_map = {'x': 'Geographic Area', 'y': ['2017', '2018', '2019'], 'id': 'Geographic Area'}, series_kwargs = {'point_padding': 0.5})

As you can see, we supplied a new series_kwargs argument to the .from_pandas() method call. This argument receives a dict with keys that correspond to properties on the series. In this case, by supplying 'point_padding' we have set the resulting ColumnSeries.point_padding property to a value of 0.5 - leading to a bit more spacing between the bars.

But our chart is still a little basic - why don’t we give it a reasonable title?

Configuring Options

my_chart = Chart.from_pandas(df,
                             series_type = 'column',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': ['2017', '2018', '2019'],
                             },
                             series_kwargs = {
                                 'point_padding': 0.5
                             },
                             options_kwargs = {
                                 'title': {
                                     'text': 'This Is My Chart Title'
                                 }
                             })
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'bar', property_map = {'x': 'Geographic Area', 'y': ['2017', '2018', '2019'], 'id': 'Geographic Area'}, series_kwargs = {'point_padding': 0.25}, options_kwargs = {'title': {'text': 'This Is My Chart Title'}})

As you can see, we’ve now given our chart a title. We did this by adding a new options_kwargs argument, which likewise takes a dict with keys that correspond to properties on the chart’s HighchartsOptions configuration.`

Now let’s say we wanted our chart to render in an HTML <div> with an id of 'my_target_div - we can configure that in the same method call.

Configuring Chart Settings

my_chart = Chart.from_pandas(df,
                             series_type = 'bar',
                             property_map = {
                                 'x': 'Geographic Area',
                                 'y': ['2017', '2018', '2019'],
                                 'id': 'Geographic Area'
                             },
                             series_kwargs = {
                                 'point_padding': 0.25
                             },
                             options_kwargs = {
                                 'title': {
                                     'text': 'This Is My Chart Title'
                                 }
                             },
                             chart_kwargs = {
                                 'container': 'my_target_div'
                             })

While you can’t really see the difference here, by adding the chart_kwargs argument to the method call, we now set the .container property on my_chart.

But maybe we want to do something a little different - like compare the change in population over time. Well, we can do that easily by visualizing each row of df rather than each column.`

Visualizing Data in Rows

my_chart = Chart.from_pandas(df,
                             series_type = 'line',
                             series_in_rows = True)
my_chart.display()
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'line', series_in_rows = True)

Okay, so here we removed some of the other arguments we’d been using to simplify the example. You’ll see we’ve now added the series_in_rows argument, and set it to True. This tells Highcharts for Python that we expect to produce one series for every row in df. Because we have not specified a property_map, the series .name values are populated from the 'Geographic Area' column, while the data point .x values come from each additional column (e.g. '2010', '2011', '2012', etc.)

Tip

To simplify the code further, any class that supports the .from_pandas() method also supports the .from_pandas_in_rows() method. The latter method is equivalent to passing series_in_rows = True to .from_pandas().

For more information, please see:

But maybe we don’t want all geographic areas shown on the chart - maybe we only want to compare a few.

Filtering Rows

my_chart = Chart.from_pandas(df,
                             series_type = 'line',
                             series_in_rows = True,
                             series_index = slice(7, 10))
Rendering of the chart produced by Chart.from_pandas(df, series_type = 'line', series_in_rows = True, series_index = slice(7, 10))

What we did here is we added a series_index argument, which tells Highcharts for Python to only include the series found at that index in the resulting chart. In this case, we supplied a slice object, which operates just like list_of_series[7:10]. The result only returns those series between index 7 and 10.


Creating Series: .from_pandas() and .from_pandas_in_rows()

All Highcharts for Python series descend from the SeriesBase class. And they all therefore support the .from_pandas() class method.

When called on a series class, it produces one or more series from the DataFrame supplied. The method supports all of the same options as Chart.from_pandas() except for options_kwargs and chart_kwargs. This is because the .from_pandas() method on a series class is only responsible for creating series instances - not the charts.

Creating Series from Columns

So let’s say we wanted to create one series for each of the years in df. We could that like so:

my_series = BarSeries.from_pandas(df)

Unlike when calling Chart.from_pandas(), we did not have to specify a series_type - that’s because the .from_pandas() class method on a series class already knows the series type!

In this case, my_series now contains ten separate BarSeries instances, each corresponding to one of the year columns in df.

But maybe we wanted to create our series from rows instead?

Creating Series from Rows

my_series = LineSeries.from_pandas_in_rows(df)

This will produce one LineSeries instance for each row in df, ultimately producing a list of 57 LineSeries instances.

Now what if we don’t need all 57, but instead only want the first five?

Filtering Series Created from Rows

my_series = LineSeries.from_pandas_in_rows(df, series_index = slice(0, 5))

This will return the first five series in the list of 57.

Updating an Existing Series: .load_from_pandas()

So far, we’ve only been creating new series and charts. But what if we want to update the data within an existing series? That’s easy to do using the .load_from_pandas() method.

Let’s say we take the first series returned in my_series up above, and we want to replace its data with the data from the 10th series. We can do that by:

my_series[0].load_from_pandas(df, series_in_rows = True, series_index = 9)

The series_in_rows argument tells the method to generate series per row, and then the series_index argument tells it to only use the 10th series generated.

Caution

While the .load_from_pandas() method supports the same arguments as .from_pandas(), it expects that the arguments supplied lead to an unambiguous single series. If they are ambiguous - meaning they lead to multiple series generated from the DataFrame - then the method will throw a HighchartsPandasDeserializationError