Using Highcharts Core for Python with Pandas
The Highcharts for Python Toolkit is a data visualizaiton library. That means that it is designed to let you visualize the data that you or your users are analyzing, rather than to do the analysis itself. But while there are better tools to actually crunch the numbers, Highcharts for Python still has to work closely with your data in order to visualize it.
When working with Highcharts for Python, it can be useful to understand:
How Highcharts for Python represents your data
How to load your data into a Highcharts for Python object
How to adjust your data in Highcharts for Python
How Highcharts for Python serializes your data for Highcharts (JS).
Highcharts for Python Data Model
In broad brushstrokes, you can think of your Highcharts for Python chart as a tree.
At the root of the tree is a
Chart
, and that chart contains
options (HighchartsOptions
).
Those options in turn contain a collection of series,
each of which can be thought of as one “line” of data in your visualization.
Each series instance (descended from
SeriesBase
)
contains a .data
property, which contains a set of data points.
Depending on your data and your configuration, this set of data points may be represented as:
a
DataPointCollection
instance (or a descendent of it) which in turn contains your data values and related configuration optionsan iterable of
DataBase
-descended instances, each of which contains the data value and configuration of an invidual data point
This model is relatively straightforward, but there is one important complexity: the
relationship between
DataPointCollection
instances and DataBase
instances.
DataPointCollection
vs DataBase
The DataPointCollection
class stores your individual data points in a combination of three different list-like structures:
Why split it up like this? The purpose is to maximize performance within both Highcharts for Python and Highcharts (JS), while still minimizing outside dependencies.
Highcharts (JS) supports data organized in primitive arrays. So it can easily visualize something like the following:
[ [0, 12], [1, 34], [2, 56], [3, 78], [4, 90] ]
This way of representing your data gives you the fastest performance in Highcharts (JS), leading to lightening-fast rendering of your chart. And since it’s just a simple list of numbers, Highcharts for Python doesn’t have to apply any fancy logic to serialize it to JS literal notation - leading to fast performance in Python as well.
This is why the
DataPointCollection
separates the data that can be represented as a primitive array (stored in either
.ndarray
or
.array
), from
data point properties that need to be represented as a full Highcharts (JS) data point object
(stored in
.data_points
).
And if you’re familiar with NumPy, that looks just like
a ndarray
- and for good reason! If you have
NumPy <https://www.numpy.org> installed, Highcharts for Python will leave your
ndarray
objects as-is to benefit from its vectorization
and performance.
Internally,
DataPointCollection
instances will intelligently combine the information stored in these three different properties
to serialize your data points. This is done as-appropriately, generating a list of renderable
data points represented either as a primitive array, or as full objects, depending on the
properties that have been configured.
So do you have to worry to about this complexity? Not really! All of this happens under the
hood in the Highcharts for Python code. You can simply load your data using the
convenience methods available on your series instances
DataPointCollection
or its descendents, or simply pass your data to the series
.data
property.
Let’s see how this works in practice.
Loading Data into Highcharts for Python
Preparing Your Data
So let’s try a real-world example. Let’s say you’ve got some annual population
counts stored in a CSV file named 'census-time-series.csv'
. There are four
different ways you can represent this data:
As-is in the CSV file. Meaning you don’t do anything, just leave it in the file as-is.
Loaded into a Python iterable (i.e. a
list
oflist
, where each inner list represents a row from the CSV). This might look something like this:raw_data = [ ['United States', 309321666, 311556874, 313830990, 315993715, 318301008, 320635163, 322941311, 324985539, 326687501, 328239523], ['Northeast', 55380134, 55604223, 55775216, 55901806, 56006011, 56034684, 56042330, 56059240, 56046620, 55982803], ['Midwest', 66974416, 67157800, 67336743, 67560379, 67745167, 67860583, 67987540, 68126781, 68236628, 68329004], ... ]As a
numpy.ndarray
, which might look like this:As a
pandas.DataFrame
, which might look like this:
Now that we’ve got our data prepared, let’s add it to a series or chart.
Creating a Series/Chart with Data
Note
In this tutorial, we’ll focus on assembling one or more series of data, rather than
a complete chart. This is because chart’s have many more configuration options, but
fundamentally the data that they contain is stored within one or more series instances,
which themselves contain data points in a
DataPointCollection
or an iterable of
DataBase
instances.
So now that we have raw_data
prepared, we can now load it into a series. There are four ways to do
this:
By passing it to the
.data
property of our series when instantiating the series:from highcharts_gantt.options.series.area import LineSeries my_series = LineSeries(data = raw_data)By calling one of the “helper” methods:
from highcharts_gantt.options.series.area import LineSeries # If my data is either a numpy.ndarray or Python iterable my_series = LineSeries.from_array(raw_data) # If my data is in a Pandas DataFrame my_series = LineSeries.from_pandas(raw_data) # If my data is in a CSV file my_series = LineSeries.from_csv('census-time-series.csv')See also
Depending on the arguments you supply to the helper methods, they may produce multiple series for inclusion on your chart. For more information, please see:
By instantiating your set of data directly, and passing it to the
.data
property of our series:from highcharts_gantt.options.series.area import LineSeries from highcharts_gantt.options.series.data.cartesian import CartesianData my_data = CartesianData.from_array(raw_data) my_series = LineSeries(data = my_data)Depending on the arguments you supply to the helper methods, they may produce multiple series for inclusion on your chart. For more information, please see:
By instantiating individual data points directly, and passing it to the
.data
property of our series:from highcharts_gantt.options.series.area import LineSeries from highcharts_gantt.options.series.data.cartesian import CartesianData my_data = [CartesianData(x = record[0], y = record[1] for record in raw_data] my_series = LineSeries(data = my_data)
In all cases, the result is the same: a
LineSeries
instance (or a
list
of
LineSeries
that contain your data.
Now that your data has been loaded into your series, you can configure it as needed.
Configuring Your Data
In most cases, you shouldn’t have to worry about the internals of how Highcharts for Python
stores your data. Depending on whether you supplied a primitive array, a
numpy.ndarray
, or data from a Pandas
DataFrame
, your series’ data will either be represented as
a DataPointCollection
or as a list
of data point objects (descended from
DataBase
).
In all cases, you can easily set properties on your data via your series object itself. For
example, let’s say we wanted to configure the
.target
values on data points
in a BulletSeries
instance. We
can do that easily by working at the series level:
# EXAMPLE 1. # Supplying one value per data point. my_series.target = [1, 2, 3, 4, 5, 6] # EXAMPLE 2. # Supplying one value, which will be applied to ALL data points. my_series.target = 2
This propagation of data point properties extends to all data point properties. If a property of the same name exists on the series, it will be set on the series. But if it only exists on the data point, it will be propagated to the relevant data points.
In some circumstances, you may want to set data point properties that have identically-named
properties on the series. For example, data points and series both support the .id
property.
But you can set this property at the data point level in two ways:
If your data point is represented as a
DataPointCollection
, you can simply set it as a sub-property of the series.data
property:# EXAMPLE 1. # Supplying one value per data point. my_series.data.id = ['id1', 'id2', 'id3', 'id4', 'id5', 'id6'] # EXAMPLE 2. # Supplying one value, which will be applied to ALL data points. my_series.data.id = 'id2'The
DataPointCollection
will worry about proagating the relevant property / value to the individual data points as needed.
If you data points are represented as a
list
ofDataBase
-descended objects, then you can adjust them the same way you would adjust any member of a list:id_list = ['id1', 'id2', 'id3', 'id4', 'id5', 'id6'] for index in range(len(series.data)): series.data[index].id = id_list[index]In this case, you are adjusting the data points directly, so you do need to make sure you are adjusting the exact properties you need to adjust in the exact right location.
Updating Your Data
You can also update your data after it has been loaded into your series. This is done by calling one
of the .load_from_*
series helper methods, which makes it possible to update your series’ data
just like when creating the series:
# EXAMPLE 1. # Updating the .data property my_series.data = updated_data # EXAMPLE 2. # If my data is either a numpy.ndarray or Python iterable my_series.load_from_array(updated_data) # EXAMPLE 3. # If my data is in a Pandas DataFrame my_series.load_from_pandas(updated_data) # EXAMPLE 4. # If my data is in a CSV file my_series.load_from_csv('updated-data.csv')
Serializing Your Data for Rendering
While you shouldn’t have to serialize your data directly using Highcharts for Python, it may be useful to understand how this process works.
First, it’s important to understand that Highcharts (JS) supports data represented in two different forms:
as JavaScript literal objects, and
as primitive arrays, which are basically collections of strings and numbers
JS literal objects are the most flexible, because they allow you to take advantage of all of the different data point configuration options supported by Highcharts. However, primitive arrays perform much faster: Highcharts for Python generates them faster, there’s less data to transfer on the wire, and Highcharts (JS) can render them faster.
For this reason, Highcharts for Python will always try to serialize your data points to a primitive array first. If the series type supports a primitive array, and there is no information configured on the data points that prevents it from being serialized as a primitive array, Highcharts for Python will default to that form of serialization.
However, if there are special properties (not supported by primitive arrays) set on the data points, or if the series type is one that does not support primitive arrays, then Highcharts for Python will generate a JavaScript literal object instead.
This logic all happens automatically whenever you call
.to_js_literal()
on your series.