More plotting tricks using Altair

cereal = pd.read_csv('data/cereal.csv')
import altair as alt

chart0 = alt.Chart(cereal, width=500, height=300).mark_circle().encode(
    x='mfr', 
    y='calories'
).properties(title="Scatter plot of manufacturer calorie content")

chart0
404 image
chart1 = alt.Chart(cereal_modified, width=500, height=300).mark_circle().encode(
                   x='mfr', 
                   y='calories'
         ).properties(title="Scatter plot of manufacturer calorie content")

chart1
404 image
chart2 = alt.Chart(cereal_modified, width=500, height=300).mark_circle().encode(
                   x='mfr:N', 
                   y='calories:Q'
                  ).properties(title="Scatter plot of manufacturer calorie content")

chart2
404 image
Data Type Shorthand Code Description Examples
Ordinal O a discrete ordered quantity “dislike”, “neutral”, “like”
Nominal N a discrete un-ordered quantity eye color, postal code, university
Quantitative Q a continuous quantity 5, 5.0, 5.011
Temporal T a time or date value date (August 13 2020), time (12:00 pm)
chart3 = alt.Chart(cereal, width=500, height=300).mark_circle().encode(
                   x='sugars:Q',  # set the sugars column as quantitative
                   y='rating:Q'   # set the rating column as quantitative
         ).properties(title="Scatter plot of cereal rating vs sugar content")

chart3
404 image

Variable types

chart4 = alt.Chart(cereal, width=500, height=300).mark_circle().encode(
                   x=alt.X('sugars:Q'), # use alt.X() to map the x-axis
                   y=alt.Y('rating:Q')  # use alt.Y() to map the y-axis
         ).properties(title="Scatter plot of cereal rating vs sugar content")

chart4
404 image

Histograms

chart5 = alt.Chart(cereal, width=500, height=300).mark_bar().encode(
                   x=alt.X('calories:Q', bin=True), # set x-axis as calories 
                   y=alt.Y('count():Q')             # set the y-axis as the occurrence count for each calorie value
         ).properties(title="Histogram plot of cereal calorie content")
chart5
404 image

Bins

chart6 = alt.Chart(cereal, width=500, height=300).mark_bar().encode(
                   x=alt.X('calories:Q', bin=alt.Bin(maxbins=20)), # set max number of bins to 20
                   y=alt.Y('count():Q')
         ).properties(title="Histogram of cereal calorie content with bins = 20")
chart6
404 image
404 image
chart7 = alt.Chart(cereal, width=500, height=300).mark_bar().encode(
                   x=alt.X('calories:Q', bin=alt.Bin(maxbins=20), title="Calorie Content"), # use alt.X() to label the x-axis
                   y=alt.Y('count():Q', title="Number of Cereals")                          # use alt.Y() to label the y-axis
        ).properties(title="Histogram plot of cereal calorie content")
chart7
404 image
mfr_mean = cereal.groupby(by='mfr').mean(numeric_only=True)
mfr_mean
protein fat sodium fiber ... shelf weight cups rating
mfr
A 4.000000 1.000000 0.000000 0.000000 ... 2.000000 1.000000 1.000000 54.850917
G 2.318182 1.363636 200.454545 1.272727 ... 2.136364 1.049091 0.875000 34.485852
K 2.652174 0.608696 174.782609 2.739130 ... 2.347826 1.077826 0.796087 44.038462
... ... ... ... ... ... ... ... ... ...
P 2.444444 0.888889 146.111111 2.777778 ... 2.444444 1.064444 0.714444 41.705744
Q 2.625000 1.750000 92.500000 1.337500 ... 2.375000 0.875000 0.823750 42.915990
R 2.500000 1.250000 198.125000 1.875000 ... 2.000000 1.000000 0.871250 41.542997

7 rows × 12 columns

mfr_mean
protein fat sodium fiber ... shelf weight cups rating
mfr
A 4.000000 1.000000 0.000000 0.000000 ... 2.000000 1.000000 1.000000 54.850917
G 2.318182 1.363636 200.454545 1.272727 ... 2.136364 1.049091 0.875000 34.485852
K 2.652174 0.608696 174.782609 2.739130 ... 2.347826 1.077826 0.796087 44.038462
... ... ... ... ... ... ... ... ... ...
P 2.444444 0.888889 146.111111 2.777778 ... 2.444444 1.064444 0.714444 41.705744
Q 2.625000 1.750000 92.500000 1.337500 ... 2.375000 0.875000 0.823750 42.915990
R 2.500000 1.250000 198.125000 1.875000 ... 2.000000 1.000000 0.871250 41.542997

7 rows × 12 columns

mfr_mean = mfr_mean.reset_index()
mfr_mean
mfr protein fat sodium ... shelf weight cups rating
0 A 4.000000 1.000000 0.000000 ... 2.000000 1.000000 1.000000 54.850917
1 G 2.318182 1.363636 200.454545 ... 2.136364 1.049091 0.875000 34.485852
2 K 2.652174 0.608696 174.782609 ... 2.347826 1.077826 0.796087 44.038462
... ... ... ... ... ... ... ... ... ...
4 P 2.444444 0.888889 146.111111 ... 2.444444 1.064444 0.714444 41.705744
5 Q 2.625000 1.750000 92.500000 ... 2.375000 0.875000 0.823750 42.915990
6 R 2.500000 1.250000 198.125000 ... 2.000000 1.000000 0.871250 41.542997

7 rows × 13 columns

chart8 = alt.Chart(mfr_mean, width=500, height=300).mark_bar().encode(
                   x=alt.X('mfr:N', title="Manufacturer"),
                   y=alt.Y('sugars:Q', title="Mean sugar content")
         ).properties(title="Bar plot of manufacturers mean sugar content")
chart8
404 image



  1. Groupby object and calculated the mean
  2. Reset index
  3. Plot using Altair

Sorting

chart9 = alt.Chart(mfr_mean, width=500, height=300).mark_bar().encode(
                   x=alt.X('mfr:N', sort="y", title="Manufacturer"),  # use sort="y" to sort in ascending order
                   y=alt.Y('sugars:Q', title="Mean sugar content")
        ).properties(title="Bar plot of manufacturers mean sugar content in ascending order")
chart9
404 image
chart10 = alt.Chart(mfr_mean, width=500, height=300).mark_bar().encode(
    x=alt.X('mfr:N', sort="-y", title="Manufacturer"),  # use sort="-y" to sort in descending order
    y=alt.Y('sugars:Q', title="Mean sugar content")
).properties(title="Bar plot of manufacturers mean sugar content sorted in descending order")
chart10
404 image



If you enjoyed this part of the module and you wish to learn more advanced visualizations using Altair, take a look at our
Data Visualization course

Let’s apply what we learned!