using CSV, DataFrames, StatsPlots
candy_filepath = "data//archive//candy.csv"
"data//archive//candy.csv"
candy_data = CSV.read(candy_filepath, DataFrame);
first(candy_data,5)
idcompetitornamechocolatefruitycaramelpeanutyalmondynougatcrispedricewafer...
10"100 Grand""Yes""No""Yes""No""No""Yes"
21"3 Musketeers""Yes""No""No""No""Yes""No"
32"Air Heads""No""Yes""No""No""No""No"
43"Almond Joy""Yes""No""No""Yes""No""No"
54"Baby Ruth""Yes""No""Yes""Yes""Yes""No"
# Fill in the line below: Which candy was more popular with survey respondents:
# '3 Musketeers' or 'Almond Joy'?  (Please enclose your answer in single quotes.)
more_popular = "3 Musketeers"
# Fill in the line below: Which candy has higher sugar content: 'Air Heads'
# or 'Baby Ruth'? (Please enclose your answer in single quotes.)
more_sugar = "Air Heads"
"Air Heads"
# Scatter plot showing the relationship between 'sugarpercent' and 'winpercent'
scatter(candy_data[!,:sugarpercent], candy_data[!,:winpercent],
    legend=false, grid=false, framestyle=:box)
@df candy_data plot(:sugarpercent, :winpercent, seriestype=:scatter, legend=false, framestyle=:box, grid=false, smooth=true)

Since the regression line has a slightly positive slope, this tells us that there is a slightly positive correlation between 'winpercent' and 'sugarpercent'. Thus, people have a slight preference for candies containing relatively more sugar.

@df candy_data plot(:pricepercent, :winpercent, group=:chocolate, seriestype=:scatter, framestyle=:box,grid=false, color_palette=:auto, legend_title="chocolate", xtick=0:0.2:1.0)
@df candy_data plot(:pricepercent, :winpercent, group=:chocolate, seriestype=:scatter, framestyle=:box,grid=false, color_palette=:auto, legend_title="chocolate", xtick=0:0.2:1.0, smooth=true)
@df candy_data plot(:chocolate, :winpercent, seriestype=:scatter) # no swarmplot

Built with Julia 1.9.1 and

CSV 0.10.9
DataFrames 1.5.0
StatsPlots 0.15.4

To run this tutorial locally, download [this file](/tutorials/scatter03x04.jl) and open it with Pluto.jl._