My approach with this assignment was to clean and narrow down the data to the Top 20 stores and use that data to inform my recommendations to the store owner not only on locations, but other factors that could increase his business.
I went in with the assumption that the top performing zip codes would have room to expand, as this was inferred by the data set and problem.
My problem statement is: “By focusing on the Top 20 best performing stores – as defined by Total Sales – we can optimize the performance of already established business centers and icnrease our profit margins.”
I believe this problem statement was proven true.
The following chart helps us understand the performance of the top 20 stores in Iowa. It shows us that there is a
significant difference in performance between the top 2 stores and it’s competitors. As a result, a good potential recommendation would be to model behavior from these stores in the bottom 10 of this list to greatly improve market performance in areas we know will do well.
In addition, we take a look at the Total Volume Sold in liters at these locations, and notice a trend in volume sold in tandem with performance. However, the difference is very low and will not inform our recommendations a great deal.
Following that logic we can explore data in regards to the top 20 stores and begin to look for trends.
Interestingly, we find that the top 2 performing locations have a price mean that is in the middle of the “Top 20” performance. Alternatively, the majority of the “Top 20” either under or over perform with their price mean. This data is helpful, because it shows that if we focus on those stores in the top 20 we can likely impact performance by adjusting prices to more closely match the top 2 performing stores.
Taking a deep dive in to some predictive analytics centered on Total Sales, Price Per Liter, and Total Volume Sold – in liters – we also find some encouraging data.
Calculating some predictive equations on the above variables, we find a model with an R-squared at .92 and both our Mean Absolute Error and Mean Squared Error hovering around 1 for predictive models built on Total Sales and Total Price.
We see that modeled here:
This shows us that we can be confident in building out predictive tools on these values.
Using the data discussed above I would recommend that the store owner focus on the bottom 10 zip codes identified in our “Top 20” data. These regions show an ability to perform at a high level, but the top 2 stores showcase an ability for that performance to be drastically improved.
I would recommend utilizing our predictive models to pace out the growth in those regions and utilize the pricing model we discovered in our top 2 stores to adjust the prices in those regions.