Background
This brand has long established itself in Taiwan's aromatherapy retail market. In the past year, they implemented new e-commerce and CRM systems, and launched a brand e-commerce app for customers, aiming to initiate digital transformation. The enterprise partnered with us, providing three years of sales data as an initial attempt to leverage data analysis for business decision-making.
Team Challenges
The brand's products are high-ticket items, so even with three years of sales data:
- On a daily basis, there are numerous zero values, which is unfavorable for training analytical models, limiting us to addressing business problems on weekly, monthly, or yearly cycles
- However, when calculated by month or year, we face the issue of insufficient data points
Therefore, we focused our problem exploration on the client's "A-grade stores" using a weekly calculation approach
Business Problem
After visiting physical stores and interviewing management, we discovered:
The brand's retail store counters place orders to the central warehouse every Sunday, but weekly manual estimates of order quantities are prone to miscalculation. This issue can be explained from two perspectives:
- First, when ordering too many products, counters may face excess inventory, unable to stock products that customers actually need
- Second, when ordering too little, counters may lose profit opportunities due to stock shortages
Therefore, we aim to optimize physical store ordering decisions by predicting best-selling product sales volumes through time series analysis
Analysis Executive Summary
We forecast weekly sales volumes for best-selling products in the client's A-grade stores and evaluate each prediction's performance. We provide several model application strategies and allow store managers to reference model predictions for weekly ordering, reducing errors in human judgment. We selected prediction targets based on each store's top ten products by sales volume. Prediction models include Naive, Seasonal Naive, Auto Arima, ETS, Linear Regression, Neural Network, and Crost, using Root Mean Square Error (RMSE) as the quantitative indicator for evaluating prediction effectiveness.
Through analysis, our recommendations are:
- Using the "Simple Best Model" by store unit can significantly improve the economic benefits of executing prediction work
- Regular periods and anniversary sale periods should be predicted separately, using the "Regular Period Best Model" during regular periods, while anniversary sale periods can explore optimal results using "Ensemble Methods" in the future
- Store managers can currently use next week's sales volume predictions for regular periods as a reference for weekly ordering, but should note that current predictions tend to be slightly underestimated
Store ID | Simple best model | Model params (Product A) | Weekday best model |
---|---|---|---|
001 | Auto Arima | ARIMA(3,1,1)(0,0,1) | Auto Arima |
002 | ETS | ANN(alpha = 0.1062) | ETS |
003 | Neural Network | 9 input data, 1 layer, 6 neural node | Neural Network |
004 | Auto Arima | ARIMA(0,1,2) | Auto Arima |
005 | Auto Arima | ARIMA(0,1,2)(0,0,1)[52] | Auto Arima |
006 | Auto Arima | ARIMA(0,1,2) | Linear Regression |
007 | Linear Regression | Trend, anniversary period | Linear Regression |
008 | Auto Arima | ARIMA(1,1,1) | Auto Arima |
Using a “Simple Best Model” Per Store Greatly Improves the Economic Efficiency of Forecasting
Our analysis found that each store has its own “simple best model” that performs well when forecasting sales volumes for all its best-selling products. However, the same best-selling product in different stores cannot be accurately forecasted using a single, shared model. Therefore, we recommend using models on a per-store basis for sales forecasting. By identifying a single suitable model for each store, we dramatically reduced the scale of the forecasting task—from finding the best model for 80 sequences (number of stores × number of products) down to just 8 (number of stores).
Weekdays and Anniversary Sale Periods Should Be Forecasted Separately, Using the “Best Weekday Model” for Regular Days
We further discovered that even when accounting for the impact of anniversary sales periods in our models, none of the forecasting models we used could effectively capture the sales spikes during these events. As a result, overall forecasting performance mainly depends on the accuracy of predictions during regular days. By analyzing model performance for both regular and anniversary periods, we found that each store also has its own “simple best model” specifically for regular days.
For Anniversary Sale Periods, Future Use of “Ensemble Methods” May Yield Better Results
In contrast, a single model cannot adequately forecast sales during anniversary periods. Using an “ensemble method”—averaging predictions from multiple models—may improve forecasting performance during these times. However, there is still much to explore and adjust regarding which models to include and how to weight them.
By adopting the “simple best model” strategy, we successfully reduced the workload and made future implementation much easier for clients. For regular days, we proposed and demonstrated the effectiveness of the simple best model. For anniversary periods, we found that ensemble methods could optimize forecasting, though further exploration is needed.
Limitations of This Analysis
The main limitations of this analysis include the reliance on weekly order quantities, which means the forecasting cycle must be weekly. Even in top-tier stores, weekly sales volumes for popular products remain unstable, which may hinder model training and reduce forecasting accuracy. Additionally, for capturing changes during anniversary sales, we currently have only three years of data for training, which is likely a key reason for the models’ limited effectiveness in these periods.
背景
該品牌在台灣深耕芳療產品零售已久,近一年導入新電商與 CRM 系統,並對顧客推出品牌電商 App,期望能啟動數位轉型。該企業與我們合作,提供三年的銷售資料,以此作為透過數據分析協助做出商業決策的初步嘗試。
團隊挑戰
該品牌之產品為偏高單價商品,因此即使有三年的銷售量資料,但是:
- 以日來計而言,會有非常多的 0 值,對於訓練分析模型不利,僅能嘗試解決以週、月或年為週期的商業問題
- 然而以月或年來計,又會面臨資料量過少的問題
因此我們針對該客戶的「A 級店」進行「以週來計」的問題探索
商業問題
在拜訪實體店鋪,與訪談管理層之後,我們發現:
該品牌零售店舖的櫃檯每個星期天都會向總倉庫訂購產品,但是每週人工預估該訂購多少產品時都有可能會發生錯估。 此問題可以用兩個面向說明:
- 首先,當訂購太多產品時,櫃檯可能會面臨積壓的庫存,無法存入客戶真正需要的產品
- 其次,當訂購量太少時,櫃檯可能會因為沒有庫存而失去獲利的機會
因此,我們期望透過時間序列分析法預測最佳銷售商品的銷售量,進而優化實體店鋪的訂購決策
分析執行摘要
我們對該客戶 A級店的熱銷商品每週銷售量進行預測,並評估每個預測的表現。我們提供數個模型運用的策略,並讓各位店經理於每週叫貨時參考模型預測結果,減少人為經驗判斷之失誤。我們依據各店前十銷售量的商品挑出各店的預測標的。預測模型包含 Naive、Seasonal Naive、Auto Arima、ETS、Linear Regression、Neural Network 與 Crost,並以均方根差(RMSE)作為評估預測成效的量化指標。
透過分析,我們的建議為:
- 以店為單位使用「簡易最佳模型」能大幅提升執行預測工作的經濟效益
- 平日期間與週年慶期間應該分開預測,並於平日期間使用「平日期間最佳模型」,週年慶期間則可在未來使用「集成法」探索最佳結果
- 店經理現階段可以採用平日期間的下週銷售量預測結果作為當週叫貨量之參考,但需注意目前預測結果多數時間都是些微低估
店鋪代碼 | 簡單最佳模型 | 模型參數(以產品 A 為例) | 平日期間最佳模型 |
---|---|---|---|
001 | Auto Arima | ARIMA(3,1,1)(0,0,1) | Auto Arima |
002 | ETS | ANN(alpha = 0.1062) | ETS |
003 | Neural Network | 9 input data, 1 layer, 6 neural node | Neural Network |
004 | Auto Arima | ARIMA(0,1,2) | Auto Arima |
005 | Auto Arima | ARIMA(0,1,2)(0,0,1)[52] | Auto Arima |
006 | Auto Arima | ARIMA(0,1,2) | Linear Regression |
007 | Linear Regression | Trend, anniversary period | Linear Regression |
008 | Auto Arima | ARIMA(1,1,1) | Auto Arima |
以店為單位使用「簡易最佳模型」能大幅提升執行預測工作的經濟效益
我們的分析發現每間店各自有一個「簡易最佳模型」,在預測該店所有熱銷商品的銷售量時都能有良好表現。然而同樣的熱銷商品在不同間店,則無法適用同一種預測模型。因此預測銷售量時建議以店為單位使用模型進行預測。我們找出每間店各自適用的單一模型,大幅降低了預測的工作規模,從找出 80 個序列(店數*產品數)的最佳預測模型縮減到 8 個(店數)。
平日期間與週年慶期間應該分開預測,並於平日期間使用「平日期間最佳模型」
我們進一步發現即使在預測模型中加入各店週年慶期間影響的考量,我們使用的所有預測模型皆難以抓取週年慶期間的銷售高峰變化,因此總體的預測表現主要來自於準確的平日的預測。透過分析平日與週年慶期間的模型預測表現,我們發現平日期間各店同樣擁有屬於各自的一個「平日期間簡易最佳模型」。
週年慶期間可在未來使用「集成法」探索最佳結果
相較之下週年慶期間無法使用單一模型進行預測,而採用「集成法(Ensemble)」加乘平均多個模型的預測值可能可以提升預測模型的表現,但使用的模型類別以及各個模型的加權比例仍有許多探索與調整的空間。
透過採用「簡易最佳模型」的策略,我們成功降低工作規模,大幅降低未來客戶在實踐時的困難度。我們針對平日期間提出「簡易最佳模型」並展現良好的預測表現,而針對週年慶期間我們發現可以透過集成法最佳化預測效果,但還有很多的探索空間。
本次分析的主要限制包含每週叫貨數量參考之需求,預測週期需要是以週為單位,而目前每週的熱門商品銷售量即使是在 A 級店仍呈現不穩定的狀態,可能不利於模型的訓練,因此降低預測成效。另外有關抓取週年慶期間的變化,由於目前僅有三年的週年慶資料可以作為訓練資料,所以這很有可能是造成無法完整抓取的關鍵原因。