The Italian company Yellow Ball SpA manage an e-commerce of articles for tennis players. All products sold are shipped via the same parcel courier that almost daily passes to the company warehouse to collect envelopes and boxes, unless the company staff not phone or mail the currier that there are nothing to pick up. A distinctive element of the Yellow Ball company are the boxes that they use so deliver tennis rackets to they costumers, they are appropriately branded boxes and made in a particular corrugated cardboard that make them resistant than traditional boxes. Since they are quite expensive boxes, and taking into account that boxes can be easily ruined if they pass too much time in the warehouse (for example from dust, humidity, some clumsy forklift driver, …), the logistics management decided to agree with the supplier for an annual supply of these boxes that will then be delivered to the warehouse on a monthly basis, until the quantity purchased is reached. Given that a deviation in the quantities indicated in the contract could be very expensive for the Yellow Ball company, we need to plan carefully how many boxes will be used over the next 12 months.

Please note that the figures, the dataset, the Tableau reports and the Python notebook discussed in this article can be found (for free) on GitHub.

The dataset

First we need historical data related to the number of these boxes that have been shipped in recent years. These are boxes, that can contains 1 to 4 tennis rackets are used only when rackets are shipped; in the case of clothing or accessories purchases, another type of packaging (envelopes) are used.

The company has provided us a CSV file with the following detail of the period from 01/10/2011 to 31/12/2018. Please consider that in the e-commerce business it is not so usual to have data with more than 5-6 years as lot of small-medium companies were born in the very last years.

Date: the day the parcel picked up the boxes at the warehouse;

  • PACK_QTY: the number of products contained in the package;
  • PACK_LABEL_ID: a code (internal) identification of the shipment;
  • PACK_SHPR: ID (internal) of the parcel service

First, we evaluate the detail that the output of our forecast should have, as you can see from the figures it is essential to decide the time detail which we perform the analysis. A daily analysis is definitely too detailed and not very indicative as there are some particularities that make the forecast more complex without giving it added value for our purpose. For example, it is clear that on Monday or the first working day after an holiday there may be more orders to be shipped than other days. We also discard the weekly level for the same reason, so as to untie as much as possible from the “mobile” holidays during the year. The level of detail that appears to best fit our purpose is monthly, also taking into account the fact that the box supplier will delivery boxes to our commitment on a monthly basis.

YellowBall deliveries of boxes over last years
YellowBall deliveries of boxes over last years

From Tableau desktop, which was used to plot the data, we generate a file with monthly details and then we load it into a Pandas DataFrame (Python).

From the visualization it is easy to understand that the months in which there are no values ​​are excluded, but as a precaution we use fillna and bfill when loading the data.

RacketsDF = pd.read_csv(CSV_PATH,sep=";")
RacketsDF = RacketsDF.fillna(RacketsDF.bfill())

At first glance it is clear that there is an annual growth trend in the number rackets boxes shipments and that there are also some great seasonal characteristics related to the number of rackets sold, during the spring-summer, probably due to the fact that tennis is played mainly during this period, and during the period immediately before Christmas, for gifts off course. Such suppositions can be easily demonstrated by the decomposition of the series generated by Python.

Stationarity test

When we examine a historical series a proper verification is its stationarity, this is an important characteristics to take into account when we decide the model to use. Since there is a growth trend over the years and a seasonality between months, it is easy to expect that the series is not stationary, but it is good practice (in particular on a data science blog) to demonstrate with the data the assumptions made “with the stomach”, so we use the Dickey-Fuller Test to confirms that the TS is not stationary.

Test Statistic          -1.622820 
p-value 0.471251
Lags Used 12.000000
Observations Used 74.000000
Critical Value (1%) -3.521980
Critical Value (5%) -2.901470
Critical Value (10%) -2.588072
dtype: float64
Dickey-Fuller Test
Dickey-Fuller Test

Model selection and evaluation

From all the above considerations, we decide to continue modeling the series with the SARIMAX model. The first step consists in correctly evaluating the 7 parameters of the model and for this reason we decide to test all the combinations of 6 of them as s=12. Looping every combination and evaluating which test data is best estimated on the basis of the training data we can choose the best combination of q, d, p, Q, D, P. The parameter chosen to evaluate the goodness of the forecast is the Akaike information criterion.

ARIMA(0, 0, 0)x(0, 0, 0, 12)12 -> AIC:995.3357044938019 
ARIMA(0, 0, 0)x(0, 0, 1, 12)12 -> AIC:798.2371211915685
ARIMA(0, 0, 0)x(0, 1, 0, 12)12 -> AIC:569.1619943484142
ARIMA(0, 0, 0)x(0, 1, 1, 12)12 -> AIC:428.6041512221132
ARIMA(0, 0, 0)x(1, 0, 0, 12)12 -> AIC:509.1914035709319

The lowest value AIC is 329.63, made with the ARIMA (0, 1, 1) x (0, 1, 1, 12) 12

mod = sm.tsa.statespace.SARIMAX(RacketsDF,
                                order=(0, 1, 1),
                                seasonal_order=(0, 1, 1, 12),

results =
Summary Yellow Ball SARIMAX
  • The standard residuals chart does not show particular pattern of residual distribution;
  • The Histogram plus estimated density graph shows a good distribution around the normal distribution with zero mean;
  • The QQ chart is quite good, apart from the lower quartiles of the ordered residual distribution;
  • The correlogram, apart from the first value that is always 1, does not show significant correlations between the points.

After having successfully passed this first test we can divide the series into the training set and test set, to evaluate how our model estimates the test data using the training data as a basis. Since the purpose of our analysis is to predict the number of packages shipped within a year (2019), we use the same time duration as the test-set (the entire 2018).

Test set real vs forecasting

Apart from the month of November 2018, which was definitely overestimated, the prediction made by our model seems quite good and the values ​​always fall within the confidence interval. As can be seen from the graph of the figure, with the progress of the years, the value of the month of November is increasingly marked, downwards, compared to that of adjacent months, this behaviour is evidently not completely understood by the model, which has provided an overestimation of that month shippings amount.

Analytically, a good indicator to evaluate the forecast quality is the mean squared error (MSE), which in our case is equal to 13.1. By the way an MSE = 13.1 is not so good, considering that the average value of the series, in the year 2018, is equal to 102, but from the following tables it is possible to see how the value of MSR is in significant due to the incorrect estimate of the value of the month of November, we can therefore consider our model acceptable.

RacketsDF_forecasted = pred.predicted_mean
RacketsDF_true = RacketsDF['2018-01-01':]['Val']

mse = ((RacketsDF_forecasted - RacketsDF_true) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))
The Mean Squared Error of our forecasts is 13.1
pd.concat([RacketsDF_true.rename('RacketsDF_true'), RacketsDF_forecasted.rename('RacketsDF_forecasted')], axis=1)
RacketsDF_true RacketsDF_forecasted
2018-01-01 90 91.0
2018-02-01 88 88.0
2018-03-01 99 103.0
2018-04-01 97 94.0
2018-05-01 111 108.0
2018-06-01 106 104.0
2018-07-01 118 114.0
2018-08-01 120 120.0
2018-09-01 113 113.0
2018-10-01 102 102.0
2018-11-01 84 94.0
2018-12-01 96 97.0

In statistics, the mean squared error (MSE) of an estimator measures the average of the squares of the errors that is, the average squared difference between the estimated values ​​and what is estimated. The MSE is a measure of the quality of an estimator that is always non-negative, and the smaller the MSE.

2019 prediction

At this point we can proceed to estimate the data of 2019 and, for completeness, also of 2020 (this value will not be shared with our customer as it was not required and the more we go forward and more uncertainty is in the forecast, so we only estimate it for our information).

Yellow Ball forecasting for 2019

Finally, we can generate the month basis values forecasted with the model and calculate the total amount of boxes that we predict to ship in the next year.

2019-01-01     98.0 
2019-02-01 96.0
2019-03-01 108.0
2019-04-01 105.0
2019-05-01 118.0
2019-06-01 113.0
2019-07-01 124.0
2019-08-01 126.0
2019-09-01 119.0
2019-10-01 108.0
2019-11-01 94.0
2019-12-01 104.0
Freq: MS, dtype: float64