Sorting out the rubbish – Background

This figure shows the frequency distribution of the net weight in tonnes of garbage dumped by trucks between January and June, 1997.

Available data

Weighbridge data from January to June 1997 were available for analysis. During this time trucks from 40 different companies were weighed. Some companies dumped rubbish on very few occasions over the 6 month period; others dumped many hundreds of loads.

Dr. Johns’ conclusion

“The pattern of tonnages is unlikely to have occurred by chance.”

In particular:

There are unusual counts of net garbage weights for a number of companies; these counts would not be expected by chance.
Company 2 shows an extremely unusual set of observations; all 29 net weights are 0.40 tonnes.
There are a small number of data entry errors, as some weights do not round to 20kg. (Note that weights were measured to the nearest 20kg.)

Dr. Johns’ analysis

Part 1: Organising the data.

Split the data set into subsets according to companies. Each company should be analysed separately as they have different sized trucks and carry different kinds of rubbish.
Divide the net weight data into one tonne ranges (e.g. 0 to 1 tonne, 1 to 2 tonnes, etc.).
Within any one tonne range, there are 50 possible weight values, as weights are measured to the nearest 20kg.
Tabulate the distribution of weights in a one tonne range.

Part 2: Analysing the weights in any given one tonne range, for a particular company.

Let the number of weights observed = n
Assume that the distribution of weights should be uniform.
Model the count, X, at any specific weight in the range, as a binomial random variable: X ~ Bi(n, 1/50).
Calculate the expected frequency of each count according to the binomial model, across the 50 weights.
Make a judgment about how unusual the observed count is compared with that predicted by the binomial model.

Example of Dr. Johns’ analysis

Company 11 had 253 net weights in the range 7 to 8 tonnes. There are two weights which do not round to 20kg. As the reasons for these errors are unknown, these weights were removed from the analysis.

Part 1: Distribution of net weights between 7.00 and 7.98 tonnes for Company 11 (n = 251)

Weight	Count	Weight	Count	Weight	Count	Weight	Count	Weight	Count
7.00	5	7.20	5	7.40	4	7.60	3	7.80	5
7.02	7	7.22	4	7.42	2	7.62	4	7.82	6
7.04	4	7.24	4	7.44	0	7.64	4	7.84	2
7.06	3	7.26	7	7.46	4	7.66	7	7.86	8
7.08	3	7.28	5	7.48	6	7.68	5	7.88	8
7.10	6	7.30	6	7.50	6	7.70	10	7.90	5
7.12	2	7.32	5	7.52	2	7.72	4	7.92	4
7.14	3	7.34	4	7.54	5	7.74	8	7.94	11
7.16	3	7.36	5	7.56	5	7.76	6	7.96	7
7.18	8	7.38	8	7.58	3	7.78	5	7.98	5

Part 2:

Assuming the distribution of weights is uniform within this one tonne range, the expected count (for any weight) is 251 x 1/50 = 5.02.
Note that, for example, a count of 5 occurs 12 times, as is shown in bold in the table above. The distribution of net weights between 7.00 and 7.98 tonnes for Company 11 (n = 251) is also shown in the figure below. Counts of 5 are highlighted in green.
Assuming the counts at any single weight, X, are distributed as Bi(251, 0.02), Pr(X = 5) = 0.1772.
The binomial model predicts 50 x 0.1772 = 8.86, or about 9 values of 5 across 50 weight categories.
Twelve counts of 5 were observed, compared to 8.86 expected – Dr. Johns concluded that this was unusually high.