5. Variance addition to counts data

Status

Current

Context

For counts data, the uncertainty on counts is typically defined by poisson counting statistics, i.e. the standard deviation on N counts is sqrt(N).

This can be problematic in cases where zero counts have been collected, as the standard deviation will then be zero, which will subsequently lead to “infinite” point weightings in downstream fitting routines for example.

A number of possible approaches were considered:

Option A

Reject data with zero counts, i.e. explicitly throw an exception if any data with zero counts is seen as part of a scan.

Option B

Use a standard deviation of NaN for points with zero counts.

Option C

Define the standard deviation of N counts as 1 if counts are zero, otherwise sqrt(N). This is one of the approaches available in mantid for example.

Option D

Define the standard deviation of N counts as sqrt(N+0.5) unconditionally - on the basis that “half a count” is smaller than the smallest possible actual measurement which can be taken.

Option E

No special handling, calculate std. dev. as sqrt(N).

For clarity, the following table shows the value and associated uncertainty for each option:

Counts	Std. Dev. (A)	Std. Dev. (B)	Std. Dev. (C)	Std. Dev. (D)	Std. Dev. (E)
0	raise exception	NaN	1	0.707	0
1	1	1	1	1.224745	1
2	1.414214	1.414214	1.414214	1.581139	1.414214
3	1.732051	1.732051	1.732051	1.870829	1.732051
4	2	2	2	2.12132	2
5	2.236068	2.236068	2.236068	2.345208	2.236068
10	3.162278	3.162278	3.162278	3.24037	3.162278
50	7.071068	7.071068	7.071068	7.106335	7.071068
100	10	10	10	10.02497	10
500	22.36068	22.36068	22.36068	22.37186	22.36068
1000	31.62278	31.62278	31.62278	31.63068	31.62278
5000	70.71068	70.71068	70.71068	70.71421	70.71068
10000	100	100	100	100.0025	100

Present

These approaches were discussed in a regular project update meeting including

TW & FA (Experiment controls)
CK (Reflectometry)
JL (Muons)
RD (SANS)

Decision

The consensus was to go with Option D.

Justification

Option A will cause real-life scans to crash in low counts regions.
Option B involves NaNs, which have many surprising floating-point characteristics and are highly likely to be a source of future bugs.
Option D was preferred to option C by scientists present, because it is continuous.
Option E causes surprising results and/or crashes downstream, for example fitting may consider points with zero uncertainty to have “infinite” weight, therefore effectively disregarding all other data.