Presented at the Sixteenth International Pyrotechnics
Seminar, Jönköping, Sweden, June 1991
This web page changes the notation and fixes minor errors in the
paper. It also includes extra graphs.
Barry T. Neyer
EG&G Mound Applied Technologies
Miamisburg, OH
Contact Address
Barry T. Neyer
PerkinElmer Optoelectronics
1100 Vanguard Blvd
Miamisburg, OH 45342
(937) 865-5586
(937) 865-5170 (Fax)
Barry.Neyer@PerkinElmer.com
New methods of sensitivity testing and analysis are proposed. The new test method utilizes Maximum Likelihood Estimates to pick the next test level in order to maximize knowledge of both the mean, m, and the standard deviation, s, of the population. Simulation results demonstrate that this new test provides better estimators (less bias and smaller variance) of both m and s than the other commonly used tests (Probit, Bruceton, Robbins-Monro, Langlie). A new method of analyzing sensitivity tests is also proposed. It uses the Likelihood Ratio Test to compute regions of arbitrary confidence. It can calculate confidence regions for m, s, and arbitrary percentiles. Unlike presently used methods, such as the program ASENT which is based on the Cramér-Rao theorem, it can analyze the results of all sensitivity tests, and it does not significantly underestimate the size of the confidence regions. The new test and analysis methods will be explained and compared to the presently used methods.
Technical Papers of Dr. Barry T. Neyer
Sensitivity tests are often used to estimate the parameters associated with latent continuous variables which cannot be measured. For example, in testing the sensitivity of pyrotechnics to ignition, each specimen is assumed to have a critical stress level or threshold. Ignition pulses larger than this level will always ignite the specimen, while smaller pulses will not lead to ignition. Repeated testing of any one sample is not possible, since the pulse that is not sufficient to cause ignition nevertheless will generally damage the specimen. To measure the parameters of the underlying distribution (e.g., mean threshold, m, and standard deviation, s, of a normal distribution), samples are tested at various stress levels and the response or lack of response is noted. The experimenter then analyzes the data to provide estimates of the parameters of the population.
We assume that the distribution of thresholds, or a known transformation of these values, is normal. Thus, once m and s are known, other points of the distribution can be calculated. The critical stress levels are often normally distributed with respect to the logarithm of the pulse in pyrotechnics experiments, especially near the mean (Dixon and Mood 1948). However, other distributions are often used in this and other fields. For the rest of this paper, the stimulus level will refer to the transformed stimulus level which results in a normal distribution of thresholds. Thus, the reader can assume either that the population under study is distributed normally, or that the stress levels are all specified as transformed (in this example the log of the) real world values.
Since most of the sensitivity test methods concentrate the test levels between the 10^{th} and 90^{th} percentiles, where a wide variety of probability functions are similar, the assumption of normality is not strong when considering calculations for the mean and standard deviation. However, the assumption of normality is critical when calculating such extreme parameters as the 99^{th} or larger percentiles.
Since testing is often expensive, care must be used in choosing the stimulus levels to insure the tests provide as much information as possible. A number of sensitivity test designs have been proposed over the past 100 years to aid the experimenter in the proper choice of test levels. Early test designs such as the Probit (Bliss 1935) and Bruceton (up-and-down) (Dixon and Mood 1948) tests were developed to make testing the samples or analyzing the data as simple as possible. These tests give reasonable estimates of both m and s as long as the experimenter has a reasonable knowledge of m and knows s to within a factor of 2. The delayed Robbins-Monro test (Robbins and Monro 1951, Cochran and Davis 1965) was designed to efficiently estimate one point on the distribution curve (such as the mean threshold level m) at the expense of a slight complication in the test procedure. However, since this test concentrates most of the test levels at the percentile of interest, it is not able to give efficient estimates of other points. The Langlie [1965] ("One-Shot") test was designed to allow reasonably efficient estimates of both m and s. This test procedure requires accurate bookkeeping of the test levels, and a maximum likelihood analysis to compute the estimates.
The first part of this paper describes a new test that has many advantages over these standard sensitivity tests, especially if either m, s or both are not well known in advance. The second half of this paper describes a new method of analyzing the data to estimate confidence intervals for the parameters. Unlike many presently used methods, this new method can analyze the results of all tests, and does not produced confidence intervals that are biased small.
The next Section presents the theory of parameter estimation for sensitivity tests while the following Section describes the new test method. Simulation studies demonstrating the efficiency and accuracy of these methods are presented in the next Section. The final Section describes the new analysis method.
The analysis of sensitivity tests is more complicated than the analysis of many standard statistical tests since the experimenter does not have any threshold information about the individual elements in the sample (i.e. the experimenter can not compute a simple average to estimate the mean threshold, m, because there is nothing to average). The only information is the stress applied to each specimen and the response or lack of response.
A very general method of analyzing these tests is to use the method of maximum likelihood. Let L_{i} be the stimulus level for the i^{th} test, P(L_{i}) be the probability of a randomly selected specimen responding at the level L_{i}, N_{i} be the number of samples that responded (successes), and M_{i} be the number that failed to respond (failures). The likelihood function, L(L_{i}, N_{i}, M_{i }| m, s), is the probability of obtaining the given test results with the specified m and s . It is given by
where T_{i} = N_{i} + M_{i} is the total number of tests conducted at level i. The values, m_{e} and s_{e}, which maximize the likelihood function are the Maximum Likelihood Estimates (MLEs). Unique MLEs will be obtained if the successes and failures overlap; i.e. the smallest success is smaller than the largest failure (Silvapulle 1981).
Setting the derivatives of Equation 1 with respect to m and s equal to zero and solving for m and s yield the maximum likelihood estimates, m_{e}, and s_{e}. Unfortunately, no analytic solution exists to these equations. However, iterative solutions, such as Newton-Raphson, converge rapidly.
Most modern analysis methods use the maximum likelihood estimates as estimates of the parameters. However, different analysis techniques were often employed by many researchers before electronic computers became accessible, due to the difficulty of the calculations.
The Probit test (Bliss 1935, Finney 1947) was designed to allow easy analysis by plotting the probability of success as a function of stress on normal probability paper. If the data lie on a straight line, the mean and standard deviation can be simply read from the mid point and slope of the line respectively.
The Bruceton test (Dixon and Mood 1948) was designed to allow the experimenter to estimate m and s from the data by computing sums of the number of tests conducted at a test level times the stress levels and their squares. It also produces estimates of the variances of these parameters, based on the Cramér-Rao method discussed in a later Section.
The Fisher information matrix (Kendall and Stuart 1967) provides a measure of the information of the parameters of the distribution from the test data. It is obtained by computing expectation values of second derivatives of the log of the likelihood function.
For sensitivity tests the information matrix has the form
, (2)
where
. (3)
The Cramér-Rao inequality (Kendall and Stuart 1967) provides a lower bound on the variances of any unbiased estimates of the parameters. (4)
and
. (5)
Figure 1 shows the three functions J_{j}(z) as a function of the normalized stimulus level, z. The coefficients, I_{jk}, are determined by adding the J_{j+k} functions evaluated for each specimen test level. The functions J_{0}(z) and J_{2}(z) are even functions of z while J_{1}(z) is an odd function. Use of the Schwartz inequality shows that the denominator in Equations 4 and 5 is always positive. Thus, the lower bounds on the variance will be as small as possible for I_{01} « I_{00}, I_{11}. This condition is achieved if the stimulus levels are chosen approximately symmetrically around the mean, since then the terms in I_{01} with z < 0 will approximately cancel the terms with z > 0 leaving I_{01} » 0. Under this condition, minimization of the lower bound of the variances for m_{e} (s_{e} ) is equivalent to maximization of the value I_{00} (I_{11}).
Figure 1: Sensitivity test information matrix functions for a single sample. The solid curve J_{0}(z) gives the information for the variance of m, the dash dot curve J_{2}(z) the variance of s, and the dash curve J_{1}(z) the covariance.
The previously mentioned sensitivity tests (Probit, Bruceton, Robbins-Monro, and Langlie) can provide reasonable estimates of the mean and some provide reasonable estimates of the standard deviation under ideal conditions. However, if not much is known about these population parameters, then all the above tests ``waste'' too many samples. Figure 1 shows that essentially no information is obtained by testing more than three standard deviations from the mean. Thus, if the mean level is uncertain, the first part of the test should converge to the mean as quickly as possible.
Figure 1 shows that the lower bound of the variance for m_{e} will be minimized if the tests are concentrated near the mean and that the lower bound of the variance of s_{e} will be minimized by concentrating the testing at stimulus levels at m ±1.6s. If it is desirable to maximize information about both m and s, the tests should be conducted near m ± s. In this case the Cramér-Rao lower bound on the variances for both is given by approximately s^{2}/(0.4386 N). (See Figure 1.)
Figure 2: Flow chart showing algorithm used by the new sensitivity test
Figure 2 shows a flow chart of the procedure used to pick the next stress level. The experimenter uses his knowledge of the specimens to guess a lower and upper bound for the mean (m_{min} and m_{max}) and a guess of the standard deviation (s_{guess}).
The first part of the test uses a modified binary search to get close to the mean. The first specimen is tested at level x_{1}, located mid way between m_{min} and m_{max}. If the first specimen responds, then x_{2} is the average of x_{1} and m_{min}. However, x_{2} is set equal to x_{1}-2s_{guess} if that lies lower. A non-response of the first test is treated analogously. If the first two or more results are the same, the next level is chosen such that the range of stresses tested doubles with each test. (I.e. x_{n+1}-x_{n}= x_{n} - x_{1}.) Once at least one success and failure have been obtained, a binary search is performed until the difference between the lowest success and highest failure is less than the estimate for sigma. This first part of the test was designed to yield both successes and failures quickly when the initial guesses were accurate, and to expand rapidly when the estimates were in error. It is not possible to estimate the efficiency of this part of the test in all cases, because the efficiency depends on the accuracy of the initial guesses. If the initial guesses are accurate and the range is smaller than 8s_{guess}, however, the simulation reported later in this work demonstrated this part of the test usually requires two samples.
The second part of the test is designed to provide unique estimates of the MLE's quickly. Unique estimates are achieved when the successes and failures overlap. The average of the lowest success and highest failure is used as an estimate of the mean s_{guess} is used as an estimate of sigma. (These inaccurate estimates will only be used until the successes and failures overlap.) The next test level is chosen as that level which maximizes the determinant of the information matrix given these estimates. Thus, this part of the test is similar to the initial part of the c-optimal design of McLeish and Tosh [1990]. In the design proposed here, however, m_{guess} is updated and s_{guess} is decreased by multiplying by 0.8 for each specimen tested. Decreasing sigma results in faster overlap of the data. It also prevents the procedure from testing all specimens far from the mean when s << s_{guess} The value of 0.8 used to multiply was chosen from the results of a number of simulations. Smaller values improved the efficiency slightly when and only s << s_{guess}. Larger s_{guess} (up to 0.85) values improved the efficiency when s ³ s_{guess}, but significantly decreased the efficiency when s << s_{guess}. Since this value is only used to quickly determine unique estimates of the MLE's, it has no effect on the marginal efficiency of the test.
The final part of the test is similar to the second, except that the MLE's are used as estimates of the parameters. Unfortunately, the MLE's will sometimes be "wild" estimates when computed from a limited number of tests. Thus, the algorithm limits m_{e} to lie within the range of the stimulus levels tested previously and limits s_{e} to be less than the difference between the highest and lowest levels tested. In such a case, the next test is usually outside the test range; thus the limits are expanded (usually more than doubled) so that the limits will not constrain the true parameters. Since "wild" estimates generally occur in much less than 1% of cases, no attempt was made to find the optimal restriction. Limiting the estimates of the parameters has a similar effect as the truncated version of Wu's test (Wu 1985).
The algorithm was designed to be "fail-safe"; even if the mean is far outside the specified range, the first part of this algorithm will expand to contain and then converge to the region of interest. It will produce unique estimates for , even if the true is a factor of ten or more smaller, assuming the sample size is sufficient.
The following example should clarify the algorithm used in the new test. Suppose a company manufactures a pyrotechnic and tests it with a hot wire ignitor. Assume that the mean stress levels of a regular batch of pyrotechnic are normally distributed with a mean ignition threshold of 1 A and a standard deviation of 0.1 A. When testing a regular batch of pyrotechnics, the experimenters perform a 20 sample test, with variables m_{min} = 0.6 A, m_{max} = 1.4 A, and s_{guess} = 0.1 A. Now suppose that a batch of pyrotechnic was improperly mixed, raising its mean threshold to 5 A, and standard deviation to 1 A.
If the experimenters did not know about the improper mixture and conducted the test as they usually would, the experiment would yield results similar to those shown in Table 1.
Table 1: An example of the New Test When Sample Is Different Than Expected
Test No. | Current (Amps) | Result | Comment |
1 | 1.00 | Failure | Start with binary search. |
2 | 1.20 | Failure | |
3 | 1.40 | Failure | Regular explosive would have exploded! |
4 | 1.80 | Failure | Rapidly increase upper limit to get success quickly. |
5 | 2.60 | Failure | |
6 | 4.20 | Success | Both successes and failures! Begin binary search. |
7 | 3.40 | Failure | |
8 | 3.80 | Failure | |
9 | 4.00 | Failure | |
10 | 4.10 | Failure | |
11 | 4.28 | Failure | (No Overlap. Use _{e} = 4.15_{ e} = _{guess} = 0.10.) |
12 | 4.52 | Failure | _{e} = 4.28, _{e} = 0.19. (Overlap. Clip MLE values.) |
13 | 5.55 | Success | _{e} = 4.52, _{e} = 0.79. |
14 | 5.24 | Failure | _{e} = 4.66, _{e} = 0.50. (Use true MLE values.) |
15 | 6.37 | Success | _{e} = 5.22, _{e} = 0.96. |
16 | 6.08 | Failure | _{e} = 5.10, _{e} = 0.83 |
17 | 7.38 | Success | _{e} = 5.70, _{e} = 1.39 |
18 | 7.09 | Success | _{e} = 5.58, _{e} = 1.25 |
19 | 6.89 | Success | _{e} = 5.50, _{e} = 1.16 |
20 | 6.74 | Success | _{e} = 5.44, _{e} = 1.09. |
Analysis of these data yields MLEs m_{e} = 5.39, s_{e} = 1.04 m. Thus, even though the defective batch of pyrotechnic was very different from a regular batch, the new sensitivity test quickly led to good estimates. Many of the standard sensitivity tests in common use would have wasted their specimens by testing far below the mean.
A number of authors have compared the various sensitivity tests to determine the most efficient method for estimating m and s. Langlie [1965] compared his one-shot test to the Bruceton (up-and-down) test under a variety of conditions. He found that the Langlie test was almost as efficient as the Bruceton test under ideal conditions (i.e., the mean and standard deviation were close to the initial guesses) and that the Langlie test was much more efficient than the Bruceton test when not much was known about the mean or the standard deviation. Davis [1971] compared several versions of the Robbins-Monro and Bruceton methods and found that the delayed Robbins-Monro method was the most efficient in determining the mean. However, MLEs were not used to estimate the parameters, so no comparison of the efficacy of determining the standard deviation was possible. Edelman and Prairie [1966] compared the Probit, Bruceton, and Langlie tests under the ideal conditions. They recommended that the Langlie test be used for all tests of small to medium sample sizes (less than 50).
A Monte Carlo approach similar to that used in several of the previous works was used to compare the new test described here to the Bruceton, Langlie, and Probit tests.
For the simulations reported below, a mean, m_{guess}, and standard deviation, s_{guess}, were given as initial guesses for all of the sensitivity tests. The tests were optimized for this choice of parameters; if the true parameters agreed with the initial guesses, then the tests should be most efficient. The tests were performed with several different values of the true mean, m, and standard deviation, s. Monte Carlo simulations of 10000 repetitions were performed for sample sizes from 6 to 100. Simulations are performed with m » m_{guess} and = (0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0) * s_{guess}. Additional simulations were performed with m » m_{guess} + (1, 2, 5, 10) * s_{guess} and s= (0.2, 0.5, 1.0) * s_{guess}.
The efficiency of the Probit, Bruceton, and Langlie test designs are strongly dependent on the position of the sample mean to the first test level. Since the experimenter rarely knows the exact value of the mean before the test, the value of m used for each test was offset by a random number between ±0.5s .
Figure 3: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = s_{guess.}
Figure 4: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 0.5s_{guess.}
Figure 5: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 2s_{guess.}
Figure 6: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 0.2s_{guess.}
Figure 7: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 5s_{guess.}
Figure 8: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 0.1s_{guess.}
Figure 9: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} and s = 10s_{guess.}
Figure 10: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} +5s_{guess} and s = s_{guess.}
Figure 11: Efficiency in Determining the Mean (left) and Standard Deviation (right) for various test designs When m » m_{guess} +5s_{guess} and s = 5s_{guess.}
Figure 3 - Figure 11 show s^{2}/MSE as a function of the sample size for both m and s for all of the tests studied. Due to space limitations, only 9 of the 19 combinations of and are shown. The other combinations showed similar results. Shown also is s^{2} divided by the asymptotic variance, assuming that the tests were conducted at the D-optimal test points m ± 1.138s. (This function is a straight line approximated by 0.392N for mand 0.507N for s. The coefficients, derived in Banerjee [1980], can be read from Figure 1.) The curves for the Bruceton, Langlie, and the new test all use the random offset value of the mean to simulate experimenter uncertainty about the exact value of the mean. Figure 3 also shows the MSE's for the probit test with no random offset. The probit test is not shown on the other graphs because the value of s^{2}/MSE is close to zero. (Since there are several "wild" estimates when s >> s_{guess}, the efficiency curves are not smooth for the Bruceton and Langlie tests in Figure 7.)
Several conclusions can be drawn from these graphs. The new test provides much better estimates of the standard deviation than all of the other tests, under all conditions tested. The efficiency in determining the mean is worse for the new test compared to the Bruceton and Langlie tests when the parameters are close to the initial guess, but is comparable to or better than the other tests when the initial guess of the parameters are far from the true values.
The new test is also much less dependent on the accuracy of the initial guess of the parameters. The efficiency in estimating s is extremely dependent on the experimenters knowledge of s for all the tests except the new test the new test. Comparison of the graphs for m and s shows that comes close to achieving the design goal of Var m = Var s, regardless of the accuracy of the initial guess of the parameters.
The bias in estimating the parameters was also established with the simulation. There was no large bias in the MLEs for the mean, except for the case of the Delayed Robbins-Monro test when the true standard deviation was much larger than the initial guess and the Langlie test when the mean was outside of the test range. However, all of the test designs produced biased estimates of s. Figure 12 shows the relative bias of the standard deviation for the tests as a function of the sample size under the conditions that the guessed parameters correspond to the true parameters. The magnitude of the bias decreased for all tests as the sample size increased, and was generally small compared to the square root of the variance of the standard deviation. Under non-ideal conditions, the magnitude of the bias was much larger for the Bruceton and Probit tests, slightly larger for the Langlie test, and essentially the same for the new test. (The relative bias for the Probit test is a very strong function on the guess of the standard deviation. It can be positive or negative depending on whether the true standard deviation is larger or smaller than the initial guess.)
Figure 12: Relative Bias in the Maximum Likelihood Estimator for the Various Test Designs When m » m_{guess} and s = s_{ guess}.
The potential cost savings of using this new test can be illustrated with the help of the simulation reported in the preceding figures. For example, assume that the values of both m and s were required to a certain precision. If the experimenter knew s to within a factor of 5 (e.g., s between 1 and 5) and m to within ±5 initially (e.g., m between 75 and 125), then he could design the test to ensure the required precision would be achieved for all combinations of m and s. Assuming the experimenter required that the square root of both variances be less than 40% of s, over 50 samples would be needed for a Bruceton test, approximately 40 for a delayed Robbins-Monro test, 41 for a Langlie test, and 23 for the new test.
Unlike the estimation of the parameters discussed previously, there are several very different methods of estimating the confidence intervals for the parameters. Each of these analysis methods give very different estimates of the confidence intervals for the parameters of the distribution.
The variance function method makes the assumption that the variances of m_{e} and s_{e} can be estimated by simple functions of the sample size and the standard deviation. This function is generally dependent on the initial conditions, sample size, and the test design (the type of test, Langlie, Bruceton, etc.)
The simulation method uses simulation to determine the variance of the parameters after the test has been completed. This method can provide reliable estimates of the variances as long as the simulation is carried out with parameterization relevant to the population.
The Cramér-Rao method is used by programs such as ASENT (Mills 1980) and in the calculations of the variance in the Bruceton method (Dixon and Mood 1948). Since this method is widely used, it is discussed further in the next Section.
The simulation discussed in the previous Section shows that the variance of both m_{e} and s_{e} scales approximately with s^{2}. Since s^{2} is not independently known, all of these techniques base their estimates on the maximum likelihood estimate of s, s_{e}. If the successes and failures do not overlap, s_{e} = 0 and these methods fail to produce estimates for confidence regions for both m and s.
The Likelihood Ratio Method, introduced in a later Section, can produce reliable confidence interval estimates in all cases, including this degenerate case.
The Cramér-Rao methods of estimation of the variance of the parameters uses Equations 3 - 5 with m_{e} and s_{e} substituted for m and s respectively. The confidence intervals are constructed from the variances assuming that the estimates are distributed normally. If the data do not overlap, then no confidence regions can be established.
Three approximations are used in providing these estimates. The Cramér-Rao theorem is used where it does not apply, lower bounds are used as estimates, and maximum likelihood estimates are used in the CR theorem instead of the true values. Thus these estimates of the variances are usually much smaller than the true variances.
The simulation reported previously has shown that the variances are often more than a factor of two larger than suggested by the Cramér-Rao estimates for the sample sizes typically used. The simulation also shows that the confidence is much less than given by the analysis method. For example, the true parameters lie outside of a 99% confidence region approximately 26% of the time for a 20 sample Langlie test performed under ideal conditions. Figure 13 shows that the Cramér-Rao method is biased, even for sample sizes as large as 100.
Figure 13: Comparison of CR and Likelihood Ratio Test confidence regions. The CR curves are shown with squares and the Likelihood ratio curves are shown with circles. All tests were conducted under efficient conditions with a sample size of 100.
In spite of these difficulties, the Cramér-Rao methods of analysis have gained wide acceptance. The main advantage of this method is that the calculations are relatively simple to perform. Another advantage of using the Cramér-Rao lower bounds is that the estimates of the variance change with the test design like the simulation variances change. Thus one procedure can be used to calculate estimates for all test designs. Various versions of the ASENT program (Mills 1980, Ashcroft 1981, Ashcroft 1987, Neyer 1990a, Spahn 1989), which utilize the Cramér-Rao method of analysis, are used in the explosive test community.
The Bruceton analysis for the confidence intervals was developed to allow the calculation of confidence intervals by computing simple sums. It is based on the Cramér-Rao theorem, with one further approximation. It calculates lower bounds based on the assumption that the distribution of successes and failures at the various test levels follows the asymptotic distribution. According to Dixon and Mood (Dixon and Mood 1948) this analysis is valid only if the sample size is larger than approximately 50 and the step size is between 0.5 and 2.0 times s. As long as these conditions are met, the Bruceton analysis will yield similar estimates as the ASENT programs.
The Sensitivity Likelihood Ratio Test can be used to estimate confidence intervals for sensitivity tests. It can analyze all tests, even those for which there is no overlap. This test is an application of the standard Likelihood Ratio Test (Kendall and Stuart 1967) to analyze sensitivity tests.
The calculation of likelihood ratio confidence intervals is conceptually simple, although the computations themselves are quite complex. Let x stand for all the test levels. A confidence region of size a is composed of all points m, s such that
(6)
Since there are two degrees of freedom l_{a} = 1 - a. Thus, the 95% confidence region for m and s is composed of all points inside the contour 5% of the height of the peak.
During the simulation discussed previously, one histogram recorded the ratios of the likelihood function evaluated at the true parameters divided by the likelihood ratio at the MLEs. (When there was no overlap of the data, this ratio was divided by four so the ratio would be similar to one where the two closest levels just touched.) Figures 14 -- 16 show the result of the simulation. These figures show the histograms of the ratio of the likelihood function evaluated at the true parameters divided by the maximum likelihood function. These figures show the ratios for the seven cases of s = (0.1, 0.2, 0.5, 1.0, 2.0, 5.0, and 10.0) s_{guess}.
Figure 14: Cumulative Likelihood Ratio Distribution for the Bruceton test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
Figure 15: Cumulative Likelihood Ratio Distribution for the Langlie test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
Figure 16: Cumulative Likelihood Ratio Distribution for the new test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
As is seen from the figures, even for sample sizes as small as 20, the ratios are close to the asymptotic value shown as the solid line in the figures, as long as the test is efficient (i.e. s » s_{guess}). The ratios approach the asymptotic value as the sample size increases. The only exception is when the test design is very inefficient for large samples (i.e. Bruceton tests when s « s_{guess}). Moderately inefficient tests (i.e. Bruceton and Langlie tests when s » s_{guess}) approach the asymptotic value, but more slowly than test designs matched to the population. Since the new test is asymptotically efficient regardless of the initial estimates of the parameters, it rapidly approaches the asymptotic values for reasonable sample sizes. Several programs (Neyer 1989m, 1989m1) employ the likelihood ratio test to compute reliable confidence regions.
The Likelihood Ratio Test can also be applied to test if two samples were drawn from the same or different populations. In this case the ratio of the likelihood function
(7)
is an indication of whether the two sets of parameters are the same or different. A ratio of l means that you are 1 - l confident that the two samples were drawn from different populations. Several programs (Neyer 1989c, 1989c1) employ this method to test for differences.
The simulation computed histograms of the ratio of the product of two likelihood functions evaluated at their joint MLEs to the product of the likelihood functions evaluated at their individual MLEs. The histograms show results similar to Figures 14 to 16. These figures are shown as Figures 17 to 19.
Figure 17: Dual Cumulative Likelihood Ratio Distribution for the Bruceton test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
Figure 18: Dual Cumulative Likelihood Ratio Distribution for the Langlie test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
Figure 19: Dual Cumulative Likelihood Ratio Distribution for the new test. Looking from left to right and top to bottom, the sample sizes are 20, 30, 50, 100. The legend shows that the ratio of the population s to the guess used in the test design, s_{guess}.
A new class of sensitivity tests has been proposed. Theoretical analysis suggests and simulation has shown that the new test is able to more efficiently determine parameters of the distribution than any of the previously known tests, under all circumstances tested. If these tests are utilized, the experimenter could achieve the same precision and accuracy as with the standard sensitivity tests, but with a fraction of the sample size, for a significant savings in both specimen cost and experimenter time.
A new method, based on the likelihood ratio test, is proposed for analyzing sensitivity tests. Simulation shows that it can analyze the results of all sensitivity tests, including degenerate results, that it produces relatively unbiased analysis, and that the results are independent of the test design. All three of these characteristics are advantages over the currently used ASENT programs based on the Cramér-Rao theorem. It can analyze single tests to produce confidence regions of various size. It can determine whether two samples were drawn from the same or different populations. The only apparent disadvantage of this test is the significant amount of computation necessary for computing confidence regions. However, these computations are simple to perform on any computer in a short amount of time.
I wish to thank Kathleen Diegert of Sandia National Laboratories, Albuquerque, NM for her help and many suggestions.
EG&G Mound Applied Technologies is operated for the U.S. Department of Energy under Contract No. DE-AC04-88DP43495.
Robert W. Ashcroft (1981), "A desktop computer version of ASENT," Technical Report MHSMP-81-46, Mason and Hanger, Silas Mason Company, Amarillo, Texas, November 1981.
Robert W. Ashcroft (1987), "An IBM PC version of ASENT," Technical Report MHSMP-87-51, Mason and Hanger, Silas Mason Company, Amarillo, Texas, December 1987.
Kali S. Banerjee (1980), "On the Efficiency of Sensitivity Experiments Analyzed by the Maximum Likelihood Estimation Procedure Under the Cumulative Normal Response," Technical Report ARBRL-TR-02269, U.S. Army Armament Research and Development Command, Aberdeen Proving Ground, MD.
C. I. Bliss (1935), "The Calculation of the Dosage-Mortality Curve," Annals of Applied Biology," 22, pp. 134-167.
William G. Cochran and Miles Davis (1965), "The Robbins-Monro Method for Estimating the Median Lethal Dose", Journal of the Royal Statistical Society B, 27, pp. 28-44.
Miles Davis (1971), "Comparison of Sequential Bioassays in Small Samples," Journal of the American Statistical Association, 33, pp. 78-87.
J. W. Dixon and A. M. Mood (1948), "A Method for Obtaining and Analyzing Sensitivity Data," Journal of the American Statistical Association, 43, pp. 109-126.
D. A. Edelman and R. R. Prairie (1966), "A Monte Carlo Evaluation of the Bruceton, Probit, and One-Shot Methods of Sensitivity Testing," Technical Report SC-RR-66-59, Sandia Corporation, Albuquerque, NM.
D. J. Finney (1947), Probit Analysis, A Statistical Treatment of the Sigmoid Response Curve, Cambridge at the University Press, Cambridge, England.
Maurice G. Kendall and Alan Stuart (1967), The Advanced Theory of Statistics, Volume 2, Second Edition, New York: Hafner Publishing Company.
H. J. Langlie (1965), "A Reliability Test Method For "One-Shot'" Items," Technical Report U-1792, Third Edition, Aeronutronic Division of Ford Motor Company, Newport Beach, CA.
D. L. McLeish and D. Tosh (1990), "Sequential Designs in Bioassay," Biometrics, 46, pp. 103-116.
B. E. Mills (1980), "Sensitivity Experiments: A One-Shot Experimental Design and the ASENT Computer Program," SAND80-8216, Sandia Laboratories, Albuquerque, New Mexico.
Barry T. Neyer (1989c1), COMSEN, Version 1.0, National Energy Software Center, Argon, Illinois.
Barry T. Neyer (1989m1), MUSIG, Version 1.0, National Energy Software Center, Argon, Illinois.
Neyer Software (1990a), ASENT Program, Version 2.1, Neyer Software, Cincinnati, Ohio.
Neyer Software (1989c), ComSen, Program, Version 2.1, Neyer Software, Cincinnati, Ohio.
Neyer Software (1989m), MuSig, Program, Version 2.1, Neyer Software, Cincinnati, Ohio.
Herbert Robbins and Sutton Monro (1951), "A Stochastic Approximation Method Method," Annals of Mathematical Statistics, 22, pp. 400-407.
Mervyn J. Silvapulle (1981), "On the Existance of Maximum Likelihood Estimators for the Binomial Response Models," Journal of the Royal Statistical Society B, 43, pp. 310-313.
Patrick Spahn (1989), LOGIT, Naval Surface Warfare Center, White Oak, Maryland.
C. F. Jeff Wu (1985), "Efficient Sequential Designs with Binary Data," Journal of the American Statistical Association, 80, pp. 974-984.