^{1}

^{1, 2}

^{1}

^{2}

We have developed an algorithm for nonparametric fitting and extraction of statistically significant peaks in the presence of statistical and systematic uncertainties. Applications of this algorithm for analysis of high-energy collision data are discussed. In particular, we illustrate how to use this algorithm in general searches for new physics in invariant-mass spectra using

Searching for peaks in particle spectra is a task which is becoming increasingly popular at the Large-Hadron collider that focuses on new physics beyond TeV-scale. Bump searches can be performed either in single-particle (such as

The task of finding bumps is ultimately related to the task of determining a correct background shape using theoretical or known cross sections. However, a theory can be rather uncertain in the regions of interest, difficult to use for background simulation, or entirely nonexistent. Even for a simple jet-jet invariant mass, finding an analytical background function that fits the QCD-driven background spanning many orders in magnitude and which can be used to extract possible excess of events due to new physics requires a careful examination. Attempts to fit two-jet and three-jet invariant masses have been discussed in CMS [

One technically attractive approach is to find a nonparametric way to extract statistically significant peaks without

The closest peak-search approach for high-energy-physics applications has been developed for studies of

For example, the ROOT analysis framework [

In high-energy collisions, a typical standard-model background distribution has a falling shape spanning many orders of magnitude in event counts. A typical example is jet-jet invariant masses used for new particle searches [

The above discussion leads to the need for a nonparametric way of background estimation together with the peak extraction mechanism which can be suited for high-energy collision distributions, such as invariant masses. The algorithm should be able to take into account the discrete nature of input distributions with their uncertainties. The proposed algorithm is less ambiguous compared to the smoothing methods (such as that used in ROOT [

Due to the reasons discussed above, the program called Nonparametric Peak Finder (NPFinder) was developed using a numerical, iterative approach to detect statistically significant peaks in event-counting distributions. In short, NPFinder iterates through bins of input histograms and, using only one sensitivity parameter, determines the location and statistical significance of possible peaks. Unlike the known smoothing algorithms, the main focus of this method is not how to smooth data and then extract peaks, but rather how to extract peaks by comparing neighboring points and then calling what is left over the “background.” Below we discuss the major elements of this algorithm and then we illustrate and discuss its limitations and possible improvements.

For each point

NPFinder continues to walk over data points until

A graphical illustration of the NPFinder algorithm. Each data point is characterized by a coordinate

After detecting all peak candidates, NPFinder iterates through the list of possible peaks in order to form a background for each peak. This is achieved by performing a linear regression of points between the first and last points in the peak, that is, applying the function

It should be mentioned that the technique of the peak finding considered above is somewhat similar to that discussed for

Finally, NPFinder uses the background points to calculate the statistical significance of each peak in a given histogram. This is done by summing up the differences

Below we illustrate the above approach by generating fully inclusive ^{−1}. Jets are reconstructed with the anti-

Next, a few fake peaks were generated using the Gaussian distributions with different peak positions and widths. The peaks were added to the original background histogram. Figure

Invariant mass of two jets generated with the PYTHIA Monte Carlo model. Several peaks seen in this figure were added using the Gaussian distributions with different widths and peak values (see the text). The peaks are found using the NPFinder algorithm which also estimates their statistical significance values as discussed in the text.

For a comparison, the same distribution was used to test the TS

It should be noted that the peak statistical significance of the proposed nonparametric method might be smaller than that calculated using more conventional approaches, such as those based on a

It should be noted that there is a correlation between the peak width and the input parameter

In conclusion, a peak-detection algorithm has been developed which can be used for extraction of statistically significant peaks in event-counting distributions taking into account statistical (and potentially systematic) uncertainties. The method can be used for new physics searches in high-energy particle experiments where a correct treatment of such uncertainties is one of the most important issues. The nonparametric peak finder has only one free parameter which is fairly independent of input background distributions. The algorithm was tested and found to perform well. The code is implemented in the Python programming language with the graphical output using either ROOT (C++) [

The authors would like to thank J. Proudfoot for discussion and comments. The submitted paper has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of Energy Office of Science Laboratory, is operated under Contract no. DE-AC02-06CH11357.

^{−1}at CDF