by Jack Weston for the Pandora Team, Cavendish Laboratory, University of Cambridge
The Pandora multi-algorithm approach to automated pattern recognition affords a powerful framework for event reconstruction in fine-granularity detectors, such as LAr TPCs.
With its origins at the proposed International Linear Collider (ILC) experiment, the Pandora project began in 2007 as a particle flow calorimetry algorithm; that is, an algorithm that reconstructs the paths of individual particles, taking advantage of high-granularity tracking detectors and calorimeters. Since then, Pandora has grown into a flexible software framework for solving generic pattern recognition problems that involve points in time and space, particularly suited to the “photographic quality” images produced by LAr TPCs.
Behind Pandora’s approach to pattern recognition is the multi-algorithm paradigm: the idea that each particular topology can be addressed by one or more relatively straightforward, self-contained algorithms. Under this paradigm, solving a complex pattern recognition problem, such as reconstructing particles in a LAr TPC (Figure 1), requires a sizable chain of such algorithms, whose order of execution is able to exhibit data-dependent nonlinearity and recursion. Some algorithms are very sophisticated, other rather simple; together, they gradually build up a picture of the events. This approach marks a significant departure from the traditional practice of single algorithms for shower finding and track fitting, for example, and promises to provide a robust way of tackling intricate pattern recognition problems. It also affords a rich development environment since many algorithms, each addressing a given topology in a different way, can be safely bound together and work in harmony. This approach is able to provide a more sophisticated and accurate reconstruction than would be reasonably achievable by writing one algorithm alone.
Figure 1: A 3D Pandora reconstruction, projected into the w-view. The input hits are from a Monte Carlo simulation of a 0.8 GeV charged-current νμ event with resonant π0production. The event shows a proton (grey), a muon (red), and two photons (green, blue). The event is visualised using the Pandora event display, where the x axis represents a distance derived from drift time and the w axis a distance derived from the number of the wire recording the ionisation signal.
In a LAr TPC, the readout provides three 2D images of the event within the active detector volume. Each image shares a common coordinate, derived from the drift time. The second coordinate is derived from the number of the wire recording the ionisation signal in a given plane. The reconstruction begins with cautious, track-oriented 2D clustering before using a series of topological-association algorithms that conservatively merge and split the 2D proto-clusters. Considering pairs of 2D clusters, an extensive list of possible 3D vertex positions is produced. A score is calculated for each candidate vertex and the best one is chosen. The 3D vertex position can then be projected back into the readout planes and used, for example, to split the 2D clusters at the projected vertex position if required. To reconstruct 3D tracks, Pandora uses track-matching algorithms to exploit the time-coordinate-overlap between all possible groupings of 2D candidate clusters in different planes to predict the position of the cluster in the third plane. The time-overlap span, the proportion of matching sampling points and a chi-squared value are all neatly stored in a rank-three tensor, where the three indices are the clusters in each of the three views. The tensor is interrogated by a series of algorithm tools which identify any matching ambiguities and make careful changes to the 2D clusters until the tensor is diagonal and the cluster combinations are unambiguous. These matches are stored in ‘particles’, which provide a convenient means for collecting together objects reconstructed in the three readout planes.
Showers can then be reconstructed (in 2D) by attempting to add branches to long clusters that represent shower spines—the showers are grown outwards from a spine by identifying branches, then branches-of-branches, and so on. To synthesize the 2D shower information into 3D showers, the tensor ideas can be re-used: the fitted shower envelopes for each pair of views are used to predict an envelope in the third view, and the enclosed hit fraction in this view is stored in a tensor, along with other cluster-matching properties. An analogous process of tensor-diagonalisation then yields the best 3D matches. Finally, 3D hits are created and the track, shower and vertex information is brought together to produce a full particle hierarchy, where each daughter particle comprises metadata, a list of 2D clusters, a 3D cluster, a 3D interaction vertex, and a list of any further daughter particles (Figure 2).
Such a multi-algorithm approach poses a significant software challenge. The Pandora Software Development Kit provides a software environment designed specifically for this purpose: it offers the basic building-blocks for abstracting high-level structure from a collection of points, such as clusters, as well as the means for users to adapt these building-blocks to suit their problem. It also provides the framework for running the complex, nonlinear chain of algorithms required and automates the handling of the building-blocks, allowing developers’ algorithms to focus on physics rather than technicalities like memory management. This facilitates rapid algorithm development. Instances of the objects in the Event Data Model (EDM) are owned by Pandora Managers, which are responsible for object and named list lifetimes. These Managers automate a complete set of low-level operations that facilitate the high-level operations required by pattern recognition algorithms. The algorithms contain the step-by-step instructions for finding patterns in the input data and use APIs to modify, create or destroy objects and lists. Much of the technical difficulty in writing highly object-oriented algorithms like these is alleviated by Pandora’s provision of APIs able to access and manipulate objects in the Pandora EDM, such as getting a named list of hits or creating new clusters. These APIs can be used in algorithms to interact with Managers and provide common low-level logic in a concise way. Examples of the API calls that would be encountered in algorithms for creating and merging clusters are shown in Figure 3. Such procedures are common to all pattern recognition problems, differing only in the logic that determines the ‘best’ clusters, which is specific to the problem at hand. Pandora also contains a ROOT-based event display that allows the navigation of particle hierarchies, 2D and 3D hits, clusters and vertices, as well as providing invaluable visual debugging possibilities within algorithms.
Pandora works in conjunction with LArSoft by providing the ideal framework for the pattern recognition step. An art producer module creates Pandora instances, and configures and registers algorithms. For each event, hit information is passed from LArSoft into Pandora and, at the end, the Pandora reconstruction output, in the form of PFParticles, is passed back to LArSoft. Whilst LArSoft remains the core framework for the simulation and reconstruction process, this multi-algorithm reconstruction cannot be easily realised using LArSoft alone: Pandora provides the tools necessary for the job, as well as an environment conducive to developing algorithms under this paradigm. All interaction between LArSoft and Pandora is handled by the LArSoft module LArPandora, which serves as their mutual interface as well as translating Pandora’s inputs and outputs into the required formats (Figure 4). This module depends only on the library LArPandoraContent which, in turn, depends only on the Pandora SDK and visualisation libraries.
At MicroBooNE, two different algorithm chains are used: PandoraCosmic, optimised for the reconstruction of cosmic rays and their daughter delta rays, and PandoraNu, optimised for the reconstruction of neutrino events. By running the PandoraCosmic chain first, cosmic rays can be identified and removed, providing the input for the PandoraNu reconstruction. Starting by manipulating the three sets of hits in 2D before synthesising these into 3D structures, Pandora gradually builds up a picture of the event through clustering, vertexing, and track and shower reconstruction. The full reconstruction currently consists of nearly 80+ different algorithms that, together, abstract from the input 2D hits an information-rich 3D particle hierarchy, including information like particle types, vertices and directions.
At the upcoming DUNE experiment, the detector will comprise multiple LAr TPC detector volumes, each of which will provide two or three 2D images. Events may cross the boundaries between volumes, posing a new challenge for the reconstruction. To solve this problem, a Pandora instance is created for each volume, which provides a MicroBooNE-like reconstruction of that detector region. Bringing together these independent reconstructions, logic is then required to decide which particles to stitch together across gaps—and to determine which particle is the ‘parent’. While algorithm re-optimisation to suit the different beam spectrum is required, the reusable Pandora interfaces ensure that MicroBooNE developments are directly applicable to DUNE.
Pandora provides a flexible development platform for developing, visualising and testing pattern recognition algorithms. The LArPandoraContent library offers an advanced and continually improving reconstruction of cosmic ray and neutrino-induced events in LAr TPCs and is used by the MicroBooNE and DUNE experiments. Pandora’s GitHub page can be found at https://github.com/PandoraPFA, with an accompanying paper describing the SDK on arXiv: http://arxiv.org/pdf/1506.05348v2.pdf (EPJC 75:439).