Offline Leads – Feb. 22, 2023
Attendees: Giuseppi Cerati, Erica Snider, Katherine Lato
LArSoft report
- SciSoft / LArSoft effort
- Expected gains in effort from the Division, as suggested earlier from CSAID management, will probably not materialize, so the Division is unlikely to have additional effort to apply to SciSoft support
- Both Saba Sehrish and Marc Paterno are completing project work that has occupied their attention for the past few years, and will have time to devote to LArSoft
- Saba will pick up multi-threading work that she was pursuing earlier. Will talk to Tracy and Giuseppe about the ICARUS production workflows that were the focus of that. In the longer term, has an interest in working on expanding GPU processing capabilities within LArSoft
- Marc has interest in GPU programming, so will be looking for algorithms that lend themselves to GPU solutions and writing code for that.
- Saba’s GPU interests may complement this in that she might have interest in enabling GPU applications in a production setting
- Both are needed to advance GPU usage
- LArSoft 2023 work plan update
- Multi-threading
- Mike is continuing to make progress on DUNE SN pipeline. Currently working to fix issues in TrajCluster, which uses globals extensively. Has fixed a number of other bugs along the way.
- Geometry changes to accommodate pixel detector readouts
- Tom Junk, Tingjun Yang, and Kyle Knoepfel have been meeting regularly to discuss Geometry changes needed to accommodate pixels, which mostly involves refactoring readout geometry elements from those pertaining to physical volumes and planes.
- There has been considerable progress on this work. Kyle’s presentation at the last Coordination Meeting describes the state of his most current work, which involves re-factoring of the ChannelMap and GeometryCore classes to break circular runtime dependencies. This work is nearing completion.
- Remaining steps include removing WireGeo objects from PlaneGeo (since the former represents readout geometry while the latter represents a physical structure); introducing readout geometry classes, where a new wire geometry class would complete the refactoring of wire readout descriptions from the physical geometry; and providing a pixel readout geometry class, where we already have a draft class from Tom Junk.
- Discussion
- How visible will these changes be to the experiments? Do not want to be in a position where they need to branch from the mainline develop branch too soon in order to avoid disruption to production schedules.
- A: the changes to iteration patterns will be visible, but have already been incorporated into the code. The biggest change will be that all readout geometry questions will need to be redirected from GeometryCore to a new class that depends on the type of readout you are using. So existing code that uses wires will need to adapt to that change.
- SciSoft will make appropriate changes to experiment repositories. We will not be able to automate some of the changes, but we can provide scripts to flag what we can’t change in private code
- SciSoft will work closely with the experiments as this rollout takes place.
- How visible will these changes be to the experiments? Do not want to be in a position where they need to branch from the mainline develop branch too soon in order to avoid disruption to production schedules.
- Multi-threading
SBN Data/Infrastructure – Giuseppi Cerati
- Machine learning
- Suggested that the priority of integrating machine learning into LArSoft should be raised significantly. Seems a missing piece, especially in context of GPUs.
- Noted NuSonic provides GPU capability for inference problems within LArSoft
- Lisa Goodenough reached out and offered help to use NuSonic. Some people have expressed concern about a lack of provenance information with NuSonic, eg, what version of TensorFlow was being run. Ship something external and aren’t sure what version was run. Less of a problem than not having a solution at all, in Giuseppi’’s opinion.
- Perhaps we can define conventions on information to return with the job. All of the inferencing configuration should be specified as part of the art configuration in sufficient detail to allow replication. So then just need to focus on information that needs to be retrieved.
- Suggested that the priority of integrating machine learning into LArSoft should be raised significantly. Seems a missing piece, especially in context of GPUs.
- Memory management
- Geant4 simulation is largest memory consumer for ICARUS – typically over 8 GB, which requires 5 slots. Occasionally even 6 slots are required.
- SciSoft has informed them that using the new LArG4 interface will reduce memory consumption, so completing that migration is becoming a higher priority. The lack of staffing is an issue.
- SciSoft can offer mostly consulting help, but it is good to keep us in the loop in case there are places we can more directly help.
- Another approach is to drop art objects that are not being used. Gianluca recently pointed out that dropping data products on input works to reduce memory, but unused transient data products are retained until the end of the event. Posed the question, can we drop transient products in the middle of the job to help with in-event memory management?
- This is outlined in an email from Feb 7 to Kyle and Erica.
- Will respond.
Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.