Category Archives: leads

May 2023 Meeting Notes

Offline Leads Meeting, May 30

Attendees: Erica Snider, Tom Junk, Tracy Usher, Miquel Nebot, Tingjun Yang, Steven Gardiner, Katherine Lato

LArSoft Report: Erica

  • Work plan update
    • Saba Sehrish has been working to pick up the work to make services thread safe. 
      • She had completed a number of changes, but was pulled away before she was able to commit them to the head of develop. Has nearly finished with updating all the relevant features branches to be able to merge these
      • Most important priority is to address any outstanding services that access database, and to make sure they use the concurrent caching currently available in LArSoft. 
    • SPACK migration timelines are still uncertain. 
      • The Spack team is confident that major changes aren’t needed to SPACK, but still not sure yet if the current implementation of SPACK can support what is needed, hence the uncertainty of the timeline. 
      • One note they emphasized is that we will be able to support builds under SPACK long before we have a complete development environment. 
    • Pixel detectors within LArSoft. 
      • There has been steady progress refactoring the geometry service as needed, and this work is nearly completed. 
      • Once that is done, the current geometry will be implemented via two separate components, a volume geometry hierarchy, and a readout geometry description. The latter knows about the former, but not vice versa. 
      • The next step will be to implement a pixel readout geometry description. A draft of this already exists. Full implementation will require working with people from  DUNE ND LAr
    • No progress on neutrino event refactoring. 
  • AL9 support
    • Current CSAID plan is to not spend the effort to bring UPS forward to AL9. If this is an issue, talk to your Spokespeople and have them bring it to the attention of CSAID management. 
  • Compilers
    • Recently included the possibility to build under clang 14 (c14) and gcc 12.1 (e26) starting with LArSoft v09_74_01 released on May 5. Default builds remain unchanged.
    • C14 and and e26 will become the default qualifiers when we migrate to art 3.13 (which will be soon)
  • Experiments:  please remember that keeping experiment code in pace with integration releases is necessary in order to have the benefit of the CI system for code updates and release validation. 

 

DUNE – Tom Junk

  • Release information, we’re at 9.75 at DUNE. 
  • One of our customers of SPACK is sending emails. Misner? He wants to run stuff on __ LArCP3 and link it with. We may have to upgrade DUNE’s software SPACK. There were some weird warnings. We contacted original developers, who reassured us that they were valid. 

SBND – Miquel

  • Worked through the update from the last release. Resolved an issue with dependencies, external ones. They were holding us back, but they are aware..

Argonaut – Tingjun 

  • Person who has been making updates has a new position, so won’t be continuing. Management says drop support for Argonaut support. If anyone wants to do work, they will have to do the updating. Believe there is no active work being done.
  • So, Argonaut is in the same position as LArIAT. There haven’t been any LArIAT releases for a long time.
  • Thanks for the support for so many years.
  • Probably makes sense to remove them from the CI tests. 

SBN: Steven

  • Miquel covered some. Do you want a message to take to MicroBooNE?
  • In response to Steven’s request for more information on Neutrino refactoring, Erica explained that LArSoft is built against a particular version of GENIE. The same is true of every generator. The only way to break that connection is to produce a text file from one generator. The idea was to break that dependency so that you could change the version of GENIE at run-time and not have LArSoft built against a particular one. That would be a model for all other generators that are built within LArSoft. Experiments pretty closely track each other for what version they use, but there’s no reason why we should have to coordinate that through LArSoft.The text file is difficult to deal with. Working with Robert Hatcher to do that. Developed a plan at the beginning of last year. It looked good. It’s been sitting ever since. Tried to get him to pick it up recently. He may have taken it back to GENIE management. 
  • Steven may follow up with Robert directly.

ICARUS – Tracy:

  • We are making progress for converting to the new LArG4. Converting to the geometry. Giuseppe picked it up recently and he’s been making progress. General schedule has a large simulation/reconstruction in fall with SBND. Targeting to have the work done at the end of summer. Giuseppe needs to work with Hans.
  • We are impacted by space charge stuff. Erica: I’ll try to get a meeting this week.
  • We have a soft production release. The main production is going to be done in fall. What we’re doing now is what we think we’ll do for the summer.
  • ICARUS workshop in 3 weeks, if we can find a room at Fermilab. 
  • We’re making good progress. ICARUS has its own challenges with the horizontal wires and other things, but we’re starting to make good progress.
  • Unified event display – continued lament. Titus would be good, but it doesn’t display what ICARUS needs. They don’t have the cycles to fix it themselves. All along it’s been ‘made to work’ which makes it difficult to understand. ICARUS breaks everything. 
    • Erica:  heard. Currently have no one available to work on this. The project continues to lobby for the effort needed.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

April 2023 Meeting Notes

The April 2023 LArSoft Offline Leads status update was handled via email and a google document. 

LArSoft – Erica Snider

  • Geometry refactoring to accommodate pixels
    • Have made good progress in removing the detector-specific readout geometry description from the generic detector volume description. Final elements of the design are completed, and the work to refactor the wire readout geometry description is at an advanced stage (Kyle Knoepfel). We expect to start testing this new geometry within the next two weeks. Once completed, a pixel readout geometry will be introduced based on an existing prototype written by Tom Junk. 
  • Thread safety work status
    • Saba Sehrish has picked up work she performed over a year ago on making LArSoft services thread safe. This work included introducing concurrent caching needed to make services that access databases thread safe. She began her current effort by working to bring forward the feature branches of that previous body of work to the head of develop. She will then turn to the service changes needed by ICARUS to make their production workflow thread safe.
  • art 3.12 update
    • We expect to be ready to migrate to art 3.12 on the time scale of a week or two. There will be some changes to user code needed. Kyle Knoepfel will talk about these at the April 18 LCM.
  • AL9 support
    • LArSoft will be supported under AL9 as SL7 enters end of life. In preparation for this, the SciSoft team has acquired the necessary infrastructure (build and CI nodes) to begin working on this support. An important point to take note of is that CSAID does not now support UPS under AL9, and has no plans to do so in the future. Consequently, the migration to the Spack-based build and development environments must be completed prior to providing full support for AL9. Since the timeline for the Spack migration is not yet known, however, we do not know when full support for AL9 will be available. We will keep the community informed as the situation changes.
  • Multi-threading workshop

DUNE – Heidi Schellman, Tingjun Yang, Michael Kirby Tom Junk

Just to let people be aware – Shekhar Mishra has been attempting to build LArSoft and dunes on AL9.1, as part of an AI/ML project for ICEBERG.  He says he has been able to build LArSoft on AL9.1 and is working on dunesw.  Not all dependent products of dunesw are built with mrb or have known build shims, such as duneanaobj and srproxy.  Some products have hand-built UPS products – dunedaqdataformats, dunedetdataformats, highfive and nlohmann_json.  The hand-built UPS products do not involve compiled code – they are header-only libraries and should just copy over to Shekhar’s setup.  We have been answering Shekhar’s questions but not actually doing work to support this path.  We have suggested using SL7 in a container on his AL9 machine as an interim solution.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

SBND – Andrzej Szelc

No Report

SBN Data/InfrastructureSteven J. Gardiner, Giuseppe B. Cerati

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

February 2023 Meeting Notes

Offline Leads – Feb. 22, 2023

Attendees: Giuseppi Cerati, Erica Snider, Katherine Lato

LArSoft report

  • SciSoft / LArSoft effort
    1. Expected gains in effort from the Division, as suggested earlier from CSAID management, will probably not materialize, so the Division is unlikely to have additional effort to apply to SciSoft support
    2. Both Saba Sehrish and Marc Paterno are completing project work that has occupied their attention for the past few years, and will have time to devote to LArSoft
    3. Saba will pick up multi-threading work that she was pursuing earlier. Will talk to Tracy and Giuseppe about the ICARUS production workflows that were the focus of that. In the longer term, has an interest in working on expanding GPU processing capabilities within LArSoft
    4. Marc has interest in GPU programming, so will be looking for algorithms that lend themselves to GPU solutions and writing code for that.
      • Saba’s GPU interests may complement this in that she might have interest in enabling GPU applications in a production setting
      • Both are needed to advance GPU usage
  • LArSoft 2023 work plan update
    1. Multi-threading
      • Mike is continuing to make progress on DUNE SN pipeline. Currently working to fix issues in TrajCluster, which uses globals extensively. Has fixed a number of other bugs along the way. 
    2. Geometry changes to accommodate pixel detector readouts
      • Tom Junk, Tingjun Yang, and Kyle Knoepfel have been meeting regularly to discuss Geometry changes needed to accommodate pixels, which mostly involves refactoring readout geometry elements from those pertaining to physical volumes and planes.
      • There has been considerable progress on this work. Kyle’s presentation at the last Coordination Meeting describes the state of his most current work, which involves re-factoring of the ChannelMap and GeometryCore classes to break circular runtime dependencies. This work is nearing completion.
      • Remaining steps include removing WireGeo objects from PlaneGeo (since the former represents readout geometry while the latter represents a physical structure); introducing readout geometry classes, where a new wire geometry class would complete the refactoring of wire readout descriptions from the physical geometry; and providing a pixel readout geometry class, where we already have a draft class from Tom Junk.
      • Discussion
        • How visible will these changes be to the experiments?  Do not want to be in a position where they need to branch from the mainline develop branch too soon in order to avoid disruption to production schedules.
          • A:  the changes to iteration patterns will be visible, but have already been incorporated into the code. The biggest change will be that all readout geometry questions will need to be redirected from GeometryCore to a new class that depends on the type of readout you are using. So existing code that uses wires will need to adapt to that change.
          • SciSoft will make appropriate changes to experiment repositories. We will not be able to automate some of the changes, but we can provide scripts to flag what we can’t change in private code
          • SciSoft will work closely with the experiments as this rollout takes place.

SBN Data/Infrastructure – Giuseppi Cerati

  • Machine learning
    • Suggested that the priority of integrating machine learning into LArSoft should be raised significantly. Seems a missing piece, especially in context of GPUs.
      • Noted NuSonic provides GPU capability for inference problems within LArSoft
    • Lisa Goodenough reached out and offered help to use NuSonic. Some people have expressed concern about a lack of provenance information with NuSonic, eg, what version of TensorFlow was being run. Ship something external and aren’t sure what version was run. Less of a problem than not having a solution at all, in Giuseppi’’s opinion.
      • Perhaps we can define conventions on information to return with the job. All of the inferencing configuration should be specified as part of the art configuration in sufficient detail to allow replication. So then just need to focus on information that needs to be retrieved.
  • Memory management
    1. Geant4 simulation is largest memory consumer for ICARUS – typically over 8 GB, which requires 5 slots. Occasionally even 6 slots are required.
    2. SciSoft has informed them that using the new LArG4 interface will reduce memory consumption, so completing that migration is becoming a higher priority. The lack of staffing is an issue.
      • SciSoft can offer mostly consulting help, but it is good to keep us in the loop in case there are places we can more directly help.
    3. Another approach is to drop art objects that are not being used. Gianluca recently pointed out that dropping data products on input works to reduce memory, but unused transient data products are retained until the end of the event. Posed the question, can we drop transient products in the middle of the job to help with in-event memory management?
      • This is outlined in an email from Feb 7 to Kyle and Erica.
      • Will respond.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

October 2022 Meeting Notes

Offline Leads Meeting Notes 10/12/22

Attendees: Tom Junk, Wes Ketchum, Giuseppe Cerati, Tingjun Yang, Erica Snider, Katherine Lato

As planned, the majority of the meeting was to discuss the multithreading LArSoft Workshop. While it is summarized here for convenience, the latest information can be found in the Multithreading LArSoft Workshop Information document.

Multithreading LArSoft Workshop Information 

Timeframe: February 2023, Two possible dates proposed:

  1. Feb. 6th and 7th (Monday pm, Tuesday all day)  or 
  2. Feb. 16 and 17th (Thursday all day,  Friday morning)
  • Need to determine availability of relevant domain experts (MT + experiment)

Tentative agenda: 

  • Presentations – half day (see “Known topics to discuss” below)
  • Working groups (in parallel with expert help as needed) – half day (+ 2 hrs )
  • Lesson learned/Future plans – 2 hrs

Specific, candidate pieces of code or workflows for which we could use the workshop to implement and/or design multi-threaded or thread-safe solutions that experiments think would help to have in core LArSoft:

  • Services used that are needed to run workflows in MT mode. (Know issues with some services that access databases.) 
  • Detector response simulation code that is used in the new LArG4 framework
  • Photon visibility library calculation that uses GPU for single-photon tracking
  • Interfaces to the data source (eg, for code in a thread accessing an event segment)
  • 1D deconvolution (which some experiments still use for the time being) (DUNE not interested in new 1D workflows and would like to move away from existing ones)

It might also be helpful in planning for a workshop to have the experiments present on what they have done to effect thread-safety/multi-threading. We can arrange a corresponding presentation with a brief overview of multi-threading in art, and the components of LArSoft that are already thread-safe / thread-efficient, and which are works in progress. Would this be useful in advance of a workshop?

Known topics to discuss

  • Multi-threading in WireCell and LArSoft integration (Wes is talking to WC)
  • Multi-threading / thread safety in Pandora (Wes is reaching out to Pandora)
  • Photon visibility in ICARUS / SBN (?)  (Wes is talking to Diego)
  • ICARUS workflows
  • Status of thread-safety / multi-threading in LArSoft
    • Services
    • Hit finding
    • Geant4 / LArG4
  • DUNE event serialization / parallelization (See “Comments from David Adams regarding DUNE’s use cases” in the Multithreading LArSoft Workshop Information document)
  • Any other round-table experiment presentations

Please let us know if you have any comments or corrections.

Erica & Katherine

August 2022 Meeting Notes

Offline Leads Meeting Notes August 31st

Attendees: Herbert Greenlee, Tingjun Yang, Tracy Usher, Erica Snider, Katherine Lato

LArSoft Update:

  1. Clang-format for repositories

We want to move LArSoft repositories to a standard clang-format to have a standard across repositories. Kyle produced some examples. https://github.com/knoepfel/larreco/tree/clang-format-example Please look at the example and provide feedback.

Herb: I assume there are a ton of badly formatted files. Would these be cleaned up?

Erica: Yes. We envision there would be a number of changes in each repository that don’t change the code, just the format. Then just need to be disciplined so that real changes can be seen.

Herb: Is this run automatically?


Erica: There’s a codechecks phase in the auto-CI workflow triggered by GitHub, and running clang-format has been proposed to be part of that. Details haven’t been discussed. Codechecks fixing it for you is one possible mode we could use, but originally envisioned just as a check. If people wanted to do it manually, that would be great. Would you rather have it be automatic?

Herb: I don’t have a strong opinion either way. It’s probably a good idea to have a standard format. 

Herb: New topic:  Any thoughts about enforcing ‘consumes’ and ‘produces’ in art modules?

Erica: ‘produces’ has been enforced for some time. [after the meeting]  ‘consumes’ has been recommended for a long time, but is not enforced by default. One can enable it with a command line option:  

lar -c <my config> –errorOnMissingConsumes=true …

So MicroBooNE (or any experiment) could choose to enforce this independently of anyone else. 

There is also an option to find out what ‘consumes’ statements are missing:

lar -c <my configs> -M mt_diagnostics.txt …

 2. LArSoft Multi-threading workshop

  • https://docs.google.com/document/d/1QCo5GUQ5Js9iU8iodZzXvkK2uNjdQC8viDGHPg5xQgM/edit 
  • Seems to be some consensus that this could be a good thing, but still do not have a clear picture of the goals or program. If we are to proceed with this, we need some follow-up from experiments regarding what they want from the workshop, and what problems we should try to address during the working time. Please try to provide feedback on that.

3. On-going multi-threading work

  • Mike Wang continues to find differences in the output of the DUNE workflow he is working on depending upon whether it is executed in single-threaded or multi-threaded mode. He is working to understand the cause. Previously found an art bug that caused event number mappings to change.

Herb: What kind of multi-threading is he doing?

Erica: A couple of types. Events are being run in parallel using the multi-threading features of art, and hits are being reconstructed within each event using multi-threading in the GausHitFinder code. Both use art and GausHitFinder use TBB.

  1. Google searches of LArSoft wiki on GitHub
    • Have had success in getting at least some pages indexed by google, and getting google hits within larsoft.github.io. 
    • People should try googling for things in the wiki next time they want to find something. Please let us know if it fails.
  1. GENIE v3.02.00
  • Are preparing to integrate GENIE v3.02.00 into LArSoft, which will involve a test release. Expect this work to be completed in the next three weeks or so.
  • Note that there is already a v3.02.000 test release – v09_55_01_01 released on July 25. 
    • Have people looked at this?
  1. Spack update
  • From last time, discussed these steps / milestones for phase 2 of migration:
    1. The experiments must  convert all of their code to use Cetmodules and modern CMake best practices (a la LArSoft phase 1).
    2. The experiments must also  produce and/or verify Spack recipes for their own packages, and for all external dependencies not directly supported by SciSoft.
    3. The current LArSoft stack and its dependencies must be verified to be buildable by Spack. There have been many changed/added dependencies since the last time this was done, so this is not a trivial task.
    4. We must have a system usable by LArSoft and experimental release managers capable of building and releasing a fixed and reproducible distribution of their code and all dependencies via Spack for all supported platforms and compilers. These distributions must be installable on supported systems with maximum (re-)use of pre-built and cached binaries, and minimum rebuilding of packages unchanged from one release to the next.
    5. We must have a multi-package development system capable of using and producing Spack-built binary packages for distribution via BuildCache.
    6. Validate everything on the release current at this point, obtain sign-off from all experiments, then execute the migration.
  • Chris Green et al currently working on some combination of (c) through (e), which do not depend on experiment code, and can proceed in parallel with those items that do
  1. Geometry adaptations for pixel detectors
  • Have started a series of technical meetings to work through design and implementation details. Do not yet have a preliminary design, though conceptually, we know that we need to abstract and separate anode descriptions from the TPC volume descriptions. Anodes are not currently a concept in the geometry. Wire planes are, but they will not be an attribute of all TPCs. How to do that is the topic of the meetings.

8. LArSoft 2023 Work plan discussions will be starting.

Erica and Katherine will discuss priorities with each experiment in a series of meetings in October. The experiments should be prepared to detail their plans for the next year, the implied requirements for LArSoft, and how the LArSoft Project Team could help, as well as what the experiments might be able to contribute to LArSoft code. Based on those discussions, LArSoft proposes a plan of work for 2023 along with relative priorities of the various items. This plan is presented to the Steering Group for discussion and input.

Discussion on the project report:

Herb: Is there a problem if Lynn retires for real.

Erica: Yes. Patrick Gartung can do releases, but does not currently do everything that Lynn does due to commitments to other projects. I have worked on solving this problem, but does not appear to be a sufficiently critical issue for the division to warrant a solution at this time.

Herb: Back to multi-threading. Has anyone tried to run an art module where different modules have different threads?

Erica: I don’t think so. With different trigger paths, you could run maybe concurrently. But trigger paths define a sequence of operations. Would need strict adherence to ‘consumes’ and ‘produces’ at the very least to make this possible.

Round Robin: 

  1. ICARUS: Tracy Usher
    1. Standard statement on multi-threading. In principle, what is holding us up is our noise-removal code. It’s been on the list to fix. We know what we need to change, just haven’t gotten a person to do it. We recognize that it would be beneficial to us to do it. 
      • One other roadblock (excuse for us):  there are services in LArSoft where multi-threaded versions are on a branch, but haven’t been integrated. They are services that access databases 
      • Erica: This is the last thing that Saba was working on, using the art concurrent caching to ensure that the database was thread-safe.
      • Erica: Can you send me the specific services in question?  that you care about, that would be helpful.
      • Tracy:  Yes
    2. Tracy: Nominal start  of physic quality beam for Run 2 is October 15th. Our intention is to do a new production branch the week of Sept 19-23. That means we need a branch. Not a lot of time to shake out. 
    3. Production runs in three stages
      • stage 0:  signal processing, 
        • Ideally this will be frozen until next summer. This is where the huge amount of data is. Need this for keep-up processing. Want it ready the week before data taking begins
      • Stage 1:  the reconstruction side, starting from hits going through Pandora
        • Would not necessarily be frozen on that timescale. Sometime in December, we would try to freeze that. 
      • Stage 2:  Common Analysis  (CAF) output. The bulk of analysis would be there unless people wanted to look at event displays.
    4. Have to coordinate SBN-wide for data sets, which will be based on this format. Freeze dates will be tied to a joint SBN and ICARUS date.
    5. Erica:  Is there something you need that isn’t there for stage 0?
    6. Tracy:  No. The wirecell stuff has come in. Will use WireCell 2D simulation. Still debating about 2D deconvolution. Related to complications of runnnig WireCell things in the ICARUS environment. Intend to make this switch, but it is not clear whether that will happen this year. It is still the ultimate plan. 
    7. If we remove waveforms from stage 0, then the heaviest object left is reco-wire object. Those don’t compress nicely because they are float objects. We may substitute that with a ‘short int’ object. Simple to implement using code Gianluca developed. We won’t be able to use LArSoft Event Display with this, however.
    8. Don’t need to go back to floats since the information is stored in the hits. Have to use lighter-weight objects, smaller, and can use for hand scanning.

 

2. ArgoNeuT: Tingjun Yang

    1. Previous release  manager, Patrick Green, has graduated. Have a postdoc who took over, Wanwei Wu. 
      • The contact list has already been updated:

https://larsoft.github.io/LArSoftWiki/LArSoftInternals/Informal_list_of_experiment_contacts

3. Proto DUNE: Tingjun Yang

    1. Tom knows more about the general activities. Tom has been working on simulating events in HDF5 format. 
    2. Working on preparing ProtoDUNE horizontal drift, updating the geometry for Run 2. Pursuing similar activities for vertical drift (single-phase)

4. Near Detector DUNE: Tingjun Yang

        1. Able to run GENIEGen module with near detector geometry file. Gave a report last week at DUNE reco meeting saying it’s possible to do it using LArSoft.
        2. Tammy Walton is working on simulating near detector with edep-sim. Expressed interest / preference for doing this in LArSoft.
        3. Heidi Schelman expressed strong support for art-based simulation.
        4. People are aware that this is possible, and seems to be some traction for it.
        5. Geometry: There is a module 0. Took data at Bern. Is part of 2×2 prototype. Data is in HDF5 format. Trying to convert it to something LArSoft can understand. Need help in saving the location of each pixel.
          • Erica:  please set up a meeting to talk about it.
        6. Tracy: Have you been coordinating with the active machine learning group? They might be able to facilitate what you’re doing.
        7. Tingjun: I talked with many people in the machine learning group. Method is based on python, not based on LArSoft.
        8. Tracy: True, not based on LArSoft, but still impressive what they’re doing.
        9. Tingjun: Current focus is on the framework, not algorithms. Will need discussion in the future on the plan to put everything together.
        10. Tracy: There is an active machine learning group in ICARUS as well. Trying to do everything from 3D space point view. It’s not inside LArSoft, but they are trying to develop an interface between what they’re doing and LArSoft.

5. MicroBooNE: Herb Greenlee

          1. Trying to finish MCC9 analyses, so don’t need anything from LArSoft at the moment.
          2. Replying to Tracy:  SparseRawDigit is the MicroBooNE class in ubobj that has integer representation for waveforms. So not part of LArSoft, but could be moved into LArSoft.

Action Items

  1. ALL – provide feedback on Clang format and workshop writeup.
  2. ALL –  please let us know if searches of LArSoft information on github wiki fail. https://larsoft.github.io/LArSoftWiki/ 
  3. Tracy: List about the services in LArSoft where multi-threaded versions are needed. (May be implementations exist on a branch somewhere, but are not yet integrated into LArSoft)
  4. Tingjun:  schedule follow-up meeting for HDF5 discussion
  5. LArSoft: Update release manager for ArgoNeuT:  done.
  6. LArSoft: Check whether ‘consumes’ is enforced:  As noted above,  ‘produces’ has been enforced for some time. ‘consumes’ has been recommended for a long time, but is not enforced by default. One can enable it with a command line option:  

lar -c <my config> –errorOnMissingConsumes=true …

So MicroBooNE (or any experiment) could choose to enforce this independently of anyone else. 

There is also an option to find out what ‘consumes’ statements are missing:

lar -c <my configs> -M mt_diagnostics.txt …

July 2022 Meeting Notes

Offline leads meeting – 7/13/22

Attendees: Tom Junk, Chris Backhouse, Andrzej Szelc, Kyle Knoepfel, Tingjun Yang, Tracy Usher, Erica Snider, Katherine Lato

LArSoft Status:

  • Multi-threading work
    • Mike Wang is continuing work on a DUNE dataprep workflow used for SN processing. Had been investigating a difference in results from hit finding when run in single threaded vs multi-threaded mode. The recob::Hit PR discussed at the July 12 LCM was one outcome of this work, and fixed (at least one of) the differences. From this point, he will continue to add workflow elements until it is entirely thread safe / multi-threaded.
  • Spack migration:  phase 2
    • Have started working on Phase 2 of Spack migration, which will involve additional adaptations to Spack to support the full set of functionality needed to manage coherent releases. Will also need to understand and possibly remedy dependency structure of code in order to make Spack happy. 
    • Chris Green kindly provided the following high-level list of tasks that make up Phase 2. (With a sixth step added by Tom Junk.)
      1. The experiments must  convert all of their code to use Cetmodules and modern CMake best practices (a la LArSoft phase 1).
      2. The experiments must also  produce and/or verify Spack recipes for their own packages, and for all external dependencies not directly supported by SciSoft.
      3. The current LArSoft stack and its dependencies must be verified to be buildable by Spack. There have been many changed/added dependencies since the last time this was done, so this is not a trivial task.
      4. We must have a system usable by LArSoft and experimental release managers capable of building and releasing a fixed and reproducible distribution of their code and all dependencies via Spack for all supported platforms and compilers. These distributions must be installable on supported systems with maximum (re-)use of pre-built and cached binaries, and minimum rebuilding of packages unchanged from one release to the next.
      5. We must have a multi-package development system capable of using and producing Spack-built binary packages for distribution via BuildCache.
      6. Validate everything on the release current at this point, obtain sign-off from all experiments, then execute the migration.

Note that items (1) and (2) involve changes to experiment code and repositories. The largest uncertainties in the scope and scale of work lie in items (4) and (5). Until these are understood, we cannot provide detailed task lists or timelines. In the mean time, experiments should work on (1) and (2), and open tickets or communicate with SciSoft team members when they encounter problems or have questions.

  • Tom: Add step to verify that the Spack-built code runs and produces comparable results as the UPS version.
    • Erica:  Yes! (added above)
  • Kyle: Does Chris talk about wrapping UPS products in Spack?
    • Erica: He did in conversations about the migration, but it was not clear (to me) exactly how that fits into the plan – whether it pertains to some or all legacy things for instance. 
    • Kyle: Chris is presenting the big work required with migration? If we have bridge technologies, that’s not covered yet?
    • Erica: Correct.  I asked for the big picture at this point so that we have a framework for discussing status and more detailed planning.
  • Workshop planning discussion
    • Points where we are seeking input
      • Feedback on the proposal circulated
      • Thoughts on specific problems / pieces of code that need to be made thread-safe or multi-threaded
        • Once code is identified, then the experiments should start identifying the teams that will come to the workshop to work on things.
      • What if any tutorials might be helpful at the beginning of the workshop?
      • We’re looking at 3 or 4 days for this. When might be a good time? Or maybe better, when are bad times?
    • Discussion
      • Andrzej:  Is this more a thing for experts, or people to learn? Saw comments about tutorials. And in person?
      • Erica:  In my mind, a dual purpose. Acquaint more people with multi-threading techniques and solve particular problems of immediate relevance to the experiments. 
        • Target will be for experienced C++ coders. So not beginning grad students if we are to solve a real problem.
        • Are advantages to working in-person – engage with experts more easily. But expect this will not be practical.
      • Also in the proposal, to  work in small teams, each working  together on a single piece of code. Hack-a-thon style.
        • Work on code that matters.
        • Have seen this model work with the right technology. So have to put some effort into identifying “google docs for coding.”
        • Andrzej: thinks the hack-a-thon idea makes it more enticing. Having some kind of introduction at the beginning would be good. We haven’t identified where the problems are.
      • Erica:  Also in the proposal, first have the experiments talk about what problems they’re trying to solve with multi-threading. Particular solutions will depend on the code. “These are the problems. These are the approaches to fix it.” Like for the database, need concurrent caching. Art provides this. Could provide a tutorial for how to use concurrent caching. So target tutorials to the solutions needed. Or might encounter an unanticipated problem along the way and decide a tutorial would help, so stop and learn about a solution.
      • Ensuing discussion concluded that workshop / hack-a-thon would be best if focused on cases where we know there is a problem, but do not yet know where, and do not yet know the solution. For things where we do know a solution, we might not need a workshop / hack-a-thon session. 
        • Seemed to be general agreement on this point (?)
      • Andrzej:  Should each experiment identify the problem, talk to LArSoft team for advice, then everyone comes in with a defined problem.
        • Yes. It’s important that everyone comes in with a well-defined problem. 
        • Do not want to front-load too much work, but this seems a reasonable approach. If we can’t find such problems, then we don’t need to waste people’s time with a workshop, and can instead focus on facilitating fixing the specific pieces of code that need fixing.
      • Kyle:  suggested reviewing slides / talks from the previous workshops on multi-threading (though the team would be amenable to repeating some of them)

Links to relevant slides and videos of talks:  

      1. 2017 presentation Introduction to multi-threading
      2. 2019 Presentation – Multi-threaded art 
      3. 2019 Presentation – Making code thread-safe
      4. 2019 Presentation (powerpoint download) – Experience learning to make code thread-safe
      5. 2019 Presentation Introduction to multi-threading and vectorization

Round-table:

DUNE: Tom Junk

  • He ran the cetmodules migration script Chris had in Feb., and made all the “required” changes, but not all of the “recommended” changes. There are a bunch of find_ups_products. Do those need to go away? [Yes, believe so.]  Not using cetmodules yet, did it in a practice run, but can flip the switch at any point.
  • Not done with a similar thing for GArSoft. Haven’t tracked down the alternative [libraries??]. Required latest version of Tensorflow and products from LArSoft that use Tensorflow. That all works. Currently, GArSoft is stuck on Pandora.
  • Thinking about how to handle large scale of raw / processed digits. Talking with many people. Tied in with multi-threading, although multi-threading may be icing on the cake, since plan to manage by constructing workflows that operate at APA level [from file i/o through data prep and deconvolution]. Issues with file I/O we still have to deal with. Have been consulting with Kyle on this.
    • Kyle: only framework support applicable is stuff (like removing cache) which Tom is aware of. Or alternating way data products are stored. That’s a big change. They’re reading one APA at a time. Things could be improved a bit. Framework does support the concept of an abstract delayed reader. That doesn’t get away from the basic problem they’re having.  
    • Tracy: Before ICARUS could run multi-threading, there’s some services that needed to be changed. Two of them, maybe Detector ones. 
    • Kyle: DetectorPropertiesService and DetectorClocksService are already thread safe. ChannelMappingService and the services that access things in databases are still issues. Saba & Kyle made a lot of progress, but didn’t get it finished. There is a dedicated branch for this.
    • This particular work was one of the casualties of the bleeding of effort from the project team. So have not made progress on it since Saba left.
    • Tracy: We would like to make use of this. We’re running single threaded jobs on three grid slots, effectively throwing away two cores.
    • Erica: The loss of effort has hurt us. The important thing now is to know exactly what services are the impediments in your case.
    • Tracy: I’ll try to follow up.

DUNE: Tingjun Yang

  • Working on simulating neutrino interactions in the Near Detector. HeI summarized this at the last LArSoft meeting. We figured out a way to save energy deposits in both detectors. Identified a few places we need to make the framework (LArSoft) more flexible to accommodate different detector types (eg, the geometry system). Hans provided a workaround for one of the problems, and Gianluca made some improvements to the Geometry service. 
  • Next want to work on the drift and detector response simulation. Need to think about how to get the location of the pixels, determine direction inside volumes, etc,, which will require changes to the geometry. 
    • Erica:  started work on this with Hans and Kyle (and Tingjun). Believe everyone agrees on the conceptual design, but need more discussion and more planning to make a detailed design that we can start implementing. Have been busy the past month, but will try to continue this work before the end of the month.

SBND: Andrzej Szelc

  • Had a SBND collaboration meeting end of June at Fermilab. People want to use different generators, some BSM generators. No one seems to know about the LArSoft work to make this easier, or the GENIE work. And would like Genie 3.2 as soon as it comes out.
    • Erica:  SciSoft team is getting weekly reminders about the need for this. Believe the holdup until now has been spack-related work, but now that Phase 1 is completed, should be able to prioritize getting GENIE updated.
  • Reconstruction of the photon detection progress.

SBN Data/Infrastructure: Chris Backhouse

  • Nothing to report.

ICARUS: Tracy Usher

  • Nothing to report.

May 2022 Meeting Notes

Offline Leads – May 25, 2022

Attendees: Chris Backhouse, Wesley Ketchum, Herb Greenlee, Joseph Zennamo, Tom Junk, Tingjun Yang, Erica Snider, Katherine Lato

LArSoft – Erica Snider

  • The migration of the Redmine LArSoft Wiki to GitHub Pages has been completed and is now available at https://larsoft.github.io/LArSoftWiki/. Among other things, this move should in principle allow search engines to index the LArSoft documentation, as was possible before the Fermilab web servers were put behind the Fermilab SSO. To date, however, Google searches do not find the LArSoft GitHub wiki.
    • Note: bing.com and duckduckgo.com do find LArSoft wiki pages on GitHub.
    • Should you have an edit or other content suggestion, you may let us know via issue tickets (which are still in Redmine), pull-requests on the LArSoft/larsoft.github.io repository in GitHub, or email to scisoft-team@fnal.gov.
  • Thread safety and multi-threading work:  Mike Wang has been working on a simplified DUNE SNB processing workflow that uses the 1D deconvolution in CalData instead of WireCell. The steps are:
    • CalData
    • GausHit
    • SPSolve
    • HitFD
    • TrajCluster
    • PMTrackTC
  • The CalData stage has been modified to use a thread-safe implementation of LArFFT written by Mike.  The GausHit hit finding stage is based on Mike’s implementation of a Levenberg-Marquardt fitter that Guiseppe Cerati and Sophie Berkman ported to LArSoft.  Mike is currently looking at the SPSolve stage, which is the 3D space point solver that also performs some disambiguation of hits.  Aside from making this stage thread safe, there is opportunity to incorporate multi-threading within the module as is done in the GausHit stage.
    • Mike is currently validating that multi-threaded and single-threaded execution yield the same results. As of May 16,  there are differences, so he was working to identify the source.
  • Work requested by SBND to implement legacy LArG4 behavior has been completed. Though missing the SBND deadline, the work is now on the head of develop, so will be available for future releases. A proposal for a long-term solution has been advanced. A follow-up discussion is needed to determine how to proceed from this point.
  • Spack migration
    • The Phase 1 migration of all LArSoft repositories to cetmodules is nearly completed. The process was relatively straight-forward in almost all cases. larrecodnn required some extra attention, which is not unusual given that it touches TensorFlow.
    • Expect work toward Phase 2 will begin shortly. Unlike Phase 1
      • The tools and procedures for building will change
      • Everything changes at once – we wake up one day, and everything will be different. There are no staged changes
    • Do experiments plan to follow this migration? (It might be required, but not yet sure.)
    • Q:  How will builds of external packages maintained by experiments work once we migrate?
      • Currently packaging a number of experiment code dependencies  in UPS
      • A:  not entirely sure. Will discuss this within the SciSoft team. SciSoft will in general provide assistance with migrating experiment code and external dependencies.
    • The details of the migration are not yet known, but will feature
      • A migration path that allows for testing of all relevant code prior to the switch
      • An education campaign for users
      • Assistance with migrating any experiment code needed 
      • Timelines developed in close consultation and agreement with experiments
    • Q:  What about legacy branches, eg, the MicroBooNE MCC9 series?
      • Presumably, things that live in the legacy world will continue to work within the legacy environment. Will not obviously even complicate back-porting code, since that usually does not involve elements of the build systems
    • A discussion with Chris about what NOvA is doing ensued, since they are art users. Not likely to be directly relevant to LArSoft, though could affect the timeline if an associated migration is needed.
    • Wes:  for SBN, there is the question of how to manage the Spack transition for the online environment
      • The DAQ is built on art-daq. Everything relies on relocatable UPS packages
      • Erica:  so this suggests that transition needs to happen during a beam downtime?
      • Not necessarily. Noted that they can operate in a legacy mode for some time, but best not to be in that position long-term. So just needs to be coordinated with SBN DAQ
      • Next SBN downtime:  Early July through mid-Sept, full beam back early Oct. (we think)
      • Wes will be working with DAQ people to discuss this.

Experiments:

  • DUNE
    • Tom: 
      • ProtoDUNE 2 takes data next year. Will use the DUNE DAQ. Wes knows more of the details.
      • DUNE can probably find people to do the Spack migration on the timescale we want, provided that the legacy system is available to everyone else. DUNE may require help with Phase 2 Spack migration if they get stuck.  
        • Erica:  This is expected. SciSoft will provide support.
    • Tingjun:
      • Discussing with LArSoft team about supporting simulation and detector.
        • Erica:  excited to have effort from the experiment directed into this long-standing interest for LArSoft. Will be working with the experiment to develop a plan for the necessary changes. That will involve changes to the geometry, so need to be clever so as to minimize the disruption from that.
        • Tom:  commented on possibility of dueling software stacks. Do not want to disrupt ability to continue code development
        • Erica:  prefer integration, but may be places were dueling stacks may occur. Want to minimize that unless there are clear gains. 
        • Note:  There is already a PR https://github.com/LArSoft/larsim/pull/94 and a resolved redmine ticket to address some of the issues related to this work: https://cdcvs.fnal.gov/redmine/issues/26961
  • ArgoNeuT:  Tingjun
    • ArgoNeuT imported the CVN product, which provides DL tools to select neutrino events.
      • Originally added to DUNE, and copied it from there into ArgoNeuT, but there was an issue trying to build it in ArgoNeuT. Once difference with DUNE usage was that DUNE followed the update to use the Triton package with CVN, but ArgoNeuT did not.
      • One idea to resolve this is to move the common code to LArSoft and only have the experiment specific part in each experiment repository. 
      • Working on it now, may need support from LArSoft.

 

  • SBN:  
    • Joseph
      • Chris Backhouse is taking over the SBND side of SBN for Joseph. 
      • SBN just launched the first large scale production of beam exposure. That’s been going well. 
      • Moving on to the next stage, at-scale production similar to one year’s exposure. 
        • Chris will be taking the lead on this. 
        • Maybe 100 million events. 
        • This will probably strain our systems, already see the need for performance and workflow improvements. 
          • Have been focusing on running all the basics that are needed at scale, but now performance upgrades are needed. 
          • Hope to adopt 2D deconvolution, overlay workflows, etc. 
          • Long-term, need to come back to understand how to lower [delta-ray] production thresholds in Geant4. Those improve fidelity, but come with a steep performance impact. 
    • Wes
      • We need to go through and plan the next production, software updates. Might be a number of requests that come in related to that. 
        • Eg., different lifetimes in different cryostats. 

 

  • MicroBoone: Herb
    • Making an effort to integrate MCC9 updates into develop. 
      • May require updating to refactored LArG4 framework. Would require help or advice for this migration
        • Otherwise, worried that they will be left behind. For instance, the new light simulation is probably the  most important new development that would be useful to MicroBooNE. This is in the new LArG4.
        • Erica:   very glad to hear that this is in the plan for MicroBooNE. The project will provide whatever assistance is needed. 
    • Worried about whether redmine will go away. 
      • MicroBooNE has not migrated to GitHub, and has not been pushing that.
      • One advantage of Redmine is that it has one landing page with links for wiki, repository, and issues. Would need to work to replicate this on GitHub. Otherwise people need to look in multiple places for everything.
        • Erica:  so far no indication there is an EOL date for Redmine. Will bring the question to Jim Amundson at next meeting with him.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.