All posts by klato

August 2024

Offline Leads Meeting Notes: 8/14/24

Attendees: Michael Kirby, Thomas Junk, Erica Snider, Katherine Lato

LArSoft status:

The main focus of the meeting was on the status of Spack. 

  • As of the time of meeting, we did not have much new information since the August 6th LArSoft Coordination Meeting aside from the following update from Marc Paterno:
  • The team is working to finalize a plan, with an associated document, for how we are to provide pre-built software through a Spack environment, what we call “standard builds”. This takes the place of UPS packages grouped into “umbrella products” or “distributions”. The document also describes new procedures and processes for managing releases, and the division of responsibilities between experiments and the SciSoft team should experiments wish to exploit the added flexibility in building releases that Spack provides.
  • The team recently started testing a method for building standard releases that avoids issues of Spack “re-concretizing” already-built products. This re-concretizing is the last significant issue with Spack that is blocking migration.
    • [Discussed this point with Marc after the meeting. The candidate solution involves creating a sub-environment of low-level libraries, then telling Spack that they are not re-buildable. This approach worked in a small scale test, and is now being tested on a large software suite (not LArSoft).]
  • The team has developed a beta version of a download-and-setup script that with only two commands installs spack and sets up a personal environment ready for development

 

  • Q:  Is there  “added flexibility” because it allows experiments to build against any compatible version of any dependent product?
    • A:  Essentially yes. UPS locks you into specific versions. Spack does not. This is something DUNE has been requesting for a long time. It has many benefits, but also some notable costs. 
    • Along the lines of added flexibility, there has long been a work plan item to decouple the version of GENIE from LArSoft. We will complete this work after the migration. Can then compare two versions of GENIE in exactly the same LArSoft release.
  • Q:  Is progress on the migration moving faster since moving the project into DSSL?
    • A:  [Update since the meeting:  as noted above, there is potentially significant progress on solving the main problem that has plagued Spack for a long time. Also, there has been progress in recent weeks in several areas that are relevant to LArSoft. So maybe.]

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

April 2024

Offline Leads Meeting Notes 4/24/24

Attendees: Tom Junk, Giuseppe Cerat, Herb Greenlee, Steven Gardiner, Erica Snider, Katherine Lato

  1. New geometry validation status

    • Is anyone working on this?
    • Tom: started a build with RC0. Easier to merge than expected. Someone else was working on RC1. It builds and runs, but no one has checked the output.
    • The validation is fairly simple. Do not expect any changes in results, so just see that nothing changes. 
      • Given this, we will adopt a new acceptance protocol:  one completed validation, then proceed with the migration. 
  2. Erica has completed the LArSoft portion of SpaceCharge changes requested by SBN. 

    • Joseph Zennamo is responsible for the experiment side. Mike Mooney will be making the necessary changes to SpaceCharge service, and will do so for all experiments. Joseph has been busy with SBND commissioning, but said he will get to this soon.
    • Once SpaceCharge changes proceed, LArSoft will make changes necessary for TPC-dependent electric fields (nominal) and electron lifetimes.
      • Tom commented on possibly spatial dependence of lifetime. A resolution issue.
      • Erica:  Would like to see the data that says it’s important enough to matter.
      • Tom:  believes they have ProtoDUNE data that says it does not.
  3. Changes to recob::Hit coming

    • The GausHitFinder is also changing. Will add code to draw boundaries between overlapping hits, and use the smaller sums to fill the new data members
    • At present, the new algorithm is built into the hit finder code. If a change is needed to diversify the solution, then it will be made into a tool. That was the promise from the algorithm authors [at DUNE].  
    • Giuseppe commented to be sure that new code does not affect the multi-threading capabilities.
      • Are reviewing the PRs now, so will check.
  4. Note about services that are either going away or being transferred to other places (which could include the experiments)

    • Has been mentioned to experiments, hopefully at FIFE meetings
    • The operations budget has been stressed, so many systems are being affected.
    • Things like POMS, SciSoft web service, CI system, Spack may be impacted.
    • Bringing this up because it (a) impacts all experiments, and (b) may impact LArSoft support in some way. 
    • DSSL does not want to see loss of support for any of these critical resources. Giuseppe echoed that CSAID has worked hard for a long time to get everyone consolidated into using these common solutions, so it would be a bad thing to turn around and tell people that they needed experiment specific solutions from now on.
      • Exactly…
      • May need to organize meetings with current support people, understand how and if support can be moved to DSSL
  5. Spack status

    • Kyle has been working on the development environment part of this. Will report at the Coordination meeting after this next one. Waiting to hear details about a potential solution from Sandia NL.
    • More than just packaging is required. Spack environment needed. 
  6. Round-table

    • SBN Data/Infrastructure (Giuseppe)
      •  SBN has recently cut a production release, but realized that some other changes in larsim are needed
        • Decided to update production to newer LArSoft base release – now v09_89_01.
        • Tracy wrote an email about this last week. Everyone is on board as far as I can tell.
      •  SBND also needs a new GENIE release to fix a timing problem for “dirt” events.
        •  The new GENIE release has been tagged, but not yet distributed in UPS. Eventually this will need to be picked up by the SBN production release
        • Erica:  So SBN will be on a separate version of GENIE?
        • No, hope not.
        • Mentioned GENIE version decoupling work that is on-going. Have not completed this yet, so will either need to isolate any special GENIE version to a production branch, or bring all of LArSoft along.
      • Hosted a first spack tutorial at SBN Analysis Infrastructure meeting
        • Idea within SBN was to have separate follow-ups for different communities
        • Building and distributing code under Spack for release managers and similar
        • When Spack development solution ready, then another tutorial on that for a larger audience.
        • Need to know the timeline for things
        • Erica: Sounds like a good plan.
    • MicroBooNE (Herb)
      • LArSoft will stop building under SL7, right?
        • Erica:  We cannot run SL7 at all, though we should be able to provide builds using containers, for use within containers.  You can assume we would continue to build for the platforms experiments required, at least for the time being, so will provide SL7 builds if MicroBooNE needs those. 
        • Is MicroBooNE committed to SL7 for the foreseeable future? Yes for MCC9.
      • Will MicroBooNE remain at MCC9 forever? Seems that some parts of MicroBooNE will
        • Discussed MCC10. Adoption has been very slow. Even people working on DL code have been back-porting into older releases. Herb has encouraged them to use integration releases (ie, MCC10), but the developers seem reluctant.
        • Giuseppe mentioned that MCC10 might require huge validation investment, which is too much for individual developers. 
        • Herb mentioned that MCC10 workflows run, but that no one has looked at the results.
        • So seems MCC9 will remain a thing for a long time
      • In general, MicroBooNE needs more hand holding for Spack. Herb has looked at the tutorials and they don’t answer the questions he has.
    • DUNE
      • Exactly when are we going to start building routinely with Spack?
      • Erica will check with SciSoft team
      • Should be happening on the CI system, right? Yes, in principle. 
      • Have to keep a version of all the pieces in the UPS products we have. I did that this week. DUNE dac people have a development environment that I don’t understand. Could ask them how they do Spack? Problem is multiple repositories. 

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

January 30, 2024

The January 2024 LArSoft Offline Leads status update was handled via email and a google document. 

LArSoft – Erica Snider

  • Work is proceeding to accelerate DUNE physics processing using GPUs for select LArSoft algorithms, starting with the PDFastSimPAR module. The plan is to first  optimize serial execution, then parallelize on CPUs, and if needed to achieve performance goals, parallelize on GPUs. There will be update reports at both LArSoft Coordination Meeting and within DUNE FD reco group this month.
  • A new machine learning algorithm, NuGraph2, is available within LArSoft. The algorithm performs hit classification and clustering using a GNN, and is being developed for use in MicroBooNE, and will be used in ICARUS later. Details were discussed at the Dec 12 LCM. Documentation on the algorithm and how to use it will be posted to LArSoft.org in coming weeks. 
  • A Spack build of LArSoft became available last month. SciSoft is planning to provide a tutorial on how to build experiment code under Spack using this new release. Experiments wishing to participate will need to use machines under AL9 to run the builds.
  • Geometry re-factoring:  working on completing documentation, the last step needed prior to releasing it as a LArSoft v10 release candidate for experiment validation.

DUNE – Heidi Schellman, Tingjun Yang, Tom Junk

DUNE will start work this period on the Spack migration now that the LArSoft Spack build was made available.  We have already been in contact with Steve White, Marc Mengel, Patrick Gartung and Kyle Knoepfel on the subject.  We will need to keep SL7 build nodes and interactive nodes as long as possible – there have been proposals to shut them down in March but we would like them longer.  We also are attending the container task force meetings and have a container that works for builds and will test the current worker node container again.

LArSoft comment:  The project pushed back on the original March 20 suggestion, and SCF Dept Head agreed to allow at least some build nodes to remain up beyond that date. A suggested compromise shutdown date is mid-May. LArSoft (tentatively) responded that this would be acceptable. EOL is end of June.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

No Report

SBND – Andrzej Szelc

No Report

SBN Data/InfrastructureSteven J. Gardiner, Giuseppe B. Cerati

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

November 15, 2023

LArSoft Offline Leads 11/15/23 Meeting

Attendees: Herb Greenlee, Tom Junk, Will Foreman, Giuseppi Cerati, Erica Snider, Katherine Lato

At the meeting, we went over the draft of the 2024 work plan for LArSoft. 

Number one priority continues to be multi-threading and High Performance Computing (HPC) with several people working on this in 2024. Notably, SciSoft has effort available to make algorithms run on GPUs. Requirement is that they be relatively slow, in LArSoft, and amenable to acceleration with a GPU. There was a follow-on question about GPU as a service. This is described at: https://larsoft.org/using-gpu-as-a-service-in-larsoft/. SciSoft believes this is ready to run at scale, so just need to work out the logistics of spinning up the GPU server somewhere.

Spack migration has a deadline of Q2 2024 in order to provide time for experiments to complete their migrations in advance of the June 2024 SL7 EOL.  AL9 requires Spack, since there will be no UPS support. This work seems close, though we still do not have a detailed timeline to completion. Comment:  Experiments have changes to make, so do not want to be pinched for time. Response:  Given that all experiment code has been migrated to cetmodules, the procedure for getting from there to a Spack build should be straight-forward. Expect that any issues will be related to getting the experiment-specific product stacks under Spack. Q:  Containers should provide some cover? Yes, should provide a buffer to the June 2024 EOL. Generally, the process will be that AL9 and SL7 will co-exist for some time to allow experiments to migrate. Spack has to come first. 

Support for multi-experiment event display has been on the wish list for a while so that upper management understands that multiple experiments have requested this.

A new item in this year’s work plan are updates to the LArSoft infrastructure. These include, but are not limited to:

  • Sampling frequencies vary across TPCs in protoDUNE, while LArSoft supports only a single value.
  • Support for non-planar cathode geometries to facilitate tracking across non-planar cathodes. 
  • Support for TPC-dependent drift velocities and electron lifetimes. 

Appendix B is a short summary of our major observations from one-on-one meetings with each experiment in September and October of 2023. Common items include: 

  1. Event display that is useful in the current environment.
  2. HPC.
  3. Faster processing.
  4. Event generators

Round Robin:

  • SBND:  Will Foreman
    • working to integrate blip reco into SBND code. Once completed, will migrate to LArSoft. But first want to make sure it is working in SBND
  • MicroBooNE:  all question answered during work plan discussion
  • SBN: Giuseppi 
    • SBND has allocation to run on Polaris at Argonne. Will be running simulation. Running in a container. First asked about Spack, but was not available, so opted for container. That has been demonstrated to work.
  • DUNE:  will send comments later.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

August 16, 2023

Offline Leads meeting – Aug 16, 2023

Attendees: Herb Greenlee, Thomas Junk, Tingjun Yang, Erica Snider, Katherine Lato

LArSoft Report:

  • Experiment meetings for 2024 plan development coming up starting in Sept. This is when we meet one-on-one to develop the work plan for the next year.
    • Please let us know if there is someone other than an offline lead that we should talk with for your experiment?
  • Progress on this year’s plan has been steady in some areas, but not fast
    • Thread safety:  all larevt services thread safe. Ones that access DB use concurrent caching. Now working on expt code. ArgoNeuT completed, SBND in progress (work started in spring)
    • Discrepancy between single-threaded and MT results in SN pipeline demonstrator understood, but work still underway to fix (on-going since winter)
    • Geometry refactoring to accommodate pixels:  factored into wire readout geometry and volume geometry. Adapting experiment code. After validation, will introduce pixel readout geom (on-going since winter)
    • GPUaaS:  working to extend to Pytorch. Allows pytorch GNNs
      • Herb: Have re-started a Reco Group. One of the conveners is at Tufts, and has been tasked with integrating the DL reco workflows. Several people have tried to run LArSoft on non-standard architectures. We’re thinking of appointing someone in MicroBooNE to be the HPC tzar.
        • Is GPUaaS targeting HPC, or stand-alone GPUs?
        • Erica: This GPUaaS is specifically for inferencing from grid node. But we can also develop algorithms to run directly on HPC or on a GPU. However, only runs with  TenserFlow currently. It’s being extended to Pytorch, which should be of interest to MicroBooNE and DUNE.
      • Tom: Network layer–don’t want to share if you have a lot of GPUs. If you just want to use it nearby and remote, do you need two versions of everything?
        • Erica: No. Our GPUaaS uses a triton server to access GPUs. That server can be run remotely or on the same node as the job running the client. So can serve jobs locally.
      • Tom: Are we going to do this (run on local GPUs or through GPUaaS) for future algorithms?
        • Erica: grid-like resources will continue to be important, which is the problem GPUaaS solves. The future problem is to make algorithms run natively on GPUs/HPC. Data needs to be structured properly to be efficient on GPUs. Will need to restructure some code and make use of portability packages. Mark Paterno is available to help with assessing changes needed, and modifying / writing the algorithms. [Some algorithms will be best optimized by completely rewriting. Others can be adapted as noted here.]
  • Budget for next year looks like it will be very tight. Do not know the impact on LArSoft. Paradoxically might help, but all remains to be seen.
  • Spack
    • Effort is starting to again be put into making LArSoft external buildable dependencies.   While there is a large set of products to deal with headway has been made.  We believe building recipes for most products will be straight forward with a couple exceptions, notably Tensorflow.
    • The tutorial has been given to Mu2e.  We are working with Mu2e to produce a clean usable build. Hope they will eventually give the tutorial to LArSoft community
    • CI’s first nightly build of the art suite completed! It was built on both SLF 7 and AL 9!  The build scripts are invoked independently from Jenkins.  We need to move to using Jenkins on the backend to reduce boilerplate.
      • If we end up on AL 9, plan is to retire UPS.
    • Experiments continue to struggle with keeping develop branch of repositories current, even as rate of integration releases is less than the usual once per week.
      • Should we discuss this release model? Will make this a topic for discussions with each experiment.

 

MicroBooNE – Herb Greenlee

  • Main concern cetbuild tools and Spack transition, and in particular, how to keep MCC9.1 (which is old) alive through the transition. Current effort to facilitate Spack transition is aimed at addressing CMakeLists.txt files. Have 400-500 in ub*, and are about 80% done with that. 
    • Was going to investigate if the same thing can be done with MCC9?
  • It’s taken a couple of weeks working on it part-time. It will go faster next time since know what to do now and have helper scripts ready. Not sure if underlying packages may require a rebuild of art. Currently on version 3.1.2. Herb is talking with Kyle. Curious whether other experiments have done cetmodules as Chris recommends.
    • Tom thinks that DUNE has updated–about a year ago. Used the migrate script from Chris. Still have a lot of UPS in CMakeLists.txt files, and don’t know how to make it go away. Not everything works if replace UPS with FindPackage. 
    • Needs to get rid of art_make’s, since that is deprecated in cetmodules
    • Tom knows that a lot of the root targets need to be specified one by one
    • Herb commented that this should be easy, since cetmodules builds in a lot of transitivity. All build targets are transitive, so list of dependencies gets shorter relative to cetbuildtools
    • Tom thought it might be fragile, if someone removes something that breaks transitivity, for instance. Example of a GArSoft package that turned out to be one of the LAr pieces.
  • Otherwise, just keep MicroBooNE informed of AL9 and what is coming. 
    • Would  not be opposed to keeping UPS alive in AL9

DUNE – Tom Junk

  • Same issues already discussed
  • Want a sustainable Spack solution that we can hand off to junior release managers.
    • Noted that there is a unilateral effort by one individual to do a Spack build of entire suite. That Believes that person is about ⅔’s through entire product stack. Know no details of what they are doing other than they’re hacking through this on their own.
  • Requested an AL 9 build node, but on hold for now, since they can’t use it yet (except on Tom’s personal desktop, which is running AL9). So no AL9 resources for the collaboration.
    • Can change an AL 9 build node to another type, so it’s not a waste.
    • Tom: can run SL7 in a container on AL9. The grid container doesn’t have some packages needed, so added a bunch of development RPMs to the SL7 container on his desktop node. 
    • Has questions about privilege in a container, but that is not a topic for this meeting.

ArgoNeuT – Tingjun Yang

  • Have decided to stop updating code. Don’t need any new art releases for ArgoNeuT code in the future. So are now an island, like lariatcode.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.