Category Archives: leads

April 2024

Offline Leads Meeting Notes 4/24/24

Attendees: Tom Junk, Giuseppe Cerat, Herb Greenlee, Steven Gardiner, Erica Snider, Katherine Lato

  1. New geometry validation status

    • Is anyone working on this?
    • Tom: started a build with RC0. Easier to merge than expected. Someone else was working on RC1. It builds and runs, but no one has checked the output.
    • The validation is fairly simple. Do not expect any changes in results, so just see that nothing changes. 
      • Given this, we will adopt a new acceptance protocol:  one completed validation, then proceed with the migration. 
  2. Erica has completed the LArSoft portion of SpaceCharge changes requested by SBN. 

    • Joseph Zennamo is responsible for the experiment side. Mike Mooney will be making the necessary changes to SpaceCharge service, and will do so for all experiments. Joseph has been busy with SBND commissioning, but said he will get to this soon.
    • Once SpaceCharge changes proceed, LArSoft will make changes necessary for TPC-dependent electric fields (nominal) and electron lifetimes.
      • Tom commented on possibly spatial dependence of lifetime. A resolution issue.
      • Erica:  Would like to see the data that says it’s important enough to matter.
      • Tom:  believes they have ProtoDUNE data that says it does not.
  3. Changes to recob::Hit coming

    • The GausHitFinder is also changing. Will add code to draw boundaries between overlapping hits, and use the smaller sums to fill the new data members
    • At present, the new algorithm is built into the hit finder code. If a change is needed to diversify the solution, then it will be made into a tool. That was the promise from the algorithm authors [at DUNE].  
    • Giuseppe commented to be sure that new code does not affect the multi-threading capabilities.
      • Are reviewing the PRs now, so will check.
  4. Note about services that are either going away or being transferred to other places (which could include the experiments)

    • Has been mentioned to experiments, hopefully at FIFE meetings
    • The operations budget has been stressed, so many systems are being affected.
    • Things like POMS, SciSoft web service, CI system, Spack may be impacted.
    • Bringing this up because it (a) impacts all experiments, and (b) may impact LArSoft support in some way. 
    • DSSL does not want to see loss of support for any of these critical resources. Giuseppe echoed that CSAID has worked hard for a long time to get everyone consolidated into using these common solutions, so it would be a bad thing to turn around and tell people that they needed experiment specific solutions from now on.
      • Exactly…
      • May need to organize meetings with current support people, understand how and if support can be moved to DSSL
  5. Spack status

    • Kyle has been working on the development environment part of this. Will report at the Coordination meeting after this next one. Waiting to hear details about a potential solution from Sandia NL.
    • More than just packaging is required. Spack environment needed. 
  6. Round-table

    • SBN Data/Infrastructure (Giuseppe)
      •  SBN has recently cut a production release, but realized that some other changes in larsim are needed
        • Decided to update production to newer LArSoft base release – now v09_89_01.
        • Tracy wrote an email about this last week. Everyone is on board as far as I can tell.
      •  SBND also needs a new GENIE release to fix a timing problem for “dirt” events.
        •  The new GENIE release has been tagged, but not yet distributed in UPS. Eventually this will need to be picked up by the SBN production release
        • Erica:  So SBN will be on a separate version of GENIE?
        • No, hope not.
        • Mentioned GENIE version decoupling work that is on-going. Have not completed this yet, so will either need to isolate any special GENIE version to a production branch, or bring all of LArSoft along.
      • Hosted a first spack tutorial at SBN Analysis Infrastructure meeting
        • Idea within SBN was to have separate follow-ups for different communities
        • Building and distributing code under Spack for release managers and similar
        • When Spack development solution ready, then another tutorial on that for a larger audience.
        • Need to know the timeline for things
        • Erica: Sounds like a good plan.
    • MicroBooNE (Herb)
      • LArSoft will stop building under SL7, right?
        • Erica:  We cannot run SL7 at all, though we should be able to provide builds using containers, for use within containers.  You can assume we would continue to build for the platforms experiments required, at least for the time being, so will provide SL7 builds if MicroBooNE needs those. 
        • Is MicroBooNE committed to SL7 for the foreseeable future? Yes for MCC9.
      • Will MicroBooNE remain at MCC9 forever? Seems that some parts of MicroBooNE will
        • Discussed MCC10. Adoption has been very slow. Even people working on DL code have been back-porting into older releases. Herb has encouraged them to use integration releases (ie, MCC10), but the developers seem reluctant.
        • Giuseppe mentioned that MCC10 might require huge validation investment, which is too much for individual developers. 
        • Herb mentioned that MCC10 workflows run, but that no one has looked at the results.
        • So seems MCC9 will remain a thing for a long time
      • In general, MicroBooNE needs more hand holding for Spack. Herb has looked at the tutorials and they don’t answer the questions he has.
    • DUNE
      • Exactly when are we going to start building routinely with Spack?
      • Erica will check with SciSoft team
      • Should be happening on the CI system, right? Yes, in principle. 
      • Have to keep a version of all the pieces in the UPS products we have. I did that this week. DUNE dac people have a development environment that I don’t understand. Could ask them how they do Spack? Problem is multiple repositories. 

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

January 30, 2024

The January 2024 LArSoft Offline Leads status update was handled via email and a google document. 

LArSoft – Erica Snider

  • Work is proceeding to accelerate DUNE physics processing using GPUs for select LArSoft algorithms, starting with the PDFastSimPAR module. The plan is to first  optimize serial execution, then parallelize on CPUs, and if needed to achieve performance goals, parallelize on GPUs. There will be update reports at both LArSoft Coordination Meeting and within DUNE FD reco group this month.
  • A new machine learning algorithm, NuGraph2, is available within LArSoft. The algorithm performs hit classification and clustering using a GNN, and is being developed for use in MicroBooNE, and will be used in ICARUS later. Details were discussed at the Dec 12 LCM. Documentation on the algorithm and how to use it will be posted to LArSoft.org in coming weeks. 
  • A Spack build of LArSoft became available last month. SciSoft is planning to provide a tutorial on how to build experiment code under Spack using this new release. Experiments wishing to participate will need to use machines under AL9 to run the builds.
  • Geometry re-factoring:  working on completing documentation, the last step needed prior to releasing it as a LArSoft v10 release candidate for experiment validation.

DUNE – Heidi Schellman, Tingjun Yang, Tom Junk

DUNE will start work this period on the Spack migration now that the LArSoft Spack build was made available.  We have already been in contact with Steve White, Marc Mengel, Patrick Gartung and Kyle Knoepfel on the subject.  We will need to keep SL7 build nodes and interactive nodes as long as possible – there have been proposals to shut them down in March but we would like them longer.  We also are attending the container task force meetings and have a container that works for builds and will test the current worker node container again.

LArSoft comment:  The project pushed back on the original March 20 suggestion, and SCF Dept Head agreed to allow at least some build nodes to remain up beyond that date. A suggested compromise shutdown date is mid-May. LArSoft (tentatively) responded that this would be acceptable. EOL is end of June.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

No Report

SBND – Andrzej Szelc

No Report

SBN Data/InfrastructureSteven J. Gardiner, Giuseppe B. Cerati

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

November 15, 2023

LArSoft Offline Leads 11/15/23 Meeting

Attendees: Herb Greenlee, Tom Junk, Will Foreman, Giuseppi Cerati, Erica Snider, Katherine Lato

At the meeting, we went over the draft of the 2024 work plan for LArSoft. 

Number one priority continues to be multi-threading and High Performance Computing (HPC) with several people working on this in 2024. Notably, SciSoft has effort available to make algorithms run on GPUs. Requirement is that they be relatively slow, in LArSoft, and amenable to acceleration with a GPU. There was a follow-on question about GPU as a service. This is described at: https://larsoft.org/using-gpu-as-a-service-in-larsoft/. SciSoft believes this is ready to run at scale, so just need to work out the logistics of spinning up the GPU server somewhere.

Spack migration has a deadline of Q2 2024 in order to provide time for experiments to complete their migrations in advance of the June 2024 SL7 EOL.  AL9 requires Spack, since there will be no UPS support. This work seems close, though we still do not have a detailed timeline to completion. Comment:  Experiments have changes to make, so do not want to be pinched for time. Response:  Given that all experiment code has been migrated to cetmodules, the procedure for getting from there to a Spack build should be straight-forward. Expect that any issues will be related to getting the experiment-specific product stacks under Spack. Q:  Containers should provide some cover? Yes, should provide a buffer to the June 2024 EOL. Generally, the process will be that AL9 and SL7 will co-exist for some time to allow experiments to migrate. Spack has to come first. 

Support for multi-experiment event display has been on the wish list for a while so that upper management understands that multiple experiments have requested this.

A new item in this year’s work plan are updates to the LArSoft infrastructure. These include, but are not limited to:

  • Sampling frequencies vary across TPCs in protoDUNE, while LArSoft supports only a single value.
  • Support for non-planar cathode geometries to facilitate tracking across non-planar cathodes. 
  • Support for TPC-dependent drift velocities and electron lifetimes. 

Appendix B is a short summary of our major observations from one-on-one meetings with each experiment in September and October of 2023. Common items include: 

  1. Event display that is useful in the current environment.
  2. HPC.
  3. Faster processing.
  4. Event generators

Round Robin:

  • SBND:  Will Foreman
    • working to integrate blip reco into SBND code. Once completed, will migrate to LArSoft. But first want to make sure it is working in SBND
  • MicroBooNE:  all question answered during work plan discussion
  • SBN: Giuseppi 
    • SBND has allocation to run on Polaris at Argonne. Will be running simulation. Running in a container. First asked about Spack, but was not available, so opted for container. That has been demonstrated to work.
  • DUNE:  will send comments later.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

August 16, 2023

Offline Leads meeting – Aug 16, 2023

Attendees: Herb Greenlee, Thomas Junk, Tingjun Yang, Erica Snider, Katherine Lato

LArSoft Report:

  • Experiment meetings for 2024 plan development coming up starting in Sept. This is when we meet one-on-one to develop the work plan for the next year.
    • Please let us know if there is someone other than an offline lead that we should talk with for your experiment?
  • Progress on this year’s plan has been steady in some areas, but not fast
    • Thread safety:  all larevt services thread safe. Ones that access DB use concurrent caching. Now working on expt code. ArgoNeuT completed, SBND in progress (work started in spring)
    • Discrepancy between single-threaded and MT results in SN pipeline demonstrator understood, but work still underway to fix (on-going since winter)
    • Geometry refactoring to accommodate pixels:  factored into wire readout geometry and volume geometry. Adapting experiment code. After validation, will introduce pixel readout geom (on-going since winter)
    • GPUaaS:  working to extend to Pytorch. Allows pytorch GNNs
      • Herb: Have re-started a Reco Group. One of the conveners is at Tufts, and has been tasked with integrating the DL reco workflows. Several people have tried to run LArSoft on non-standard architectures. We’re thinking of appointing someone in MicroBooNE to be the HPC tzar.
        • Is GPUaaS targeting HPC, or stand-alone GPUs?
        • Erica: This GPUaaS is specifically for inferencing from grid node. But we can also develop algorithms to run directly on HPC or on a GPU. However, only runs with  TenserFlow currently. It’s being extended to Pytorch, which should be of interest to MicroBooNE and DUNE.
      • Tom: Network layer–don’t want to share if you have a lot of GPUs. If you just want to use it nearby and remote, do you need two versions of everything?
        • Erica: No. Our GPUaaS uses a triton server to access GPUs. That server can be run remotely or on the same node as the job running the client. So can serve jobs locally.
      • Tom: Are we going to do this (run on local GPUs or through GPUaaS) for future algorithms?
        • Erica: grid-like resources will continue to be important, which is the problem GPUaaS solves. The future problem is to make algorithms run natively on GPUs/HPC. Data needs to be structured properly to be efficient on GPUs. Will need to restructure some code and make use of portability packages. Mark Paterno is available to help with assessing changes needed, and modifying / writing the algorithms. [Some algorithms will be best optimized by completely rewriting. Others can be adapted as noted here.]
  • Budget for next year looks like it will be very tight. Do not know the impact on LArSoft. Paradoxically might help, but all remains to be seen.
  • Spack
    • Effort is starting to again be put into making LArSoft external buildable dependencies.   While there is a large set of products to deal with headway has been made.  We believe building recipes for most products will be straight forward with a couple exceptions, notably Tensorflow.
    • The tutorial has been given to Mu2e.  We are working with Mu2e to produce a clean usable build. Hope they will eventually give the tutorial to LArSoft community
    • CI’s first nightly build of the art suite completed! It was built on both SLF 7 and AL 9!  The build scripts are invoked independently from Jenkins.  We need to move to using Jenkins on the backend to reduce boilerplate.
      • If we end up on AL 9, plan is to retire UPS.
    • Experiments continue to struggle with keeping develop branch of repositories current, even as rate of integration releases is less than the usual once per week.
      • Should we discuss this release model? Will make this a topic for discussions with each experiment.

 

MicroBooNE – Herb Greenlee

  • Main concern cetbuild tools and Spack transition, and in particular, how to keep MCC9.1 (which is old) alive through the transition. Current effort to facilitate Spack transition is aimed at addressing CMakeLists.txt files. Have 400-500 in ub*, and are about 80% done with that. 
    • Was going to investigate if the same thing can be done with MCC9?
  • It’s taken a couple of weeks working on it part-time. It will go faster next time since know what to do now and have helper scripts ready. Not sure if underlying packages may require a rebuild of art. Currently on version 3.1.2. Herb is talking with Kyle. Curious whether other experiments have done cetmodules as Chris recommends.
    • Tom thinks that DUNE has updated–about a year ago. Used the migrate script from Chris. Still have a lot of UPS in CMakeLists.txt files, and don’t know how to make it go away. Not everything works if replace UPS with FindPackage. 
    • Needs to get rid of art_make’s, since that is deprecated in cetmodules
    • Tom knows that a lot of the root targets need to be specified one by one
    • Herb commented that this should be easy, since cetmodules builds in a lot of transitivity. All build targets are transitive, so list of dependencies gets shorter relative to cetbuildtools
    • Tom thought it might be fragile, if someone removes something that breaks transitivity, for instance. Example of a GArSoft package that turned out to be one of the LAr pieces.
  • Otherwise, just keep MicroBooNE informed of AL9 and what is coming. 
    • Would  not be opposed to keeping UPS alive in AL9

DUNE – Tom Junk

  • Same issues already discussed
  • Want a sustainable Spack solution that we can hand off to junior release managers.
    • Noted that there is a unilateral effort by one individual to do a Spack build of entire suite. That Believes that person is about ⅔’s through entire product stack. Know no details of what they are doing other than they’re hacking through this on their own.
  • Requested an AL 9 build node, but on hold for now, since they can’t use it yet (except on Tom’s personal desktop, which is running AL9). So no AL9 resources for the collaboration.
    • Can change an AL 9 build node to another type, so it’s not a waste.
    • Tom: can run SL7 in a container on AL9. The grid container doesn’t have some packages needed, so added a bunch of development RPMs to the SL7 container on his desktop node. 
    • Has questions about privilege in a container, but that is not a topic for this meeting.

ArgoNeuT – Tingjun Yang

  • Have decided to stop updating code. Don’t need any new art releases for ArgoNeuT code in the future. So are now an island, like lariatcode.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

May 2023 Meeting Notes

Offline Leads Meeting, May 30

Attendees: Erica Snider, Tom Junk, Tracy Usher, Miquel Nebot, Tingjun Yang, Steven Gardiner, Katherine Lato

LArSoft Report: Erica

  • Work plan update
    • Saba Sehrish has been working to pick up the work to make services thread safe. 
      • She had completed a number of changes, but was pulled away before she was able to commit them to the head of develop. Has nearly finished with updating all the relevant features branches to be able to merge these
      • Most important priority is to address any outstanding services that access database, and to make sure they use the concurrent caching currently available in LArSoft. 
    • SPACK migration timelines are still uncertain. 
      • The Spack team is confident that major changes aren’t needed to SPACK, but still not sure yet if the current implementation of SPACK can support what is needed, hence the uncertainty of the timeline. 
      • One note they emphasized is that we will be able to support builds under SPACK long before we have a complete development environment. 
    • Pixel detectors within LArSoft. 
      • There has been steady progress refactoring the geometry service as needed, and this work is nearly completed. 
      • Once that is done, the current geometry will be implemented via two separate components, a volume geometry hierarchy, and a readout geometry description. The latter knows about the former, but not vice versa. 
      • The next step will be to implement a pixel readout geometry description. A draft of this already exists. Full implementation will require working with people from  DUNE ND LAr
    • No progress on neutrino event refactoring. 
  • AL9 support
    • Current CSAID plan is to not spend the effort to bring UPS forward to AL9. If this is an issue, talk to your Spokespeople and have them bring it to the attention of CSAID management. 
  • Compilers
    • Recently included the possibility to build under clang 14 (c14) and gcc 12.1 (e26) starting with LArSoft v09_74_01 released on May 5. Default builds remain unchanged.
    • C14 and and e26 will become the default qualifiers when we migrate to art 3.13 (which will be soon)
  • Experiments:  please remember that keeping experiment code in pace with integration releases is necessary in order to have the benefit of the CI system for code updates and release validation. 

 

DUNE – Tom Junk

  • Release information, we’re at 9.75 at DUNE. 
  • One of our customers of SPACK is sending emails. Misner? He wants to run stuff on __ LArCP3 and link it with. We may have to upgrade DUNE’s software SPACK. There were some weird warnings. We contacted original developers, who reassured us that they were valid. 

SBND – Miquel

  • Worked through the update from the last release. Resolved an issue with dependencies, external ones. They were holding us back, but they are aware..

Argonaut – Tingjun 

  • Person who has been making updates has a new position, so won’t be continuing. Management says drop support for Argonaut support. If anyone wants to do work, they will have to do the updating. Believe there is no active work being done.
  • So, Argonaut is in the same position as LArIAT. There haven’t been any LArIAT releases for a long time.
  • Thanks for the support for so many years.
  • Probably makes sense to remove them from the CI tests. 

SBN: Steven

  • Miquel covered some. Do you want a message to take to MicroBooNE?
  • In response to Steven’s request for more information on Neutrino refactoring, Erica explained that LArSoft is built against a particular version of GENIE. The same is true of every generator. The only way to break that connection is to produce a text file from one generator. The idea was to break that dependency so that you could change the version of GENIE at run-time and not have LArSoft built against a particular one. That would be a model for all other generators that are built within LArSoft. Experiments pretty closely track each other for what version they use, but there’s no reason why we should have to coordinate that through LArSoft.The text file is difficult to deal with. Working with Robert Hatcher to do that. Developed a plan at the beginning of last year. It looked good. It’s been sitting ever since. Tried to get him to pick it up recently. He may have taken it back to GENIE management. 
  • Steven may follow up with Robert directly.

ICARUS – Tracy:

  • We are making progress for converting to the new LArG4. Converting to the geometry. Giuseppe picked it up recently and he’s been making progress. General schedule has a large simulation/reconstruction in fall with SBND. Targeting to have the work done at the end of summer. Giuseppe needs to work with Hans.
  • We are impacted by space charge stuff. Erica: I’ll try to get a meeting this week.
  • We have a soft production release. The main production is going to be done in fall. What we’re doing now is what we think we’ll do for the summer.
  • ICARUS workshop in 3 weeks, if we can find a room at Fermilab. 
  • We’re making good progress. ICARUS has its own challenges with the horizontal wires and other things, but we’re starting to make good progress.
  • Unified event display – continued lament. Titus would be good, but it doesn’t display what ICARUS needs. They don’t have the cycles to fix it themselves. All along it’s been ‘made to work’ which makes it difficult to understand. ICARUS breaks everything. 
    • Erica:  heard. Currently have no one available to work on this. The project continues to lobby for the effort needed.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

April 2023 Meeting Notes

The April 2023 LArSoft Offline Leads status update was handled via email and a google document. 

LArSoft – Erica Snider

  • Geometry refactoring to accommodate pixels
    • Have made good progress in removing the detector-specific readout geometry description from the generic detector volume description. Final elements of the design are completed, and the work to refactor the wire readout geometry description is at an advanced stage (Kyle Knoepfel). We expect to start testing this new geometry within the next two weeks. Once completed, a pixel readout geometry will be introduced based on an existing prototype written by Tom Junk. 
  • Thread safety work status
    • Saba Sehrish has picked up work she performed over a year ago on making LArSoft services thread safe. This work included introducing concurrent caching needed to make services that access databases thread safe. She began her current effort by working to bring forward the feature branches of that previous body of work to the head of develop. She will then turn to the service changes needed by ICARUS to make their production workflow thread safe.
  • art 3.12 update
    • We expect to be ready to migrate to art 3.12 on the time scale of a week or two. There will be some changes to user code needed. Kyle Knoepfel will talk about these at the April 18 LCM.
  • AL9 support
    • LArSoft will be supported under AL9 as SL7 enters end of life. In preparation for this, the SciSoft team has acquired the necessary infrastructure (build and CI nodes) to begin working on this support. An important point to take note of is that CSAID does not now support UPS under AL9, and has no plans to do so in the future. Consequently, the migration to the Spack-based build and development environments must be completed prior to providing full support for AL9. Since the timeline for the Spack migration is not yet known, however, we do not know when full support for AL9 will be available. We will keep the community informed as the situation changes.
  • Multi-threading workshop

DUNE – Heidi Schellman, Tingjun Yang, Michael Kirby Tom Junk

Just to let people be aware – Shekhar Mishra has been attempting to build LArSoft and dunes on AL9.1, as part of an AI/ML project for ICEBERG.  He says he has been able to build LArSoft on AL9.1 and is working on dunesw.  Not all dependent products of dunesw are built with mrb or have known build shims, such as duneanaobj and srproxy.  Some products have hand-built UPS products – dunedaqdataformats, dunedetdataformats, highfive and nlohmann_json.  The hand-built UPS products do not involve compiled code – they are header-only libraries and should just copy over to Shekhar’s setup.  We have been answering Shekhar’s questions but not actually doing work to support this path.  We have suggested using SL7 in a container on his AL9 machine as an interim solution.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

SBND – Andrzej Szelc

No Report

SBN Data/InfrastructureSteven J. Gardiner, Giuseppe B. Cerati

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.