All posts by klato

August 2021 Meeting Notes

This Offline Leads status update was handled via email and a google document.

LArSoft – Erica Snider

  • Making progress on art 3.09 migration, and have a third release candidate. We are aiming to transition LArSoft to art 3.09 during the week of Aug  9 or 16, depending upon what additional problems are found
  • After art 3.09 is in place, we expect to be in a position technically to migrate LArSoft to a build system based on cetmodules with a spack back end that provides backwards compatible support for UPS. (See the presentation by Chris Green at the Feb 23, 2021 LArSoft Coordination Meeting for some discussion of this migration). Work on this migration will begin immediately after the art 3.09 migration. 
    • Prior to rolling out the new system, we will provide experiments an opportunity to review documentation and our user support, along with a release candidate with the new system. We will seek explicit sign-off from the experiments prior to migration.
    • After this migration, we will begin work on phasing out UPS in preparation for a move to the final spack-only system. Additional user resources will be provided prior to that change.
  • Progress on thread-safety has slowed. The current focus is on converting services that access the database to use the art concurrent caching support infrastructure.
  • Kyle is working on preparing a profiling and optimization presentation, as requested in issue #25831. He proposed three separate 30-minute sessions:
    1. Basics of CPU and memory usage (stacks, caches, heap allocations) and guidelines for their use
    2. Tools for profiling your programs
    3. Stepping through profile results of a sample program

DUNE – Andrew John Norman, Heidi Schellman, Tingjun Yang, Michael Kirby

DUNE has scripts to split up dunetpc, but is waiting for Dom Brailsford to commit a rearrangement of the services fcl files which affect the ability of unit tests to run independently.  They plan on moving to GitHub for the new split repositories.  Heidi and Andrew are evaluating ways DUNE collaborators should use GitHub now that username/password access is disabled and tokens or SSH keys are required.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

At the Aug. 24, 2021 LArSoft coordination meeting, Herb presented a plan for reconciling the LArSoft version of data product ParticleID (package lardataobj) with the MicroBooNE MCC9 version.  The long term goal is for MicroBooNE to merge its MCC9 production release updates into the develop branch.  Some follow up work is required to decide between the strategy of updating ParticleID on the develop branch to match MCC9, or adding an entirely new data product class.  The sticking point all along has been backward compatibility with data files written using older versions.

SBND – Andrzej Szelc

No Report

SBN Data/Infrastructure – Joseph Zennamo, Wesley Ketchum

SBN is preparing for the August production in advance of the larger October production push. 

As part of this SBND has migrated to using the latest refactored LArG4 where they have observed issues with the MCParticle collections containing non-unique TrackIDs and SegFaulting when trying to access trajectory information. They have followed up with experts. 

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

July 2021 Meeting Notes

Offline Leads meeting – July 15th, 2021

Attendees: Miquel Nebot-Guinot, Andrzej Szelc, Wesley Ketchum, Tom Junk,  Erica Snider, Katherine Lato

LArSoft:

  • Working on making services that access the database use the caching system. (What Kyle Knoepfel presented at the LArSoft Coordination Meeting in November, 2020.)
  • Have been working through issues related to art 3.09 migration.
    • Recently resolved two root issues. One still being tested. 
    • Takes time to iterate on issues in the product stack
    •  Expect to be completed soon.
  • First phase of SPACK migration requires art 3.09, and expect will follow relatively quickly after migration to art 3.09. This phase will be compatible with UPS and mrb, so will not require major changes in how we do things.

Round Robin:

  1. SBN Data/Infrastructure  (Wes, Miquel) 
    1. Need to think about the online systems for SBN. We use UPS local products, and run two environments — one on the DAQ side so more real-time/online, the other on data quality so more like offline. We need to get experience with this for the Spack transition, but nothing appears to be outside current methods. The way we do stuff in the online system mirrors what is done in the offline. 
    2. Looking to freeze the code and get things in order for the next few months. It is advantageous to us to have the new build as soon as possible. ICARUS major physics run in fall. Hoping to get frozen the pieces for the ICARUS data reconstruction. When we freeze the code, we’re going to want to optimize it. May reach out for help on profiling. Hopefully in 2-3 weeks, we’ll have code that does what we want and will have dedicated time for optimizing. This will be our general pattern moving forward:  freeze functional production code, dedicated time for optimization
  2. SBND: (Andrzej)  Getting ready to move to new Geant4 framework. Made a module to take the  CRT output from the new way (SimAuxDetHits) to the old (SimAuxDetChannels)? The module takes hits and packages them as channels. Two objects that are effectively identical. Is this something LArSoft would be interested in? 
    1. Erica:  Contributing that to LArSoft would be good. 
    2. Andrzej: Will let Ivan know to get in touch with LArSoft.
  3. DUNE: (Tom) 
    1. Been working on chopping DUNE TPC into pieces because the builds are slow. Chopped into smaller pieces by taking directories out and assigning them to UPS products. Not too different from how LArSoft arranges things. Wrote a script to do the chopping since code changes while working on the split. One issue, the FHiCL files don’t factor as easily as the code because they are included often and include many other files. The FHiCL files can depend on things not there in the code dependency tree, so if I put a file higher on the tree, it depends on things that aren’t there. Can get around this by setting up the whole tree, but then it’s just like dunetpc now.  For LArSoft, can people set up subsets or must they run the whole thing?
      1. Erica: LArSoft depends on having experiment code — no native detector, for instance, so can’t run anything outside the context of an experiment. Doing an integration type test therefore requires a lot of repositories. I would expect to set up everything to do integration tests. For unit tests, the repositories should be stand-alone if done correctly.. Mrb test runs unit tests at build time on one repository at a time. Historically, there were integration tests (so full art workflows) put in the unit test part. As an aside, would encourage DUNE to strip all that out, put all integration tests into CI workflows. Can define many workflows there if don’t want them all to be run automatically. Then make sure all unit tests are stand-alone, so testable one repository at a time with ‘mrb test’. 
    2. David Adams gave a talk yesterday. Again advocated structuring around art tools
      1. LArSoft MT work is de-servicing as much as we can, since at least some of the current services don’t need to be (e.g., (things where there is no need for global scope). Really just there to take advantage of art state transitions
      2. State transitions can be handled at module scope with tools in many cases. 
      3. Noted that ProtoDUNE pulls event data from a DB. Beam configuration is at the spill level (where a spill is ~15 sec long). Need to optimize DB access for these cases.
    3. Discussed FHiCL structures again w/in context of re-factoring repositories. Long discussion about trade-offs of aggregating configuration versus layering, shortcomings of the current scheme, other ways to organize the layering, the utility of base configurations,… Difficult to summarize, and no clear conclusions.
    4. Noted that since https access to Redmine repositories has been removed, many collaborations who want to develop code (international developers in particular) can no longer check out DUNE code.
      1. So want to deploy to GitHub. 
      2. Wes noted SBN was happy with move to GitHub and use of pull requests. Resulting in better and more stable code. Having pull request mechanism in place is helping to improve the quality and stability of the code. They are starting to do code reviews as the code comes in. Erica echoed similar situation for LArSoft. Particularly good that with pull requests, LArSoft is able to  test the code before merging. Tom not sure DUNE has the effort available. 
      3. Wes commented that there are instructions on Redmine for how to set up a mirror on GitHub, 
    5. Also a discussion of factoring along functional versus detector axes. DUNE has a lot of detectors, so much of the organization is along detector lines. A lot of the code is detector-specific. Can’t use ProtoDUNE code for DUNE FD.
      1. Wes offered to share examples of using two detectors — SBND and ICARUS with common SBN underneath. Driven by how people work. Have two collaborations working together, though, which may not fit the DUNE model as well.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

May 2021 Meeting Notes

This Offline Leads status update was handled via email and a google document.

LArSoft – Erica Snider

  • The previously proposed rollback of hdf5 package will not be necessary. We have the required e20 builds, which required patches to an externally supported package. Thank you to those who followed up with testing the rollback.
  • The migration to art 3.09 is in progress, and is expected to be completed by mid-May. This new version comes with three associated changes:  
    • e20 as the default build qualifier 
    •  new version of root that addresses a problem reading certain files (issue #25615). This version of art is also compatible with cet_modules, and will enable the first phase of migration to the Spack-based build system
    • Tensorflow v2.3
  • SBN previously requested assistance and possibly a tutorial on profiling tools and techniques. The SciSoft team is prepared to provide this assistance. SBN should make a specific request via the Offline Leads Meeting, or Redmine issue ticket
  • Update on the status of memory footprint increase reported by DUNE:  Tom Junk reported some progress on the DUNE side. There has been no further progress to report from the LArSoft side. Kyle Knoepfel is tasked with following up.
  • The project has no progress to report on geometry extensions for pixel detectors

DUNE – Andrew John Norman, Heidi Schellman, Tingjun Yang, Michael Kirby

dunepdsprce, dune-raw-data and dunetpc have been compiled and tested with e20.  It took a little maintenance as data read-in methods sometimes involved creating pointers to elements packed structures complaining about possible unaligned data; e20 emits a warning with these.  All fixed, though if someday in the future 32-bit objects get padded unless we say packed, we could be in for more maintenance.  Tom’s progress with the memory footprint issue consisted of identifying software components that take more memory in larsoft v09_16_00 as compared with v09_15_00, and a lot of it seems to be what ROOT loads with it.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

No Report

SBND – Andrzej Szelc

No Report

SBN Data/Infrastructure – Joseph Zennamo, Wesley Ketchum

From email: We have opened a request for a profiling tutorial for SBN developers:

https://cdcvs.fnal.gov/redmine/issues/25831

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

April 2021 Offline Leads Meeting Notes

Offline Leads Meeting – April 22, 2021

Attendees: Joseph Zennamo, Tingjun Yang, Erica Snider, Katherine Lato

1) We had a request to migrate “best effort” Ubuntu support from LTS 18 to LTS 20. This requires building under gcc v9 (e20). This now works, so will begin “best effort” for LTS 20. Proposing to move to e20. DUNE and uB have products that will need to be rebuilt with e20.

Discussion: Joseph asked about the impact of shifting to e20? 

Typically aren’t changes to interfaces, but compilers get better at enforcing the standard. Sometimes code that compiles in an earlier version of a compiler doesn’t compile because the code wasn’t compliant with the standard. 

Tingjun noted that they tried to move to e20 for ArgoNeut code. Has some issues with warnings in TenserFlow. Lynn provided a solution, they’re going to test that. May have similar issues with DUNE, should start testing it.

LArSoft will migrate once experiments give the all-clear.

2) There is a request to migrate to TensorFlow v2.3. The project is ready to do this, but we need people from the experiments to check that everything works as required under the new version. Only larrecodnn uses TensorFlow within core LArSoft.  Both argoneutcode and dunetpc use tensorflow.

Discussion: Have expanded the scope of this migration to include moving to the next version of TensorRT (now re-branded as Triton) at the same time. 

Leigh Whitehead said in email several weeks back that  DUNE is ready for TF v2.3. Tingjun noted that some things have changed, so they need to run some of the tests again.

3) Rollback of hdf5 v1_12 to hdf5 v1_10. (Noted that the older version builds with e20)

Discussion: SBN has no immediate use for this, but given their drive to use HPC resources, expect that HDF5 conversions will be a part of the workflow at some point. No opinion at this time.

Discussion at last LCM suggested DUNE is ok with a temporary rollback to hdf5 v1_10. Need to confirm.

4) Round table:

Tingjun: ArgoNeuT and DUNE issues for LArSoft

 

  1. A producer module crashes when reading older data. Submitted an art ticket for that. Kyle was consulting, but it’s been a while and it may impact us soon. Urgent. https://cdcvs.fnal.gov/redmine/issues/25615
  2. Since LArSoft moved to new ROOT version see a 20% increase in memory usage for DUNE production jobs. Reported this via LArSoft issue. Tom Junk is the contact. Not as urgent, but it has impact on our production since we have to request more slots. https://cdcvs.fnal.gov/redmine/issues/25512 
  3. Tom Junk additions after the meeting via email:
    1. dunetpc compiles (and links) with e20 but I have yet to run anything more than an event display with it.  There’s an e20 build of tensorflow v1_12_0d that is included in the dependency tree when I built dunetpc just now with e20.
    2. There were a couple of things in dune-raw-data and dunepdsprce that caused gcc v9_3_0 to emit new warnings but these have been straightened out.
    3. I have tested the rollback to hdf5 v1_10 with the raw data readin source we have in dune-raw-data and it works. There’s now a dune_raw_data v1_18_01 which builds with the older hdf5, ready to go when the rollback is deployed. DUNE also depends on hdf5 via hep_hpc, and there is a rolled-back version of that now (with e20 even.  Thanks, Lynn!) I am discussing with Kyle about how best to do delayed reading with HDF5.  This is important to keep memory consumption down for the DUNE far detector and even helps us with ProtoDUNE data, which I assume will be in HDF5 format moving forwards, if I read peoples’ slides right.  We had it working with ROOT, but it will take some design and coding to get it right with HDF5.
    4. Regarding the memory increase with larsoft v9_16_00 (ROOT v6_22), I ran valgrind and spotted a few things that were taking more memory.  I don’t have solutions, however.

SBN: 

  1. Just started a workflow group with SBN to digest how they’re going to do everything and think about it. There may be things that affect LArSoft in the future, but not right now.
  2. What kind of support on profiling SBN code is there?  For the first pass, is it possible to invite a profile expert to a SBN meeting to help developers and analyzers learn how to use these tools? At least for the simpler ones. Need a discussion suitable for analyzers.

Erica: SciSoft team can assist with profiling. The lab provides a set of profiling tools, though it changes with time. There is expertise within SCD in using these tools. Will try to find someone to provide the requested tutorial.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

February 2021 Offline Leads Meeting Notes

The February 2021 LArSoft Offline Leads status update were handled via email and a google document.

LArSoft – Erica Snider

LArSoft has migrated to art 3.06 as of LArSoft v09_16_00 released on Feb 4. The new version of art includes an update to root, v6_22_06a. Note that this version of root requires an additional set of ups qualifiers, e19:p383b:prof, etc.

In anticipation of the migration to the spack packaging system, we have begun work to migrate the LARSoft build to use cetmodules, which uses spack and CMake instead of cetbuildtools. cetmodules is backwards compatible with UPS, so can be used with either packaging system. There will be no impact on developers and end users when this change is introduced. More details will be provided soon once the full migration plan is completed.

There is a request to migrate to TensorFlow v2.3. The project is ready to do this, but we need people from the experiments to check that everything works as required under the new version. Only larrecodnn uses TensorFlow within core LArSoft. Both argoneutcode and dunetpc use tensorflow.

We also have a request to migrate “best effort” Ubuntu support from LTS 18 to LTS 20. This will probably require that we also migrate to gcc v9 (e20). Please send any comments you have regarding either of these changes. 

DUNE – Andrew John Norman, Heidi Schellman, Tingjun Yang, Michael Kirby

We upgraded DUNE’s stack to the new art and root with very few issues.  A development hdf5 reader module in dune-raw-data had used H5Cpp.h which is now removed and we are looking for alternative methods in hep_hepc but are having troubles finding a method to access file attributes. We are testing the tensorflow v2.3 product.

SBN Data/Infrastructure – Joseph Zennamo, Wesley Ketchum

We did a minor refactoring of some of our experiment code to ease further development and deployment: this has meant setting up an ‘sbnana’ that has minimal dependencies on art/LArSoft, helping us push forward on the CAFAna analysis framework development. We’re working on being ready for e20 releases ASAP, to help the Ubuntu migration to LTS20 go through faster. We’re also looking forward to the updates to the Event Reweighting presented by S. Gardiner, as that will be necessary for handing reweighting systematics in SBN sensitivity studies.

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee

No Report

SBND – Andrzej Szelc

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

January 2021 Offline Leads Meeting Notes

Offline Leads Meeting – Jan. 14th

Tom Junk, Tingjun Yang, Erica Snider, Katherine Lato

LArSoft Update:

  • Spack Update: SciSoft, after getting art to work with Spack, is now focusing on LArSoft. Kyle is working on a draft of the migration plan. 
    • Tom has looked at a few examples, trying things. Did not seem to be “easier” than current ‘system. The hope is that it at least solves some of the technical issues with portability, etc, and is no worse an experience for users. 
    • Two phases for the transition. In the first phase, allow Spack to operate with the underlying UPS. The second will do away with UPS.
  • Tickets for “numerous” items from work plan discussion were mainly ICARUS and SBND, who weren’t at the meeting. Documentation for running in containers was a DUNE request. No one present had tried them, so no feedback on this.
  • Documentation in general. Who do we target? New people, or experts? Is it complete? LArSoft asked for guidance on what to focus on and missing pieces. Also help with identifying things that are out of date.
  • Mu Wei presented Algorithms for calculating number of ions and photons at the January 12th LArSoft Coordination meeting, which led to a proposal to drop support for the Separate algorithm (where number of electrons and photons are uncorrelated). Discussed with those present, who agreed dropping Separate is reasonable.

Round Robin, DUNE:

  • Tom asked about Github going to two-factor authorization in August. Pull request was awkward. User side documentation isn’t as obvious as the administrative, but Tom did find it on the web. Might be helpful to add some pointers in the LArSoft wiki to the correct documentation in GitHub
  • Tingun mentioned a computing tutorial they’re having next Friday that will be going through a lot of LArSoft wiki pages. LArSoft asked to let us know if things are out of date, or need updating prior to that.
  • DUNE has a plan to split dunetpc but have been waiting, first on Spack migration. Have since decided they don’t need to wait on that. Discussed some of the technicalities, such as needing to do the build themselves piece by piece or getting an FWBuild-like recipe. Biggest work is probably breaking up into pieces, which they can do at any time. Will keep LArSoft advised.
  • Tingjun asked about status on updating TensorFlow. Erica noted that as of mid-December, there was a Tensorflow v2.3 build available.  Will inquire about plans for migrating LArSoft, and see about expediting them. 

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

November 2020 Offline Leads Meeting Notes

November Offline Leads Meeting, November 10, 2020.

Attendees: Tracy Usher, Wes Ketchum, Tingjun Yang, Herb Greenlee, Tom Junk, Joseph Zennamo, Erica Snider, Katherine Lato

Discussed the status of the 2021 LArSoft work plan after the series of meetings with each experiment. Some of the items now in the short term priority list were in the 2020 work plan as long term items, so part of the new plan is a reshuffling of priorities. A few items mentioned in the meetings are more appropriately handled as tickets, so will open those, get them assigned, and track the work. There were also requests for consultation, which we may also track via tickets or as a work item in the plan.

Question from Wes about Spack: What about packaging the number of things experiments need, but that are not part of LArSoft? 

Erica: Note, a lot of the products that we care about have already been packaged in Spack. One of the benefits of migrating. We envision that all the changes to LArSoft and experiment code would be made in advance of the  migration to Spack, in a fashion similar to other migrations we have done. That doesn’t include things outside LArSoft that experiments depend on, say for analysis, and we’ve so far not considered this issue. We’ll discuss the transition process in more detail once we get the new build system. There will be time to address those types of concerns at that time.

Wes: I assume the priority is high for LArSoft?

Yes.

Wes: This is more for Tom & Tingjun. I’m aware of efforts on HDF5 (Hierarchical Data Format) and wanting to write to HDF5 files from the DAQ. Not super immediate, but it might come up on the timescale of protoDUNE 2, which isn’t that far away. This may have impact on thinking about data formats and common analysis tools. NOvA did a fair bit of work to convert to HDF5. Is that something that needs to fit in the work plan somewhere? At a minimum doing data format conversation, or support for underlying things?

Tom: Kurt has supplied a basic data converter. Can’t read small parts of events, but that could be re-architected, so not a huge problem. There are generally more features in ROOT for reading files than for HDF5, including being able to make plots by clicking on a variable, etc. Also a lot of meta-data and headers we need to survive are missing in Kurt’s solution. It is a reinvention of the stuff we have in ROOT, but it’s a more modern solution to use HDF5.

Wes: We can talk about the end analysis part, but not sure if it’s needed.

Tingjun: Pandora needs planning meetings to incorporate machine learning. What’s the long-term plan for that?

Erica: That’s been on our list for some time. It came up in our meetings with the experiments. What’s been lacking is a clear project objective. Wasn’t much in general we could do except agree on an image format starting from wires or hits. There have been requests to include TensorFlow and PyTorch. Those can handled as tickets. Beyond this, it is unclear what is needed in terms of LArSoft infrastructure to better support machine learning. [Input on this is welcome.]

Still need to talk to people to turn some of the items in the plan into a detailed task list, which is true every year. Expect us to contact you for input to this part of the process.

Round robin:

  • SBN, Wes
    • In the process of getting production. It’s running now, and preparing for a much larger production in the new year. Some of the things we talked about (LArG4) are targeted for that. Want to close the loop in the next month. 
  • ICARUS, Tracy: 
    • The list you have is pretty complete. We’re just busy trying to get ourselves going. Multiple meetings within a few weeks of each other are taxing our resources.
  • DUNE, Tom: 
    • About pixels and the near detector. The software is already disjoint, with the near detector working in their own environment. That may be good because they’re in a hurry. At some point, they’ll need to be integrated.What’s in the plan is a good list.
  • ArgoNeuT, Tingjun: 
    • Several analyses still going on, and we appreciate the continuing support from the LArSoft team.
  • MicroBooNE, Herb: 
    • Nothing much to add. I heard you mention the things that I mentioned before.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.

September 2020 Offline Leads Meeting Notes

LArSoft – Erica Snider

Every year we query the experiments about their plans for the upcoming year in preparation for the LArSoft work plan. These meetings between each experiment and Erica and Katherine will begin in September and continue into October with a draft 2021 LArSoft work plan being available in November and presented at the December Steering Group meeting.  We expect that the major focus will continue to be modernizing the code to be multi-threaded, or to run in multi-threaded environments. We will also continue to facilitate work toward using LArSoft on HPC resources and platforms as they become available. Expect that interest in using deep learning for LAr reconstruction will continue to grow. LArSoft will want to facilitate using these methodologies within art / LArSoft jobs. 

Pixel LArTPC readouts. 

  • Considering geometry service overhaul to accommodate pixels, where a complete re-factoring is indicated. While there, better factorize channel mapping initialization to simplify stand-alone construction and minimize dependencies.

Decoupling generators from LArSoft:  the key to solving this problem is to address how to initialize the geometry information needed by the generator without creating a compile-time dependence on a particular version of LArSoft.

DUNE – Andrew John Norman, Heidi Schellman, Tingjun Yang, Michael Kirby (from Alex Himel) 

Work continues on transitioning from the old to new LArG4, with good progress in protoDUNE simulation now being transferred to the Far Detector. We continue to take advantage of close collaboration with SBND to adopt improvements in photon simulation. At the most recent DUNE collaboration meeting, we held our first joint meeting between ND and FD Sim/Reco groups to start building towards more cohesive software across both detectors. 

ICARUS – Daniele Gibin, Tracy Usher

No Report

LArIAT – Jonathan Asaadi

No Report

MicroBooNE – Herbert Greenlee, Tracy Usher

No Report

SBND – Andrzej Szelc

SBND is working towards preparing for a small MC production bound to happen in October in preparation for a larger-scale production in the fall. One of the things that is being worked on is an update/tweaking to the semi-analytic model of light simulation (Patrick Green). In parallel, there is an ongoing effort to get SBND running on the factorized version of LArG4. Preliminary tests look promising. In parallel work ongoing to implement a new way to use calorimetry with SCE corrections (G. Putnam, E. Tyley).

SBN Data/Infrastructure – Joseph Zennamo, Wesley Ketchum

No Report

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.