April 2021 Offline Leads Meeting Notes

Offline Leads Meeting – April 22, 2021

Attendees: Joseph Zennamo, Tingjun Yang, Erica Snider, Katherine Lato

1) We had a request to migrate “best effort” Ubuntu support from LTS 18 to LTS 20. This requires building under gcc v9 (e20). This now works, so will begin “best effort” for LTS 20. Proposing to move to e20. DUNE and uB have products that will need to be rebuilt with e20.

Discussion: Joseph asked about the impact of shifting to e20? 

Typically aren’t changes to interfaces, but compilers get better at enforcing the standard. Sometimes code that compiles in an earlier version of a compiler doesn’t compile because the code wasn’t compliant with the standard. 

Tingjun noted that they tried to move to e20 for ArgoNeut code. Has some issues with warnings in TenserFlow. Lynn provided a solution, they’re going to test that. May have similar issues with DUNE, should start testing it.

LArSoft will migrate once experiments give the all-clear.

2) There is a request to migrate to TensorFlow v2.3. The project is ready to do this, but we need people from the experiments to check that everything works as required under the new version. Only larrecodnn uses TensorFlow within core LArSoft.  Both argoneutcode and dunetpc use tensorflow.

Discussion: Have expanded the scope of this migration to include moving to the next version of TensorRT (now re-branded as Triton) at the same time. 

Leigh Whitehead said in email several weeks back that  DUNE is ready for TF v2.3. Tingjun noted that some things have changed, so they need to run some of the tests again.

3) Rollback of hdf5 v1_12 to hdf5 v1_10. (Noted that the older version builds with e20)

Discussion: SBN has no immediate use for this, but given their drive to use HPC resources, expect that HDF5 conversions will be a part of the workflow at some point. No opinion at this time.

Discussion at last LCM suggested DUNE is ok with a temporary rollback to hdf5 v1_10. Need to confirm.

4) Round table:

Tingjun: ArgoNeuT and DUNE issues for LArSoft

 

  1. A producer module crashes when reading older data. Submitted an art ticket for that. Kyle was consulting, but it’s been a while and it may impact us soon. Urgent. https://cdcvs.fnal.gov/redmine/issues/25615
  2. Since LArSoft moved to new ROOT version see a 20% increase in memory usage for DUNE production jobs. Reported this via LArSoft issue. Tom Junk is the contact. Not as urgent, but it has impact on our production since we have to request more slots. https://cdcvs.fnal.gov/redmine/issues/25512 
  3. Tom Junk additions after the meeting via email:
    1. dunetpc compiles (and links) with e20 but I have yet to run anything more than an event display with it.  There’s an e20 build of tensorflow v1_12_0d that is included in the dependency tree when I built dunetpc just now with e20.
    2. There were a couple of things in dune-raw-data and dunepdsprce that caused gcc v9_3_0 to emit new warnings but these have been straightened out.
    3. I have tested the rollback to hdf5 v1_10 with the raw data readin source we have in dune-raw-data and it works. There’s now a dune_raw_data v1_18_01 which builds with the older hdf5, ready to go when the rollback is deployed. DUNE also depends on hdf5 via hep_hpc, and there is a rolled-back version of that now (with e20 even.  Thanks, Lynn!) I am discussing with Kyle about how best to do delayed reading with HDF5.  This is important to keep memory consumption down for the DUNE far detector and even helps us with ProtoDUNE data, which I assume will be in HDF5 format moving forwards, if I read peoples’ slides right.  We had it working with ROOT, but it will take some design and coding to get it right with HDF5.
    4. Regarding the memory increase with larsoft v9_16_00 (ROOT v6_22), I ran valgrind and spotted a few things that were taking more memory.  I don’t have solutions, however.

SBN: 

  1. Just started a workflow group with SBN to digest how they’re going to do everything and think about it. There may be things that affect LArSoft in the future, but not right now.
  2. What kind of support on profiling SBN code is there?  For the first pass, is it possible to invite a profile expert to a SBN meeting to help developers and analyzers learn how to use these tools? At least for the simpler ones. Need a discussion suitable for analyzers.

Erica: SciSoft team can assist with profiling. The lab provides a set of profiling tools, though it changes with time. There is expertise within SCD in using these tools. Will try to find someone to provide the requested tutorial.

Please email Katherine Lato or Erica Snider for any corrections or additions to these notes.