Particle physics - exercise 1b solution

in #utopian-io7 years ago (edited)

Repository

https://github.com/BFuks/mad5-utopian-exercises

Introduction

This is a description of my solution provided for particle physics exercise by @lemouth

The exercise code is located in two files:

ex1b-mactro.h
ex1b-mactro.cpp

The goal of the exercise was to implement a simplified mechanism of identifying electrons, muons and jets useful for further study. This process consists of three parts:

  1. Identifying baseline objects
  2. Removal of overlapping objects
  3. Identifying signal objects

It is described in length in ATLAS research paper

[ Image credits CERN ]

Implementation

This exercise was definitely more fun then the previous one, and required more work. I implemented 7 helper functions to get it done:

  std::vector<RecLeptonFormat> GetBaselineElectrons(const std::vector<RecLeptonFormat>& candidate_electrons);
  std::vector<RecLeptonFormat> GetBaselineMuons(const std::vector<RecLeptonFormat>& candidate_muons);
  std::vector<RecJetFormat> GetBaselineJets(const std::vector<RecJetFormat>& candidate_jets);
  std::vector<RecLeptonFormat> GetSignalElectrons(const std::vector<RecLeptonFormat>& baseline_electrons);
  std::vector<RecLeptonFormat> GetSignalMuons(const std::vector<RecLeptonFormat>& baseline_muons);
  std::vector<RecJetFormat> GetSignalJets(const std::vector<RecJetFormat>& baseline_jets);
  
  void RemoveOverlap(std::vector<RecJetFormat>& jets, std::vector<RecLeptonFormat>& electrons,
                     std::vector<RecLeptonFormat>& muons);

All functions except RemoveOverlap() create a new vectors of Lepton or Jet format. I saw no real value in keeping vectors containing overlapping objects, so RemoveOverlap() works on the vectors provided as its arguments.

Bodies of all GetSignal... and GetBaseline.. functions look pretty similar:

std::vector<RecLeptonFormat> test_analysis::GetBaselineElectrons(const std::vector<RecLeptonFormat> &candidate_electrons) {
  std::vector<RecLeptonFormat> result;
  for (auto& candidate : candidate_electrons) {
    if(candidate.pt() <= 5.7)
      continue;
    if(candidate.abseta() >= 2.47)
      continue;
    result.push_back(candidate);
  }
  return result;
}

Inside the for loop that iterates over all provided object, we check their properties, and reject those that doesn't match. If all checks are passed, the candidate object is added to result vector. That process could be rewritten as:

    if(candidate.pt() > 5.7 && candidate.abseta() < 2.47)
      result.push_back(candidate);

Which is shorter, but in case there are many checks to be done (which may be the case if we need to implement full procedure in some of the next exercises), then IMO it is easier to read and debug the longer version.

Removing overlap is also pretty straightforward:

  // Second step from Table 3.
  for(auto& electron: electrons) {
    for(auto jet_it = jets.begin(); jet_it != jets.end(); jet_it++) {
      if(jet_it->dr(electron) < 0.2) {
        jets.erase(jet_it);
        // we have to move iterator back after removing the element
        jet_it--;
      }
    }
  }

Outer loop iterates over elements that have a precedence (need to stay in case of overlap), while inner loop checks the elements that may be potentially removed.

The next overlap removal step is implemented in a very similar way, and the rest of the code is printing of the results.

Results

=> progress: [===>                               ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 5

        => progress: [======>                            ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 6

        => progress: [==========>                        ]
Initial electrons count: 2, initial muons count: 1, initial jets count: 9
Baseline electrons count: 2, baseline muons count: 1, baseline jets count: 9
After overlap removal:
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 9
signal electrons count: 1, signal muons count: 0, signal jets count: 8

        => progress: [=============>                     ]
Initial electrons count: 0, initial muons count: 1, initial jets count: 10
Baseline electrons count: 0, baseline muons count: 1, baseline jets count: 10
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
signal electrons count: 0, signal muons count: 0, signal jets count: 8

        => progress: [=================>                 ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 8
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 8
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 8
signal electrons count: 0, signal muons count: 0, signal jets count: 7

        => progress: [====================>              ]
Initial electrons count: 0, initial muons count: 1, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 1, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 6

        => progress: [========================>          ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 4
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 4
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 4
signal electrons count: 0, signal muons count: 0, signal jets count: 4

        => progress: [===========================>       ]
Initial electrons count: 1, initial muons count: 0, initial jets count: 6
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 5
signal electrons count: 0, signal muons count: 0, signal jets count: 5

        => progress: [===============================>   ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 10
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
signal electrons count: 0, signal muons count: 0, signal jets count: 8

        => progress: [==================================>]
Initial electrons count: 1, initial muons count: 0, initial jets count: 7
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 7
After overlap removal:
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 6
signal electrons count: 1, signal muons count: 0, signal jets count: 6

        => progress: [===================================]

Possible improvements

  1. I don't know with what number of objects such analysis should deal. Is it tens? Thousands? Or maybe billions? If the number is big, more care should be taken to optimization of the code - the overlap removal procedure seems to be really costly right now.
  2. "Magic numbers" in the code should be replaced by named constants - I actually noticed that writing this article.
  3. Maybe, for better readability, results could be printed only once instead for each data set.
  4. Not an improvement, but question about requirements: @lemouth Are we limited to any specific C++ standard? I'm using C++11, but maybe should give it up?

Previous exercises

https://steemit.com/utopian-io/@mactro/particle-physics-utopian-project-first-exercise

EDIT

I did some fixes according to the comments by @irelandscape and @lemouth. Both fixes are related to overlap removal:

    for(auto jet_it = jets.begin(); jet_it != jets.end();) {
      if(jet_it->dr(electron) < 0.2 && jet_it->true_btag()) {
        jet_it = jets.erase(jet_it);
        // Iterator now points to element after the removed one, and it will be further
        // incremented by the for loop, so we need to move it back.
        jet_it--;
      }
    }

I added the jet_it->true_btag() check that was missing which caused wrong results. I also assign vector erase result to the iterator jet_it = jets.erase(jet_it);. As pointed out by @irelandscape, not doing so is undefined behavior.

Results after the correction:

=> progress: [===>                               ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 5

        => progress: [======>                            ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 6

        => progress: [==========>                        ]
Initial electrons count: 2, initial muons count: 1, initial jets count: 9
Baseline electrons count: 2, baseline muons count: 1, baseline jets count: 9
After overlap removal:
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 9
signal electrons count: 1, signal muons count: 0, signal jets count: 8

        => progress: [=============>                     ]
Initial electrons count: 0, initial muons count: 1, initial jets count: 10
Baseline electrons count: 0, baseline muons count: 1, baseline jets count: 10
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
signal electrons count: 0, signal muons count: 0, signal jets count: 8

        => progress: [=================>                 ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 8
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 8
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 8
signal electrons count: 0, signal muons count: 0, signal jets count: 7

        => progress: [====================>              ]
Initial electrons count: 0, initial muons count: 1, initial jets count: 6
Baseline electrons count: 0, baseline muons count: 1, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 6

        => progress: [========================>          ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 4
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 4
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 4
signal electrons count: 0, signal muons count: 0, signal jets count: 4

        => progress: [===========================>       ]
Initial electrons count: 1, initial muons count: 0, initial jets count: 6
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 6
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 6
signal electrons count: 0, signal muons count: 0, signal jets count: 6

        => progress: [===============================>   ]
Initial electrons count: 0, initial muons count: 0, initial jets count: 10
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 10
signal electrons count: 0, signal muons count: 0, signal jets count: 8

        => progress: [==================================>]
Initial electrons count: 1, initial muons count: 0, initial jets count: 7
Baseline electrons count: 1, baseline muons count: 0, baseline jets count: 7
After overlap removal:
Baseline electrons count: 0, baseline muons count: 0, baseline jets count: 7
signal electrons count: 0, signal muons count: 0, signal jets count: 7

        => progress: [===================================]
Sort:  

Here is my review. First of all, sorry about it. There is no bug and I can't just read a screen output correctly. It was easier after running your code on my computer than on the post snippet ;)

Just two minor comments.

1. HELPER FUNCTIONS

Those functions are great, but you may want to move the thresholds in the arguments too, so that you could use the same methods for various objects and various analyses. Of course thus won't change anything.

2. REMOVALS

I saw no real value in keeping vectors containing overlapping objects, so RemoveOverlap() works on the vectors provided as its arguments.

This is very correct.

I will write a review, but later tomorrow or on Friday. Sorry, I am a but pushed for time. There must be something weird somewhere (according to the final screenshot) and for that I need a few hours to investigate the code.

Can you just write where do I have bad results and what should they be? That way I can probably find a bug myself.

You can check @irelandscape code. It gives results in agreement with my code (see here). But I will have a look later today, I promise :)

OK, found a bug - I wasn't checking the B-tag in overlap removal procedure.

Sorry, there is no bug. This was no mandatory. I was too quick in reading the screen output you pasted :)

Your results look OK to me but maybe I've missed something. Don't have much time to review in-depth right now.

Hi @mactro, congratulations on finishing successfully the exercise!

I found that the hardest part was to read the relevant section of the paper and make sense of it. Fortunately @lemouth was of great help to answer any question.

My only comment about your code would be to be careful about the following:

        jets.erase(jet_it);
        // we have to move iterator back after removing the element
        jet_it--;

since erase normally invalidates the iterator. The result is unspecified and yield to unexpected behavior (or crash) with some compilers.

Apart from that, great work! :-)

Good catch, thanks!

I don't know with what number of objects such analysis should deal. Is it tens? Thousands? Or maybe billions? If the number is big, more care should be taken to optimization of the code - the overlap removal procedure seems to be really costly right now.

I was wondering about this myself. I'm planning on ignoring the issue until I know that optimization is necessary.

Loading...