Utopian Particle Physics In A WeekendsteemCreated with Sketch.

in #utopian-io6 years ago (edited)

Four months ago @lemouth opened up particle physics research to the steemit community. The objective: have steemians produce meaningful contributions to particle physics by extending the MadAnalysis5 framework and its associated public analysis database. In this article we'll walk through the first 5 exercises of @lemouth's particle physics introduction. If you're familiar with modern C++ and unix development, it will take a weekend to come up to speed. After that, LeMouth's current task awaits.

Repositories

First Exercise

Let's dive right into building MadAnalysis5 on the macOS platform. Instructions for getting up to speed are given here in the 'Getting Started' section. Beyond those instructions, here are a few notes of my own.

Building Root

Root's CERN website was down when I tried to download the latest version. Fortunately, there's an official github repository that contains root's source. If you use root's github repo, I'd suggest using homebrew to install cmake, then running git checkout with the v6-14-04 tag (latest production).

Build steps are given in root's github readme. The build goes smoothly with one modification: once you're in root's cmake build directory, the build command is cmake .. instead of cmake ../root.

The build takes roughly 30 minutes. Afterwards, if you type root and end up in the interpreter, exit by typing '.q'.

Be sure to source bin/thisroot.sh (inside the build directory). If you don't want to add it to your shell RC, source it in the shell you plan to do MadAnalysis5 work.

MadAnalysis 5

Once root's environment variables are properly set and you extract the MadAnalysis5 (MA5) tarball, installation of MA5 and associated modules proceeds without a hitch. Simply follow lemouth's instructions in the introductory post: ./bin/ma5, ma5> install delphes, ma5> install PAD.

If you're using MA5's Github Repo, you'll need to create the tools/SampleAnalyzer/Bin directory. If you do not, the ./bin/ma5 interpreter will fail and complain about a linker errer. For the fix, see this commit in my repo.

Note: There's no need to install homebrew GCC on macOS as the clang shim installed on macOS is sufficient. The command gcc exists on macOS, but /usr/bin/gcc on Apple platforms is actually clang in disguise. Clang's command line arguments are nearly identical to gcc's and /usr/bin/gcc is a tool shim (mentioned in the xcode-select man pages).

Second Exercise (ex1a)

The second exercise covers writing MadAnalysis5 code to count the number of objects in an event. As this is the first time diving into code, I'd recommend using Universal Ctags or some clang-based semantic analysis for code introspection. Note that some of the C++ files in MA5 have a .cc extension.

Per lemouth's article, I used ./bin/ma5 -E ex1a ex1a_iauns to generate my example ex1a directory.

Each recorded event is passed into the Execute function of our analysis code. The parameter event of type EventFormat contains all event data. Its definition is found in $(MA5_root)/tools/SampleAnalyzer/Commons/DataFormat/EventFormat.h. For this exercise we'll be interacting exclusively with RecEventFormat structure through the rec() accessor on EventFormat. Similarly, RecEventFormat is defined in $(MA5_root)/tools/SampleAnalyzer/Commons/DataFormat/RecEventFormat.h.

Instead of using size member variable of std::vector, I opted to use a range-based for loop to prep for upcoming exercises. The C++11 loop constructions are straight-forward:

bool test_analysis::Execute(SampleFormat& sample, const EventFormat& event)
{
  printf("\n");
  if (event.rec()!=0)
  {
    const std::vector<RecPhotonFormat>& photons   = event.rec()->photons();
    const std::vector<RecLeptonFormat>& electrons = event.rec()->electrons();
    const std::vector<RecLeptonFormat>& muons     = event.rec()->muons();

    size_t numPhotons   = 0; // photons.size();
    size_t numElectrons = 0; // electrons.size();
    size_t numMuons     = 0; // muons.size();
    for (const RecPhotonFormat& photon : photons)     { ++numPhotons; }
    for (const RecLeptonFormat& electron : electrons) { ++numElectrons; }
    for (const RecLeptonFormat& muon : muons)         { ++numMuons; }

    printf("  Num Photons: %zu\n", numPhotons);
    printf("  Num Electrons: %zu\n", numElectrons);
    printf("  Num Muons: %zu\n", numMuons);
  }
  return true;
}

For future reference, here are the shell steps to build the contribution while inside the Build directory in your test_project folder:

> source setup.sh   # Sets appropriate environment variables.
> make
> ./MadAnalysis5job ../Input/tth_aa.list # tth_aa.list is a plaintext file containing the absolute path to your data

Full ex1a source can be found here.

Third Exercise (ex1b)

The third exercise is where the physics gets more involved. While the prior two exercises can be done quickly, this one takes a bit more care. Section 5 of this research paper describes the object definition steps.

The most time-consuming part of this exercise was not implementing the code, but gaining a better understanding of the terminology and physics. I've included references below on the various quantities. In short, eta (η) represents pseudorapidity, and p_T represents transverse momentum. In the code, both pseudorapidity and momentum accessors are defined in ParticleBaseFormat base class.

After reading the article, it appears as though all of the thresholds are strictly less-than or strictly greater than (no equality). Using that assumption I've compared the following results to @irelandscape's and they are identical:

          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            6 |            5
        => progress: [======>                            ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            6 |            6
        => progress: [==========>                        ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            2 |            2 |            1 |            1
Muons     |            1 |            1 |            0 |            0
Jets      |            9 |            9 |            9 |            8
        => progress: [=============>                     ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            1 |            1 |            0 |            0
Jets      |           10 |           10 |           10 |            8
        => progress: [=================>                 ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            8 |            8 |            8 |            7
        => progress: [====================>              ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            1 |            1 |            0 |            0
Jets      |            6 |            6 |            6 |            6
        => progress: [========================>          ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            4 |            4 |            4 |            4
        => progress: [===========================>       ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            1 |            1 |            1 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |            6 |            6 |            5 |            5
        => progress: [===============================>   ]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            0 |            0 |            0 |            0
Muons     |            0 |            0 |            0 |            0
Jets      |           10 |           10 |           10 |            8
        => progress: [==================================>]
          | Initial      | BaseWithOver | Baseline    | Signal
Electrons |            1 |            1 |            1 |            1
Muons     |            0 |            0 |            0 |            0
Jets      |            7 |            7 |            6 |            6
        => progress: [===================================]
        => total number of events: 10 ( analyzed: 10 ; skipped: 0 )

I've used std::remove_if to make the code more concise. The only problem is that the return value of the lambda requires the reversal of inequalites from those used in the paper. In future exercises, I'll De Morgan the boolean expression to make the inequalities match those in the paper.

The relevant code can be found on github here.

A note on debugging: the makefile for MA5 expert mode is clean and simple. To debug with lldb or gdb, add -g and remove optimization -O3.

References

Fourth Exercise (ex1c)

The fourth exercise involves object isolation and histogramming. We'll be focusing on CMS search for dark matter. Generating signal jets for this exercise is surprisingly simple compared to the last exercise. Instead of jets, we'll spend a bit more time generating signal photons.

Others have already built python code to generate histograms from SAS files. So there's no need to dive into that task request in this exercise. See the 'External addons' section on the MadAnalysis5 website.

The Loose, Medium, and Tight barrel photon identification requirements are given in Table 3, page 29, of this research paper, though only the Medium are required. After looking through irelandscape's ex1c posts and lemouth's comments, I figured out how to correctly isolate photons. The trick was subtracting off the photon's transverse momentum from I_gamma.

Here are the results using @crokkon's saf2png:

histos.n_jet.png

histos.pt_jet.png

histos.pt_photon.png

histos.n_photon.png

These match irelandscape's ex1c histograms. The relevant code:

bool ex1c_iauns::Execute(SampleFormat& sample, const EventFormat& event)
{
  if (event.rec() == nullptr) { return true; }

  Manager()->InitializeForNewEvent(event.mc()->weight());

  // Signal jets.
  std::vector<RecJetFormat> signalJets = event.rec()->jets();
  signalJets.erase(std::remove_if(signalJets.begin(), signalJets.end(),
    [](const RecJetFormat& jet) {
      return !(jet.pt() > 30.0f) || !(jet.abseta() < 5.0f);
    }), signalJets.end());

  // Isolated photons.
  const std::vector<RecPhotonFormat>& photons() const {return photons_;}
  std::vector<RecPhotonFormat> isolatedPhotons = event.rec()->photons();
  isolatedPhotons.erase(std::remove_if(isolatedPhotons.begin(), isolatedPhotons.end(),
    [&event](const RecPhotonFormat& photon) {
      // Reject on pseudorapidity and H/E.
      if (photon.abseta() >= 1.44f)    { return true; }
      if (photon.HEoverEE() >= 0.05f)  { return true; }

      // Isolation.
      MAfloat32 pt = photon.pt();
      double Igam = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::PHOTON_COMPONENT);
      double In = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::NEUTRAL_COMPONENT);
      double Ipi = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::TRACK_COMPONENT);

      if ((Igam - pt) >= (1.3f + 0.005f * pt))  { return true; }
      if (In >= (1.0f + 0.04f * pt))            { return true; }
      if (Ipi >= (1.5f))                        { return true; }

      return false; // Photon is isolated.
    }), isolatedPhotons.end());

  if (isolatedPhotons.size() > 0) {
    Manager()->FillHisto("pt_photon", isolatedPhotons[0].pt());
    Manager()->FillHisto("n_photon", isolatedPhotons[0].eta());
  }

  if (signalJets.size() > 0) {
    Manager()->FillHisto("pt_jet", signalJets[0].pt());
    Manager()->FillHisto("n_jet", signalJets[0].eta());
  }

  return true;
}

Full ex1c source can be found here.

References

Fifth Exercise (ex1d)

The fifth exercise utilizes our work from ex1c. A little less direction is given this time, so we're more reliant on the CMS paper. I'd recommend viewing the Public Analysis Database and reading through some of the implemented analyses. Since this project is about filling out this database, it's helpful to review prior implementations.

At first, I struggled with the introduction of cuts in this exercise. I didn't understand the rationale of introducing cuts versus filtering objects based on predefined criteria. However, after reading through Section 3 (Event Selection) of the CMS paper, it became clear that the application of cuts may reject all data from the event if not met, and the cut criteria does not necessarily include all objects of a given category. For example, one cut given in this exercise is: at least one photon must have (p_T > 175 GeV) within the fiducial region of the ECAL barrel. Even if an event passes this cut, we may still need to analyze photons that don't necessarily meet this cut criteria. Hopefully that understanding is correct.

Here's the adapted CMS code:

bool ex1d_iauns::Execute(SampleFormat& sample, const EventFormat& event)
{
  if (event.rec() == nullptr) { return true; }

  Manager()->InitializeForNewEvent(event.mc()->weight());

  // Signal jets.
  std::vector<RecJetFormat> signalJets = event.rec()->jets();
  signalJets.erase(std::remove_if(signalJets.begin(), signalJets.end(),
    [](const RecJetFormat& jet) {
      return !(jet.pt() > 30.0f) || !(jet.abseta() < 5.0f);
    }), signalJets.end());

  // Isolated photons.
  std::vector<RecPhotonFormat> isolatedPhotons = event.rec()->photons();
  isolatedPhotons.erase(std::remove_if(isolatedPhotons.begin(), isolatedPhotons.end(),
    [&event](const RecPhotonFormat& photon) {
      // Reject on pseudorapidity and H/E.
      if (!(photon.abseta() < 1.44f))    { return true; }
      if (!(photon.HEoverEE() < 0.05f))  { return true; }

      // Isolation.
      MAfloat32 pt = photon.pt();
      double Igam = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::PHOTON_COMPONENT);
      double In = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::NEUTRAL_COMPONENT);
      double Ipi = PHYSICS->Isol->eflow->sumIsolation(photon,event.rec(),0.3,0.,IsolationEFlow::TRACK_COMPONENT);

      if (!((Igam - pt) < (1.3f + 0.005f * pt)))  { return true; }
      if (!(In < (1.0f + 0.04f * pt)))            { return true; }
      if (!(Ipi < (1.5f)))                        { return true; }

      return false; // Photon is isolated.
    }), isolatedPhotons.end());

  MALorentzVector pTmiss = event.rec()->MET().momentum();
  double MET = pTmiss.Pt();

  if (!Manager()->ApplyCut((MET > 170.), "MET > 170 GeV")) { return true; }
  if (!Manager()->ApplyCut((isolatedPhotons.size() >= 1) && (isolatedPhotons[0].pt() > 175.), "1+ photon 175 GeV pt")) { return true; }

  // foldr: If any of the 4 highest transverse momenta jets' minimum opening
  // angle between ptmiss is less than 0.5, reject the event.
  if (!Manager()->ApplyCut(std::accumulate(signalJets.begin(), std::next(signalJets.begin(), 4), true,
                      [&pTmiss](bool priorResult, const RecJetFormat& jet) {
                        return priorResult && (jet.dphi_0_pi(pTmiss) >= 0.5);
                      }),
                    "min dphi(ptmiss, ptjet) >= 0.5"))
  {
    return true;
  }

  // Search for signal photons.
  std::vector<RecPhotonFormat> signalPhotons = isolatedPhotons;
  signalPhotons.erase(std::remove_if(signalPhotons.begin(), signalPhotons.end(),
    [&pTmiss](const RecPhotonFormat& photon) {
      return !(photon.dphi_0_pi(pTmiss) >= 2.0);
    }), signalPhotons.end());


  if (!Manager()->ApplyCut(signalPhotons.size() > 0, "1+ photon")) { return true; }

  RecPhotonFormat candidatePhoton = signalPhotons[0];

  // Lepton veto
  std::vector<RecLeptonFormat> electrons = event.rec()->electrons();
  std::vector<RecLeptonFormat> muons = event.rec()->muons();
  auto leptonVetoPredicate = [&candidatePhoton](const RecLeptonFormat& lepton) {
    return (lepton.pt() > 10.0f && lepton.dr(candidatePhoton) > 0.5);
  };
  bool leptonVeto =    std::any_of(electrons.begin(), electrons.end(), leptonVetoPredicate)
                    || std::any_of(muons.begin(), muons.end(), leptonVetoPredicate);
  if (!Manager()->ApplyCut(leptonVeto, "Lepton veto")) { return true; }

  // Update histogram
  if (isolatedPhotons.size() > 0) {
    Manager()->FillHisto("pt_photon", candidatePhoton.pt());
    Manager()->FillHisto("n_photon", candidatePhoton.eta());
  }

  if (signalJets.size() > 0) {
    Manager()->FillHisto("pt_jet", signalJets[0].pt());
    Manager()->FillHisto("n_jet", signalJets[0].eta());
  }

  return true;
}

You'll notice in the code that I use std::accumulate. Semantically it is similar to Haskell's foldl.

I'm much less certain about this exercise. Particularly my handling of signal photons and if I should be using JetCleaning. I will need data to test these calculations beyond the MET > 170 GeV cut.

Full ex1d source can be found here.

Conclusion

I ran through all 5 exercises in about 2 full days. So if you're looking to jump in and are familiar with C++, the time investment is pretty reasonable. The most time consuming process is familiarizing yourself with the physics behind the programming. Moving this quickly would not have been possible without all of the legwork and questions effofex, irelandscape and others asked. It's useful to read through their interaction with @lemouth when coming up to speed. I'd recommend not looking at anyone's code until you've written and tested your own.

The progression of difficulty is pretty linear: 1a, 1b, 1c, and 1d. This is due to subteties in 1c (subtraction of photon transverse momentum from Igamma), and lack of direction in the exercise for 1d which forces reliance on the CMS paper. Also, it wasn't clear that the incoming event objects were sorted in any fashion until I found a comment to that effect -- though I'm still not entirely clear on the sorting criteria, I believe they are sorted by decreasing transverse momenta.

To fully appreciate the code we're writing, a better understanding of the physics is required. To those entering, I'd suggest throwing yourself into the physics as much as possible.

For the future, I'd like to be able to fabricate datasets to test code. I'm hoping this will come up during the verification phase.

In the interest of moving quickly, I kept the details and code discussion to a minimum.

Thanks for reading.

Sort:  

While this is an interesting post, to those with the technical savvy to glean information from it, it is not quite what we're usually looking for in the Blog category. It deals with an open source project, but doesn't really tell readers much about it. You have the exercises, but you don't go into what they are in any great detail.

As a layperson reading a Utopian post, I'd be more interested in reading about the open source project itself. This, obviously, was not what this post is about. I get that. But it does make it harder to consider as a Utopian blog post.

In Utopian's Blog category, we're looking for editorial content. Especially when the post isn't by the project's creators. As it is, it's not quite a tutorial, and not quite a blogpost. There's a great deal of content, and it is well written, but it is entirely opaque to people who aren't already in the know.

More explanation and elucidation would have really helped this post, to my mind. I'll give an example:

  • "Viewing these analyses helped solidify the concept of cuts and how/where to add them." Is a sentence that is entirely opaque, and none of the text preceding it is any help in understanding it. You then refer to "Section 3 (Event Selection)," but you don't say section 3 of what. Earlier in the post, you mention a section of the MadAnalysis5 website, as well as a research paper.

I'm sure your intent wasn't gatekeeping, but writing in this way is gatekeeping. It keeps the potentially interested out. To me, that is the opposite of Utopian's purpose.

All of that aside, this is certainly a worthwhile post. I would certainly be interested in reading more posts from you, especially if you made them more accessible.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for your review, @didic!

So far this week you've reviewed 24 contributions. Keep up the good work!

Thanks for your time and thoughtful feedback @didic. I went back and clarified the areas you pointed out. I'll keep an eye out for potential gatekeeping and opaque sentences in future articles.

I'll also add more rigorous analysis. I suspect my follow up articles will end up in the tutorial category since I plan to dive further into technical details.

I didn't know the existence of the accumulate function in C. Glad to have read your post. IO don't have much time this week to dig into the code, but you can probably inspire yourself from those have solved the exercises already to double check what is going on. I will stay available for the extra questions.

To answer the few questions:

  • JetCleaning is not related with photons, but with leptons (to remove electrons from the jet collection that can potentially be considered as jets when not specifically identified as electrons).
  • by default, particles are ordered decreasingly in terms of pt. but this can be changed thanks to the SORTER service.

For the future, I'd like to be able to fabricate datasets to test code. I'm hoping this will come up during the verification phase.

This is phase 2. Once I will have beta versions for the analysis I am proposing in the TRs, we will get there. Do you plan to try these out? :)
I am also happy to see one extra participant to this project.

Thanks for the clarifications!

Yes, I plan to give the latest task requests a go. I'll try my hand at one of the supersymmetry TRs.

I've done some rudimentary double checking of the code using other's solutions as you suggested. Everything looks good. I'll plan to proceed forward on the latest TRs soon.

Thanks for participating! :)

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

Congratulations @iauns! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 1 year!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!



This post has been voted on by the steemstem curation team and voting trail.

There is more to SteemSTEM than just writing posts, check here for some more tips on being a community member. You can also join our discord here to get to know the rest of the community!

Congratulations @iauns! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

Award for the number of upvotes received

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard:

SteemitBoard - Witness Update
SteemFest³ - SteemitBoard support the Travel Reimbursement Fund.

Support SteemitBoard's project! Vote for its witness and get one more award!

Hi @iauns!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

@iauns You have received a 100% upvote from @taginspector because this post did not use any bidbots and you have not used bidbots in the last 30 days!

Upvoting this comment will help keep this service running.

Congratulations @iauns! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

Award for the number of upvotes

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @steemitboard:

SteemitBoard - Witness Update

Support SteemitBoard's project! Vote for its witness and get one more award!

Hey, @iauns!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!