Particle physics data taking and analysing - a 'quick' overview

in #steemstem7 years ago (edited)

One of the most common questions I get most as a particle physicist has to do with how we collect data and then analyse the data. So in this post I will discuss the process from collision to plot shortly. For each step, there are many details that I will waive over, sacrificing detail for the big picture. In the future I will revisit each of these topics for dedicated posts as well, as I am really skimming over details.

LHC datataking

IMG_2162.jpg
data taking in action. This is my work station when I am in charge of data taking at the CMS experiment. Source: F.Blekman

The goal of particle physics experiments at the Large Hadron Collider is to reproduce, in the most controlled way possible, what happens when two protons collide. Each individual collision is not particularly special, most proton collisions are extremely well known and in general we are only interested in the special cases where something rare happens. So, one of the challenges is that the more rare the process, the more collisions are needed to actually see that consistently something new and rare is happening. This is technologically the challenge for the LHC accelerator physicists, make as many collisions as possible. For example in the last year's run, the LHC was colliding clouds of protons 40 million times per second. And in each collision of clouds (we call those bunches), about 30 actual proton-proton collisions occurred.

The rest of the proton cloud continues around the ring, to have another chance to collide again in one of the four LHC collision points. In each proton-proton collision, tens to thousands of new particles are created from the kinetic energy held by the protons.

The amount of collisions per second is quantified in Luminosity. Typically experimental particle physicists talk about integrated luminosity, that is, the amount of collisions collected over a certain time, for example a day, month or year.
luminosity during the 2016 datatakingAmount of total integrated luminosity (y-axis,=number of proton collisions) collected by the CMS experiment in 2016 as a function of days (x-axis). source: CMS experiment luminosity public results

Selecting interesting collisions

Considering the HUGE numbers of collisions, the next challenge is two-fold: how do you record these collisions, and how do we store them. For the recording, we use dedicated detectors with around 100 million readout channels that are sensitive to measuring the particles produced in the collision. Each of those 100 million mini detectors is sensitive to seeing some particles, and they are placed in a clever way so that independent of the directions that the particles are produced, the particles will cross some of the detector elements. Having so many detection elements is great as it allows us to have very detailed knowledge of all the particle energy and trajectories. It also means that storing each collision takes 2MB after all zero-supression etc. You can imagine that recording the 40 million collisions per second becomes a problem very quickly, 80 Terabyte per second. Storing all of those data is not feasible as we just cannot afford the disk space. So the trick we use in particle physics is that we use a selection algorithm called a trigger to on-the-fly pick out interesting collisions. And all other collisions are just discarded. Typically we save between 500 and 1000 collisions per second in the end and this number is driven by how much storage we can afford from our budgets, we receive funding from national funding agencies and governments all over the world.

cables part of the data acquisition system of the CMS experimentcables part of the data acquisition system of the Compact Muon Solenoid experiment at CERN, photo by F. Blekman

Offline statistical analysis and an example of discovery

Once the data is on disk, the fun starts! These datasets are so large, that one needs to use dedicated tools to access them, at the LHC we use a dedicated distributed computing system called the LHC Computing Grid which gives anyone of the thousands of experimental physicists at the LHC experiments access to hundreds of thousands of CPUs to analyse the *%@^#&% out of this data. And this is necessary, because even after all the data reduction, most of the the things we are looking for are extremely extremely rare.

For example, in the gif below shows the subset of collisions where four high-energy electrons or muons were produced in the full dataset of 2010 and 2011. It is one of the most famous plots from the discovery of the Higgs boson. Out of the billions and billions of collisions produced by the accelerator, and after running algorithms (yes including machine learning) optimised to find electrons and muons, on the grid, about a few hundred collisions are left. Those collisions are mostly due to known and well-understood processes that happen at proton-proton collisions which are also simulated for comparison, and are shown in the blue and green distributions. And then, if you take enough data, you see additional structure appear when you look at the right distribution. In this case the distribution shown is the total mass of the four leptons (including their kinetic energies).

Higgs discoverySeeing the LHC data come in for the discovery of the Higgs boson using the data collected in 2010 and 2011. Source: CMS collaboration (incl. F. Blekman)

And the additional structure here was then shown via statistical analysis to not be consistent with the background, specifically the data (black dots) peak is not consistent with the blue background prediction, with a probability of about 1/3000. On the other hand, the data is actually consistent with the blue background plus the red peak which is the prediction of what the boson predicted by Brout&Englert and Higgs, and others (a.k.a. Higgs particle) would look like! Very exciting, but first we did about half a year of cross checks. We also checked distributions of different other collisions where for example two photons were produced, and there was a peak there at the same total mass! This is when things get very excited, of course we cross checked our own results in all different ways including fully independent analysis, only optimising on simulation before we decide to look at data. It was an exciting and very enervating time.

In the end I was one of the physicists that went to the two-yearly International Conference for High Energy Physics (ICHEP) where the new result was announced. An amazing memory that I will cherish my whole life. One of the other things I remember most from it was being extremely jet lagged as the conference was in Australia and I only arrived from Europe the morning of the announcement ;-)

What next?

Although the Higgs boson discovery was an exciting first step, just finding the Higgs boson was only one of the aspects that the LHC was designed for. The LHC will allow us to measure the actual properties of those bosons (after all, that little peak contained only about 20 collisions consistent with HIggs bosons. From 2 years of data taking). For that we need lots and lots of data. As the data is so diverse, we can answer many other questions at the same time, for example study the subtle differences between matter and antimatter, study if there are any other new, yet undiscovered particles, maybe see if we can produce the elusive dark matter at the LHC, understand if the particles we know now are really fundamental. Answering even a part of any of these puzzles would be a new paradigm shift in fundamental physics of Nobel-prize winning proportions. I'm also a fan of making sure that we are open even to questions we have not thought of, theoretical physicists such as @lemouth are always producing new predictions of what we could be seeing in the data in the future. The LHC will run for a very long time, the current plan is at least up to 2035. And I expect to have enough data to stay busy until then, definitely!

Sort:  

This was easier to understand than I thought 😆! TIL

Happy to hear that, I think it is important to talk about the concepts, not to obfuscate with technicalities or jargon, I'm happy to read you noticed :)

Very cool! In your opinion what do these discoveries mean for humanity? I know that's a broad question, but I guess I mean, is this about just having a better understanding of the universe and world around us or are there any applications for this knowledge that could disrupt things in any way?

the research we do is basic (fundamental) science. This means it produces knowledge, in my case the understanding of particles, forces between particles etc.

The goal is not to have applications beyond that, but of course it does happen by accident, a very good example is the world wide web which was designed so particle physicists could share their results more efficiently. Of course it is much more difficult to say if the specific discovery of the Higgs particle will have further consequences for humanity. But looking at history, the work that was done for electromagnetism, quantum mechanics and relativity also took many tens of years before applications were even thought of. In the long run, most if not all modern physics discoveries eventually payed off. And of course the knowledge and understanding is amazing too, humans are curious so it is in our nature to try to answer questions as fundamental as what we do at CERN :)

Really interesting, thanks for the thoughtful reply :)

Thanks @freyablekman for this very interesting article.
I am wondering if the particles (Protons) Have the same energy and speed why are there some special collisions?

that is the nature of quantum physics. Even if the initial conditions would be exactly the same, every collision would have a certain number of possible outcomes (not just one, many!), and there are probabilities (chances) that can be calculated by the physics theory we call the Standard Model, which makes predictions on how often each kind of collision happens. The interesting collisions are really really really rare. But even 'normal' proton-proton collisions are all slightly different.

You mean statistical predictions or there are known factors by which the nature of collision differs?

Both (sorry!), there are known factors that the collisions are different, but also actual are actual statistical probabilities of different things that can happen when exactly the same initial condition occurs. And then there are even more random things like angles, the created particles can essentially fly in any direction after a proton-proton collision. Producing all those random options is one of the reasons we use very advanced simulations to predict what we expect. A substantial fraction of the grid computing power is used to produce those simulations.

I guess it is very intriguing to check the results of every experiment. Thanks for sharing this knowledge.

there is not from above the occurrence of most of the proton crash is very famous and in general we are only interested in special cases where something rare happens. So, one of the challenges is the less the process, the more collisions it takes to actually see that consistently something new and rare is happening. may be profits as well as others, I got interested to understand the lesson

Congratulations @freyablekman, this post is the forth most rewarded post (based on pending payouts) in the last 12 hours written by a User account holder (accounts that hold between 0.1 and 1.0 Mega Vests). The total number of posts by User account holders during this period was 2868 and the total pending payments to posts in this category was $7405.58. To see the full list of highest paid posts across all accounts categories, click here.

If you do not wish to receive these messages in future, please reply stop to this comment.

wow that is quite a mouthful! loved seeing the process photos tho, and especially the animated gifs - they really help tell the story and give us an insight as to what you physicists are really up to on the daily :)

i really need to start posting about my hyperloop work - but thinking a separate account is best. can you imagine going from art and poetry to hyperloop posts back and forth ?! haha:)

anyway maybe i need to make a song about higgs boson or quantum physics - stay tuned ;)

I'd love to read about the hyperloop work, definitely! And let me know if you end up writing a song about Higgs bosons, but just to be aware: you have Nick Cave and others to compete with :)

haha will do! and omg songs exist about higgs boson already?!!? immediately goes to google :D