Skip to main content

Practical machine learning for particle physicists

Project description

Practical Machine Learning for Physicists

Welcome to the graduate course on machine learning at the Albert Einstein Center for Fundamental Physics of the University of Bern!

What is this thing called machine learning?

Figure reference: https://xkcd.com/1838/

Machine learning is a subfield of artificial intelligence that focuses on using algorithms to parse data, learn from it, and then make predictions about something in the world. In the last decade, this framework has led to significant advances in computer vision, natural language processing, and reinforcement learning. More recently, machine learning has begun to attract interest in the physical sciences and is rapidly becoming an important part of the physicist's toolkit, especially in data-rich fields like high-energy particle physics and cosmology.

Basics

Timetable: 14:15 to 16:00 on Tuesdays via Zoom due to the COVID-19 pandemic.

Class Slack group: ml-for-physics.slack.com

If you need to contact me, I strongly encourage you to do so via Slack since I'll check this a few times per day.

Learning objectives of the course

This course provides students with a hands-on introduction to the methods of machine learning, with an emphasis on applying these methods to solve physics problems. By the end of this course, it is expected that students will:

  • Know how to approach physics problems from a machine learning perspective;
  • Understand the fundamental principles behind extracting useful knowledge from data;
  • Understand the core concepts and terminology of machine learning;
  • Gain hands-on experience with mining data for insights.

Throughout the course, students will also have the opportunity to learn several technical skills:

  • Python programming and experience with the core libaries for data analysis, visualisation, and modelling.
  • Working with data: collecting, cleaning, and transforming.
  • Creating and interpreting descriptive statistics.
  • Creating and interpreting data visualisations.
  • Practical experience with machine learning.

Structure of the course

Due to constraints placed by the COVID-19 pandemic, the course will be delivered entirely via online lectures. Each lecture will involve a mix of theoretical and programming work, with an emphasis on the latter. A tentative outline for the course is shown in the table below.

CW Date Topic Links
15 7.4 Introduction to random forests Binder Open In Colab
17 21.4 Random forest deep dive Binder Open In Colab
18 28.4 Random forest from scratch Binder Open In Colab
19 5.5 Introduction to gradient boosting Binder Open In Colab
20 12.5 Gradient boosting deep dive
21 19.5 Topological machine learning I
22 26.5 Topological machine learning II

Cloud environment

We will be teaching the class entirely via Jupyter notebooks in Python. You can open and run them directly on Binder or Google Colab by clicking on the badges

Binder Open In Colab

at the top of each lecture notebook. We highly encourage the use of Binder or Colab, since they require no local installation and run for free.

A few remarks about Binder:

  1. Binder is free to use.
  2. If you edit a notebook make sure you download it, since Binder does not save your changes.
  3. Binder will automatically shut down user sessions that have more than 10 minutes of inactivity (if you leave your browser window open, this will be counted as “activity”).
  4. Binder aims to provide at least 12 hours of session time per user session. Beyond that, it is not guaranteed that the session will remain running.

Downloading and uploading files

Since Binder does not save your changes permanently, you should download the notebooks you worked on at the end of your session. If you want to continue your session later on you can re-upload them to Binder. See the image below for instructions how to upload and download files.

Figure: Download and upload buttons on JupyterLab as seen in the binder environment.

Local environment

You can also run the course material locally on your laptop. In general, when working with Python it is recommended to use virtual environments. This makes sure that the packages you install don't interfere with the packages you already installed in other projects.

To install the library associated with the course run the following command:

pip install hepml

To install JupyterLab (a more advanced environment than Jupyter notebooks) run:

pip install jupyterlab

Then make sure you download all course material from the GitHub repository or just the missing notebooks. In general you will need to copy all materials, since some resources such as images are not self-contained in the notebooks.

Finally, to start JuypterLab run:

jupyter-lab

Updating the local environment

Since we are developing the materials throughout the course you will need to update your local environment every time we move on to the next lesson. To do so just run the following command before you start JupyterLab

pip install --upgrade hepml

Recommended references

Machine Learning

The structure of the lectures is adapted (with permission) from Jeremy Howard's excellent machine learning course for coders, while the theoretical content is based on the comprehensive review articles by Mehta et al and Murugan and Roberston.

Python Programming

We will use the Python programming language to analyse and visualise a variety of datasets in this course. McKinney’s book is an excellent reference to have at hand and covers the nuts and bolts of the NumPy and pandas packages.

Kaggle Learn

Kaggle Learn is a great resource to brush up on concepts like Python basics, data visualisation or pandas in an online notebook environment (similar to Binder).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hepml-0.0.7.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

hepml-0.0.7-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file hepml-0.0.7.tar.gz.

File metadata

  • Download URL: hepml-0.0.7.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.44.0 CPython/3.7.6

File hashes

Hashes for hepml-0.0.7.tar.gz
Algorithm Hash digest
SHA256 185b39107e3b71f7c22b8346907e273c3bdd2f950d4ae936927527007ae16a2b
MD5 15a2d29ec9f1a8d7ce8e274c1f80714e
BLAKE2b-256 03df6dbae77f5e81caa9f3b025c3a8ae3b24f6ae2680ab46bffb5a0e1e8828f9

See more details on using hashes here.

File details

Details for the file hepml-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: hepml-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200325 requests-toolbelt/0.9.1 tqdm/4.44.0 CPython/3.7.6

File hashes

Hashes for hepml-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d9d5e2a2593574c895265d0521c25e317b1bb922a34b85bb4909979012e18ca4
MD5 bc5b0e9be32212b077a8bfd10616ee11
BLAKE2b-256 475e72a934815d5f046cc05cb2b4b4df35a930752577fee0df8418a0bfb650b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page