Journey to Selection Day

Hi everyone, I am glad to inform that I have been selected as a student developer in Google Summer of Code 2020 at CERN-HSF. I am really excited about contributing and working on the project “TMVA Interfaces PyTorch”.

tl;dr : My project involves creating a PyTorch interface on top of the TMVA PyMVA framework for making PyTorch-like model definitions possible. In essence the idea is: PyTorch provides model definition, training and prediction but TMVA does the preprocessing, dataloading, testing etc.

Can't See? Something went wrong!

I applied to GSoC for the first time. It was such a delight seeing this in my mail, and I hope I can do justice to the project, achieving the desired goals.

My project is co-mentored by Lorenzo Moneta, Akshay Vashishtha and Omar Andres Zapata Mesa from the following organizations:

Motivation

Google Summer of Code (GSoC) is a program that matches mentoring organizations with college and university student developers, who work on open source development throughout the summers. Each year, Google works with many open source, free software, and technology-related groups to identify and fund proposals for student open source projects. GSoC pairs accepted student applicants with mentors from participating projects.

I :heart: open source. For the past year, I have been actively involved in various open-source projects (See GitHub). As a GSoC applicant my aim was to contribute to OSS orgs and gain exposure to real-world software development.

Making an impact and contributing to the software used and developed by one of the world’s largest and most respected scientific research centres, CERN, was the key driving force along with my inclination and skills in Machine Learning, Research and Open Source.

About Organization

CERN-HSF (High-Energy Physics Software Foundation) is the umbrella organization for high-energy physics-related projects in GSoC. The HEP Software Foundation facilitates the coordination of common international efforts in high-energy physics software and computing.

TMVA is a submodule of ROOT Project (One of the famous projects by CERN). ROOT is a modular scientific software framework. It provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation, and storage. It is mainly written in C++ but integrated with other languages such as Python and R. The Toolkit for Multivariate Data Analysis with ROOT (TMVA) is a standalone project that provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of sophisticated multivariate classification techniques.

Experience Applying to GSoC

If I had to summarize in a few words, it was pretty fun!

So, I stumbled upon this project on “TMVA Graph Neural Networks”, and I was immediately interested due to my recent work and background in graph representation learning. I was upto date with the literature in this field and had carried out a thorough literature survey in the past couple of years. I already knew a bit about ROOT and CERN-HSF and had skimmed over the codebase on github a while back.

I started with completing the tasks and later focused on my proposal for this idea.

After reviewing my submission, the mentors were very impressed with my work and mentioned their interest in working with me and fruition. But, they also encouraged me to apply to other projects in TMVA based on my interests. One of the mentors emphasized, that the idea on “TMVA Interfaces PyTorch” was a high impact project and I could eventually better serve particle physicists working on Machine Learning at CERN, by leading this new direction for the development of TMVA.

I took my time to cerebrate and decided to read about this other idea in detail to be certain about my interests. Driven by the impact I could make at CERN through this project, I started working on my second proposal. After a few sleepless nights, understanding the codebase for this part of the project, I finally submitted the proposal minutes before the deadline. I then completed the tasks for this project in the next couple of days.

I felt confident about both proposals and was slightly inclined to the GNNs project but I was happy in the end when the projects were announced and “I Got In” with my second proposal. I still plan on contributing to the GNNs project and will endeavor to collaborate with the other GSoC student, Stanislav Lukyanenko on this project.

Start of Community Bonding Period

The first phase of the program timeline for selected students is the community bonding period which started on May 4th. In this period, students familiarise themselves with the organisation’s community, understand the codebase, contact their mentors, and decide their preferred modes of communication.

We decided to schedule an introductory meet on 8th May. This, first of many Zoom/Skype meetings will set the stage for the project and the agenda is to meet other TMVA developers and GSoC students who will be presenting and explaining their proposals.


I’ll try to keep the space spiced up and explainable by sharing code snippets along my journey in the coming months like below:

Python code and output

# Say hi to open-source
print("Hello Open-Source World!")

Formatting text as shell commands:

echo "TMVA Developers are awesome!"
./gsoc_script.sh --option "start"
wget https://home.cern/community_bonding_period.txt

I hope that this summer will be interesting, challenging and productive.

To me GSoC is all about communicating, learning and making an impact, for the love of open-source. :heart:

Stay tuned for my next post! I’ll try to make the blogs interpretable and interactive while documenting and explaining all my experiences and learnings in these coming days.


See you on the other side!

Signing Off.