BOSS EVENTS Phase 2: Training through a workshop.

BOSS EVENTS Phase 2: Training through a workshop.

Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. There has been rapid development of high-throughput technologies, data storage capacity, and sophisticated algorithms, which have brought about immense changes in the research practices in biological research fields. Computational and quantitative skills, such as those taught in bioinformatics curricula, are helpful for scientists to take advantage of the availability of this data. A study on core competencies in bioinformatics provided 16 competencies required to fit into three user profiles described by the same study, including bioinformatics users, scientists, and engineers.  We aimed to provide general knowledge to let our participants know where they would fall in the three profiles.

The BOSS workshop was held virtually from the 1st -5th Nov 2021. It introduced the trainees to bioinformatics analysis skillsets and incorporated open science practices in tackling bioinformatics projects. Guided by the core competencies and their successes in short courses, we came up with a brief curriculum for the 5-day event. The modules covered include an introduction to sequencing technologies, data file formats, basic and advanced Unix, quality control and assessment, scientific writing, sequence alignment and assembly, introduction to Git and GitHub, and introduction to Galaxy. Learning materials, including the course content and practical assignments, were available on Canvas and GitHub.

The event was advertised on Twitter and attracted 67 applications, with 69% male and 31% female applicants. The majority of the applicants were master’s and bachelor’s students compared to the undergraduate, PhD, and post-doctorate degree students. Most of the applicants were from Kenya, Nigeria, Ghana, and Uganda. The rest were spread out within other countries globally, namely: Algeria, Botswana, Cameroon, Ethiopia, India, Japan, Mauritius, Pakistan, Rwanda, Somalia, South Africa, Tanzania, and the USA.

From our applications, we noted significant interest in training among students or researchers who were not yet enrolled in Master’s programs or were just beginning their master’s program,  indicating the need for more introductory training opportunities in basic bioinformatics skills.

DAY 1: Introduction to sequencing technologies, data file formats, Introduction to Unix

Day one served as a refresher course for our participants. Martha Luka kicked off the training by introducing the participants to sequencing technologies. She explained DNA sequencing in greater detail before describing different sequencing technologies, their applications, how they compare, and template preparations for each. She concluded her presentation by giving the participants an overview of the next-generation sequencing (NGS) data analysis process. Dr Shaun Aron gave a brief overview of NGS file formats generated from QC to variant calling. He described the different files generated during the various steps of the analysis pipeline and the information contained in those file formats. He taught on how to identify and extract specific information from the different files. 

Dr Sumir Panji introduced Linux and its uses, sharing useful Linux navigation commands helpful in manipulating files and more. Bernice Ngina Waweru then introduced the participants to high-performance computing in the afternoon. They were taught how to navigate the HPC, load different modules, and submit jobs to the cluster for analysis. After which, each participant was assigned a temporary HPC account that they would use for the entire duration of the workshop. Talk about an intense day one! 

DAY 2: Advanced Unix; AWK and SED

Dr Sumir Panji introduced the participants to Stream editor (SED), a Unix utility that parses and transforms text using a simple, compact programming language. AWK is a scripting language with text processing capabilities for data extraction, comparison, transformation. These would help manipulate biological data such as gene or genome fasta files. 

Once our participants knew where the data came from, had sufficiently interacted with the HPC, and learned basic command-line arguments to work on data, we could comfortably link the science to the analysis.

DAY 3: Quality Control and Assessment

This day was the most interactive for our instructor and helpers as the participants filled it with multiple questions throughout the theory and practical sessions. The participants got to practice the skills by themselves. Dr Shaun Aron took them through checking the quality of a data set using tools such as fastqc and multiqc and interpreting the results. 

SIDENOTE: Check out the H3ABioNet website for great lessons prepared by excellent instructors and communities. Learn anything bioinformatics, from beginner to advanced level. Also, follow them on Twitter to stay in the loop about events and training they hold all year round.  

DAY 4: Sequence alignment and assembly

Dr Sonal Henson and Dr James Richard Otieno teamed up on day four to introduce sequence alignment, mapping, and assembly. It was also beneficial since many of our participants were interested in conducting genomics research involving whole-genome sequencing, alignment, assembly, and more analysis downstream. 

DAY 5: Introduction to Git/GitHub and introduction to Galaxy

Finally, we had Dr Caleb Kibet merge bioinformatics and open science. He introduced our participants to the basic concepts of documenting a bioinformatics project on GitHub, a web-based service for version control and online collaboration. Most participants created GitHub accounts before diving into the subject and creating their first repositories. Check out the Intro to Git lesson.

Peter Van Heusden introduced Galaxy, a web-based bioinformatics environment with public servers, a community of users, trainers, and contributors. Galaxy makes bioinformatics accessible, makes analyses reusable, and employs open-source and FAIR principles. The participants got to create personal accounts on the Galaxy website, and they actively followed as the instructor taught them, using data sets, how to do different analyses on the Galaxy website.

What we learned: Patience! Collaboration!

Five days of intense morning and afternoon sessions taught us a lot of patience. We were able to identify gaps to address in the future training, including the need to allocate more time on training modules and include more training material on the use of resources, such as using the HPC. Our participants also expressed interest in more advanced R and python skills training. Participants greatly appreciated training on platforms such as Galaxy since they were not aware of the platform’s capabilities. 

The importance of collaboration came out strongly during the workshop. There was terrific teamwork from the instructors, helpers, and participants throughout the workshop, which helped a lot where there was the need for quick thinking or problem-solving. It also helped in making the learning process much faster.

Check out the BOSS workshop’s YouTube Playlist and follow along. Send us questions or comments at  

With a successful ‘Train’ phase, we could comfortably launch the ‘Hack’ phase, the BOSS mini-projects.