Trapped in a jungle of data? Data analysis skills can help and are an essential part of the modern scientist's toolkit. We are working with increasingly larger and more complex datasets and programming provides us with tools that can automate data analysis and make it easier for us all. Data analysis in Python leverages both well established and state-of-the-art statistical and modelling methods, some of which are available as soon as a paper is published!
From mining data lakes by automated analysis through to sharing data and statistical results with colleagues, programming is an essential data analysis skill for the modern biologist. This intensive, hands-on, one-day course will teach you the basics of data exploration, analysis and visualisation using Python and is a natural follow on for those who have done our Python for Biologists course (see prerequisites below).
In the first part of the course we'll walk through our example datasets and introduce you to some key python modules for quick and easy data exploration.
In groups we will clean our datasets, introducing data engineering approaches for repeatable analyses. We will then explore common statistical hypothesis testing and identify the correct tests for our example datasets and hypotheses. We'll explore different python modules for these tests and use them on our example datasets.
As part of this section we will demonstrate generalised linear models.
In the second part, we'll focus on visualising data by building interactive visualisations for data exploration and presenting the results of our earlier hypothesis testing.
We'll then discuss different aspects of reproducibility and of sharing data, code and results.
Finally, we'll package up the day's activities in an easy-to-share format that colleagues without coding experience can run/use as interactive widgets to see your results.
Scientists at any career stage (including students) in biology and related areas of science and medicine who have some experience of coding in Python (see prerequisites) and wish to expand their skills in the area of data analysis and visualisation. Some statistics knowledge is assumed (see notes on prerequisites).
If you have no experience programming in any language then you may find our Python for Biologists course more appropriate.
Attendees usually work on their own laptops and are expected to install some programmes before the course. Any laptop or operating system is suitable.
We expect attendees to have previous Python skills equivalent to having done our Python for Biologists course and built upon those skills after the course.
This means that participants should be comfortable:
This course also requires some basic statistical know-how, including: