• Home
  • Sessions
  • Resources
    • Glossary

Glossary

This page will have a list of definitions for commonly used (and/or commonly misunderstood) terminology and acronyms relating to python or data analysis and manipulation in general.

Term Definition
Exploratory Data Analysis (EDA) The process of analysing datasets to summarise their main characteristics, often using visual methods, before formal modeling.
Functional programming A programming paradigm that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data.
Git A distributed version control system used to track changes in source code during software development.
GitHub A cloud-based platform that provides hosting for repositories of files and folders (usually of software code) using git as its backend for version control.
Jupyter A software package that allows the creation of python notebooks that can include a mixture of markdown-formatted text and live Python code.
Markdown (.md) A simple plain-text markup scheme designed to allow for rapid production of formatted text (with headings, links, etc) within a plaintext file. Used by Quarto (as a Quarto-specific flavour with the .qmd extension).
matplotlib The baseline Python library for creating static, animated, and interactive visualisations, offering extensive customisation options for plots and charts. Also used as a framework for more advanced or visually pleasing visualisation packages like seaborn.
Object-oriented programming (OOP) A programming paradigm based on the concept of "objects," which can contain data and code to manipulate that data.
pandas A python data analysis and manipulation library for working with dataframes (tabular data)
Python A general-purpose programming language
Regression A method for modeling the relationship between one or more explanatory variables and an outcome. It is used to predict outcomes and understand the impact of changes in predictors (explanatory variables) on the response (outcome).
Repository (repo) In git and github, a repository is a self-contained "project" of files and folders.
Reproducible Analytical Pipelines (RAP) A set of processes and tools designed to ensure that data analysis can be consistently repeated and verified by others.
seaborn (sns) A Python data visualisation library based on Matplotlib, providing a high-level interface for drawing attractive and informative statistical graphics.
TOML Tom's Obvious Minimal Language - simple, human-readable data serialisation format designed for configuration files, emphasizing readability and ease of use. Used by uv to specify its projects.
uv A Python package manager, which can manage python projects (folders) and manage the installation and management of the python environment and libraries within that folder.
YAML Yet Another Markup Language - a human-readable data serialisation format often used for configuration files and data exchange between languages.

Created and maintained by the Specialist Analytics Team

 

This page was built with Quarto