Data types overview

A brief history of data types

  • All1 computers store data in binary (1s and 0s) – example shown on the right, represented as hexadecimal
  • Variables add a level of convenience and abstraction by letting us name specific buckets to put data in, and data types give structure to these buckets.
  • In the early days of computing data was stored as raw binary
  • The need for specific data types came from the emergence of structured programming from the 1950s onward
  • Languages like FORTRAN and COBOL introduced the segregation of numeric datatypes and character types
  • Object-oriented languages like C++ and Java further expanded on this with user-defined data types
  • Specifying the type of data allows the machine to allocate an appropriate amount of memory to it (was very important in the early days of computing, but still relevant)
  • Allows us to prevent errors; setting the expectation on the exact type of data that a specific variable will contain.
Raw data in hex format (ASCII representation on right).
Core rope memory. More on this on the next slide. (Konstantin Lanzet, Wikimedia Commons)

History lesson - core rope memory

The Apollo Guidance Computer for the Apollo programme which eventually landed the first person on the moon made use of core rope memory. The program code and fixed data (such as important physical and astronomical constant) were literally woven into a grid of magnetic round cores using a needle, with the sequence the wire took through the cores deciding the pattern of 0s and 1s. This highly technical work was done in bulk in factories by almost exclusively female workers.

A closeup of a few cores in a core rope memory module, showing the hundreds of times the sense wire goes through each core2.
A factory employee working on a core rope module3
One of the memory trays of the AGC - each rectangular module contains a self-contained core rope grid4

A quick note on type systems

Programming languages have different philosophies. They are often referred as being “strong” or “weak” and “static” or “dynamic”.

Strongly but dynamically-typed languages (e.g. Python)

  • Python features dynamic typing. There is no need to explicitly declare variables as being a specific data type, and it does allow limited implicit conversions, but not as extensively as e.g. JavaScript.

Statically-typed languages (C++, Rust, SQL)

  • The programmer has to specify the data type for a variable or object in the code itself and they are checked at compile time. Safer (catches errors early) and possibly more performant, but more tedious and less flexible

Weakly-typed languages (e.g. Javascript)

  • Allows extensive type coercion; mixing-and-matching of datatypes freely e.g. 5+“2”=“52”
https://remotescout24.com/en/blog/806-typed-vs-untyped-programming-languages
C++. This code generates a type error; we tried to assign a string value to an int
JavaScript. This is valid JS code and ends with z being a string with the content “52”

Data types in Python

Overview

A logical overview of the basic data types in python. From https://pynative.com/python-data-types/

Booleans

Like most programming languages, python has a bool datatype. In some ways, this is the simplest type available. A Boolean can only be True or False, and is returned when evaluating an expression. For example:

our_result = 10>9 print(our_result)

Returns True - we’re asking Python for the result of the comparison 10>9, and to store this result in a variable called our_result. The data type of a true-false comparison result like that is bool, so our variable will also be of this type.

Booleans will become highly relevant when we talk about conditionals and program flow.

George Boole (1815-1864) - the originator of Boolean logic

Numeric types

Numeric types are for variables that will only contain numbers. Other programming languages often have many different numeric types, but Python (mercifully) only has two:

int can contain any5 whole (no fraction or decimals) number; negative, positive or zero. E.g.

  • a = -4
  • b = 3
  • c = 9087358292578

float can contain any number with a decimal point, to arbitrary6 precision. E,g,

  • x = -2.2
  • y = 3.0
  • z = 2452.259259999999999

If you’re manually assigning a number to a variable, python will always choose an int or float depending on whether you’ve used a decimal point or not - so 2 and 2.0 are not equivalent in this context.

Data structures

With data structures, we can address an element or elements by using square bracket notation - more on this below.

Strings (str)

These are similar to a VARCHAR in SQL. They are ordered sequences (strings) of characters7. Enclosed by quotation marks8. E.g.

  • our_string = "Hello world"

Lists (list)

An ordered sequence of objects, where each object can be another data type (int, float, string, bool, etc). Enclosed by square brackets, and the items separated by commas. E.g.

  • our_list = [1, 2.3, "abc"]

Dictionaries (dict)

Dictionaries are key-value pairs, where each entry is a pair of entries. Enclosed by curly braces, the keys and values separated by a colon and each pair separated by a comma. E.g.

  • our_dict = {"org_code":"0DF","name":"SCW CSU","year": 2013}

Other data types

Built-in

  • We’ve skipped over complex numbers and tuples, the latter being like a dict but non-changeable.

Other packages

  • You may have heard of other data types such as arrays (which are kind like lists but multi-dimensional).
  • Arrays are not a built-in Python type but are offered by the numpy package.
  • pandas also offers additional data types such as timestamp (similar to SQL’s datetime).
  • dataframes (from pandas) are an example of a higher-order class that makes use of datatypes within it; remember from previous sessions that a dataframe can contain strings, integers, timestamps etc.

Final thoughts

Don’t worry about memorising any of this! If you take one thing away from this session, make it the fact that data types exist, that being aware of them will help you understand problems with your code, and that resources and documentation are readily available online.

Further reading

Footnotes

  1. experimental ternary computers and quantum computing are firmly out of scope of this presentation

  2. from https://www.righto.com/2019/07/software-woven-into-wire-core-rope-and.html

  3. from https://www.righto.com/2019/07/software-woven-into-wire-core-rope-and.html

  4. from https://www.righto.com/2019/07/software-woven-into-wire-core-rope-and.html

  5. there is no clearly-defined maximum number for an integer in python; certainly not one you’re likely to ever encounter

  6. again, limits exist but aren’t relevant here

  7. letters, numbers, symbols, etc. - any valid UTF-8 symbols

  8. in most instances either double quotes (") or single quotes (') are fine - but it’s a good idea to pick one style and be consistent.