Data Types

Lecture 3: Types of Data

Robert W. Walker

Outline

  • Readings: Data Types and Chapter 1

What is Data?

Cambridge defines data as:

data

  • facts or numbers.
  • in electronic form.
  • stored and used by a computer.

FBS: What is a spreadsheet?

  • Google Sheets: rows, columns, sheets

Sheet

Finding Data

  • Gemini.
  • tidytuesday [in .csv]
  • Kaggle
  • UCI Machine Learning

Examples

Datasaurus

Bond Funds

Bob Ross Paintings

What is a variable?

Something that varies. Distinguished from constants.

Classifying Data

From the above definition, anything we can put in a spreadsheet [and that may be too limiting] is data. To classify it, we will need some categories.

  • The spreadsheet may impose character limits or the like, that is a programming choice and not tied to definition..

Classifying Data I: Quantative vs. Qualitative

Literally numbers vs. non-numbers. But it is not quite that simple.

  • There are numbers that are not numerical, e.g. Social security numbers. There is some, but not much, encoded information and there is little we can do with them.

  • Your Willamette Student ID

Quantitative Data: Levels of Measurement

Four categories:

  • Nominal. (No order and no arithmetic)
  • Ordinal. (Order, no arithmetic)
  • Interval scale. (Addition and subtraction, no multiplication and division)
  • Ratio scale. (All arithmetic)

Nominal Data

Names ID numbers

Ordered Data

Ranks Medals Satisfaction Likert scales

Interval data

The year example in the Primer.

Ratio scale

The duration example in the primer.

Distinction: Unit of Analysis

What are the units?

Sometimes, this is singular

Sometimes, it is a combination

Popular units:

  1. cross-sectional units [people, countries, products]
  2. time periods – time series
  3. panel data is a combination of units and time. [Multiple time series or multiple cross-sections over time]

Consider this example

A Data Taxonomy

  • Generally column-centric.
  • Variables in columns.
  • Units in rows.

A Class Assignment to Conclude

Some customer satisfaction data: