# AI Data

Up to 80% of an Artificial Intelligence project is about Collecting Data:

• What data is Required?
• What data is Available?
• How to Select the data?
• How to Collect the data?
• How to Clean the data?
• How to Prepare the data?
• How to Use the data?

## What is Data?

Data can be many things. With Artificial Intelligence it must be a collection of facts:

TypeExamples
NumbersPrices. Dates.
MeasurementsSize. Height. Weight.
WordsNames and Places.
ObservationsCounting Cars.
DescriptionsIt is cold.

## Intelligence Needs Data

Human intelligence needs data:

A real estate broker needs data about sold houses to estimate prices.

Artificial intelligence needs data:

A computer program also needs data to estimate prices.

## Storing Data

The most common data to collect are Numbers and Measurements.

Often data are stored in arrays representing the relationship between values.

This table contains house prices versus size:

 Price 7 8 8 9 9 9 10 11 14 14 15 Size 50 60 70 80 90 100 110 120 130 140 150

## Quantitative vs. Qualitative

Quantitative data are numerical:

• 55 cars
• 15 meters
• 35 children

Qualitative data are descriptive:

• It is cold
• It is long
• It was fun

## Census or Sampling

A Census is when we collect data for every member of a group.

A Sample is when we collect data for some members of a group.

If we wanted to know how many Americans smoke cigarettes, we could ask every person in the US (a census), or we could ask 10 000 people (a sample).

A census is Accurate, but hard to do. A sample is Inaccurate, but is easier to do.

## Sampling Terms

A Population is group of individuals (objects) we want to collect information from.

A Census is information about every individual in a population.

A Sample is information about a part of the population (In order to represent all).

## Random Samples

In order for a sample to represent a population, it must be collected randomly.

A Random Sample, is a sample where every member of the population has an equal chance to appear in the sample.

## Sampling Bias

A Sampling Bias (Error) occurs when samples are collected in such a way that some individuals are less (or more) likely to be included in the sample.