By Kevin Shane / November 7, 2017

Data Vocabulary 3: Some of Many!

Continuing our series on data vocabulary there are a few terms I’ve come across recently that I’d love to share and hopefully help more people understand.

Beginner – “Distribution”

When talking about data, a distribution is the list or representation of all points in a data set and how often each point occurs. An easy example distribution would be a list of grades for a class.An extremely common distribution that most people are familiar with is the “normal distribution” or “bell curve.” In a “normal distribution”, data is clustered around the average, which is why this distribution is sometimes used to set a curve on scores in higher education settings. Distributions can range from having a noticeable pattern like the normal distribution shown, or they can be completely random. One of the challenges of data analysis can be finding a pattern in what seems like a chaotic distribution.

Intermediate – “Benchmarking”

Benchmarking is commonly used in education and in business in similar ways. For businesses, they compare themselves to other businesses using different metrics and practices. How well does Bing’s search perform compared to Google’s? Bing can compare themselves using things like response time, number of results, quality of results, etc. In education, benchmarking works in a similar way. There are standards that are set for schools, students, grade levels, etc, and the performance on those standards can be used for comparison. Benchmarking can be used to place students in courses or to just compare how they fit with other students.

Advanced – “Relational Data”

Relational data is probably something that most people who are reading this are familiar with but don’t know that their data falls into this category. “Relational Data” is data that contains an identifying “key” (usually something like a student ID) that is unique to each row. Usually this data is presented as a table or spreadsheet. The key can then be used to connect data from other tables or sheets together to see how they “relate.” No two keys in a table should be the same in order to avoid misidentifying data.

As always, if you have more questions or have terms you want to discuss, let us know! All of us here love talking about data and enjoy helping others understand it better.

Come to terms with your terms 🙂