1st Normalization In Database With Example

Normalization is a process to organize the database in such a way to reduce data redundancy and improve scalability. It consists of 5 steps, which are: (1) First Normalization, (2) Second Normalization, (3) Third Normalization, (4) Boyce-Codd normal form and (5) Fourth Normalization.

The normalization process is used to remove data redundancy in a database. This makes the data easier to manage and maintain, can improve performance, and can make it easier to use the data in different ways.

In this tutorial, you will learn how to normalize a database by using the first normal form (1NF) method. After reading this tutorial, you should be able to define each of the three main steps in 1NF and know how to apply those steps when normalizing a database.

Right here on Quyasoft you can rest easy to obtain all the relevant information you need on first normal form, normalization example with solution, normalization in DBMs with examples, and so much more. Take out time to surf through our catalog for more information on related topics. You don’t want to miss this!

First Form Normalization Databases

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss normal forms with examples.

Normalization In DBMs With Examples

Normalization is a process of reorganizing data to avoid redundancy and minimize data modification. Normalization is a common approach to database design. In this article, we will learn about the first normal form.

First normal form is a generalization of the concept of atomic values in memory to fields in a database. A field or entity is in 1NF if and only if it has a single value which can be interpreted as information about that entity

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are: Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: A manufacturing company stores the employee details in a table Employee that has four attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s name, Emp_Address for storing employee’s address and Emp_Dept for storing the department details in which the employee works. At some point of time the table looks like this:

Emp_IdEmp_NameEmp_AddressEmp_Dept
101RickDelhiD001
101RickDelhiD002
123MaggieAgraD890
166GlennChennaiD900
166GlennChennaiD004

This table is not normalized. We will see the problems that we face when a table in database is not normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two departments of the company. If we want to update the address of Rick then we have to update the same in two rows or the data will become inconsistent. If somehow, the correct address gets updated in one department but not in other then as per the database, Rick would be having two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not assigned to any department then we would not be able to insert the data into the table if Emp_Dept field doesn’t allow null.

Delete anomaly: Let’s say in future, company closes the department D890 then deleting the rows that are having Emp_Dept as D890 would also delete the information of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss about normalization.

Normalization Example With Solution

Here are the most commonly used normal forms:

  • First normal form(1NF)
  • Second normal form(2NF)
  • Third normal form(3NF)
  • Boyce & Codd normal form (BCNF)

First Normal Form (1NF)

A relation is said to be in 1NF (first normal form), if it doesn’t contain any multi-valued attribute. In other words you can say that a relation is in 1NF if each attribute contains only atomic(single) value only.

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It should hold only atomic values.

Example: Let’s say a company wants to store the names and contact details of its employees. It creates a table in the database that looks like this:

Emp_IdEmp_NameEmp_AddressEmp_Mobile
101HerschelNew Delhi8912312390
102JonKanpur8812121212 ,
9900012222
103RonChennai7778881212
104LesterBangalore9990000123,
8123450987

Two employees (Jon & Lester) have two mobile numbers that caused the Emp_Mobile field to have multiple values for these two employees.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”, the Emp_Mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we need to create separate rows for the each mobile number in such a way so that none of the attributes contains multiple values.

Emp_IdEmp_NameEmp_AddressEmp_Mobile
101HerschelNew Delhi8912312390
102JonKanpur8812121212
102JonKanpur9900012222
103RonChennai7778881212
104LesterBangalore9990000123
104LesterBangalore8123450987

To learn more about 1NF refer this article: 1NF

Second Normal Form (2NF)

A table is said to be in 2NF if both the following conditions hold:

  • Table is in 1NF (First normal form)
  • No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Let’s say a school wants to store the data of teachers and the subjects they teach. They create a table Teacher that looks like this: Since a teacher can teach more than one subjects, the table can have multiple rows for a same teacher.

Teacher_IdSubjectTeacher_Age
111Maths38
111Physics38
222Biology38
333Physics40
333Chemistry40

Candidate Keys: {Teacher_IdSubject}
Non prime attributeTeacher_Age

This table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

To make the table complies with 2NF we can disintegrate it in two tables like this:
Teacher_Details table:

Teacher_IdTeacher_Age
11138
22238
33340

Teacher_Subject table:

Teacher_IdSubject
111Maths
111Physics
222Biology
333Physics
333Chemistry

Now the tables are in Second normal form (2NF). To learn more about 2NF refer this guide: 2NF

Third Normal Form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

  • Table must be in 2NF
  • Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional dependency X-> Y at least one of the following conditions hold:

  • X is a super kKey of table
  • Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.

Example: Let’s say a company wants to store the complete address of each employee, they create a table named Employee_Details that looks like this:

Emp_IdEmp_NameEmp_ZipEmp_StateEmp_CityEmp_District
1001John282005UPAgraDayal Bagh
1002Ajeet222008TNChennaiM-City
1006Lora282007TNChennaiUrrapakkam
1101Lilly292008UKPauriBhagwan
1201Steve222999MPGwaliorRatan

Super keys: {Emp_Id}, {Emp_IdEmp_Name}, {Emp_IdEmp_NameEmp_Zip}…so on
Candidate Keys: {Emp_Id}
Non-prime attributes: all attributes except Emp_Id are non-prime as they are not part of any candidate keys.

Here, Emp_StateEmp_City & Emp_District dependent on Emp_Zip. Further Emp_zip is dependent on Emp_Id that makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively dependent on super key (Emp_Id). This violates the rule of 3NF.

To make this table complies with 3NF we have to disintegrate the table into two tables to remove the transitive dependency:

Employee Table:

Emp_IdEmp_NameEmp_Zip
1001John282005
1002Ajeet222008
1006Lora282007
1101Lilly292008
1201Steve222999

Step-By-Step Normalization Example

Normalization, step by step with example

Normalization is the transformation of complex user views and data stores to a set of smaller, stable data structures. In addition to being simpler and more stable, normalized data structures are more easily maintained than other data structures.

The Three Steps of Normalization

Beginning with either a user view or a data store developed for a data dictionary (see Chapter 8), the analyst normalizes a data structure in three steps, as shown in the figure below. Each step involves an important procedure, one that simplifies the data structure.

The third step removes any transitive dependencies. A transitive dependency is one in which nonkey attributes are dependent on other nonkey attributes.

A Normalization Example

Figure shown below is a user view for the Al S. Well Hydraulic Equipment Company. The report shows the (1) SALESPERSON-NUMBER, (2) SALESPERSON-NAME, and (3) SALES-AREA. The body of the report shows the (4) CUSTOMER-NUMBER and (5) CUSTOMER-NAME. Next is the (6)WAREHOUSE-NUMBER that will service the customer, followed by the (7) WAREHOUSE-LOCATION, which is the city in which the company is located. The final information contained in the user view is the (8) SALES-AMOUNT. The rows (one for each customer) on the user view show that items 4 through 8 form a repeating group.

If the analyst was using a data flow/data dictionary approach, the same information in the user view would appear in a data structure. Figure below shows how the data structure would appear at the data dictionary stage of analysis. The repeating group is also indicated in the data structure by an asterisk (*) and indentation.

Before proceeding, note the data associations of the data elements in shown in the figure below. This type of illustration is called a bubble diagram or data model diagram. Each entity is enclosed in an ellipse, and arrows are used to show the relationships. Although it is possible to draw these relationships with an E-R diagram, it is sometimes easier to use the simpler bubble diagram to model the data.

In this example, there is only one SALESPERSON-NUMBER assigned to each SALESPERSON-NAME, and that person will cover only one SALES-AREA, but each SALES-AREA may be assigned to many salespeople: hence, the double arrow notation from SALES-AREA to SALESPERSON-NUMBER. For each SALESPERSON-NUMBER, there may be many CUSTOMER-NUMBER(s).

Furthermore, there would be a one-to-one correspondence between CUSTOMER-NUMBER and CUSTOMER-NAME; the same is true for WAREHOUSE-NUMBER and WAREHOUSE-LOCATION. CUSTOMER-NUMBER will have only one WAREHOUSE-NUMBER and WAREHOUSE-LOCATION, but each WAREHOUSE-NUMBER or WAREHOUSE-LOCATION may service many CUSTOMER-NUMBER(s). Finally, to determine the SALES-AMOUNT for one salesperson’s calls to a particular company, it is necessary to know both the SALESPERSON-NUMBER and the CUSTOMER-NUMBER.

The main objective of the normalization process is to simplify all the complex data items that are often found in user views. For example, if the analyst were to take the user view discussed previously and attempt to make a relational table out of it, the table would look like as shown below. Because this relation is based on our initial user view, we refer to it as SALES-REPORT.

SALES-REPORT is an unnormalized relation, because it has repeating groups. It is also important to observe that a single attribute such as SALESPERSON-NUMBER cannot serve as the key. The reason is clear when one examines the relationships between SALESPERSON-NUMBER and the other attributes in the figure illustration below. Although there is a one-to-one correspondence between SALESPERSON-NUMBER and two attributes (SALESPERSON-NAME and SALES-AREA), there is a one-to-many relationship between SALESPERSON-NUMBER and the other five attributes (CUSTOMER-NUMBER, CUSTOMER-NAME, WAREHOUSE-NUMBER, WAREHOUSE-LOCATION, and SALES-AMOUNT).

SALES-REPORT can be expressed in the following shorthand notation:

SALES REPORT  (SALESPERSON-NUMBER,
        SALESPERSON-NAME, SALES-AREA,
        (CUSTOMER-NUMBER,
        CUSTOMER-NAME,
        WAREHOUSE-NUMBER,
        WAREHOUSE-LOCATION,
        SALES-AMOUNT))

where the inner set of parentheses represents the repeated group.

FIRST NORMAL FORM (1NF)

The first step in normalizing a relation is to remove the repeating groups. In our example, the unnormalized relation SALES-REPORT will be broken into two separate relations. These new relations will be named SALESPERSON and SALESPERSON-CUSTOMER. Figure below shows how the original, unnormalized relation SALES-REPORT is normalized by separating the relation into two new relations. Notice that the relation SALESPERSON contains the primary key SALESPERSON-NUMBER and all the attributes that were not repeating (SALESPERSON-NAME and SALES-AREA).

The second relation, SALESPERSON-CUSTOMER, contains the primary key from the relation SALESPERSON (the primary key of SALESPERSON is SALESPERSON-NUMBER), as well as all the attributes that were part of the repeating group (CUSTOMER-NUMBER, CUSTOMER-NAME, WAREHOUSE-NUMBER, WAREHOUSE-LOCATION, and SALES-AMOUNT). Knowing the SALESPERSON-NUMBER, however, does not automatically mean that you will know the CUSTOMER-NAME, SALES-AMOUNT, WAREHOUSE-LOCATION, and so on. In this relation, one must use a concatenated key (both SALESPERSON-NUMBER and CUSTOMER-NUMBER) to access the rest of the information. It is possible to write the relations in shorthand notation as follows:

The relation SALESPERSON-CUSTOMER is a first normal relation, but it is not in its ideal form. Problems arise because some of the attributes are not functionally dependent on the primary key (that is, SALESPERSON-NUMBER, CUSTOMER-NUMBER). In other words, some of the nonkey attributes are dependent only on CUSTOMER NUMBER and not on the concatenated key. The data model diagram in the figure illustration below shows that SALES-AMOUNT is dependent on both SALESPERSON-NUMBER and CUSTOMER-NUMBER, but the other three attributes are dependent only on CUSTOMER-NUMBER.

Leave a Comment