OCA Oracle Database 11g SQL Fundamentals I by Steve Ries Download PDF Free

SQL and Relational
Databases

We live in a data-driven world. Think for a moment about all the data that exists

about you, in computers around the world.

• Your name

• Birth date and information

• Your hobbies

• Purchases you've made

• The identity of your friends

• Your place of employment

The examples are endless. Next, multiply that amount of data by the number

of people in the world. The result is a truly staggering amount of information.

How is it possible that all this data can be organized and retrieved? In today's

data-centric world, it is databases that make this possible. These Relational

Database Management Systems (RDBMS) are primarily controlled by a

programming language called Structured Query Language (SQL).

In this chapter, we will cover the following topics:

• Discussing the purpose of relational database management systems

• Understanding the use of the relational paradigm

• Examining the use of Entity Relationship Diagrams (ERDs)

• Looking at the structure of tables

• Introducing Structured Query Language (SQL)

• Reviewing commonly-used query tools

• Introducing the SQL Developer tool

SQL and Relational Databases

Relational Database Management

Systems

Imagine, for a moment, that you have the telephone books for the 20 largest cities

in the U.S. I give you the following request: Please find all the phone numbers for

individuals named Rick Clark in the Greater Chicago area. In order to satisfy the request,

you simply do the following:

• Open the Chicago phone book

• Scan to the "C" section of names

• Find all individuals that match "Clark, Rick"

• Report back their phone numbers

Now imagine that I take each phone book, tear out all of the pages, and throw them

into the air. I then proceed to shuffle the thousands of pages on the ground into

a completely disorganized mess. Now I repeat the same request: Please find all the

phone numbers for individuals named Rick Clark in the Greater Chicago area. How do you

think you would do that? It would be nearly impossible. The data is all there, but

it's completely disorganized. Finding the "Rick Clarks" of Chicago would involve

individually examining each page to see if it satisfied the request—a very frustrating

undertaking, to say the least.

This example underscores the importance of a database, or more accurately, a

Relational Database Management System(RDBMS) Today's RDBMSs are what

enable the storage, modification, and retrieval of massive amounts of data.

Flat file databases

When the devices that we know as computers first came into existence, they were

primarily used for one thing—computation. Computers became useful entities

because they were able to do numeric computation on an unprecedented scale. For

example, one of the first computers, ENIAC, was designed (although not used) for

the US Army to calculate artillery trajectories, a task made simpler through the use

of complex sequences of mathematical calculations. As such, originally, computers

were primarily a tool for mathematical and scientific research. Eventually, the use of

computers began to penetrate the business market, where the company's data itself

became just as important as computational speed. As the importance of this data

grew, so the need for data storage and management grew as well, and the concept

of a database was born.

The earliest databases were simple to envision. Most were simply large files that

were similar in concept to a spreadsheet or comma-separated values (CSV) file.

Data was stored as fields. A portion of these databases might look something like

the following:

Susan, Bates, 123 State St, Somewhere, VA

Fred, Hartman, 234 Banner Rd, Anywhere, CA

Bill, Frankin, 345 Downtown Rd, Somewhere, MO

Emily, Thompson, 456 Uptown Rd, Somewhere, NY

In this example, the first field is determined by reading left to right until a delimiter,

in this case a comma, is reached. This first field refers to the first name of the

individual. Similarly, the next field is determined by reading from the first delimiter

to the next. That second field refers to the last name of the individual. It continues

in this manner until we have five fields—first name, last name, street address, city,

and state. Each individual line or record in the file refers to the information for a

distinct individual. Because this data is stored in a file, it is often referred to as a flat

file database. To retrieve a certain piece of information, programs could be written

that would scan through the records for the requested information. In this way, large

amounts of data could be stored and retrieved in an orderly, programmatic way.

Limitations of the flat file paradigm

The flat file database system served well for many years. However, as time passed

and the demands of businesses to retain more data increased, the flat file paradigm

began to show some flaws.

In our previous example, our flat file is quite limited. It contains only five fields,

representing five distinct pieces of information. If this flat file database contained the

data for a real company, five distinct pieces of information would not even begin to

suffice. A complete set of customer data might include addresses, phone numbers,

information about what was ordered, when the order was placed, when the order

was delivered, and so on. In short, as the need to retain more data increases, the

number of fields grows. As the number of fields grows, our flat file database gets

wider and wider. We should also consider the amount of data being stored. Our first

example had four distinct records; not a very realistic amount for storing customer

data. The number of records could actually number in thousands or even millions.

Eventually, it is completely plausible that we could have a single flat file that is

hundreds of fields wide and millions of records long. We could easily find that the

speed with which our original data retrieval programs can retrieve the required data

is decreasing at a rapid rate and is insufficient for our needs.

SQL and Relational Databases

As our data demands increase, we're presented with another problem. If we are

storing order information, for example, strictly under the flat file paradigm, we are

forced to store a new record each time an order is placed. Consider this example,

in which our customer purchases six different items. We store a six-digit invoice

number, customer name, and address for the customer's purchase, as follows:

487345, Susan, Bates, 123 State St, Somewhere, VA

584793, Susan, Bates, 123 State St, Somewhere, VA

998347, Susan, Bates, 123 State St, Somewhere, VA

126543, Susan, Bates, 123 State St, Somewhere, VA

487392, Susan, Bates, 123 State St, Somewhere, VA

Using this example, notice how much duplicate data we have stored. The fields

are invoice number, first name, last name, street address, city, and state, respectively.

The only different piece of information in each record is the invoice number, and

yet we have repeatedly stored the last five fields—information that is stored in

previous records. We refer to these anomalies as repeating values. Repeating values

present two problems from a processing standpoint. First, the duplicate data must

be re-read each time by our retrieval programs, creating a performance problem

for our retrieval operations. Second, those duplicate characters constitute bytes that

must be stored on disk, uselessly increasing our storage requirements. It is clear that

the flat file paradigm needs to be revised in order to meet the growing demands of

our database.

Normalization

The world of databases changed in the early 1970s due in large part to the work of

Dr. Edgar "Ted" Codd. In his paper, A Relational Model of Data for Large Shared Data

Banks, Dr. Codd presented a new paradigm—the relational paradigm. The relational

paradigm seeks to resolve the issues of repeating values and unconstrained size by

implementing a process called normalization. During normalization, we organize

our data in such a way that the data and its inter-relationships can be clearly

identified. When we design a database, we begin by asking two questions—what

data do I have? And, how do the pieces of data relate to each other? In the first step,

the data is identified and organized into entities. An entity is any person, place,

or thing. An entity also has attributes, or characteristics, that pertain to it. Some

example entities are listed in the following diagram:

Structured Query Language

SQL was developed by Donald Chamberlain and Raymond Boyce in the early 1970s

as a language to retrieve data from IBM's early relational database management

systems. It was accepted as a standard by the American National Standards

Institute (ANSI) in 1986. SQL is generally referred to as a fourth-generation

language (4GL), in contrast with third-generation languages (3GLs) such as C,

Java, and Python. As a 4GL, the syntax of SQL is designed to be even closer to

human language than 3GLs, making it relatively natural to learn. Some do not

refer to SQL as a programming language at all, but rather a data sub-language.

A language for relational databases

Before we look at what SQL (pronounced either 'S-Q-L' or 'sequel') is, it is important

to define what it is not. First, SQL is not a product of Oracle or any other software

company. While most relational database products use some implementation of

SQL, none of them own it. The structure and syntax of SQL are governed by the

American National Standards Institute and the International Organization for

Standardization (ISO). It is these organizations that decide, albeit with input from

other companies such as Oracle, what comprises the accepted standard for SQL.

The current revision is SQL:2008.

Second, while the ANSI standard forms the basis for the various implementations

of SQL used in different database management systems, this does not mean that

the SQL syntax and functionality in all database products is the same; in fact, it is

often quite different. For instance, the SQL language permits the concatenation of

two column values into one; for example, the values hello and there concatenated

would be hellothere. Oracle and Microsoft SQL Server both use symbols to denote

concatenation, but they are different symbols. Oracle uses the double-pipe symbol,

'||', and SQL Server uses a plus sign, '+'. MySQL, on the other hand, uses a keyword,

CONCAT. Additionally, RDBMS software manufacturers often add functionalities to

their own SQL implementations. In Oracle version 10g, a new type of syntax was

included to join data from two or more tables that differs significantly from the ANSI

standard. Oracle does, however, still support the ANSI standard as well.

Read Book PDF More :

Programming Python 4 Edition PDF - ZeindBook

Hacking Beginner to Expert Guide- ZeindBook

Web Animation using JavaScript - ZeindBook

Advertisement

OCA Oracle Database 11g SQL Fundamentals I by Steve Ries Download PDF Free

OCA Oracle Database 11g SQL Fundamentals I by Steve Ries Download PDF Free