Tuesday, May 3, 2011

GENERATIONS OF LANGUAGES

To understand the amazing variety of languages, programs, and products which computer scientists collectively
refer to as software, it helps to recall the history of this young discipline.
Each computer is wired to perform certain operations in response to instructions. An instruction is a pattern
of ones and zeros stored in a word of computer memory. By the way, a “word” of memory is the basic unit of
storage for a computer. A 16-bit computer has a word size of 16 bits, or two bytes. A 32-bit computer has
a word size of 32 bits, or four bytes. A 64-bit computer has a word size of 64 bits, or eight bytes. When a computer
accesses memory, it usually stores or retrieves a word of information at a time.
If one looked at a particular memory location, one could not tell whether the pattern of ones and zeros in
that location was an instruction or a piece of data (number). When the computer reads a memory location
expecting to find an instruction there, it interprets whatever bit pattern it finds in that location as an instruction.
If the bit pattern is a correctly formed machine instruction, the computer performs the appropriate operation;
otherwise, the machine halts with an illegal instruction fault.
Each computer is wired to interpret a finite set of instructions. Most machines today have 75 to
150 instructions in the machine “instruction set.” Much of the “architecture” of a computer design is
reflected in the instruction set, and the instruction sets for different architectures are different. For example,
the instruction set for the Intel Pentium computer is different from the instruction set for the Sun SPARC.
Even if the different architectures have instructions that do the same thing, such as shift all the bits in a computer
word left one place, the pattern of ones and zeros in the instruction word will be different in different
architectures. Of course, different architectures will usually also have some instructions that are unique to that
computer design.
The earliest computers, and the first hobby computers, were programmed directly in the machine
instruction set. The programmer worked with ones and zeros to code each instruction. As an example,
here is code (and an explanation of each instruction), for a particular 16-bit computer. These three
instructions will add
the value stored in memory location 64 to that in location 65, and store the result
in location 66.


0110000001000000 (Load the A-register from 64)
0100000001000001 (Add the contents of 65)
0111000001000010 (Store the A-register in 66)

Once the programmer created all the machine instructions, probably by writing the bit patterns on paper,
the programmer would store the instructions into memory using switches on the front panel of the computer.
Then the programmer would set the P register (program counter register) contents to the location of the first
instruction in the program, and then press “Run.” The basic operational loop of the computer is to read
the instruction stored in the memory location pointed to by the P register, increment the P register, execute the
instruction found in memory, and repeat.
An early improvement in programming productivity was the assembler. An assembler can read mnemonics
(letters and numbers) for the machine instructions, and for each mnemonic generate the machine language
in ones and zeros.
Assembly languages are called second-generation languages. With assembly language programming, the
programmer can work in the world of letters and words rather than ones and zeros. Programmers write their code
using the mnemonic codes that translate directly into machine instructions. These are typical of such mnemonics:
LDA m Load the A-register from memory location m.
ADA m Add the contents of memory location m to the contents of the A-register, and leave
the sum in the A-register.
ALS A Left Shift; shift the bits in the A-register left 1 bit, and make the least significant bit zero.
SSA Skip on Sign of A; if the most significant bit in the A-register is 1, skip the next
instruction, otherwise execute the next instruction.
JMP m Jump to address m for the next instruction.
The work of an assembler is direct; translate the mnemonic “op-codes” into the corresponding machine
instructions.
Here is assembly language code for the program above that adds two numbers and stores the result in
a third location:
LDA 100 //Load the A-register from 100 octal = 64
ADA 101 //Add to the A-reg the contents of 101 (65)
STA 102 //Store the A-register contents in 102 (66)

Almost no one codes directly in the ones and zeros of machine language anymore. However, programmers
often use assembly language for programs that are very intimate with the details of the computer hardware, or
for programs that must be optimized for speed and small memory requirements. As an educational tool, assembly
language programming is very important, too. It is probably the best way to gain an intuitive feel for what computers
really do and how they do it.
In 1954 the world saw the first third-generation language. The language was FORTRAN, devised by John
Backus of IBM. FORTRAN stands for FORmula TRANslation. The goal was to provide programmers with
a way to work at a higher level of abstraction. Instead of being confined to the instruction set of a particular
machine, the programmer worked with statements that looked something like English and mathematical statements.
The language also included constructs for conditional branching, looping, and I/O (input and output).
Here is the FORTRAN statement that will add two numbers and store the result in a third location. The
variable names X, Y, and Z become labels for memory locations, and this statement says to add the contents of
location Y to the contents of location Z, and store the sum in location X:
X = Y + Z
Compared to assembly language, that’s quite a gain in writeability and readability!
FORTRAN is a “procedural language”. Procedural languages seem quite natural to people with a background
in automation and engineering. The computer is a flexible tool, and the programmer’s job is to lay out the sequence of steps necessary to accomplish the task. The program is like a recipe that the computer will follow
mechanically.
Procedural languages make up one category of “imperative languages,” because the statements of the language
are imperatives to the computer—the steps of the program specify every action of the computer. The other
category of imperative languages is “object-oriented” languages, which we will discuss in more detail later.
Most programs today are written in imperative languages, but not all ...
In 1958, John McCarthy at MIT developed a very different type of language. This language was LISP (for
LISt Processing), and it was modeled on mathematical functions. It is a particularly good language for working
with lists of numbers, words, and objects, and it has been widely used in artificial intelligence (AI) work.
In mathematics, a function takes arguments and returns a value. LISP works the same way, and LISP is
called a “functional language” as a result. Here is the LISP code that will add two numbers and return the sum:

(+ 2 5)
This code says the function is addition, and the two numbers to add are 2 and 5. The LISP language processor
will return the number 7 as a result. Functional languages are also called “declarative languages” because the
functions are declared, and the execution of the program is simply the evaluation of the functions. We will return
to functional languages later.
In 1959 a consortium of six computer manufacturers and three US government agencies released Cobol as
the computing language for business applications (COmmercial and Business-Oriented Language). Cobol, like
FORTRAN, is an imperative, procedural language. To make the code more self-documenting, Cobol was designed
to be a remarkably “wordy” language. The following line adds two numbers and stores the result in a third variable:
ADD Y, Z GIVING X.


Many students in computer science today regard Cobol as old technology, but even today there are more lines
of production code in daily use written in Cobol than in any other language (http://archive.adaic.com/docs/
reports /lawlis/content.htm).
Both PL/1 and BASIC were introduced in 1964. These, too, are procedural, imperative languages. IBM
designed PL/1 with the plan of “unifying” scientific and commercial programming. PL/1 was part of the IBM
360 project, and PL/1 was intended to supplant both FORTRAN and Cobol, and become the one language
programmers would henceforth use for all projects (Pugh, E., Johnson, L., & Palmer, J. IBM’s 360 and Early
370 Systems. Cambridge, MA: MIT Press, 1991). Needless to say, IBM’s strategy failed to persuade all those
FORTRAN and Cobol programmers.
BASIC was designed at Dartmouth by professors Kemeny and Kurtz as a simple language for beginners.
BASIC stands for Beginner’s All-purpose Symbolic Instruction Code. Originally BASIC really was simple, too
simple, in fact, for production use; it had few data types and drastic restrictions on the length of variable names,
for example. Over time, however, an almost countless number of variations of BASIC have been created, and
some are very rich in programming power. Microsoft’s Visual Basic, for example, is a powerful language rich
in modern features.
Dennis Ritchie created the very influential third-generation language C in 1971. C was developed as a language
with which to write the operating system Unix, and the popularity of C and Unix rose together. C is also an
imperative programming language. An important part of C’s appeal is its ability to perform low-level manipulations,
such as manipulations of individual bits, from a high-level language. C code is also unusually amenable
to performance optimization. Even after 34 years, C is neck-and-neck with the much newer Java as the most
popular language for new work (http://www.tiobe.com/tpci.htm).
During the 1970s, the language Smalltalk popularized the ideas of object-oriented programming. Objectoriented
languages are another subcategory of imperative languages. Both procedural and object-oriented
languages are imperative languages. The difference is that object-oriented languages support object-oriented
programming practices such as inheritance, encapsulation, and polymorphism. We will describe these ideas in
more detail later. The goal of such practices is to create more robust and reusable modules of code, and hence
improve programming productivity.
In the mid-1980s, Bjarne Stroustrup, at Cambridge University in Britain, invented an object-oriented
language called C++. C++ is a superset of C; any C program is also a C++ program. C++ provides a full set of object-oriented features, and at one time was called “C with classes.” Until Java emerged in the late 1990s, C++
was the most popular object-oriented development language.
The most popular object-oriented language today is Java, which was created by James Gosling and his
colleagues at Sun Microsystems. Java was released by Sun in 1994, and became an immediate hit due to its
appropriateness for web applications, its rich language library, and its hardware independence. Java’s growth in
use among programmers has been unprecedented for a new language. Today Java and C are the languages most
frequently chosen for new work (http://www.tiobe.com/tpci.htm).
The variety of third-generation languages today is very great. Some are more successful than others
because they offer unusual expressive power (C, Java), efficiency of execution (C, FORTRAN), a large installed
base of code (Cobol), familiarity (BASIC), portability between computers (Java), object orientation (Java,
C++), or the backing of important sponsors (such as the US Department of Defense sponsorship of ADA).

No comments:

Post a Comment