The algorithms discussed so far all have an order of growth that can be described by some polynomial equation
in n. A “polynomial in n” means the sum of some number of terms, where each term consists of n raised to some
power and multiplied by a coefficient. For instance, the insertion sort order of growth is (n2/2 - n/2).
When an algorithm has an order of growth that is greater than can be expressed by some polynomial equation
in n, then computer scientists refer to the algorithm as intractable. If no better algorithm can be discovered to
solve the problem, computer scientists refer to the problem as an intractable problem.
As an example of an intractable problem, consider a bioinformatics problem. The Department of Genetics at
Yale School of Medicine maintains a database of genetic information obtained from different human populations.
ALFRED (ALlele FREquency Database) is a repository of genetic data on 494 anthropologically defined
human populations, for over 1600 polymorphisms (differences in DNA sequences between individuals).
However, researchers have collected data for only about 6 percent of the possible population–polymorphism
combinations, so most of the possible entries in the database are absent.
When population geneticists seek to find the largest possible subset of populations and polymorphisms for
which complete data exist (that is, measures exist for all polymorphisms for all populations), the researchers are
confronted by a computationally intractable problem. This problem requires that every subset of the elements
in the matrix be examined, and the number of subsets is very large!
The number of subsets among n elements is 2n, since each element can either be in a particular subset or
not. For our problem, the number of elements of our set is the number of possible entries in the database. That
is, the ALFRED database presents us with 2 (494 ∗ 1600) subsets to investigate! To exhaustively test for the largest
subset with complete data, we would have to enumerate all the subsets, and test each one to see if all entries in
the subset contained measurements!
Clearly, the order of growth of such an algorithm is 2n; Θ(2n). This is an exponential function of n, not
a polynomial, and it makes a very important difference. An exponential algorithm becomes intractable quickly
For instance, solving the problem for a matrix of 20 entries will require about a million units of time, but solving
the problem for a matrix of 50 entries will require about a million billion units of time. If a unit of time is a millionth
of a second, the problem of size 20 will require a second to compute, but the problem of size 50 will require
more than 25 years. The ALFRED database is of size 494 ∗ 1600 = 790,400. Students hoping to graduate need
a better algorithm or a different problem!
Another example of an intractable problem is the famous traveling salesman problem. This problem is
so famous it has its own acronym, TSP. The salesman needs to visit each of several cities, and wants to do so
without visiting any city more than once. In the interest of efficiency, the salesman wants to minimize the length
of the trip.
The salesman must visit each city, but he can visit the cities in any order. Finding the shortest route requires
computing the total distance for each permutation of the cities the salesman must visit, and selecting the shortest
one. Actually, since a route in one direction is the same distance as the reverse route, only half of the permutations
of cities need to be calculated. Since the number of permutations of n objects is equal to n-factorial (n! or n ∗
(n−1) ∗ (n−2) ... ∗ 2 ∗ 1), the number of routes to test grows as the factorial of the number of cities, divided by 2.
So the order of growth for the TSP problem is n-factorial; Θ(n!).
Q Classification
k Constant: run time is fixed, and does not depend upon n. Most instructions are executed once,
or only a few times, regardless of the amount of information being processed.
lg n Logarithmic: when n increases, so does run time, but much more slowly than n does. When n
doubles, lg n increases by a constant, but does not double until n increases to n2. Common in
programs which solve large problems by transforming them into smaller problems.
n Linear: run time varies directly with n. Typically, a small amount of processing is done on
each element.
n lg n When n doubles, run time slightly more than doubles. Common in programs which break
a problem down into smaller subproblems, solve them independently, and then combine
solutions.
n2 Quadratic: when n doubles, runtime increases fourfold. Practical only for small problems;
typically the program processes all pairs of input (e.g., in a double nested loop).
2n Exponential: when n doubles, run time squares. This is often the result of a natural, “brute force”
solution. Such problems are not computable in a reasonable time when the problem becomes
at all large.
A factorial order of growth is even more extreme than an exponential order of growth. For example, there
are about 3.6 million permutations of 10 cities, but more than 2 trillion billion permutations of 20. If the computer
can compute the distance for a million permutations a second, the TSP problem will take 1.8 seconds for
10 cities, but tens of thousands of years for 20 cities.
Figure 2-2 shows the rates of growth for lg n, n, n(lg n), n2, 2n, and n!
Table 2.1 summarizes some different orders of growth, and the characteristics of associated algorithms.
No comments:
Post a Comment