Attack of the Rampaging Data
By Bill Roberts
Advanced Data Access Architecture
Hard Facts About Data Growth
Enterprise MIS must handle ever-increasing amounts of data. Here are
some strategies and case studies for dealing with this issue.
To remain competitive, any business must understand customers. Once
you know the detailed buying patterns and demographics of customers, you
can launch sales and marketing efforts into narrowly defined markets. Today
those markets may be so narrow that one large bank refers to them as market
segments of one.
To segment customers into slices that small, companies need data--ever larger
amounts of data. In fact, corporations are acquiring data so fast that one
expert estimates that mission-critical data has doubled every year for the
past five years. The desires to know customers intimately and to target
markets accurately are good reasons why enterprise data is overrunning the
corporate landscape. Additional causes may be technological, such as distributed
systems and bigger applications, or social, such as industry deregulation
or litigation. Whatever the reason, it seems that executives and managers
can't get enough data and tools to analyze it.
There's good news, though. Storage has never been cheaper, strategies for
managing data are reasonably well tested and the tools for storing, maintaining
and delivering data to end users in useful ways are proliferating. Most
encouraging are reports from the front. As high as some of the hurdles are,
IS professionals who accept the facts of life are able to provide business
colleagues with information that contributes to revenue-generating efforts.
How fast is enterprise data growing? Hard figures aren't available, but
anecdotal evidence is compelling. A manager of decision support systems
(DSS) at a $1 billion laboratory instruments maker says an executive information
system (EIS) that he built a mere four years ago at 10MB has grown above
25GB, and it's just one of four EISs the company has.
Richard Finkelstein, president of Performance Computing, a client/server
and database consulting firm in Chicago, has been building databases for
20 years. "My last big system on a mainframe 10 years ago was one gigabyte
and considered huge," he says. "Now you'd say a terabyte is large.
That's a thousand-fold increase."
George Ferguson, a product manager at Hewlett-Packard in Palo Alto, CA,
began working in HP's data warehousing business four years ago, when most
customers were asking for 10GB systems. "Today that's 100 to 150GB
at least, with about 25 percent of new data warehouse business requesting
more than 500GB," he says.
Not everyone requires an industrial-strength database of those dimensions.
"You'll hear people talking about terabytes," says Randy Betancourt,
program manager for data warehousing at SAS Institute in Cary, NC, "but,
with the exception of the Fortune 1000, the typical size of a data
warehouse is about 50GB."
Even the Fortune 1000 includes modest efforts. Bernard Seban, database
manager for computer systems vendor Data General Corp. in Westboro, MA,
has only 4GB in a data warehouse dedicated to online analytical processing
(OLAP). However, this is not a comprehensive repository. "We have only
four years of history in there," he says.
Getting to Know You
As noted, the desire to know customers is what's driving data growth. Today,
point-of-sale retail systems collect every piece of accessible data about
a customer. Computers store it. Marketing and sales staffs demand access
The goal is micro-marketing: getting the right promotions to the
right people at the right time. "If you're spending $50 million on
a direct-mail campaign and you can identify the 10 regions that will give
you the biggest payback, you can target it that way," says Michael
Guidry, director of corporate applications development for Management Science
Associates, a market research and applied technology firm in Pittsburgh
that works with various vertical markets, including consumer packaged goods
companies. In that niche, Guidry says, data is flowing into the enterprise
from points of sale; store-level marketing studies that rival the U.S. Census
in detail and breadth; syndicated information; and other sources.
It has gotten to the point that, in retailing, the sale itself may no longer
be the most important thing. "Sometimes the most valuable commodity
that comes out of the exchange is the demographics you capture from it,"
says Betancourt of SAS. "There's a keen desire to capture as much information
as possible about that transaction so you can go back and sell to [that
The demand for data seems endless. "You no longer just need to know
your customers, but the marketplace and the competitors. And the competitors
come from different places," says Terril Retter, a senior analyst at
consultant Price Waterhouse in Menlo Park, CA. "The kinds of questions
CEOs ask are expanding dramatically." This demand has trickled down,
too. Vice presidents, department managers and many others want access to
the corporate data stores.
Despite all this activity, even more data is out there. Jim Johnson, chairman
of the Standish Group International, a research firm in Dennis, MA, believes
business has barely made a dent in collecting data about the customer. He
recalls a client, a Las Vegas casino, which knew practically everything
about its "high rollers" but almost nothing about the next tier
of customers. Businesses can no longer afford this sort of ignorance.
The Big Picture
A host of other factors also contribute to data growth. There's the technology
itself. Relational databases have made information more useful; graphical
user interfaces have made it easier to use; and today's end-user tools make
analysis efficient and thorough. As the technology gets cheaper and easier
to use, even more data is collected and analyzed. Existing applications
in some cases are doubling in size, and new apps are needed quickly.
Consider the Federal Express customer service application. It used to be
you called up and got one of 50 customer service reps who would report the
progress of your package. Then the company extended the application to thousands
of drivers with hand-held devices. Next it went to top customers, and there
were 50,000 users. Now it's on the Internet, accessible by millions of customers.
Each iteration requires more data to make the application work.
What we might call meta-factors also cause data to proliferate. David Flaxman,
a partner in the Radnor, PA, office of consultant KPMG Peat Marwick, believes
that most industries run in five-year cycles. When the industry moves into
a new cycle, it ushers in a whole new set of data requirements. He points
to watershed changes in financial services (Wall Street in the 1980s), retailing
(thanks to WalMart), telecommunications (the breakup of the Bell system)
and commercial banking (in the present day) as examples that demanded new
and better data throughout the industry.
Johnson of the Standish Group believes deregulation in certain industries
is a key factor. "The more things get deregulated, the more information
government requires about the entity. As soon as the airline industry was
deregulated, the government wanted more information on things like maintenance
and on-time records." A twin cause is litigation, he notes; more information
is being archived for much longer periods on a CYA basis.
Not least of all, corporate reengineering or downsizing, and the distributed
systems they call into play, are culprits. A downsized company needs to
empower the few to do the work once handled by many. IT and better and more
data are necessary to do that. Data General's Seban says a driving force
in his company's data warehouse strategy was the downsizing of the workforce
from more than 18,000 five years ago to 5,000 today. And distributed systems
themselves result in small individual or workgroup databases that eventually
must be replicated to a wider audience, producing duplicate databases in
various locations on the corporate network.
Controlling the Uncontrollable
IS professionals must bear the brunt of managing this data rampage: storing
data, moving it into relational databases or warehouses where it can be
accessed by end users, cleaning and maintaining data, acquiring or building
applications for the end users, and training support staff and end users.
"How do you control something that is out of control?" asks Katherine
Hammer, president and CEO of Evolutionary Technologies in Austin, TX, which
develops software tools for data migration and maintenance. "The IS
pro needs to accept that he's running a distributed, heterogeneous space
and that he's responsible for it." For example, IS has to determine
what is the database of record for each different application and control
what information goes where and to whom.
This, of course, is easier said than done. The headaches are enormous. Sterling
Makishima, HP's data warehouse manager for global customer support, says
his biggest problems are maintaining data integrity, assuring the timeliness
of the data being demanded by the end user and supporting a worldwide data
IS people are increasingly turning to some version of the data warehouse
to manage this flow. "To improve the look at traditional data, you
put it in a data warehouse," says Price Waterhouse's Retter. "You
pull it from OLTP, rationalize it and put it on an OLAP for EIS or DDS."
This also is not simple to do. "Most IT pros are struggling to integrate
data across stovepipe legacy apps," he says.
Whatever the problems for IS, management looks for the advantages. "Data
warehouses have a good payoff based on productivity," says Paul Cubbage,
a senior analyst at Dataquest, a research firm in San Jose, CA. "You
can get metrics that show how much money you're going to save by going with
the data warehouse concept."
In a growing number of cases, the business unit--not IS--is proposing and
paying for data solutions such as warehouses. In Retter's view, reengineering
changed the perspective from using technology for business functions (vertical)
to using it across business processes (horizontal). But vertical data structures
don't match horizontal data needs. "For 30 years the heads of data
processing departments made technology sound mystical," he says. "Now
that technology is more important to the health of the enterprise, business
has to pay attention."
This restructuring makes it imperative that IS pros and business users talk
to each other. "Business units are driving the data warehouse phenomenon,"
says Betancourt of SAS. "IT has to spend a lot of energy transferring
the data logic. To do that properly, they have to go out to the business
users and ask, 'How do you want it transformed? What are the business rules?'"
To provide insight into how leaders in technology implementation are solving
their data growth and access problems, we present the following somewhat
related, yet uniquely defined case studies.
HP Worldwide Customer Service Organization
Asking the Unaskable
Ferguson, the product manager at HP, points out that decision support systems
have been around for years, but answering previously unasked questions about
customers was a painful process. Because it would take IS two to three months
to do a new report, "very few questions were asked," he says.
Today, thanks to several data warehouses and tools designed for end users,
Ferguson and others can get detailed answers quickly. "We can look
at our customers more broadly and see the business we're doing with them,"
he says. "We ask a lot more questions."
That's what Makishima, the data warehouse manager, intended when his team
began building the Worldwide Customer Service Organization warehouse two
years ago. Customer support was once merely reactive, responding when the
customer called. "Now when the customer's contract expires, we can
go out and inform them of new programs and see if they are interested,"
Makishima says. "It gives us a competitive edge."
Customer service used to be handled by a mainframe at corporate headquarters.
With users demanding more detailed information, the system took too long
to load data and too long to respond to queries, and it cost too much to
support. For those reasons and obvious technological preferences, HP decided
to migrate off mainframes.
After investigating several products, Makishima's team chose software from
Red Brick Systems of Los Gatos, CA, to run on HP 9000 servers. Makishima
says IS wanted to get out of the ad hoc reporting role and provide data
so users could select what they wanted. Red Brick's software loads millions
of rows of data speedily and supports new data access tools.
There was strong management sponsorship. Marketing and finance lent people
full-time to the project. "We had a good idea how users would use it,
but when we modeled the Red Brick system we were able to check with them,"
Makishima explains. When the team began to build the warehouse, the design
called for 20GB. It's now at 100GB with room to grow to 200GB. "Our
users have realized the importance of historical data," he says.
The biggest challenges appear to be assuring the integrity and timeliness
of data, and providing support to users worldwide. "At first we had
a handful of users. In a little over a year we're at more than 400 and growing
at 18 end users per month," Makishima says. "Originally the end
users were just in customer support. When the word got out we had all this
consolidated information, other people started using it." The demand
is more than Makishima's staff of six can support. HP is adding individual
business information managers around the globe, under the budgets of the
workgroups they support.
Another problem was that management wanted the job done in three months.
"We got it up in about four or five. That was pretty good, because
warehouses were taking more than a year," Makishima recalls. "That's
one reason we don't always use HP products. We don't have time to reinvent
the wheel, so we leverage partnerships with other companies."
Out with the Greenbars
In the beginning there were greenbars: those mainframe-generated reports
printed on reams of green-and-white-lined paper. The greenbars begat a DOS-based
EIS. EIS begat a Unix-based relational database, which begat four regional
data warehouses. Now Perkin-Elmer Corp. is looking at a World Wide Web solution
to give one-fourth of its 5,500 employees access to rich data stores.
In 1990, an executive with technical savvy was put in charge of the product
group that accounted for half of the $1 billion revenues at the Wilton,
CT-based laboratory instruments manufacturer. He wanted to know what products
were ordered where. But all data resided on a mainframe; greenbars came
out monthly. Daily greenbars could be cranked out only painfully.
John Stoveken, a consultant, led a team that took a couple of years' worth
of data off the mainframe and built a DOS-based EIS. "There were probably
10,000 rows of information," recalls Stoveken, who today is the company's
DSS manager. "It was world order information, by country and by product.
It started off under 10MB. It was accessed by a small group and grew to
The users demanded more. The system grew to 800 product groups. Next, users
asked for information on 15,000 to 20,000 active parts. When they asked
for the entire parts list of 130,000, IS saw that it had reached the limits
of DOS. About three years ago, a decision was made to move to a data warehouse.
A first step was selecting the Oracle RDBMS on a Sun Sparc 1000. "We
started to grow the data systems and to look for tools to manage the data
we were bringing in," Stoveken says.
During the pilot test, Stoveken discovered CrossTarget, a client/server
multidimensional analysis and reporting system from Dimensional Insight
of Burlington, MA. The CrossTarget engine uses array indexing to transform
data from spreadsheets, SQL databases and legacy systems into multidimensional
models. The client, called Diver, lets users view data as they wish, perform
analysis and "drill down" into subsets. The models, of 250MB or
less, are built from data stored on Oracle and go out over the LAN to be
used by anyone who has the client software. This way, Stoveken's 10-person
staff keeps control of the data in the warehouses.
U.S. users were happy. "Then people around the world started asking
for more detail," Stoveken says. "We said it's time to do data
warehousing on a global basis." They chose four HP 9000s running Oracle
at manufacturing sites in California, Connecticut, Germany and the United
Kingdom. "When you have 40 different order-entry systems around the
world, there are coding inconsistencies," Stoveken explains. "So
we use the data warehouse as a central point where we can clean data, take
out the coding problems and make sure it's a well-defined set."
What's next? With a goal of giving as many as 1,500 employees access, Stoveken
is looking at CrossTarget's new Hypertext Markup Language (HTML) version
and considering using it over the company Web. "The company runs a
WAN that stretches across the U.S. to Germany to Japan. Through our intranet,
we can open up some of these larger models to the global users."
CoreStates Financial Corp.
Getting Intimate with Data
If a customer doesn't answer the letter that offered a special interest
rate on a new credit card, CoreStates Financial Corp. wants to know why.
That level of data analysis requires a staff that is more intimate with
the data than are typical IS staffers.
At CoreStates, a Philadelphia-based bank holding company with $30 billion
in assets, the retail credit risk technology unit not only analyzes the
data, it manages it, develops its own applications and is building a data
warehouse. "Our group is comprised of analytical people, used to dealing
with large quantities of data and turning it into competitive advantage,"
says Jeffrey Oulton, vice president of retail credit risk technology. "They're
not reading batch runs and turning them into spreadsheets. They're merging
and analyzing data for decision-makers."
Within the 20-person team are people dedicated to building a data warehouse.
"They like data and like to code, but they understand business,"
says Oulton. "They're our liaison to the technology side of the house;
their mission is data acquisition and making it available to others."
At the heart of credit risk analysis is scoring. Within the group, one team
works on the front end of the business--determining whether to extend credit.
Another concentrates on the back end, traditionally known as collections
but increasingly a source of new revenue, including offering bigger lines
of credit. These two groups were separate and worked off different sets
of data. As a result of a reengineering effort last year, they combined.
"The piece that is completely new is the data warehousing," says
They're migrating data off a mainframe with the DB2 RDBMS to an IBM RS/6000
server running Oracle. The LAN is Novell NetWare. "The entire corporation
is also building its own warehouse. There will be links between them,"
says Oulton. "We're ahead of the rest of the corporation. The corporate
IT guys are not as comfortable with the Unix environment."
Oulton praises the IS side of the house. "They understand that some
end users understand the data better than they do. IT is very supportive
of the project. We play on their expertise. Data modeling, loading in the
DB2 environment, the decision whether to move to Oracle and how--that's
Using tools and consultants from SAS Institute, Oulton's team has built
application suites for three types of users. The EIS is a highly summarized
GUI with some drill-down capabilities for about 20 executives. The power-user
version satisfies the people in his unit. And a middle-of-the-road suite
serves about 200 others who need access to data. "We've been able with
one tool to address three levels of user," he says. "That's important
from cost justification and support standpoints."
Bill Roberts is a free-lance writer who covers business,
technology and management issues. He can be reached at firstname.lastname@example.org.