gogogo
Syndetics cover image
Image from Syndetics

Data integration blueprint and modeling : techniques for a scalable and sustainable architecture / Anthony David Giordano.

By: Material type: TextTextPublication details: Upper Saddle River, NJ, USA : IBM Press Pearson, c2011.Description: xxi, 384 p. : ill. ; 24 cmISBN:
  • 9780137084937 (hardback : alk. paper)
Subject(s):
Holdings
Item type Current library Call number Status Date due Barcode
Standard Loan Thurles Library Main Collection 005.7322 GIO (Browse shelf(Opens below)) Available 30026000067198

Enhanced descriptions from Syndetics:

Making Data Integration Work: How to Systematically Reduce Cost, Improve Quality, and Enhance Effectiveness

Today's enterprises are investing massive resources in data integration. Many possess thousands of point-to-point data integration applications that are costly, undocumented, and difficult to maintain. Data integration now accounts for a major part of the expense and risk of typical data warehousing and business intelligence projects--and, as businesses increasingly rely on analytics, the need for a blueprint for data integration is increasing now more than ever.

This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right.

Mr. Giordano begins with an overview of the "patterns" of data integration, showing how to build blueprints that smoothly handle both operational and analytic data integration. Next, he walks through the entire project lifecycle, explaining each phase, activity, task, and deliverable through a complete case study. Finally, he shows how to integrate data integration with other information management disciplines, from data governance to metadata. The book's appendices bring together key principles, detailed models, and a complete data integration glossary.

Coverage includes

Implementing repeatable, efficient, and well-documented processes for integrating data Lowering costs and improving quality by eliminating unnecessary or duplicative data integrations Managing the high levels of complexity associated with integrating business and technical data Using intuitive graphical design techniques for more effective process and data integration modeling Building end-to-end data integration applications that bring together many complex data sources

Includes index.

Table of contents provided by Syndetics

  • Preface (p. xix)
  • Acknowledgments (p. xxii)
  • About the Author (p. xxiii)
  • Introduction: Why Is Data Integration Important? (p. 1)
  • Part 1 Overview of Data Integration (p. 5)
  • Chapter 1 Types of Data Integration (p. 7)
  • Data Integration Architectural Patterns (p. 7)
  • Enterprise Application Integration (EAI) (p. 8)
  • Service-Oriented Architecture (SOA) (p. 9)
  • Federation (p. 12)
  • Extract, Transform, Load (ETL) (p. 14)
  • Common Data Integration Functionality (p. 15)
  • Summary (p. 16)
  • End-of-Chapter Questions (p. 16)
  • Chapter 2 An Architecture for Data Integration (p. 19)
  • What Is Reference Architecture? (p. 19)
  • Reference Architecture for Data Integration (p. 20)
  • Objectives of the Data Integration Reference Architecture (p. 21)
  • The Data Subject Area-Based Component Design Approach (p. 22)
  • A Scalable Architecture (p. 24)
  • Purposes of the Data Integration Reference Architecture (p. 26)
  • The Layers of the Data Integration Architecture (p. 26)
  • Extract/Subscribe Processes (p. 27)
  • Data Integration Guiding Principle: "Read Once, Write Many" (p. 28)
  • Data Integration Guiding Principle: "Grab Everything" (p. 28)
  • Initial Staging Landing Zone (p. 29)
  • Data Quality Processes (p. 31)
  • What Is Data Quality? (p. 31)
  • Causes of Poor Data Quality (p. 31)
  • Data Quality Check Points (p. 32)
  • Where to Perform a Data Quality Check (p. 32)
  • Clean Staging Landing Zone (p. 34)
  • Transform Processes (p. 35)
  • Conforming Transform Types (p. 35)
  • Calculations and Splits Transform Types (p. 35)
  • Processing and Enrichment Transform Types (p. 36)
  • Target Filters Transform Types (p. 38)
  • Load-Ready Publish Landing Zone (p. 39)
  • Load/Publish Processes (p. 40)
  • Physical Load Architectures (p. 41)
  • An Overall Data Architecture (p. 41)
  • Summary (p. 42)
  • End-of-Chapter Questions (p. 43)
  • Chapter 3 A Design Technique: Data Integration Modeling (p. 45)
  • The Business Case for a New Design Process (p. 45)
  • Improving the Development Process (p. 47)
  • Leveraging Process Modeling for Data Integration (p. 48)
  • Overview of Data Integration Modeling (p. 48)
  • Modeling to the Data Integration Architecture (p. 48)
  • Data Integration Models within the SDLC (p. 49)
  • Structuring Models on the Reference Architecture (p. 50)
  • Conceptual Data Integration Models (p. 51)
  • Logical Data Integration Models (p. 51)
  • High-Level Logical Data Integration Model (p. 52)
  • Logical Extraction Data Integration Models (p. 52)
  • Logical Data Quality Data Integration Models (p. 53)
  • Logical Transform Data Integration Models (p. 54)
  • Logical Load Data Integration Models (p. 55)
  • Physical Data Integration Models (p. 56)
  • Converting Logical Data Integration Models to Physical Data Integration Models (p. 56)
  • Target-Based Data Integration Design Technique Overview (p. 56)
  • Physical Source System Data Integration Models (p. 57)
  • Physical Common Component Data Integration Models (p. 58)
  • Physical Subject Area Load Data Integration Models (p. 60)
  • Logical Versus Physical Data Integration Models (p. 61)
  • Tools for Developing Data Integration Models (p. 61)
  • Industry-Based Data Integration Models (p. 63)
  • Summary (p. 64)
  • End-of-Chapter Questions (p. 65)
  • Chapter 4 Case Study: Customer Loan Data Warehouse Project (p. 67)
  • Case Study Overview (p. 67)
  • Step 1: Build a Conceptual Data Integration Model (p. 69)
  • Step 2: Build a High-Level Logical Model Data Integration Model (p. 70)
  • Step 3: Build the Logical Extract DI Models (p. 72)
  • Confirm the Subject Area Focus from the Data Mapping Document (p. 73)
  • Review Whether the Existing Data Integration Environment Can Fulfill the Requirements (p. 74)
  • Determine the Business Extraction Rules (p. 74)
  • Control File Check Processing (p. 74)
  • Complete the Logical Extract Data Integration Models (p. 74)
  • Final Thoughts on Designing a Logical Extract DI Model (p. 76)
  • Step 4: Define a Logical Data Quality DI Model (p. 76)
  • Design a Logical Data Quality Data Integration Model (p. 77)
  • Identify Technical and Business Data Quality Criteria (p. 77)
  • Determine Absolute and Optional Data Quality Criteria (p. 80)
  • Step 5: Define the Logical Transform DI Model (p. 81)
  • Step 6: Define the Logical Load DI Model (p. 85)
  • Step 7: Determine the Physicalization Strategy (p. 87)
  • Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models (p. 88)
  • Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models (p. 90)
  • Step 10: Package the Enterprise Business Rules into Common Component Models (p. 92)
  • Step 11: Sequence the Physical DI Models (p. 94)
  • Summary (p. 95)
  • Part 2 The Data Integration Systems Development Life Cycle (p. 97)
  • Chapter 5 Data Integration Analysis (p. 99)
  • Analyzing Data Integration Requirements (p. 100)
  • Building a Conceptual Data Integration Model (p. 101)
  • Key Conceptual Data Integration Modeling Task Steps (p. 102)
  • Why Is Source System Data Discovery So Difficult? (p. 103)
  • Performing Source System Data Profiling (p. 104)
  • Overview of Data Profiling (p. 104)
  • Key Source System Data Profiling Task Steps (p. 105)
  • Reviewing/Assessing Source Data Quality (p. 109)
  • Validation Checks to Assess the Data (p. 109)
  • Key Review/Assess Source Data Quality Task Steps (p. 111)
  • Performing Source\Target Data Mappings (p. 111)
  • Overview of Data Mapping (p. 112)
  • Types of Data Mapping (p. 113)
  • Key Source\Target Data Mapping Task Steps (p. 115)
  • Summary (p. 116)
  • End-of-Chapter Questions (p. 116)
  • Chapter 6 Data Integration Analysis Case Study 117
  • Case Study Overview (p. 117)
  • Envisioned Wheeler Data Warehouse Environment (p. 118)
  • Aggregations in a Data Warehouse Environment (p. 120)
  • Data Integration Analysis Phase (p. 123)
  • Step 1: Build a Conceptual Data Integration Model (p. 123)
  • Step 2: Perform Source System Data Profiling (p. 124)
  • Step 3: Review/Assess Source Data Quality (p. 130)
  • Step 4: Perform Source\Target Data Mappings (p. 135)
  • Summary (p. 145)
  • Chapter 7 Data Integration Logical Design (p. 147)
  • Determining High-Level Data Volumetrics (p. 147)
  • Extract Sizing (p. 148)
  • Disk Space Sizing (p. 148)
  • File Size Impacts Component Design (p. 150)
  • Key Data Integration Volumetrics Task Steps (p. 150)
  • Establishing a Data Integration Architecture (p. 151)
  • Identifying Data Quality Criteria (p. 154)
  • Examples of Data Quality Criteria from a Target (p. 155)
  • Key Data Quality Criteria Identification Task Steps (p. 155)
  • Creating Logical Data Integration Models (p. 156)
  • Key Logical Data Integration Model Task Steps (p. 157)
  • Defining One-Time Data Conversion Load Logical Design (p. 163)
  • Designing a History Conversion (p. 164)
  • One-Time History Data Conversion Task Steps (p. 166)
  • Summary (p. 166)
  • End-of-Chapter Questions (p. 167)
  • Chapter 8 Data Integration Logical Design Case Study 169
  • Step 1: Determine High-Level Data Volumetrics (p. 169)
  • Step 2: Establish the Data Integration Architecture (p. 174)
  • Step 3: Identify Data Quality Criteria (p. 177)
  • Step 4: Create Logical Data Integration Models (p. 180)
  • Define the High-Level Logical Data Integration Model (p. 181)
  • Define the Logical Extraction Data Integration Model (p. 183)
  • Define the Logical Data Quality Data Integration Model (p. 187)
  • Define Logical Transform Data Integration Model (p. 190)
  • Define Logical Load Data Integration Model (p. 191)
  • Define Logical Data Mart Data Integration Model (p. 192)
  • Develop the History Conversion Design (p. 195)
  • Summary (p. 198)
  • Chapter 9 Data Integration Physical Design (p. 199)
  • Creating Component-Based Physical Designs (p. 200)
  • Reviewing the Rationale for a Component-Based Design (p. 200)
  • Modularity Design Principles (p. 200)
  • Key Component-Based Physical Designs Creation Task Steps (p. 201)
  • Preparing the DI Development Environment (p. 201)
  • Key Data Integration Development Environment Preparation Task Steps (p. 202)
  • Creating Physical Data Integration Models (p. 203)
  • Point-to-Point Application Development--The Evolution of Data Integration Development (p. 203)
  • The High-Level Logical Data Integration Model in Physical Design (p. 205)
  • Design Physical Common Components Data Integration Models (p. 206)
  • Design Physical Source System Extract Data Integration Models (p. 208)
  • Design Physical Subject Area Load Data Integration Models (p. 209)
  • Designing Parallelism into the Data Integration Models (p. 210)
  • Types of Data Integration Parallel Processing (p. 211)
  • Other Parallel Processing Design Considerations (p. 214)
  • Parallel Processing Pitfalls (p. 215)
  • Key Parallelism Design Task Steps (p. 216)
  • Designing Change Data Capture (p. 216)
  • Append Change Data Capture Design Complexities (p. 217)
  • Key Change Data Capture Design Task Steps (p. 219)
  • Finalizing the History Conversion Design (p. 220)
  • From Hypothesis to Fact (p. 220)
  • Finalize History Data Conversion Design Task Steps (p. 220)
  • Defining Data Integration Operational Requirements (p. 221)
  • Determining a Job Schedule for the Data Integration Jobs (p. 221)
  • Determining a Production Support Team (p. 222)
  • Key Data Integration Operational Requirements Task Steps (p. 224)
  • Designing Data Integration Components for SOA (p. 225)
  • Leveraging Traditional Data Integration Processes as SOA Services (p. 225)
  • Appropriate Data Integration Job Types (p. 227)
  • Key Data Integration Design for SOA Task Steps (p. 227)
  • Summary (p. 228)
  • End-of-Chapter Questions (p. 228)
  • Chapter 10 Data Integration Physical Design Case Study 229
  • Step 1: Create Physical Data Integration Models (p. 229)
  • Instantiating the Logical Data Integration Models into a Data Integration Package (p. 229)
  • Step 2: Find Opportunities to Tune through Parallel Processing (p. 237)
  • Step 3: Complete Wheeler History Conversion Design (p. 238)
  • Step 4: Define Data Integration Operational Requirements (p. 239)
  • Developing a Job Schedule for Wheeler (p. 240)
  • The Wheeler Monthly Job Schedule (p. 240)
  • The Wheeler Monthly Job Flow (p. 240)
  • Process Step 1: Preparation for the EDW Load Processing (p. 241)
  • Process Step 2: Source System to Subject Area File Processing (p. 242)
  • Process Step 3: Subject Area Files to EDW Load Processing (p. 245)
  • Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing (p. 248)
  • Production Support Staffing (p. 248)
  • Summary (p. 249)
  • Chapter 11 Data Integration Development Cycle (p. 251)
  • Performing General Data Integration Development Activities (p. 253)
  • Data Integration Development Standards (p. 253)
  • Error-Handling Requirements (p. 255)
  • Naming Standards (p. 255)
  • Key General Development Task Steps (p. 256)
  • Prototyping a Set of Data Integration Functionality (p. 257)
  • The Rationale for Prototyping (p. 257)
  • Benefits of Prototyping (p. 257)
  • Prototyping Example (p. 258)
  • Key Data Integration Prototyping Task Steps (p. 261)
  • Completing/Extending Data Integration Job Code (p. 262)
  • Complete/Extend Common Component Data Integration Jobs (p. 263)
  • Complete/Extend the Source System Extract Data Integration Jobs (p. 264)
  • Complete/Extend the Subject Area Load Data Integration Jobs (p. 265)
  • Performing Data Integration Testing (p. 266)
  • Data Warehousing Testing Overview (p. 267)
  • Types of Data Warehousing Testing (p. 268)
  • Perform Data Warehouse Unit Testing (p. 269)
  • Perform Data Warehouse Integration Testing (p. 272)
  • Perform Data Warehouse System and Performance Testing (p. 273)
  • Perform Data Warehouse User Acceptance Testing (p. 274)
  • The Role of Configuration Management in Data Integration (p. 275)
  • What Is Configuration Management? (p. 276)
  • Data Integration Version Control (p. 277)
  • Data Integration Software Promotion Life Cycle (p. 277)
  • Summary (p. 277)
  • End-of-Chapter Questions (p. 278)
  • Chapter 12 Data Integration Development Cycle Case Study 279
  • Step 1: Prototype the Common Customer Key (p. 279)
  • Step 2: Develop User Test Cases (p. 283)
  • Domestic OM Source System Extract Job Unit Test Case (p. 284)
  • Summary (p. 287)
  • Part 3 Data Integration with Other Information Management Disciplines (p. 289)
  • Chapter 13 Data Integration and Data Governance (p. 291)
  • What Is Data Governance? (p. 292)
  • Why Is Data Governance Important? (p. 294)
  • Components of Data Governance (p. 295)
  • Foundational Data Governance Processes (p. 295)
  • Data Governance Organizational Structure (p. 298)
  • Data Stewardship Processes (p. 304)
  • Data Governance Functions in Data Warehousing (p. 305)
  • Compliance in Data Governance (p. 309)
  • Data Governance Change Management (p. 310)
  • Summary (p. 311)
  • End-of-Chapter Questions (p. 311)
  • Chapter 14 Metadata (p. 313)
  • What Is Metadata? (p. 313)
  • The Role of Metadata in Data Integration (p. 314)
  • Categories of Metadata (p. 314)
  • Business Metadata (p. 315)
  • Structural Metadata (p. 315)
  • Navigational Metadata (p. 317)
  • Analytic Metadata (p. 318)
  • Operational Metadata (p. 319)
  • Metadata as Part of a Reference Architecture (p. 319)
  • Metadata Users (p. 320)
  • Managing Metadata (p. 321)
  • The Importance of Metadata Management in Data Governance (p. 321)
  • Metadata Environment Current State (p. 322)
  • Metadata Management Plan (p. 322)
  • Metadata Management Life Cycle (p. 324)
  • Summary (p. 327)
  • End-of-Chapter Questions (p. 327)
  • Chapter 15 Data Quality (p. 329)
  • The Data Quality Framework (p. 330)
  • Key Data Quality Elements (p. 331)
  • The Technical Data Quality Dimension (p. 332)
  • The Business-Process Data Quality Dimension (p. 333)
  • Types of Data Quality Processes (p. 334)
  • The Data Quality Life Cycle (p. 334)
  • The Define Phase (p. 336)
  • Defining the Data Quality Scope (p. 336)
  • Identifying/Defining the Data Quality Elements (p. 336)
  • Developing Preventive Data Quality Processes (p. 337)
  • The Audit Phase (p. 345)
  • Developing a Data Quality Measurement Process (p. 346)
  • Developing Data Quality Reports (p. 348)
  • Auditing Data Quality by LOB or Subject Area (p. 350)
  • The Renovate Phase (p. 351)
  • Data Quality Assessment and Remediation Projects (p. 352)
  • Data Quality SWAT Renovation Projects (p. 352)
  • Data Quality Programs (p. 353)
  • Final Thoughts on Data Quality (p. 353)
  • Summary (p. 353)
  • End-of-Chapter Questions (p. 354)
  • Appendix A Exercise Answers (p. 355)
  • Appendix B Data Integration Guiding Principles (p. 369)
  • Write Once, Read Many (p. 369)
  • Grab Everything (p. 369)
  • Data Quality before Transforms (p. 369)
  • Transformation Componentization (p. 370)
  • Where to Perform Aggregations and Calculations (p. 370)
  • Data Integration Environment Volumetric Sizing (p. 370)
  • Subject Area Volumetric Sizing (p. 370)
  • Appendix C Glossary (p. 371)
  • Appendix D Case Study Models
  • Appendix D Is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book
  • Index (p. 375)

Author notes provided by Syndetics

Anthony Giordano is a partner in IBM's Business Analytics and Optimization Consulting Practice and currently leads the Enterprise Information Management Service Line that focuses on data modeling, data integration, master data management, and data governance. He has more than 20 years of experience in the Information Technology field with a focus in the areas of business intelligence, data warehousing, and Information Management. In his spare time, he has taught classes in data warehousing and project management at the undergraduate and graduate levels at several local colleges and universities.

Powered by Koha