Usage-Driven Database Design - From Logical Data Modeling through Physical Schema Definition
von: George Tillmann
Apress, 2017
ISBN: 9781484227220
Sprache: Englisch
379 Seiten, Download: 7945 KB
Format: PDF, auch als Online-Lesen
Mehr zum Inhalt
Usage-Driven Database Design - From Logical Data Modeling through Physical Schema Definition
Contents at a Glance | 5 | ||
Contents | 7 | ||
About the Author | 18 | ||
Preface | 19 | ||
Part I: Introduction | 23 | ||
Chapter 1: Introduction to Usage-Driven Database Design | 24 | ||
Database Design Principle 1: Separation Principle | 26 | ||
Database Design Principle 2: Distinction Principle | 27 | ||
The Difference Between Separation and Distinction | 29 | ||
Database Design Principle 3: Convergence Principle | 29 | ||
The Separation, Distinction, and Convergence Principles | 30 | ||
Database Design Principle 4: Minimal Regression Principle | 30 | ||
Usage-Driven Database Design | 30 | ||
Logical Data Modeling | 31 | ||
Physical Schema Definition | 32 | ||
The Terminology Trap | 32 | ||
Notes | 33 | ||
Part II: Logical Data Modeling | 34 | ||
Chapter 2: The E-R Approach | 35 | ||
A Little Data Modeling History | 37 | ||
Some Important Definitions | 38 | ||
Logical Data Modeling Objects | 39 | ||
Entities | 39 | ||
Type-Instance Distinction | 40 | ||
Relationships | 40 | ||
Attributes | 41 | ||
Notes | 42 | ||
Chapter 3: More About the E-R Approach | 43 | ||
More About Relationships | 43 | ||
Membership Class | 43 | ||
Cardinality | 44 | ||
Modality | 45 | ||
Degree | 47 | ||
Binary Relationship | 47 | ||
N-ary Relationships | 48 | ||
Unary or Recursive Relationships | 48 | ||
Relationship Constraints | 49 | ||
Inclusion | 49 | ||
Exclusion | 50 | ||
Conjunction | 50 | ||
Simple Conjunction | 50 | ||
Conditional Conjunction | 51 | ||
Recursive Modality Constraints | 52 | ||
More About Entities | 55 | ||
Attributive Entity | 55 | ||
Associative Entities | 56 | ||
Supertype and Subtype Entities (Generalization and Specialization) | 56 | ||
More About Attributes | 58 | ||
Attribute Domain | 58 | ||
Attribute Source: Primitive and Derived | 59 | ||
Attribute Descriptor and Unique Identifier | 59 | ||
Compound or Concatenated Unique Identifiers | 60 | ||
Attribute Complexity: Simple and Group | 60 | ||
Attribute Valuation: Single Value and Multivalue | 61 | ||
Attribute Complexity and Valuation | 61 | ||
Chapter 4: Building the Logical Data Model | 64 | ||
The Interview Process | 65 | ||
Gather Information and Review | 66 | ||
1. Identify the Users Who Are Authorities or Experts on the Subject | 66 | ||
2. Meet and Interview the Experts and Identify the Subject (Application) Entities | 66 | ||
Preparation | 66 | ||
The First Interview | 67 | ||
3. Identify Relationships Between the Entities | 67 | ||
4. Identify the Properties or Attributes of the Entities and Relationships | 67 | ||
Analyze Information | 67 | ||
Construct Model | 68 | ||
Repeat as Necessary | 68 | ||
Making Sense of the Interview | 68 | ||
Modeling Rules | 70 | ||
Verifying What You Have Heard | 72 | ||
Immediate Interview Feedback | 72 | ||
Formal Walk-Throughs | 72 | ||
Increasing E-R Diagram Comprehension | 74 | ||
Subject Areas | 74 | ||
Entity Fragments | 75 | ||
Neighborhood Diagrams | 76 | ||
Relationship Bridges and Stubs | 77 | ||
Some Model Building Best Practices | 78 | ||
Getting Started | 78 | ||
Don’t Lose Control of the Project to Users | 79 | ||
Don’t Lose Control of the Project to Technical Staff | 79 | ||
Don’t Become Dependent on Tools or Techniques | 80 | ||
Don’t Get Bogged Down in Endless Analysis | 80 | ||
The Players…and the Rules of Engagement | 81 | ||
Deliverables | 82 | ||
Examples of Deliverables | 83 | ||
Sample Data Dictionary, Data Object Definitions | 84 | ||
Notes | 86 | ||
Chapter 5: LDM Best Practices | 87 | ||
Abbreviations | 88 | ||
Almost Unique Identifiers | 89 | ||
Clarity | 90 | ||
Compound Unique Identifiers | 90 | ||
Conceptual Integrity | 91 | ||
Conjunctive Relationships | 93 | ||
Duplicate Super-Subtypes “Type” Data | 93 | ||
Exclusive and Nonexclusive Generalization | 94 | ||
Required and Nonrequired Participation | 96 | ||
Exclusive Relationships | 96 | ||
Group Attributes | 97 | ||
Level of Abstraction | 97 | ||
Many-to-Many Relationships | 98 | ||
N-ary Relationships | 100 | ||
N-ary Relationships and Membership Class | 101 | ||
Naming Objects | 104 | ||
Multiple Names | 104 | ||
Naming Conventions | 105 | ||
Name Uniqueness | 105 | ||
Naming Convention Goals | 106 | ||
Null Attributes | 109 | ||
There Be Blanks in Them Thar Nulls | 109 | ||
Optional Relationships (Optional-Optional Relationships) | 110 | ||
Subject Areas | 110 | ||
Supertypes and Subtypes | 111 | ||
Unique Identifiers | 113 | ||
Note | 115 | ||
Chapter 6: LDM Pitfalls | 116 | ||
Circular Relationships | 116 | ||
Data Values | 117 | ||
Data Value–Differentiated Entities and Attributes | 118 | ||
Derived Data | 119 | ||
Three Poor Arguments Against Modeling Derived Data | 119 | ||
Derived Data as Process | 120 | ||
Derived Data and Physical Database Design | 121 | ||
Discrete Attributes | 122 | ||
Embedded Attributes | 123 | ||
Uniqueness | 123 | ||
Group Attributes | 123 | ||
The Problem with Embedded Attributes | 123 | ||
The Solution | 124 | ||
The Moral of the Story | 124 | ||
Entity Fragmentation | 124 | ||
Foreign Keys | 125 | ||
Junction Entities | 126 | ||
Normalization | 126 | ||
Presentation Data | 127 | ||
Primary Keys | 127 | ||
Process Data | 128 | ||
Repeating Groups | 129 | ||
Multivalue Attribute | 129 | ||
Group Attribute | 130 | ||
Single-Attribute Entities | 130 | ||
Code | 131 | ||
Multivalue Attribute (Repeating Group) | 131 | ||
Associative Entities | 131 | ||
Substitution Data | 132 | ||
Substitution Tables | 132 | ||
Transient Data | 132 | ||
Location-Dependent Data | 133 | ||
Chapter 7: LDM Perils to Watch For | 135 | ||
Associatives Related to Other Associatives | 135 | ||
Diagrammable Objects | 136 | ||
Disassociated Entity Clusters (“Islands”) | 137 | ||
Duplicate Unique Identifiers | 138 | ||
One Entity, Two or More Identifiers | 138 | ||
One Identifier, Two or More Entities | 138 | ||
Multiple Relationships | 139 | ||
One Relationship, Multiple Views | 139 | ||
Multiple Different but Similar Relationships | 139 | ||
One-of-a-Kind (OOAK) Entities | 140 | ||
One-to-One Relationships | 140 | ||
Rare Entity Relationships | 141 | ||
Recursive Modality Constraints | 142 | ||
Updating the Constraints | 143 | ||
Spiderwebs | 144 | ||
Too Many Blanks or Nulls | 145 | ||
Too Many Recursives | 146 | ||
Next U3D Phase: Physical Schema Definition | 147 | ||
Notes | 147 | ||
Part III: Physical Schema Definition | 148 | ||
Chapter 8: Introduction to Physical Database Design | 149 | ||
A Short Incondite History of Automated Information Management (or, a Sequential Look at Random Access) | 150 | ||
Information Management Era 1: Sequential Processing | 150 | ||
Information Management Era 2: The First Random Access DBMS | 151 | ||
A Small Digression: A Couple of Words About Database Access | 156 | ||
Hashing | 156 | ||
Inverted Indices | 158 | ||
Database Pages | 159 | ||
B-Trees | 160 | ||
Bitmaps | 162 | ||
Associative Arrays | 163 | ||
Information Management Era 3: Inverted File Systems | 164 | ||
Information Management Era 4: The Age of Relational | 164 | ||
Problems with Relational | 166 | ||
First—Performance Issues | 166 | ||
Second—Not So Simple Simplicity | 166 | ||
Data Types | 166 | ||
Procedural Code | 166 | ||
Groups | 167 | ||
Third—Communication and Language | 167 | ||
Fourth—Relational: Theory or DBMS? | 169 | ||
Fifth—Where Are You Relational Model? | 170 | ||
Just Because It Has Failings Doesn’t Mean It’s a Failure | 170 | ||
Information Management Era 5: Object Technology | 171 | ||
Object-Oriented Programming Led to Object-Oriented Analysis and Design, Which Eventually Led to the Object-Oriented Database Management Systems (OODBMSs) | 171 | ||
A Small Digression (Again): The ACID Test | 172 | ||
Information Management Era 6: NoSQL | 173 | ||
Key-Value | 173 | ||
Graph | 174 | ||
Document Management | 174 | ||
Multimodal | 174 | ||
Is That an ACID or a BASE? | 174 | ||
And the Winner Is… | 176 | ||
What’s to Come | 177 | ||
References | 177 | ||
Notes | 177 | ||
Chapter 9: Introduction to Physical Schema Definition | 179 | ||
Usage-Driven Database Design: Physical Schema Definition | 181 | ||
Step 1: Transformation | 182 | ||
Task 1.1: Translation | 183 | ||
Task 1.2: Expansion | 184 | ||
Step 2: Utilization | 185 | ||
Task 2.1: Usage Analysis | 185 | ||
Task 2.2: Path Rationalization | 187 | ||
Step 3: Formalization | 188 | ||
Task 3.1: Environment Designation | 189 | ||
Task 3.2: Constraint Compliance | 191 | ||
Step 4: Customization | 193 | ||
Task 4.1: Resource Analysis | 194 | ||
Task 4.2: Performance Enhancement | 196 | ||
Summary | 197 | ||
Chapter 10: Transformation: Creating the Physical Data Model | 199 | ||
Task 1.1: Translation | 200 | ||
Activity 1.1.1: Transform LDM Objects to PDM Objects | 200 | ||
Entities to Record Types | 201 | ||
Relationships to Linkages | 204 | ||
Linkage Membership Class | 204 | ||
Linkage Degree | 205 | ||
Linkage Constraints | 205 | ||
Attributes to Data Items | 206 | ||
Data Item Domain | 207 | ||
Data Item Source: Primitive and Derived | 207 | ||
Primitive Data Item: Unique Identifiers and Descriptors | 207 | ||
Data Item Complexity: Simple and Group | 207 | ||
Data Item Valuation: Single Value and Multivalue | 208 | ||
Other Data Item Information | 208 | ||
Activity 1.1.2: Diagram the Objects | 208 | ||
Task 1.2: Expansion | 209 | ||
Activity 1.2.1: Assign Keys | 209 | ||
Activity 1.2.2: Normalize the Model | 211 | ||
Adjustments Needed for Normalization: Keys—Foreign and Domestic? | 213 | ||
Zero Normal Form | 214 | ||
First Normal Form | 214 | ||
Before Getting to Second Normal Form, a Slight Digression | 215 | ||
Second Normal Form | 215 | ||
Third Normal Form | 216 | ||
Post-Normalization—Retreat of Sally Forth? | 217 | ||
Issues with Normalization | 217 | ||
Tranformation Notes | 219 | ||
Deliverables | 220 | ||
Examples of Deliverables | 220 | ||
Chapter 11: Utilization: Merging Data and Process | 224 | ||
Task 2.1: Usage Analysis | 225 | ||
Process Modeling | 225 | ||
Logical Process Modeling | 225 | ||
Natural-Language Logical Process Modeling Techniques | 226 | ||
Plain English | 226 | ||
Oy Vey, There Has Got to Be a Better English Translation | 226 | ||
Structured English | 226 | ||
Graphical Logical Process Modeling Techniques | 228 | ||
Physical Process Modeling | 230 | ||
Natural-Language Physical Process Modeling Techniques | 230 | ||
Plain English | 230 | ||
Structured English | 230 | ||
Pseudocode | 230 | ||
Graphic Physical Process Modeling Techniques | 231 | ||
Flow Charts | 231 | ||
Structure Charts | 232 | ||
Activity 2.1.1: Create Usage Scenarios | 233 | ||
Clearing the Decks for Action | 234 | ||
Putting a Usage Scenario Together | 237 | ||
An Example | 237 | ||
Activity 2.1.2: Map Usage Scenarios to the PDM | 239 | ||
Task 2.2: Path Rationalization | 240 | ||
Activity 2.2.1: Reduce to Simplest Paths | 240 | ||
Activity 2.2.2: Simplify Model | 241 | ||
Utilization Notes | 242 | ||
Deliverables | 242 | ||
Example of Deliverables | 243 | ||
Further Reading | 243 | ||
Structured English | 243 | ||
Data Flow Diagramming | 244 | ||
Flow Charts | 244 | ||
Pseudocode | 244 | ||
Structure Charts | 244 | ||
Chapter 12: Formalization: Creating a Schema | 245 | ||
Task 3.1: Environment Designation | 246 | ||
Hierarchical Systems | 249 | ||
Network Systems | 249 | ||
Relational Systems | 249 | ||
Object-Oriented | 250 | ||
NoSQL | 250 | ||
DBMS Product and Version Selection | 250 | ||
Task 3.2: Constraint Compliance | 251 | ||
Pseudocode…Again | 251 | ||
Activity 3.2.1: Map Rationalized Physical Data Model to the Data Architecture | 253 | ||
Record Types | 253 | ||
Proper | 253 | ||
Associative | 253 | ||
Attributive | 254 | ||
S-Type | 255 | ||
Links | 255 | ||
Membership Class: Cardinality | 255 | ||
One-to-One | 255 | ||
One-to-Many | 256 | ||
Many-to-Many | 256 | ||
Membership Class: Modality | 256 | ||
Mandatory | 257 | ||
Optional | 257 | ||
Degree | 257 | ||
Unary | 257 | ||
Binary | 258 | ||
N-ary | 258 | ||
Constraints | 259 | ||
Inclusion | 259 | ||
Exclusion | 259 | ||
Conjunction | 259 | ||
Data Items | 260 | ||
Domains | 260 | ||
Source: Primitive and Derived | 260 | ||
Primitive Data Items | 260 | ||
Derived Data Items | 261 | ||
Complexity: Simple and Group | 261 | ||
Valuation: Single Value and Multivalue | 262 | ||
Activity 3.2.2: Create a DBMS Product/Version-Specific Functional Physical Database Design | 263 | ||
Subschema Creation | 265 | ||
Formalization Notes | 267 | ||
Deliverables | 267 | ||
Example of Deliverables | 268 | ||
Chapter 13: Customization: Enhancing Performance | 269 | ||
Task 4.1: Resource Analysis | 270 | ||
The Trade-Off Triangle | 271 | ||
Task 4.2: Performance Enhancements | 274 | ||
Activity 4.2.1: Customize Hardware | 275 | ||
A Few Words About Secondary Storage | 275 | ||
Add Disk | 277 | ||
Faster Disk | 277 | ||
Main Memory | 277 | ||
Activity 4.2.2: Customize Software | 278 | ||
Indices (B-Tree, Hash, Bitmap) | 278 | ||
Clustering | 278 | ||
Example Using Indices and Clusters | 281 | ||
Question 1: Should Order Be Indexed? | 283 | ||
Question 2: Should Product Be Indexed? | 283 | ||
Question 3: Should Line Item Be Indexed? | 284 | ||
Question 4: Should Line Item Be Clustered? | 284 | ||
Question 5: Should Line Item Be Clustered Around Order or Product? | 284 | ||
Alternative 1: Line Item Clustered Around Order | 284 | ||
Alternative Two: Line Item Clustered Around Product | 285 | ||
Partitioning | 286 | ||
Derived and Duplicate Data | 287 | ||
Denormalization | 288 | ||
Get Rid of ACID | 288 | ||
Big Data, Big Problems, Big Solution | 289 | ||
To Plunge or Not to Plunge | 290 | ||
NoSQL | 291 | ||
Modeling Big Data U3D Style | 292 | ||
Customization Notes | 294 | ||
Deliverables | 294 | ||
Examples of Deliverables | 295 | ||
Chapter 14: The Data Warehouse | 298 | ||
The Data Warehouse | 299 | ||
Data Warehouse Architecture | 301 | ||
Using U3D to Develop a Data Warehouse | 302 | ||
Step 1: Transformation | 302 | ||
Step 2: Utilization | 303 | ||
The Time Dimension | 305 | ||
Step 3: Formalization | 306 | ||
Step 4: Customization | 306 | ||
Streamlining | 306 | ||
Duplication | 306 | ||
Denormalization | 306 | ||
Indices and Hashing | 307 | ||
Bitmaps | 307 | ||
Bulk Loading | 308 | ||
Clustering | 309 | ||
Partitioning | 309 | ||
Distributed Processing | 309 | ||
All Together Now… | 310 | ||
Oops… | 310 | ||
Customization Notes | 311 | ||
Note | 311 | ||
Chapter 15: The Big Data Decision Support System | 312 | ||
Structured, Unstructured, and Semistructured Data—Another Small Digression | 313 | ||
DSS and Big Data | 316 | ||
Using U3D to Develop a Big Data Decision Support System | 317 | ||
Step 1: Transformation | 318 | ||
Step 2: Utililization | 318 | ||
Step 3: Formalization | 319 | ||
Step 4: Customization | 321 | ||
A Little About Hadoop | 321 | ||
Putting It All Together | 324 | ||
Deliverables | 324 | ||
Part IV: Where from Here? | 326 | ||
Chapter 16: A Look Ahead | 327 | ||
We Need to Ask the Awkward Questions | 328 | ||
Tools Need to Take Usage into Account | 330 | ||
The One and Only DBMS | 330 | ||
Better Training | 334 | ||
Notes | 336 | ||
Part V: Appendixes | 337 | ||
Appendix A: Glossary | 338 | ||
Appendix B: Logical Data Modeling Definitions | 357 | ||
Entity | 357 | ||
Relationship | 358 | ||
Attribute | 358 | ||
Domain | 359 | ||
Appendix C: Physical Schema Definition Object Definitions | 360 | ||
Record | 360 | ||
Linkage | 361 | ||
Data Item | 362 | ||
Domain | 362 | ||
Cluster | 363 | ||
Partition | 363 | ||
Index | 364 | ||
Appendix D: Formulas Used in This Book | 365 | ||
Appendix E: List of U3D Deliverables | 367 | ||
Index | 370 |