There is a wealth of material on database design, and it is covered in university degree programs. However, as we have repeatedly emphasized, no teacher is better than experience. So I summarized the detours and experiences over the years, and found some professionals on the Internet who are quite good at database design to give you some tips and experience in designing databases. Selected 60 of the best tips, and these tips into this article, in order to facilitate the indexing of its content is divided into five parts:
Part 1 - Before designing a database: This part lists 12 basic skills, including naming conventions and clear business requirements. Part 2 - Designing Database Tables: A total of 24 guiding tips covering field design in tables and common problems to avoid. Part 3 - Choosing Keys: How do you choose keys? Here are 10 tips on the proper use of system-generated primary keys, when and how to index fields for optimal performance, and more. Part 4 - Ensuring Data Integrity: Discusses how to keep your database clear and robust, and how to minimize harmful data. Part 5 - Various Tips and Tricks: Other tips not included in the above 4 parts are varied and will hopefully make your database development job a little easier. Part 1 - Examining the Existing Environment Before Designing a Database
When designing a new database, you should not only scrutinize the business requirements but also examine the existing system. Most database projects are not built from scratch; usually, there are existing systems within the organization that are designed to meet specific needs (and may not be automated). Obviously, the existing system is not perfect, or you wouldn't have to build a new one. But examining old systems can allow you to discover subtle problems that you might otherwise miss. In general, it is definitely to your advantage to examine existing systems.
Define a standard object naming convention
Be sure to define a naming convention for database objects. For database tables, determine from the beginning of the project whether the table name will be in plural or singular form. In addition to define simple rules for table aliases (for example, if the table name is a word, the alias takes the first four letters of the word; if the table name is two words, take the first two letters of each word to form a four-letter long alias; if the table name consists of three words, you may wish to take one from each of the first two words and then two letters from the last word, the result is still four letters long aliases, and the rest of the aliases will be formed by four letters). (For a working table, the table name can be prefixed with WORK_ followed by the name of the application that uses the table. Columns [fields] within a table should have a set of design rules for the key. For example, if the key is numeric, you can suffix it with _N; if it is a character, you can suffix it with _C. Standard prefixes and suffixes should be used for column [field] names. For example, if you have many "money" fields in your table, you might add a _M suffix to each column/field. Also, date columns should ideally be prefixed with D_.
Check the naming conventions between table names, report names, and query names. You can quickly become confused by the names of these different database elements. If you insist on naming these different components of the database uniformly, at the very least you should differentiate between these object names by prefixing them with Table, Query, or Report at the beginning of the name.
With Microsoft Access, you can identify objects with symbols such as qry, rpt, tbl, and mod (for example, tbl_Employees). I've also used tbl to index tables when dealing with SQL Server, but I use sp_company (and now sp_feft_) to identify stored procedures because I tend to keep several copies at times if I find a better way of handling them. I use udf_ (or a similar token) to identify functions I write when implementing SQL Server 2000.
What goes around comes around
Adopting an ideal database design tool, such as SyBase's PowerDesign, which supports languages such as PB, VB, Delphe, and so on, and connecting to more than 30 popular databases on the market through ODBC, including dBase, FoxPro, VFP, SQL Server, and so on, I will have the opportunity to connect to more than 30 databases in the future. Server, etc. In the future, I will have the opportunity to focus on the use of PowerDesign.
Get the Data Schema Resource Manual
People looking for an example schema can read the book The Data Schema Resource Manual, written by Len Silverston, W. H. Inmon, and Kent Graziano, which is one of the best data modeling books to own. The book includes chapters covering a variety of data domains, such as people, organizations, and work effectiveness.
Others you can refer to: Introduction to Database Systems by Xuan Wang Shan, Sa Division
Think about the future, but don't forget the lessons of the past
I have found it very useful to ask users how they see their needs changing in the future. This accomplishes two things: first, you get a clear picture of where the application design should be more flexible and how to avoid performance bottlenecks; and second, you know that users will be just as surprised as you are when requirements changes occur that weren't identified in advance.
Make sure to remember the lessons learned from the past! We developers should also help each other by sharing our experiences and lessons learned. Even if users think they don't need any more support, we should educate them about it - we've all had moments where we've been faced with "I wish I'd done that...". .
Logical design before physical practice
Logical design before diving into physical design. With the plethora of CASE tools that continue to come out, your designs can be of a fairly high logical caliber, and you can usually get a better overall understanding of all the aspects required for database design.
Know your business
Don't add even one data table to your ER (Entity-Relationship) schema until you are 100% sure that the system meets the needs of your customers from their point of view (what, you don't have a schema yet?). Then see Tip 9). Understanding your organization's business can save a lot of time later in the development phase. Once you've clarified your business requirements, you can make many of the decisions yourself.
Once you think you have a clear picture of what the business is, it's a good idea to have a systematic conversation with the client. Use the client's terminology and explain to them what you think and what you hear. You should also express the systematic relationship base in terms of may, will and must. This way you can allow your client to correct your own understanding and then take the next step in ER design.
Creating a Data Dictionary and ER Diagram
Be sure to spend some time creating an ER diagram and data dictionary. This should contain at least the data type of each field and the primary and foreign keys within each table. Creating ER charts and data dictionaries can be a bit time consuming, but it is absolutely necessary for other developers to understand the entire design. The earlier they are created, the more they can help avoid possible confusion in the future, so that it is clear to anyone who knows anything about databases how to get data out of them.
The importance of having an up-to-date document such as an ER chart cannot be overstated, which is useful for showing the relationships between tables, and a data dictionary that explains what each field is used for and any aliases that may exist. This is absolutely essential for the documentation of SQL expressions.
Creating schemas
A diagram is worth a thousand words: developers not only have to read and implement it, but also use it to help themselves talk to users. Schemas help make collaboration more effective, so that it's almost impossible for big problems to arise in up-front database design. A schema doesn't have to be complicated; it can even be as simple as handwriting it on a piece of paper. It's just a matter of making sure that the logical relationships on it will yield benefits in the future.
Starting with Inputs and Outputs
When defining database table and field requirements (inputs), you should first examine the existing or already designed reports, queries, and views (outputs) to determine which tables and fields are necessary to support those outputs. For example, if a customer needs a report that sorts, segments, and sums by zip code, you want to make sure that it includes a separate zip code field and doesn't blend the zip code into the address field.
Reporting tips
Find out how users typically report data: batch or online? Is the time interval daily, weekly, monthly, quarterly, or yearly? Also consider creating summary tables if needed. System-generated primary keys are difficult to manage in reports. Users searching with secondary keys in tables with system-generated primary keys often return many duplicates. This is low performance and confusing.
Understanding Customer Requirements
It may seem like this should be obvious, but requirements come from the customer (think in terms of internal and external customers here). Don't rely on the user writing down the requirements; the real requirements are in the customer's head. You have to let the customer explain their needs and, as development continues, ask the customer frequently to assure them that their needs are still within the purpose of the development. It is a truism that "I don't know what I want until I see it" will inevitably lead to a lot of rework because the database is not up to the standard of the customer's never-written-down requirements. And what's worse is that your interpretation of their requirements is yours alone and could be completely wrong.
Part 2 - Designing Tables and Fields to Check for Various Changes
I design my databases with an eye to which data fields are likely to change in the future. Let's say this is the case with surnames (note that they are western surnames, such as women taking their husband's surname when they marry, etc.). So, when setting up a system to store customer information, I tend to store the last name field in a separate data table and also append fields such as start date and end date so that changes to this data entry can be tracked.
Using Meaningful Field Names
On one occasion I was involved in the development of a project that included programs inherited from another programmer, and that programmer liked to name fields in terms of on-screen display of data indications, which wasn't too bad, but unfortunately she also liked to use some strange nomenclature, with naming taking the form of a combination of Hungarian naming and control sequences, such as cbo1, txt2, txt2_b, and so on.
Unless you're using a system that's geared only toward your abbreviated field names, please describe the fields as clearly as possible. Of course, don't overdo it, for example, Customer_Shipping_Address_Street_Line_1 is very descriptive, but no one wants to type in such a long name, the exact scale is in your hands.
Naming with prefixes
If there are a lot of fields of the same type in multiple tables (such as FirstName), you might want to use a table-specific prefix (such as CusLastName) to help you identify the field.
Time-sensitive data should include a "last updated date/time" field. Timestamping is especially useful for finding the cause of data problems, reprocessing/reloading data by date, and purging old data.
Standardization and data-driven
Standardizing data is not only convenient for you, but also for others. For example, if your user interface accesses external data sources (files, XML documents, other databases, etc.), you might want to store the appropriate connection and path information in the user interface support table. Also, if the user interface performs tasks such as workflows (sending mail, printing letterhead, modifying the status of a record, etc.), the data that generates the workflow can also be stored in the database. Pre-arrangement always requires effort, but if these processes are data-driven rather than hard-coded, it is much easier to make policy changes and maintain them. In fact, if the processes are data-driven, you can put a considerable amount of responsibility on the users to maintain their own workflow processes.
Normalization can't be overdone
For those unfamiliar with the term normalization, standardization ensures that the fields in a table are the most basic elements, and this measure helps to eliminate data redundancy in the database. There are several forms of normalization, but Third Normal Form (3NF) is generally considered to have the best balance of performance, scalability, and data integrity. Simply put, 3NF states:
Every value in a table should be expressed only once. Each row within a table should be uniquely identified (have a unique key). No non-key information should be stored in the table that depends on other keys. Databases that comply with the 3NF standard are characterized by a set of tables that are dedicated to storing linked data that are connected by keys. For example, a 3NF database that holds customers and their associated orders might have two tables, Customer and Order. the Order table does not contain any information about the customer to which the order is associated, but it does hold a key value that points to the row in the Customer table that contains information about that customer.
There are higher levels of standardization, but is more standard necessarily better? The answer is not necessarily. In fact, for some projects, even 3NF may introduce too much complexity into the database.
There are many examples where not standardizing tables is sometimes necessary for efficiency reasons. I once had a job developing restaurant analytics software that used non-normalized tables to reduce query times from an average of 40 seconds to about two seconds. While I had to do this, I don't take non-normalization of data tables as a matter of course. And the specific operation is nothing more than a derivation. So if something goes wrong with the table re-generating the non-normalized table is entirely possible.
Indicator of inactivity or non-adoption
Adding a field to indicate whether the record in question is no longer active in the business is quite useful. Whether it's a customer, an employee, or something else, this helps filter the active or inactive status when running queries again. It also removes some of the problems that new users face when adopting data, such as the fact that certain records may no longer be available to them, and can be used as a precaution when deleting them.
Using Role Entities to Define Columns [Fields] That Belong to a Category
When you need to define something that belongs to a specific category or has a specific role, you can use role entities to create specific temporal associations that can be self-documenting.
The implication here is not to have PERSON entities with Title fields, but rather, why not use PERSON entities and PERSON_TYPE entities to describe people? Let's say that when John Smith, Engineer moves up to John Smith, Director and eventually to John Smith, CIO, all you have to do is change the key value of the relationship between the two tables, PERSON and PERSON_TYPE, and add a date/time field to know when the change occurred. This way, your PERSON_TYPE table contains all the possible types of PERSON, such as Associate, Engineer, Director, CIO, or CEO.
An alternative would be to change the PERSON record to reflect the new title change, but then it would not be possible to track the exact time of the individual's position in time.
Naming institutional data using commonly used entities
The easiest way to organize your data is to use commonly used names, such as PERSON, ORGANIZATION, ADDRESS, and PHONE, to name a few. When you combine these common general names or create specific corresponding subentities, you get special versions that you use yourself. The main reason for starting out with general terms is that all concrete users can be concrete about abstract things.
With these abstract representations, you can adopt your own special names for level 2 identifiers; for example, PERSON might be Employee, Spouse, Patient, Client, Customer, Vendor, or Teacher, and so on. Similarly, ORGANIZATION may be MyCompany, MyDepartment, Competitor, Hospital, Warehouse, Government, etc. Finally ADDRESS could be specifically Site, Location, Home, Work, Client, Vendor, Corporate, and FieldOffice etc.
Using general abstract terms to identify categories of "things" gives you tremendous flexibility in relating data to meet business requirements, and it also significantly reduces the amount of redundancy required for data storage.
Users come from all over the world
When designing databases for the Web or with other international features, it's important to keep in mind that most countries have different field formats, such as postal codes, and that some, such as New Zealand, don't have postal codes.
Duplication of data requires discrete data tables
If you find yourself duplicating data, create new tables and new relationships.
3 useful fields that should be added to every table
dRecordCreationDate, which defaults to Now() under VB and GETDATE() under SQL Server sRecordCreator, which defaults to NOT NULL under SQL Server DEFAULT USERnRecordVersion, the version tag for the record; helps to pinpoint the reason for null data or missing data in the record Use more than one field for addresses and phone numbers
Describing a street address in a single line is not enough. address_Line1, address_Line2, and address_Line3 can provide a much bigger picture. Address_Line1, Address_Line2, and Address_Line3 provide more flexibility. Also, phone numbers and e-mail addresses should ideally have their own data tables with their own types and tag categories.
Be careful about over-standardizing; doing so can lead to performance problems. While separating the address and phone tables is often optimal, if you need to access this type of information frequently, it may be better to store "preferred" information (e.g., Customer, etc.) in its parent table. The compromise between non-standardization and faster access makes some sense.
Using multiple name fields
I find it surprising that many people have a single field for name in their database. I thought only developers just starting out would do this, but it's actually a very common practice online. I would suggest that you should treat the last name and first name as two fields and then combine them when querying.
My most common use is to create a calculated column [field] in the same table, through which I can automatically concatenate the normalized fields so that it changes with the data as it changes. However, you have to be very clever when using modeling software to do this. In summary, the use of linked fields is an effective way to isolate the user application from the developer interface.
Beware of mixed-case object names and special characters
One of the things that used to annoy me most was mixed-case object names in databases, such as CustomerData, which has been a problem in databases from Access to Oracle. I didn't like the idea of having mixed-case object naming and having to manually change the names. Think about it, will this database/application survive until a more powerful database is adopted? Better readability (CUSTOMER_DATA) is achieved by using all capitalized names with underscores, and never leaving spaces between characters of an object name.
Beware of reserved words
Make sure that your field names don't conflict with reserved words, the database system, or common access methods. For example, I recently wrote an ODBC connection program with a table that used DESC as the description field name. DESC is a reserved word after the abbreviation DESCENDING. A SELECT * statement in the table worked, but I got a bunch of useless information.
Maintaining consistency in field names and types
Make sure to maintain consistency when naming fields and assigning data types to them. If the field is called "agreement_number" in one table, you don't want to change the name to "ref1" in another table. If the data type is integer in one table, don't make it character in another. Remember, you've done your job, and other people are going to use your database.
Choose your numeric types carefully
Be careful with the smallint and tinyint types in SQL. For example, if you want to look at monthly sales totals, and you have a smallint field type for the totals, you can't calculate the total if it's more than $32,767.
Deletion marks
Include a "deletion mark" field in the table so that rows can be marked as deleted. In a relational database, do not delete a row in isolation; it is better to use a purge program and carefully maintain index integrity.
Avoiding triggers
The function of triggers can often be accomplished in other ways. Triggers can be a distraction when debugging a program. If you do need to use triggers, you should concentrate on documenting them.
Including a versioning mechanism
It is recommended that you introduce a versioning mechanism in your database to determine the version of the database in use. You need to implement this requirement anyway. Over time, users' needs will always change. Eventually a change in the database structure may be required. While you can determine the version of the database structure by checking for new fields or indexes, I find it more convenient not to store the version information directly into the database?
Leaving enough room for text fields
Text fields of the ID type, such as customer ID or order number, etc., should be set larger than you might think, because it won't be long before you'll probably be embarrassed by having to add extra characters. For example, let's say your customer ID is 10 digits long. Then you should set the database table field to be 12 or 13 characters long. Is this a waste of space? A little, but not as much as you might think: a field 3 characters longer would take up an additional 3MB of space in a database with 1 million records and a little bit of indexing. But this extra space is not needed to reorganize the whole database in the future to realize the database size growth. The best and most painful example of this is the change in ID numbers from 15 to 18 digits.
Column [field] naming tips
We have found that writing SQL expressions is greatly simplified if you give each table a uniform prefix for the column [field] name. This does have its drawbacks, such as undermining the usefulness of automated table joining tools that link public *** column [field] names to certain databases, but even these tools sometimes connect incorrectly. For a simple example, suppose you have two tables:
Customer and Order.The Customer table is prefixed with cu_, so the subsections within the table have names like cu_name_id, cu_surname, cu_initials, cu_address, etc. The Order table is prefixed with or_, so the subsections are:
The Order table has subsection names like names are:
or_order_id, or_cust_name_id, or_quantity, and or_description, among others.
This way the SQL statement to select all the data from the database can be written as follows:
1Select*FromCustomer,OrderWherecu_surname = "MYNAME";2andcu_name_id = or_cust_name_idandor_quantity = 1 in the absence of these prefixes. quantity = 1In the absence of these prefixes it is written like this (with aliases to differentiate):
1Select*FromCustomer,OrderWhereCustomer.name = "MYNAME";