Talk:Fourth normal form
This article is rated Start-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||
|
Possible Error
editThe text currently reads "A trivial multivalued dependency X->->Y is one in which Y consists of all columns belonging to X." I'm fairly sure this should read "A trivial multivalued dependency X->->Y is one in which Y contains all columns not belonging to X." Can anyone confirm?
Multi-valued dependencies
editI think your explanation of multi-valued dependencies was misleading. A functional dependency is one where a field or fields determine the value of another field in the same row; whereas a multivalued dependency is the existence of some rows of data necessitate further rows of data in the same table, but not in the same row. I've been staring hard at the multivalued dependency article to get my head round it and I'll try and come up with a user-friendly description of what its on about. --VinceBowdren 14:11, 18 August 2006 (UTC)
Ok, perhaps it was a bad idea to go into detail about multivalued dependencies in the 4NF article (especially as an informative multivalued dependencies article already exists). The pizza example gives an intuitive idea of what they're about anyway. Ronald Fagin's 1977 paper in which he introduces the idea of multivalued dependencies and 4NF can be found at http://www.almaden.ibm.com/cs/people/fagin/tods77.pdf. It's very clearly written and looks at multivalued dependencies from several different angles, some of which I found easier to think through than others. One succinct definition he gives is: "THEOREM 1. X->->Y holds for the relation R(X,Y,Z) if and only if R is the join of its projections R1( X,Y) and R2 (X,Z)." You can see how this holds for the pizza example. Fagin notes that a functional dependency is a special case of a multivalued dependency, but on further consideration, I don't want to push that: it will only confuse people needlessly. --Nabav 15:47, 18 August 2006 (UTC)
There seems to be some bizarre and confused thinking going on here. The definition given for multi-valued dependency in this article is actually the definition of a funtional dependency, and the description of a functional dependency is actually the definition of a functional dependency with single domain range. This is of course a useful special case, since the augmentation and decomposition rules together imply that the closure of any set of functional dependencies is the closure of an equivalent set containing only functional dependencies with single domain range, so that we can for example start from just those ones when following Bernstein's algorithm for schema synthesis, but not a useful definition since it would discard all the inference rules for functional dependencies and make it impossible to formulate any useful concept of closure. In particular Zaniolo's definition of EKNF would be meaningless, so we would no longer be able to describe the rsults of Bernstein's synthesis algorithm for 3NF as EKNF schemata which are a good deal stronger than 3NF but, unlike BCNF, EKNF is representationaly sound (ie it satisfies the representation principle).MichealT (talk) 11:04, 7 March 2010 (UTC)
I am not sure what 4NF is WITHOUT multivalued dependencies. They go to the heart of why 4NF exist at all; they're the raison d'etre of the 4th normal form. Whatever else is done, please don't delete this part of the explanation. I find this series of articles normal forms to be just excellent, really excellent. Precise and thoroughgoing without the pendantry or excessive formalism that often accompanies this topic. — Preceding unsigned comment added by Jaydee000 (talk • contribs) 22:53, 9 June 2012 (UTC)
Why the 4NF ??
editFor years , I have now been wondering why the IT world is still trying to explain and justify the 4 and 5NF ?
The pizza example was actually the perfect one. Every single example will show a "non 4NF" relation, which always involve all-key relations. Those very tables are the results of empty associations in an E/R model. The transformation rule then states to transform those associations into table by composing all primary keys from associated entities. And yeah, in most cases, one can find "multi-valued dependencies" in those tables.
BUT, I can see problems with seeking for a higher normal form than the BCNF:
- Reason 1: the table is not even BCNF: the only determinable normal form of those tables is the 1NF ! Why ? Just because there is no attributes to compare to the key, as all the table attributes are the primary key ! - Reason 1bis: If the table is not even BCNF, why seeking for higher ? - Reason 2: According to 1 and 1bis, where did the DB architect get it wrong then ?
We are now facing two possibilities (which will read strange before I explain myself further down):
- P1 being a severe lack of basis design knowledge from a fairly ignorant software engineer. - P2 being a normal design choice procedure, from a then very careful software engineer.
What is usually called as "classic" design methods, i.e., non-object, are divided into three abstract conceptual levels:
- Conceptual level - Logical level - Physical level
During the conceptual phase, the engineer will create the first data model, from the data dictionary provided by the analysis phase. This data model is meant to be the ideal model, 100% BCNF, even if some attributes are left unclassified. Under the conceptual model, no empty, all keys, associations are allowed, because they are not matching the BCNF requirements. At logical level, the (business logic, algorithms, procedures and functions ==> choose according to your tastes and fashion) are designed according to the business and functional requirements and then validated against the first, conceptual, data model. If that model satisfies the "needs" of the business logic (in terms of data accessibility, points of entries, …), then, obviously, no changes are made to the data model. However, very frequently, the business logic will not be satisfied by the conceptual data model, and the latter will have to be changed, hence probably not 100% BCNF anymore. This is often where empty associations appear, and more, are allowed to appear.
We can now converge towards the 4th/5NF myths: Empty associations should be created only:
- At logical level - When the software engineer has confronted the conceptual data model and the business logic … - And the business logic requires such associations
So, we agree that the business logic "knows" about those all keys, so qualified non 4/5NF compliant, tables. Hence, assuming P1, the software engineer will have taken into consideration the update risks of those tables, and integrated the according checks and safety into the business logic algorithms operating on them. In that very respect, the 4/5NF are absolutely useless, because we are not dealing with a design mistake, but a well identified, carefully taken choice.
On the other hand, the P2 situation arises when empty associations are created for the conceptual data model. As I wrote earlier, we are facing there a major lack of knowledge about conceptual design. Because conceptual models have to be 100% BCNF, empty associations should never appear on them. Besides, there is little chances that any business logic will take into account how to deal with empty associations created in the conceptual phase, as this very business logic would have not had been confronted yet to the data model. In this totally wrong situation, we could then well observe data corruption, hence, that's where Fagin and Rissannen believed to fix what the BCNF did not fix …
I will really welcome your reactions to this post !
Sebastien
- Hi Sebastien,
- A Wikipedia article talk page is probably not the best forum for discussing such matters. The purpose of this page is to allow us to talk specifically about the content of the article - how accurate or inaccurate it is, and how it may be improved. There are (I would imagine) other forums on the Internet that would be more suited to the posting of material like the above. You raise many questions, but entering into discussions of them would take us quite far away from the matter at hand, i.e. the specific content of this article. --Nabav (talk) 21:48, 14 August 2008 (UTC)
How on earth what I wrote cannot be relevant ? Your very statement "how it can be improved" goes against yourself.
I'd like to improve the article by using the status of Wikipedia to finish the myths of the fourth and fifth normal form. In that respect, the article should explain how irrelevant they are, because they are founded on an irrelevant idea.
Anyone can see something wrong with that ? Sébastien —Preceding unsigned comment added by Hotstaff (talk • contribs) 16:41, 22 August 2008 (UTC)
- The problem with that is that you are proposing original research, which is against one of wikipedia's core policies:
Wikipedia does not publish original thought: all material in Wikipedia must be attributable to a reliable, published source.
Articles may not contain any new analysis or synthesis of published material that serves to advance a position not clearly advanced by the sources.
- If other (independent reliable) sources have already advanced the arguments you describe, then an addition to the wikipedia articles is fair enough - but if this is your own personal opinions then wikipedia is not the place to publish them. --VinceBowdren (talk) 13:06, 25 August 2008 (UTC)
Actually Sebastien, BCNF already throws away the representation principle and has no decent algorithm for generating a schema from the functional dependencies so if you want to write an article on the work of the many computer scientists who have wanted to keep both the representation and separation principles and at the same time have useful (efficient) algorithms for schema design you should start from BCNF, not from 4NF. I think such an article would be a useful addition to wikipedia, and if it were available it shouyld be referenced from this article. As VinceBowdren points out, it would be appropriate for it to be a separate article, and perhaps inappropriate to include a full description of the issues in this normalisation article (if the article existed we could then discuss whether it should be merged with this article)MichealT (talk) 11:26, 7 March 2010 (UTC)
There is a contradiction in the article
editThe article says: 4NF is the next level of normalization after Boyce-Codd normal form. This seems to point that every 4NF is also BCNF. But later states "Ronald Fagin demonstrated[2] that it is always possible to achieve 4NF". And we know that to achieve BCNF is not always possible. Therefore there is a contradiction because if we can always achieve 4NF, we would achieve BCNF at the same time.
Which one of the previous assertions is wrong? --Juansempere (talk) 14:02, 9 October 2008 (UTC)
- A good point. A nonloss decomposition of a non-BCNF schema into BCNF (and 4NF) is always possible. What is not always possible is a nonloss, dependency preserving decomposition of a non-BCNF schema into BCNF (and by extension, if we cannot always find such a decomposition into BCNF, then we cannot always find such a decomposition into 4NF, which is an even more stringent normal form).
- The article as it stands is using an ambiguous term, "achievable". A slight amendment is needed - we need to specify that 4NF is always achievable in the sense of being able to find a nonloss decomposition, but not always achievable in the sense of being able to find a nonloss dependency-preserving decomposition. --Nabav (talk) 21:11, 9 October 2008 (UTC)
Section added in error (wrong talk page)
editontent removedMichealT (talk) 11:54, 7 March 2010 (UTC)
External links modified
editHello fellow Wikipedians,
I have just modified one external link on Fourth normal form. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
- Added archive https://web.archive.org/web/20050404010227/http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html to http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html
When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.
This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}}
(last update: 5 June 2024).
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—InternetArchiveBot (Report bug) 00:44, 5 October 2017 (UTC)
Unnecessarily full of confounding jargon
editFirstly, after a brief explanation that 4th normal form is part of database normalization, it needs to explain the problem with lower forms that this form solves. The reason normalization exists is to solve data redundancy and efficiency, etc. This must be clear. The audience of this article is presumably people who are new to relational database modeling and should have them in mind throughout. Each concept should be described in terms of the problem being solved.
To do this, jargon should not be used without clear, plain language definitions (or links to the definitions). Any abstract representations of data should be only used in conjunction with real-world illustrations.
Instead of using pizza types, vendors, I think a library catalog is probably the most apt model for understanding all forms from 1st to 5th. At 4th and 5th form, we start defining tables based on attributes that might regularly be null in a given table. As a database size increases, the risk of null columns in tables can become a liability and can result in some rigidity in entity definitions that may spell trouble later on. So in BCNF, which is what most database developers will ever need, one might break up a library catalog by title, author, publisher, edition, copies, locations etc.
But in a 4th or 5th normal form, any column that is not required should have its own table. That is, any column that can have a NULL value should be replaced with a table for only the entities that have a value.
For instance in a 3rd normal form data table for `book_editions` could have a value for both print (page_count) and audio-book (time_length). In a 4th and 5th normal forms, these attributes would be split into two separate tables: a `book_edition__page_count` and `book_edition__time_length`. Each table would have the book_edition_id and a the corresponding column for the data type. In querying the data, one would left join all the related tables into a single "book edition". Since a given edition will have one or the other values, the abstract result will resemble the 3rd normal form table with one value populated and the other null.
Another example would be for the author table. For every unique author_id in the Author table, instead of having columns for first_name, last_name, middle_name, pen_name, prefix, suffix, each field would have it's own table with the value and related author_id. Since the vast majority of authors will not have a prefix title (Sir, Dame, Dr., etc.) or a suffix (Sr. Jr. Esq., etc.) those rows simply would not exist in the database for the given author_id. So this could account for a book by Aristotle, and one by Rev. Dr. Martin Luther King, Jr. . This could be further abstracted into a single table called name with the columns: author_id, name, type where type relatees to a third table of name types: first, last, middle, prefix, suffix, and even more rare concepts like titles of nobility: His Grace Franklin Bafflement Wingbats-Almond Tree, Jr. 7th Duke of The Dumpster. Each concept would be represented with its type in separate rows in the table all relating to the single author_id for the given author. So if this fictitious person's author_id was 5 and there was a table of name_types the rows would look like this:
author_id | name_string | name_type_id |
---|---|---|
5 | His Grace | 4 |
5 | Franklin | 1 |
5 | Wingbats-Almond Tree | 2 |
5 | Bafflement | 3 |
5 | Jr. | 5 |
5 | 7th Duke of the Dumpster | 6 |
With a table of name_types
name_type_id | type_string | name_position |
---|---|---|
1 | First Name | 2 |
2 | Last Name | 4 |
3 | Middle Name | 3 |
4 | Prefix | 1 |
5 | Suffix | 5 |
6 | Noble Title | 6 |
Theoretically the person table that stands as the parent "entity" could just have one field: author_id. This would be extreme, but it is entirely possible. General Ludd (talk) 02:01, 12 February 2020 (UTC)
Clarification needed
editThe article leaves me guessing as to what should happen when there are additional non-key columns in the same table.
Example:
Restaurant | Pizza variety | Delivery area | Price |
---|---|---|---|
A1 Pizza | Thick Crust | Springfield | 10.99 |
A1 Pizza | Thick Crust | Shelbyville | 11.49 |
A1 Pizza | Thick Crust | Capital City | 11.99 |
A1 Pizza | Stuffed Crust | Springfield | 11.49 |
A1 Pizza | Stuffed Crust | Shelbyville | 11.99 |
A1 Pizza | Stuffed Crust | Capital City | 12.49 |
Elite Pizza | Thin Crust | Capital City | 11.29 |
Elite Pizza | Stuffed Crust | Capital City | 12.29 |
Vincenzo's Pizza | Thick Crust | Springfield | 11.99 |
Vincenzo's Pizza | Thick Crust | Shelbyville | 11.99 |
Vincenzo's Pizza | Thin Crust | Springfield | 10.69 |
Vincenzo's Pizza | Thin Crust | Shelbyville | 10.69 |
This modified example has one functional dependency: {Restaurant, Pizza variety, Delivery area} → {Price}. It still meets BCNF and below. Under the assumption from the main text ("pizza varieties offered by a restaurant are not affected by delivery area"), we also still have the same two multivalued dependencies: {Restaurant} {Pizza variety} and {Restaurant} {Delivery area}. Intuitively, the correct design here would be three tables, as follows (with the appropriate foreign key constraints):
Restaurant | Pizza variety |
---|---|
A1 Pizza | Thick Crust |
A1 Pizza | Stuffed Crust |
Elite Pizza | Thin Crust |
Elite Pizza | Stuffed Crust |
Vincenzo's Pizza | Thick Crust |
Vincenzo's Pizza | Thin Crust |
Restaurant | Delivery area |
---|---|
A1 Pizza | Springfield |
A1 Pizza | Shelbyville |
A1 Pizza | Capital City |
Elite Pizza | Capital City |
Vincenzo's Pizza | Springfield |
Vincenzo's Pizza | Shelbyville |
Restaurant | Pizza variety | Delivery area | Price |
---|---|---|---|
A1 Pizza | Thick Crust | Springfield | 10.99 |
A1 Pizza | Thick Crust | Shelbyville | 11.49 |
A1 Pizza | Thick Crust | Capital City | 11.99 |
A1 Pizza | Stuffed Crust | Springfield | 11.49 |
A1 Pizza | Stuffed Crust | Shelbyville | 11.99 |
A1 Pizza | Stuffed Crust | Capital City | 12.49 |
Elite Pizza | Thin Crust | Capital City | 11.29 |
Elite Pizza | Stuffed Crust | Capital City | 12.29 |
Vincenzo's Pizza | Thick Crust | Springfield | 11.99 |
Vincenzo's Pizza | Thick Crust | Shelbyville | 11.99 |
Vincenzo's Pizza | Thin Crust | Springfield | 10.69 |
Vincenzo's Pizza | Thin Crust | Shelbyville | 10.69 |
However, this intuitive design still does not follow the rules of Fourth Normal Form. The third table still the same functional dependency, and also still has the same non-trivial multivalued dependencies. Yet, there is no other way to represent the prices correctly in the relational database. I read Fagin's paper, but either I missed something, or this issue was not addressed.
It gets even worse when the assumption from the main text does not hold, i.e. pizza varieties offered by a restaurant are affected by delivery area. So for instance, A1 Pizza could decide to no longer sell Thick Crust to Shelbyville, but keep all their other offerings the same. In that case, the intuitive design would be to keep the original four-column table:
Restaurant | Pizza variety | Delivery area | Price |
---|---|---|---|
A1 Pizza | Thick Crust | Springfield | 10.99 |
A1 Pizza | Thick Crust | Capital City | 11.99 |
A1 Pizza | Stuffed Crust | Springfield | 11.49 |
A1 Pizza | Stuffed Crust | Shelbyville | 11.99 |
A1 Pizza | Stuffed Crust | Capital City | 12.49 |
Elite Pizza | Thin Crust | Capital City | 11.29 |
Elite Pizza | Stuffed Crust | Capital City | 12.29 |
Vincenzo's Pizza | Thick Crust | Springfield | 11.99 |
Vincenzo's Pizza | Thick Crust | Shelbyville | 11.99 |
Vincenzo's Pizza | Thin Crust | Springfield | 10.69 |
Vincenzo's Pizza | Thin Crust | Shelbyville | 10.69 |
Because of the changed assumption, the multivalued dependencies {Restaurant} {Pizza variety} and {Restaurant} {Delivery area} no longer apply. There are still six three-column multivalued dependencies, such as {Restaurant, Pizza variety} {Delivery area} and all its possible permutations. These were irrelevant in the original example, since they were trivial. But adding the Price attribute changed them into non-trivial ({Restaurant, Pizza variety} and {Delivery area} together no longer form the whole set of attributes of the relation).
My intuitive response is that a multivalued dependency should be considered trivial when either Y is a subset of X, or X and Y together form the whole set of attributes of a candidate key of the relation. But this is not backed by Fagin's paper, where the definition of a trivial multivalued dependency matches exactly what is said in the main text. So based on this, we have to conclude that the table above violates 4NF, and needs to be decomposed. But I don't see any possible decomposition that would remove the 4NF violation without violating the business rules. And once more, I fail to find an answer to this situation in Fagin's paper.
I would really appreciate if someone with a better understanding of 4NF than me could update the main text to address the two examples I show here, or else reply here to explain what I am missing.
Thanks!