This article is rated Stub-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
Incorrect definition of 5NF
editThis article currently contains an incorrect definition of 5NF:
- A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R
This is well known to be false. E.g., *{A, B} does not hold if we have the candidate key A and candidate key B, but the above claim says it does. Another example is *{AB, BC, C} which is implied by candidate key B, but the above claim says it does not.
Also the following claim that is currently made is false:
- A table T is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it cannot have a lossless decomposition into any number of smaller tables. The case where all the smaller tables after the decomposition have the same candidate key as the table T is excluded.
For example, R(A, B, C, D) with candidate keys B and C and join dependency *{AB, BC, CD} is in 5NF but also has a lossless decomposition that is not based on a single candidate key.
Anyone mind if I start fixing this?
Example not in 4NF?
editShouldn't the example be in the 4NF to be a good example? Currently it looks to me, like it's not. (Jonas)
I think you're right. I've been reading some of the citations and links available online, and I think I misunderstood what 5NF was about when we were discussing the problems with the previous example. Having now read enough to realise I don't understand 5NF as well as I thought, I'm wary of making further solo contributions until I've better informed myself. --VinceBowdren 11:55, 29 August 2006 (UTC)
Better Example
editUsing American healthcare isn't a good example. The rest of the world (myself included) doesn't understand the example table as we don't have "insurers" - we have "governments" who pay for our healthcare as you can't put a price on healthcare. I am confused as I don't understand what is shown. Do different "insurers" only provide for certain treatments? Do different insurers only work with specific doctors? Surely using a table with student-course-option would be better? How happy I am that I don't live in a country where you have to pay if you need medical treatment!! Xanucia 00:11, 18 May 2007 (UTC)
- Your anti-Americanism aside, the example is confusing to me, and I'm an American even. I'm not quite sure the example is even properly in 4NF. Gigs (talk) 05:24, 16 December 2007 (UTC)
- Ok, folks. I've changed the example to something a bit less esoteric.--Nabav (talk) 14:55, 26 April 2008 (UTC)
- Your anti-Americanism aside, the example is confusing to me, and I'm an American even. I'm not quite sure the example is even properly in 4NF. Gigs (talk) 05:24, 16 December 2007 (UTC)
I don't think the salesmen example is valid. Here's a salesmen profile that can be represented in the original table but not in the three-table version:
- Acme and Robusto produce both a vacuum cleaner and a breadbox (stet)
- Willy sells (only) the Robusto vacuum cleaner and Acme breadbox
- Louis sells (only) the Acme vacuum cleaner and Robusto breadbox
The three-table version can only say "X sells brand X and product Y", so you end up with the union of all brand/product combinations - Royan (talk) 12:39, 22 April 2009 (UTC)
- You've glossed over the bit of the article that says: "Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B makes Product Type P), the Travelling Salesman must offer products of Product Type P made by Brand B. In that case, it is possible to split the table into three." So: if the rule described applies, then you can split the table into three. If the rule described does not apply, then you can't. In the situation you refer to, the rule does not apply, thus - as you say, and as the article says - you cannot split the table into three. You are saying the same thing that the article is saying. --Nabav (talk) 15:36, 22 April 2009 (UTC)
I don't understand this: Suppose that Jack Schneider starts selling Robusto's products. In the previous setup we would have to add five new entries since Jack Schneider is able to sell five Product Types covered by Robusto: Breadboxes, Vacuum Cleaners, Pruning Shears, Umbrella Stands and Telescopes. With the new setup we need only add a single entry (in Brands By Travelling Salesman). How would that enable him to sell their shears, stands and telescopes? These aren't products from his repertoire.--87.162.8.111 (talk) 13:32, 27 April 2010 (UTC)
- The "five Product Types" thing was a recently-introduced inaccuracy. I see someone's removed it now. --Nabav (talk) 11:12, 15 May 2010 (UTC)
Why the 5NF ??
editFor years , I have now been wondering why the IT world is still trying to explain and justify the 4 and 5NF ?
The pizza example was actually the perfect one. Every single example will show a "non 5NF" relation, which always involve all-key relations. Those very tables are the results of empty associations in an E/R model. The transformation rule then states to transform those associations into table by composing all primary keys from associated entities. And yeah, in most cases, one can find "multi-valued dependencies" in those tables.
BUT, I can see problems with seeking for a higher normal form than the BCNF:
- Reason 1: the table is not even BCNF: the only determinable normal form of those tables is the 1NF ! Why ? Just because there is no attributes to compare to the key, as all the table attributes are the primary key ! - Reason 1bis: If the table is not even BCNF, why seeking for higher ? - Reason 2: According to 1 and 1bis, where did the DB architect get it wrong then ?
We are now facing two possibilities (which will read strange before I explain myself further down):
- P1 being a severe lack of basis design knowledge from a fairly ignorant software engineer. - P2 being a normal design choice procedure, from a then very careful software engineer.
What is usually called as "classic" design methods, i.e., non-object, are divided into three abstract conceptual levels:
- Conceptual level - Logical level - Physical level
During the conceptual phase, the engineer will create the first data model, from the data dictionary provided by the analysis phase. This data model is meant to be the ideal model, 100% BCNF, even if some attributes are left unclassified. Under the conceptual model, no empty, all keys, associations are allowed, because they are not matching the BCNF requirements. At logical level, the (business logic, algorithms, procedures and functions ==> choose according to your tastes and fashion) are designed according to the business and functional requirements and then validated against the first, conceptual, data model. If that model satisfies the "needs" of the business logic (in terms of data accessibility, points of entries, …), then, obviously, no changes are made to the data model. However, very frequently, the business logic will not be satisfied by the conceptual data model, and the latter will have to be changed, hence probably not 100% BCNF anymore. This is often where empty associations appear, and more, are allowed to appear.
We can now converge towards the 4th/5NF myths: Empty associations should be created only:
- At logical level - When the software engineer has confronted the conceptual data model and the business logic … - And the business logic requires such associations
So, we agree that the business logic "knows" about those all keys, so qualified non 4/5NF compliant, tables. Hence, assuming P1, the software engineer will have taken into consideration the update risks of those tables, and integrated the according checks and safety into the business logic algorithms operating on them. In that very respect, the 4/5NF are absolutely useless, because we are not dealing with a conception mistake, but a well identified, carefully taken choice.
On the other hand, the P2 situation arises when empty associations are created for the conceptual data model. As I wrote earlier, we are facing there a major lack of knowledge about conceptual design. Because conceptual models have to be 100% BCNF, empty associations should never appear on them. Besides, there is little chances that any business logic will take into account how to deal with empty associations created in the conceptual phase, as this very business logic would have not had been confronted yet to the data model. In this totally wrong situation, we could then well observe data corruption, hence, that's where Fagin and Rissannen believed to fix what the BCNF did not fix …
I will really welcome your reactions to this post !
Sebastien
- See my reply to this on the 4NF discussion page, where the above material was duplicated. --Nabav (talk) 21:50, 14 August 2008 (UTC)