Talk:Boyce–Codd normal form
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
3NF but not BCNF
editIsn't the person, shop type, nearest shop example a schema in the form {AB → C, AC → B} and not {AB → C, C → B} as stated? If nearest shop determined shop type (C → B), then surely the stated anomaly could not happen. —Preceding unsigned comment added by 213.248.204.74 (talk) 09:23, 17 February 2009 (UTC)
- Hi, I think you've subtly misunderstood what is meant by a functional dependency. Shop Type is functionally dependent on Nearest Shop because logically, in the real world, a single shop ought to have exactly one shop-type. The point about functional dependencies though is that depending on how you've designed your tables, you might have a situation where the table can accommodate data that fails to respect one or more functional dependencies. This possibility of this situation IS the anomaly. So in this example, we want to respect the functional dependency C → B, but even if we enforce all the candidate keys we cannot guarantee that C → B will be respected. --Nabav (talk) 14:33, 21 February 2009 (UTC)
- "because logically, in the real world, a single shop ought to have exactly one shop-type" Since When? I know plenty of shops that do more than one thing, and hence could have more than one type. (Also, a cynical pedant would point out that "logically, in the real world" is a contradiction in terms.) I do think this is a bad example ... —Preceding unsigned comment added by 94.193.94.243 (talk) 13:25, 17 June 2009 (UTC)
- All right: poor phrasing on my part in my previous comment. The upholding of certain functional dependencies is an aspect of the way a database represents the real world. For example, we might have an Employee table that upholds the functional dependency {Employee} → {Current Work Location}. Now, someone might complain that this representation distorts reality because actually, an Employee might have a desk at each of two different locations, dividing his time between the two. Fair enough: but by designing our database in such a way as to uphold {Employee} → {Current Work Location}, we are simply saying that an Employee can never have multiple Current Work Locations in our representational schema. Depending on how the database is to be used, this may well be a perfectly good design choice. In designing databases, we aren't striving to represent absolutely every nuance of reality; instead we're striving to represent a particular domain in a way that's useful for particular purposes. Thus even though it is true that tomorrow someone might open a shop called "Steve's Bakery & Opticians", this possibility should not necessarily prevent us from designing a database that upholds the functional dependency {Shop} → {Shop Type}. I will add to the article an explicit statement that we're treating a shop as always having exactly one type. --Nabav (talk) 07:29, 27 June 2009 (UTC)
- == A way out? ==
Although 3NF is not BCNF for {AB->C,C->B} is true. There is a way out in this example. Just create a more generic datamodel. We would have to record the distance to each shop. And if we need the closest shop we need to udpate our queries. create a table [person, shop, distance] and a talbe [shop, shoptype]. Wouldn't this be possible for every 3NF that is not BCNF???? —Preceding unsigned comment added by 85.144.94.32 (talk) 11:21, 7 August 2009 (UTC)
- There's always more than one way to model a situation, and yes: modelling the situation in a different way can often make these problems (problems of insufficient normalization) disappear. Your suggestion would have that effect. So would a model in which we recorded the latitude and longitude of each person's home, and the latitude and longitude of each shop, and the shop type of each shop. (Writing the necessary query could be awkward, but in principle we've got all the information we need.) What I would say however is that both of these suggestions require the database to capture extra information (about distances, about latitudes and longitudes) which the original example was not trying to capture. Therefore I don't think these sorts of suggestions throw much light, in a general way, on the issues discussed in the article. --Nabav (talk) 08:40, 15 August 2009 (UTC)
- == Naming things ==
As Zaniolo's new normal form is used in the article to illustrate how the representation failures of BCNF can be avoided at the same time as fixing the anomalies that BCNF was intended to fix, and Zaniolo's paper defining that normal form is referenced, why not give that normal form its name Elementary Key Normal Form (EKNF) since that's what Zaniolo called it in the paper that defined it, rather than leaving it nameless? — Preceding unsigned comment added by Michealt (talk • contribs) 18:58, 27 May 2011 (UTC)
Tennis Example
editIn the tennis court example, splitting the tables as shown does not allow you to prevent scheduling SAVER and STANDARD rates (both for Court 1) with the same Start Time.71.165.187.242 (talk) 19:41, 23 July 2008 (UTC)
- Right you are. Well, my track record of coming up with edifying BCNF examples is terrible. Here, what I was going for was a table meeting 3NF but not BCNF, a table which could then be decomposed into tables meeting BCNF. In fact, as your comment highlights, my decomposition doesn't do what I thought it did. Is this one of those cases (as in the "Achievability of BCNF" section) where BCNF is unachievable? Can anyone come up with an example that doesn't suffer from this problem? The Tutor/Student example in a previous version of this article (see discussion) did not suffer from the problem ... but was judged not to be very enlightening. Frankly, though, the Tutor/Student example, awkward as it was to explain and justify, now seems better than this example - which just does not work. I welcome other people's thoughts on example-construction. --Nabav (talk) 22:37, 23 July 2008 (UTC)
- Ok: panic over, hopefully. I've altered the BCNF solution so that it doesn't suffer from the problem described above. The new solution incorporates an entirely new attribute, Member Flag - having a Member Flag is quite natural in modelling this particular situation. Because of the Member Flag attribute, this is not a decomposition. But it is in BCNF, and it does work (unless someone spots another flaw ...) The candidate keys in the solution are such as to ensure all the functional dependencies are respected. New functional dependencies introduced (and respected), are, as I hope should be obvious given the candidate keys specified, {Rate Type} → {Court, Member Flag} and vice-versa, as well as Member Flag's dependency on both candidate keys in Today's Bookings. --Nabav (talk) 07:07, 24 July 2008 (UTC)
- A minor point which I believe might make the Tennis example a little clearer. Currently the table "Today's Court Bookings" adheres to 2NF and 3NF because all attributes belong to candidate keys. It might be a little clearer if they did not so that you could demonstrate non-trivial functional dependence on something that is not a candidate key in the non-BCNF table and contrast it to non-trivial functional dependence on only candidate keys in the BCNF tables after you have BCNF-normalised. I think this contrast would help to highlight the difference between 3NF and BCNF. —Preceding unsigned comment added by 84.203.32.235 (talk) 14:29, 21 January 2010 (UTC)
- I don't get it. To me it's simply : Something is rotten in the state of Denmark. I agree with my fellow "71.165.187.242". As I see it, you can still book a SAVER and a STANDARD at 9:00 and both will met on court 1. Maybe it's me... don't know. Here my view, maybe it doesn't show what you want to explain but... A "Booking" is an event that has a court and a time. The key is "Court" and "Start Time". "Rate Type" has nothing to do up there since it depends on "Court" and "Member Flag" not on the event "Booking". What you need to know is who's booking the court. This event owns a place, a time and someone (the facts); not a place, a time and .. a rate (to fix the rate you need to know first who will be there). I'm wrong ? - Elavoie 01:09, 5 Oct. 2011 — Preceding unsigned comment added by 70.30.170.84 (talk) 05:11, 5 October 2011 (UTC)
In the example of: Achievability of BCNF
editWouldnt it be easy to give the tables an id and then require the table shop to have a unique foreign key that references the table nearest shop? —Preceding unsigned comment added by 78.42.33.168 (talk) 01:36, 18 December 2008 (UTC)
- I think your suggestion is a correct solution which satisfy both the BCNF and the initial FDs.
- I was thinking on something similar:
- AB → C, C → B
- to become
- AB → D, D → C, C → B
- where D is a unique foreign key attribute in one of the new created relations (explained below).
- In my example there are 3 relations with the following schemas:
- 1. PersonToShopType relation with attributes(Person(A), ShopType(B), PersonToShopTypeID(D)) and PrimaryKey(Person(A), ShopType(B)) and a unique ForeignKey(PersonToShopTypeID(D)). Note: Here PersonToShopTypeID(D) could be also a candidate for a PrimaryKey.
- Functional dependencies in this relation are: AB → D. Or in case of PrimaryKey(PersonToShopTypeID(D)): D → A, D → B.
- 2. NearestShop relation with attributes(PersonToShopTypeID(D), ShopName(C)) and PrimaryKey(PersonToShopTypeID(D)).
- Functional dependencies in this relation are: D → C.
- 3. Shop relation with attributes(ShopName(C), ShopType(B)) and PrimaryKey(ShopName(C)).
- Functional dependencies in this relation are: C → B.
- All these 3 relations are in BCNF and also the initial idea of functional dependencies is maintained (only slightly extended with the help of the transitive rule: AB → D, D → C lead to AB → C).
- Please someone let me know if/where I am wrong in my thoughts. Bobarman (talk) 21:09, 23 November 2018 (UTC)
The first paragraph in this section states: "...a set of functional dependencies {AB → C, C → B} cannot be represented by a BCNF schema". It then goes onto to show the Person/Shop Type/Nearest Shop example having those dependencies and after splitting into entities Shop Near Person & Shop states "...although this design adheres to BCNF,...". This appears to contradict the earlier statement unless I am missing something. — Preceding unsigned comment added by BigFatKnowAll (talk • contribs) 13:24, 15 April 2013 (UTC)
- I don't think you are missing anything. It isn't that you can't achieve BCNF, it's that you can't achieve both BCNF and preserve the previously stated FDs. Since this contradicts the final sentence, I deleted it. — Preceding unsigned comment added by 66.109.59.83 (talk) 19:03, 17 August 2016 (UTC)
Has this NearestShop example appeared in literature anywhere? Johncartmell (talk) 10:55, 23 July 2019 (UTC)
The whole key
edithttp://www.datamodel.org/NormalizationRules.html#bcnf reports:
Basically, a humorous way to remember BCNF is that all functional dependencies are: "The key, the whole key, and nothing but the key, so help me Codd."
How can a functional dependency BE the whole key key? Keys are sets of attributes not dependencies. This is a confusing reference.
While http://en.wikipedia.org/wiki/Third_normal_form reports:
A memorable summary of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a court of law, was given by Bill Kent: every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."...
...Here the requirement is concerned with every attribute in the table, not just non-key attributes.
The section on http://en.wikipedia.org/wiki/Third_normal_form is terribly worded; I cannot tell which sentences are referring to BCNF and which to 3NF. What is the "Here the requirement is concerned..." referring to? BCNF? 3NF?
I think one of the most useful things the BCNF and 3NF articles could do would be to give the easiest and most intuitive way of checking the normalization of a table. The whole "The key, the whole key..." thing seems to be on the right track. Perhaps an example would help. Something like the following:
<TABLE_1>
This table does not meet 2NF because the attribute <ATTRIB3> does not describe the key (it describes <ATTRIB2>, which is not the whole key. [or maybe that should say "...which is not even part of the key". I can't tell.]
<TABLE_2> This table now meets 2NF because all the non-key attributes describe the whole key, but it does not meet 3NF because NOT<COND1> [I have no idea what to put for <COND1>. What combination of all attributes "must provide a fact about the key, the whole key, and nothing but the key" should <COND1> be?]
<TABLE_3> This table now meets 3NF because <COND1> is met, but it does not meet BCNF because NOT<COND2>. [I have no idea what to put for <COND2>. What combination of "every attribute must provide a fact about the key, the whole key, and nothing but the key" should <COND2> be? Or should it be a combination of "every non-key attribute must provide a fact about the key..."? Or should it be a combination of "the attributes on the left-hand side of all functional dependencies must be the key, the whole key, and nothing but the key"? Or should be it be a combination of "the attributes on the right-hand side of all functional dependencies must provide a fact about the key, the whole key, and nothing but the key"?]
<TABLE_4> This table now meets BCNF because <COND2> is met.
Such an example would be extremely informative for either this article or the 3NF article. fogus (talk) 18:49, 18 April 2009 (UTC)
- I would say humorous descriptions of how to remember the rules don't belong in Wikipedia at all -- unless there is something eminently notable about the fact that someone devised such a method. --Boson (talk) 21:28, 18 April 2009 (UTC)
- "All non-key attributes must depend on [or "provide a fact about"] the key, the whole key, and nothing but the key" is a very widely cited informal characterization of 3NF. The database theorist Bill Kent coined it, and the database theorist Chris Date quotes it approvingly because it sums up normalization in a nutshell. Admittedly it leaves out some nuances and doesn't help us with tables that, say, violate 4NF without violating 3NF. Nevertheless, as a compelling simplified definition (not a "method", incidentally) that database theorists cite with approval, it definitely deserves a mention (and is mentioned, with citations, in the 3NF article; also note I have recently reworded the explanation of the informal definition in that article).
- "All non-key attributes must depend on ... the key, the whole key, and nothing but the key" ... The database theorist Bill Kent coined it"
I first used the phrase publicly in Part 3 of a series entitled "Relational Databases" in the UK computer magazine Computer Weekly 9th November 1978. I did not, however, also use the commonly appended suffix "so help me Codd" Actaeon~enwiki (talk) 18:19, 28 September 2015 (UTC)
- So I am all for continuing to make reference to "the key, the whole key, and nothing but the key", maybe even citing the BCNF variation of it in the BCNF article. On the other hand, I don't understand the call for examples. We already have examples in all the NF articles, and it seems to me that these examples are clear; so I don't see what new examples are being asked for.
- I also don't understand the questions above that ask things like "What combination of "every attribute must provide a fact about the key, the whole key, and nothing but the key" should <COND2> be?" I can't parse the question! Can you perhaps ask your questions in a different way? Once I get to grips with what you're uncertain about, I can clarify.
Better Amendment Exists For Tennis Court Example?
editThe amended (so that it meets BCNF) design of the tennis example is illogical. It may meet BCNF, but to me a more logical solution would be:
Member Type Court | ||
---|---|---|
Court (PK) | Member Flag (PK) | Rate Type |
1 | Yes | SAVER |
1 | No | STANDARD |
2 | Yes | PREMIUM-A |
2 | No | PREMIUM-B |
Today's Bookings | |||
---|---|---|---|
Court (PK) | Member Flag | Start Time (PK) | End Time (PK) |
1 | Yes | 9:30 | 10:30 |
1 | Yes | 11:30 | 12:30 |
1 | No | 14:00 | 15:30 |
2 | No | 9:30 | 10:30 |
2 | Yes | 11:30 | 12:30 |
2 | Yes | 14:00 | 15:30 |
We have two courts (i.e. 1 and 2). we can allow each court for either member or non member at a time. if it is understood that a court should be allowed for a fix duration then here we can select either (court,start time) as a key or (court,end time) as a key and if it is not then we need to focus on that a start time should not be overlap with or before end time. the member flag is not need to take as a key because we have a court and we can allow it to either a member or to a non member.
This solution seems more straightforward to answer the common question "What courts are booked today?". — Preceding unsigned comment added by Derekwork (talk • contribs) 21:01, 16 February 2011 (UTC)
For example, using the article's ammended BCNF tables, one might make the mistake of adding the bookings (PREMIUM-A, 13:30, 15:00) and (PREMIUM-B, 13:30, 1500). Neither would violate the primary key of the bookings table, but both would attempt to book court 2 for the same time period. —Preceding unsigned comment added by 206.209.102.35 (talk) 21:05, 25 March 2011 (UTC)
Not normalization
editThe amendment of the Tennis Court Example isn't normalization at all! It adds a new attribute, the membership flag, and that is not the job of normalization. Mind you that I am not saying that it is wrong to do this when you are designing a DB in practice. However, since the article is meant to show what normalizing is, it shouldn't do this. Instead the proper normalization would have been to split up the original table into a first table {Start Time, End Time, Rate Type} which has two keys {Start Time, Rate Type} and {End Time, Rate Type} and a second table {Rate Type, Court} where {Rate Type} is the key. LR. — Preceding unsigned comment added by 178.119.195.3 (talk) 13:06, 17 July 2016 (UTC)
- I would argue the example does not conform to 1NF because the Rate Type conflates the Court and Member Flag
- AntPraxis (talk) 18:49, 15 February 2021 (UTC)
Why is Rate Type a prime attribute?
editIf a candidate key is the minimum needed to uniquely identify the row and court + start (or end) time is a candidate key then why is count + start (or end) time + Rate Type also a candidate key? The extra column is not needed as a key at all. — Preceding unsigned comment added by Pedzsan (talk • contribs) 03:36, 14 April 2011 (UTC)
- A prime attribute is an attribute which belongs to a candidate key. Since {Rate Type, Start Time} is a candidate key => so Rate Type is a prime attribute.
- {count, start (or end) time, Rate Type} cannot be a candidate key, but only a superkey. Where is it written so in the article? Maybe it's deleted before I saw this article today.
- 石庭豐 (talk) 16:51, 30 April 2011 (UTC)
"Today's Court Bookings" table has more than four superkeys
editCurrently, it's written that "Today's Court Bookings" table only has four superkeys:
{Court, Start Time}
{Court, End Time}
{Rate Type, Start Time}
{Rate Type, End Time}
However, following superkey's definition, there are more than that. So I added them all.
石庭豐 (talk) 15:48, 30 April 2011 (UTC)
- No, they are not all superkeys. The superkey definition states, that a superkey must identify unique tuples in the relation for all possible values. Therefore, {Rate Type, Start Time} and {Rate Type, End Time} are not superkeys. Consider this counterexample:
- {1, 14:00, 15:30, STANDARD}
- {2, 14:00, 15:30, STANDARD}
- They are both identified by the same {Rate Type, Start Time} and {Rate Type, End Time} tuple. — Preceding unsigned comment added by 89.103.72.59 (talk) 10:48, 16 December 2019 (UTC)
- {2, 14:00, 15:30, STANDARD} is not valid in this database because it's defined that all STANDARD bookings are on court 1. Therefore they are superkeys
- AntPraxis (talk) 18:54, 15 February 2021 (UTC)
Can't see why {Court, Start Time} and {Court, End Time} are NOT candidate keys
editAccording to candidate key's definition (and even intuitively), I don't see why {Court, Start Time} and {Court, End Time} are NOT candidate keys. So I included them back.
As a matter of fact, the paragraphs after that make more sense when these two superkeys are ALSO candidate keys.
If someone could mathematically prove that they are not candidate keys, please rectify my change.
石庭豐 (talk) 15:54, 30 April 2011 (UTC)
Incorrect statement about functional dependency?
editIn the "Nearest Shops" example, the following is stated:
>The table is not in BCNF, however, as the Shop type attribute is functionally dependent on a non-superkey: Nearest shop.
The definition of a functional dependency is that X → Y if, and only if, each X value is associated with precisely one Y value. The above sentence states that Shop Type → Nearest Shop. However, the example shows two different 'nearest shop' values for the shop type of 'Hairdresser'. Even thinking about it logically, in the real world there can be multiple shops of the same type.
I believe what the sentence should say is:
>The table is not in BCNF, however, as the Nearest shop attribute is functionally dependent on a non-superkey: Shop type.
I won't make the change myself because I'm not an expert and it's very possible I've misunderstood something and got this wrong. Hopefully someone more knowledgable can weigh in. Frakur (talk) 12:37, 26 May 2020 (UTC)
- The statement about functional dependencies is in fact correct. If X refers to the nearest shop and Y refers to the shop type, each X value has precisely one Y value. In other words, each shop has precisely one shop type. The shop type is not unique between shops. Each X value having precisely one Y value does not mean 2 X values cannot share a Y value. "In the real world there can be multiple shops of the same type" is disproving Y → X rather than X → Y
- AntPraxis (talk) 18:44, 15 February 2021 (UTC)
NP Complete?
editThe article states
> It is NP-complete, given a database schema in third normal form, to determine whether it violates Boyce–Codd normal form.
However I can't see how this could be true based on the Wikipedia definition of NP-complete. One of the features of NP-complete is that the solution can be validated quickly but in this case the "solution" is a boolean value so surely if the solution can be validated quickly then the maximum time needed is 2x "quickly"? Mjaggard (talk) 10:31, 8 February 2023 (UTC)