Primary keys with informational content

edit

A commonly overlooked violation of 1NF is building intelligence into an identifier. The VIN for an automobile is not atomic, besides a sequence number it contains:

1) A code identifying the region where the vehicle was manufactured;

2) A code identifying the manufacturer;

3) Several codes identifying attributes of the vehicle;

4) A code indicating the vehicle model year;

5) A code indicating the plant where the vehicle was made;

Sadly, the VIN is probably the primary key to many tables in existence today. Business persons have a natural tendency to build inteligent keys. There is no mention of this in the main article, it only addresses the classic textbook example of eliminating redundant columns. Mooredc 17:56, 16 August 2006 (UTC)Reply

1NF does not have any prohibition against "intelligent keys". 1NF is only a prohibition against domains which contain relations. A VIN is not a relation, it is a simple string. There may be other arguments against primary keys encoding information, but this discussion is unrelated to 1NF. --80.62.117.218 (talk) 09:44, 12 June 2021 (UTC)Reply
GUIDs contain (or, did in the past) a mac address and a timestamp, does that make a GUID a bad key? Isn’t the question more to do with whether that “intelligence” is actually used for anything? —Random832 20:33, 19 September 2007 (UTC)Reply
In August 2006 when Mooredc wrote this, the article did not contain a section on atomicity; now it does, and hopefully it gives a sense of why Date and others consider it a mistake to hang one's definition of 1NF on the notion of atomicity. 1NF aside, there are many situations in which building "intelligence" into a domain - particularly when it's a primary key column's domain - is a bad idea. If CAR_ID = "RED16", and the "RED" substring refers to the colour of the car in question, then we have a dilemma when someone paints the car blue. The identifier isn't much of an identifier if we are compelled to alter its value from time to time; on the other hand, if we keep its value the same in these types of situations, its meaning becomes muddled. Further, if knowing the colour of the car is important, we may as well expose the colour to the RDBMS by making the colour a column in its own right. You are exactly right when you say that what matters is whether the intelligence is used for anything - if it is, we get problems of the type described; if it isn't, we don't. --Nabav 22:03, 2 October 2007 (UTC)Reply
edit

I don't know if this is the right place to add my two pence, but I was wondering if it wouldn't be a good idea to have some kind of links in the bottom of the article to move from 1NF to 2NF to 3NF and so on, instead of having to go back to the normalization in order to move from the article on one normal form to the other. ray 17:04, 21 August 2006 (UTC)Reply

Glad I'm not the only person thinking that. I'll get on with that template.--VinceBowdren 18:04, 21 August 2006 (UTC)Reply
Obviously, your comment here did get noticed. I've been treating Talk:Database normalization as the place for discussion of this set of articles in general, though I and the others working in this area have all the articles on our watchlists anyway. I don't think its worth setting up a WikiProject for this few linked articles.--VinceBowdren 19:12, 21 August 2006 (UTC)Reply

What if NULL is part of the applicable domain?

edit

I mean, take this:

First Name Last Name Middle Name
John Public Q
J Hacker Random
John Doe

“Has no middle name” is a valid answer, yet having a nullable column seems to violate that requirement —Random832 20:28, 19 September 2007 (UTC)Reply

The first point to make is that Null is never part of the applicable domain. The Null (SQL) article puts this succinctly: 'Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value.'
Second, although you are completely right in saying that "has no middle name" is a valid answer, it does not follow that a Null in a Middle Name column is the only way of expressing the proposition. There are other ways of doing it. A logically-sound way that does not involve Nulls is:
Person ID First Name Last Name
1 John Public
2 J Hacker
3 John Doe
4 George Bush
Person ID Middle Name Sequence Num Middle Name
1 1 Q
2 1 Random
4 1 Herbert
4 2 Walker
Notice that with this method we also allow a person to have any number of middle names.
The two tables above are in 1NF even by Date's stringent definition. --Nabav 06:51, 2 October 2007 (UTC)Reply
Agreed. Also even the table in the primary example needn't necessarily contain a Null value in the sense mentioned in the Null (SQL) article to present as such. It could, in fact, contain instead a single character string comprised solely of the "nul" character (\0). Nul is a legitimate member of the set of possible characters in a string (generally serving to mark the end of a variable length string) and is therefore separate of the "Null" value (i.e. the empty data set) MerlinYoda (not signed in) 65.117.116.130 (talk) 21:34, 29 December 2010 (UTC)Reply
Is this "logically-sound way that does not involve Nulls", in which individual words of a person's name are stored in separate rows of some relation, ever implemented in a practical system? If so, which system? And if not, why not? --Damian Yerrick (talk | stalk) 15:58, 5 October 2011 (UTC)Reply
The trouble with this discussion is that the example of a domain which contains null is very wrong and all the discussion is about this straw man. The statement that null is never part of the applicable domain should not be left unchallenged. The definition of NULL in SQL has nothing to do with relational algebra and much to do with SQL. Christopher Strachey (the first director of the OU Programming Research Group, and Oxford's first full professor of computer science) worked on the use of domain theory to obtain a semantics for computation beginning in about 1967 and in later years continued this work in collaboration with Dana Scott when Scott was at Oxford (from 1972 onwards) until his death in 1975, after which Scott continued to elaborate the theory. The result of this collaboration was of course the denotational semantics for which the names "Scott and Strachey" are famous in both the computer science world and the mathematics world. The domains Strachey was using (and those which Scott used when he joined in) always had a NULL element (usually called "bottom"); in fact bottom has become very much a standard concept in the field of declarative programming (which most relational database theorists claims includes their relational algebra). The bottom element of a domain is that element which provides no information other than it is a member of the domain concerned. Since relational database theory (and normalisation too) are, unlike SQL, part of computer science and mathematics, domains in relational theory can and usually do include NULL, although it is extremely good practise to avoid its use wherever possible (which, luckily, is quite often). So now tell me: if I want to represent these domains, including NULLs, why can't I do so and how does it conflict with first normal form? Michealt (talk) 21:55, 1 May 2011 (UTC)Reply

Seems like the use of the word tuple is inconsistent

edit

The article contains the following:

A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view.[5] This violates condition 1. The tuples in true relations are not ordered.

However, when I followed the link for tuple, the formal definition said that tuples are ordered. —Preceding unsigned comment added by 84.75.117.176 (talk) 10:44, 30 September 2007 (UTC)Reply

The formal definition is talking about how the values within a tuple are ordered, whereas here we're talking about something different, namely the possibility of tuples being ordered with respect to each other. I've changed the wording to make this more clear. --Nabav 22:49, 1 October 2007 (UTC)Reply

So the link for tuple contradicts the second point of the definition - that there is no left to right column ordering - by saying that the elements of a tuple have an intrinsic order. Michealt (talk) 23:57, 16 August 2011 (UTC)Reply

1NF Example is not appropriate

edit

Please revise the 1NF Example as it can be an example of 2NF also. —Preceding unsigned comment added by 122.164.254.49 (talk) 11:14, 1 October 2007 (UTC)Reply

done.Heathcliff (talk) 03:39, 8 February 2008 (UTC)Reply
There was nothing wrong with the 1NF example being in 2NF as well. As has been explained before on the discussion pages of some of the other NF articles, and in the main database normalisation article, normalisation is not an iterative process: we do not normalise to 1NF, then to 2NF, then to 3NF, etc. On the contrary, if a design problem prevents a table from meeting Nth Normal Form, correction of the problem typically causes the revised table(s) to meet not only Nth Normal Form, but the higher normal forms as well (thus for example the revised table(s) will be overwhelmingly likely to be in 5th Normal Form). Thus I've reverted the example back. Having said all this, I believe the article DOES need some additional material that briefly notes that some 1NF tables suffer from problems, and that the job of 2NF is to address those problems. An example of such a 1NF table could be given. All of this would be in a separate section called something like "Normalization Beyond 1NF", and, of course, a link to the 2NF article would be given within that section. --Nabav (talk) 18:49, 29 February 2008 (UTC)Reply
I've added the "Normalization Beyond 1NF" section now. --Nabav (talk) 19:41, 28 April 2008 (UTC)Reply

How is the example given under the heading "Normalization Beyond 1NF" a 1NF table? IF you assume the customer ID is the primary key here, then isn't there a violation of having unque key in rows 2 and 3? - Henry —Preceding unsigned comment added by 192.193.164.8 (talk) 07:52, 7 August 2008 (UTC)Reply

To assume that the customer ID is the key is to assume that Telephone Number must be functionally dependent on Customer. It isn't. The reason it isn't is that multiple telephone numbers can legitimately correspond to a single customer in our example. Upshot: the key is {Customer ID, Telephone Number}. I've made this explicit in the article now. --Nabav (talk) 21:02, 9 August 2008 (UTC)Reply
As someone new to 1NF I find this confusing, because your example here is essentially identical to the original problem in the 1st example. If you are saying that another acceptable solution to make the table conform to 1NF is to alter the key, then that needs to be explicitly stated in the article. —Preceding unsigned comment added by 67.85.254.145 (talk) 16:57, 29 March 2009 (UTC)Reply
Point taken, it was confusing. I've changed the example. --Nabav (talk) 16:50, 31 March 2009 (UTC)Reply

Pure 1NF Example mention seems to me necessary:

1NF Simple Solution of Repeating groups

edit
CUSTOMER_TELEPHONE
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
456 Jane Wright 555-776-4100
789 Maria Fernandez 555-808-9633

This 1NF design, not in 2NF, is relational and allows further normalization. The new PK is (CustomerId,TelephoneNr). EnriqueVillar Jan 12, 2011 —Preceding unsigned comment added by 88.31.77.59 (talk) 17:32, 12 January 2011 (UTC)Reply

Atomicity

edit

Seems like there should be some mention of the importance of atomicity instead of just explaining that Date disagrees with Codd and listing the ways atomicity can be taken to extremes. Ostensibly, Codd was trying to say that putting a field in a field isn't generally a good idea. For example, you wouldn't want a flight number field to contain the codes for the airports of departure and arrival. That's the common sense way to interpret atomicity. Seems like that should be mentioned first instead of not at all. Then talk about all the theoretical mumbo jumbo that has limited practical value to someone just learning about 1NF. --Trweiss (talk) 22:09, 12 May 2008 (UTC)Reply

Cases in which somebody wants to place more than one of the same type of thing in a single field have been covered in the "Repeating Groups" section of the article. The case you mention, involving placing multiple different types of things (i.e. departure airport and arrival airport) in the same field, arguably hasn't been covered. And covering it might be a good way to introduce atomicity. As an intro to the Atomicity section, we could mention something like the departure airport / arrival airport case, and make the point that this is the sort of thing people are trying to discourage when they encourage "atomicity". So in summary: I agree! We just need to be careful not to give the reader the impression that atomicity is a clearer and more well-defined concept than it actaully is. Atomicity is in the eye of the beholder (a string field can contain multiple letters of the alphabet combined together into words, for example, and no one complains about that) ... If the smaller components of the data within the field are not worth separating out as fields in their own right, then don't; if they are, then do. --Nabav (talk) 08:10, 13 May 2008 (UTC)Reply
Yes Date disagreed with Codd about atomicity, but he expresses what he means no less obscurely than Codd did. His 4th condition "Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else)" is just Codd's atomicity condition (with something intended to help justify his exclusion of nulls tacked onto the end in partentheses) and I am sure Codd would have agreed with Date's wording (apart perhaps from the parenthetical ending). Codd said that a value in the domain should not be decomposable except by some special functions; Date said decomposable values are usually not allowed but are allowed sometimes. I actually found Codd's ideas less vague and undefined than Date's, because it was quite clear that Codd knew what he wanted to forbid (and, in my opinion, quite clear what that was). It's not at all clear what Date wanted to forbid, or even that he had any fixed idea about what he wanted to forbid (apart from forbidding NULL). Nabav's last sentence above seems to be quite a good description of Codd's objectives in atomicity, and also of what I think Date was probably trying to say, and probably ought to be in the article.MichealT (talk) 13:21, 7 March 2010 (UTC)Reply

1NF tables as representations of relations

edit

The requirement that tables represent mathematical relations (no duplication or implicit ordering of rows) is not part of 1NF, but is prior to the idea of 1NF; if Date says otherwise he is wrong. The article should reflect Codd's original article, which proceeds in two steps: first, the proposal to base data modeling on mathematical relations, and second, the proposal of "normalization", later called 1NF, which is the systematic elimination of "non-simple" value domains. The article should not confuse these two things like it does now. Rp (talk) 14:50, 3 February 2009 (UTC)Reply

In general, our NF articles reflect the original text in which the NF was introduced. The original text typically provides an absolutely rigorous definition of the NF. Not so in the case of 1NF, however. Codd never supplied a rigorous definition of 1NF - whereas, in his "Further Normalization..." article, he DID supply rigorous definitions of 2NF and 3NF.
What you get in Codd's "Relational Model of Data for Large Shared Data Banks" article is a rather informal description of a procedure which Codd called normalization. The description of the procedure is reliant on highly ambiguous notions such as that of a "non-simple domain", which is related to the equally ambiguous notion of "atomicity". Because of this ambiguity, it is quite impossible to make definite judgments as to whether a table is in (Codd's) 1NF or not.
Date's definition is not an attempt to express Codd's definition in different language. Rather, it is an attempt to define a version of 1NF that rests on unambiguous criteria and is (from Date's point of view) sensible. Fidelity to Codd's intentions is not particularly relevant here. Date disagrees with Codd on certain points, and would not claim that his definition of 1NF is in line with all of Codd's intentions (whatever they may have been).
Incidentally, I believe that in stipulating as part of his 1NF that a table has to be isomorphic to a relation, Date IS in agreement with Codd. The distinction you are making - between "a requirement prior to X" and "a requirement of X" - cannot hold. "A requirement prior to X" clearly IS a requirement of X. For example, the property of "being a fisherman" is a property of human beings, and I'm sure you'd agree that the idea of being human is prior to the idea of being a fisherman. But we speak the truth if we cite "being human" as one of the criteria for being a fisherman. Most definitions would not bother mentioning it, but a definition that does mention it is not wrong. ("A human being who fishes by profession" - this would not be wrong.)
This is exactly what Date is doing. And one can see why. Both he and Codd think that 1NF is a property of relations - well then, anything that is not a relation clearly cannot be in 1NF. A table with duplicate rows is not a relation. Therefore a table with duplicate rows is not in 1NF. Of course, at the time Codd was writing, there was no point even mentioning this, because SQL tables (which of course can accommodate duplicate rows if no unique key constraint has been defined) did not exist at that time! But they certainly do exist now, which is why it makes sense to say explicitly that a table that can accommodate duplicate rows is not in 1NF. --Nabav (talk) 15:21, 6 February 2009 (UTC)Reply
I disagree. Regardless of his inaccurate description, Codd's notion of 1NF is a property of mathematical relations, namely the absence of "non-simple" attribute values, and this is how mathematical treatments of relational databases use the term. Therefore, Date's extension of the notion, which includes certain requirements on implementations of mathematical relations, is confusing, and the article should be amended to correct this. Wikipedia should explain terms as they are normally used, and not restrict itself to particular ways of using them, especially when they are not the most common. At the same time, Date's books are so popular that his use of the term must also be explained here. Rp (talk) 10:03, 9 February 2009 (UTC)Reply
Perhaps you are right that the article currently doesn't devote enough attention to Codd's original account of 1NF. This can be corrected. However, I'd just sound one note of caution. When you say that Wikipedia should explain terms as they are "normally" used, I basically agree with you; but I don't agree that the term "first normal form" is normally used in either the way Codd uses it in his "Relational Model of Data for Large Shared Data Banks" article, or in the way Date uses it. The normal usage of the term, the most popular usage I have come across, is as a piece of practical database design guidance, according to which (for instance) examples 2 and 3 in this article are "violations" of 1NF - even though none of the attributes in those examples are of type RELATION or TABLE. The best one can say about such examples is that they somehow contravene the spirit of 1NF, if not the letter. Getting the right balance in an article about 1NF is very tricky, because the article must do justice not only to popular examples of that kind, but also to Codd's original characterization of 1NF, and also to Date's new definition of 1NF - all with a view to illuminating the actual implications of 1NF for databases, because those implications are what give 1NF its importance in the first place. Each of those three understandings of 1NF is controversial in its own way. Nevertheless, all we can do is continue to tweak the article until we do get the balance right! --Nabav (talk) 11:23, 9 February 2009 (UTC)Reply

Isomorphic to a relation

edit

There is a blatant contradiction between the between "isomorphic to a relation" and number 2 in the list of 5 conditions. A relation is nothing more or less that a subset of the cartesian product of several domains: each element is therefore an ordered finite sequence of values. The attributes don't have names. If a table is to be isomorphic to a relation, the columns must be ordered. This is an error on Date's part.

A better statement would be that a table must be a map between attribute names and values. One could also say that when a total order is assigned to the map's set of attribute names and is used to order the values in each each map element so as to produce a tuple (sequence of values) for each map element: the set of such tuples is required to be a relation (a fairly meaningless statement, as it's easy to show that any such construction from a finite map will generate a relation).

Of course it's possible to get the attribute names into a relation (replace each attribute domain by the cartesian product of the singleton set containing the attribute name and the original attribute domain) but the description is rather far-fetched and requires a lot of mental gymnastics to explain how one can handle the concept os the domain with attribute-name "pet" in one table holding a value equal to one in the domain with attribute-name "dog" in another table (redefining equality is such fun) and even then the order is still there - there's no way of eliminating it from a relation. So I prefer to say that a table must be (isomorphic to) a set-theoretic map, so that the fields are accessed by name and not by order.MichealT (talk) 13:04, 7 March 2010 (UTC)Reply

Inaccessible lead needs improvement

edit

The lead paragraph is too technical. It is my understanding that an 1NF can roughly be summarized as a table where the fields only contain one piece of information (no lists items). This statement may not be technically rigorous but it at least tries to capture the gist of the situation so that beginners can gain a basic understanding of the topic. The sentence, "A [1NF] table is a faithful representation of a relation and that it is free of repeating groups" absolutely inaccessible to the bulk of the readership. It is an instant turnoff that belongs nowhere in the lead. Technical jargon should be appear after a more down-to-earth introduction. I don't feel like I have the background to wisely word a less technical lead, so hopefully one of you can do it. Jason Quinn (talk) 14:48, 16 June 2011 (UTC)Reply

Perhaps Date puts it better, when he writes

"At every row-and-column position within the relation [table], there is always exactly one data value, never a set of multiple values. In other words, first normal form just means no repeating groups (loosely speaking).

:Of course, even this could be misleading to a beginner, because a value may be a relation value (and may therefore contain more than one primitive values); so to give beginners a basic understanding of the topic (i.e. not lead them into believing things which appear obvious but are fundamentally incorrect), we might—counterintuitively— need to give more, rather than fewer, technical details. This is because the beginner might think it is obvious what "data value" means, where it is not.--Boson (talk) 20:37, 16 June 2011 (UTC)Reply
Physics articles have a similar issue of trying to make technical topics accessible. This is done with some success there. Between the two, database concepts are far less complex. I see no reason why simple introductions cannot be given for the various normal forms. From my own reading of beginner's material on databases, I think that obfuscation through jargon is endemic to the whole field. It is almost as if the authors are trying to make the subject more complex than it is. It's just bad writing. I took courses in mathematics where the foundational mathematical concepts that underlie relational databases were presented with far more clarity and ease of access than some introductory database books manage to achieve through their "practical" approach. That's ass backwards. I do not see giving a non-precise "working" definition before a precise one as a problem. (Imagine if we did that with the real numbers: "Hey kids, the real numbers are the unique complete Archimedean ordered field up to isomorphism".) From a pedagogical perspective, it is necessary to motivate a complex topic by using less-complex examples. If some precision is lost because of this, it can always be clarified later. The value of learning by example cannot be underestimated. There's no good reason that examples of 1NF and non-1NF tables cannot be given before the technical sections. The human brain is a learning machine. We are able to extract abstract concepts from concrete examples even when the cause of the differences is not known. The examples that are given in the "Repeating groups" section might as well not exist at all because the technical jargon surrounding them has already alienated most of the audience likely to read this article. They should be moved up and outside of that section. Jason Quinn (talk) 02:06, 17 June 2011 (UTC)Reply

Primary Keys

edit

One big difference between the Date and Codd formulations of first normal form is that Codd regarded primary keys as crucial - every relation must have one or more primary keys (today those would be called candidate keys) and the definition of the relation must specify which of these primary keys is to be the primary key. The importance of the primary key is that it, together with an identification of the relation, is the means by which a row is identified by a user of the relational system - so the primary key is something meaningful to the user. As a consequence of the requirement for a primary key, a relation can not contain duplicate rows. The Date view seems to be that it is fundamental that there are no duplicate rows; so there is at least one superkey (the whole row) and so there must be at least one minimal superkey, that is at least one candidate key, and any one of those can be picked as the primary key. When it comes to views, since duplicates are eliminated from projections there is still guaranteed to be a primary key. Superficially, these two approaches look the same; in fact they turn out to have very different consequences, and probably explain why Codd could envisage a relation permitting attributes that could be NULL, while this is anathema to Date (Codd in his later work doiesn't require derived relations to have primary keys - but Date's starting from the position that there can never be duplicates and needing the existence of a primary key to be a consequence of this for base relations, can't allow nulls); it also leads to a difference of views about whether the meanigfulness (to users) of primary keys is or is not fundamental. It is a great pity that teh article does not treat primary keys at all - but I suppose it is an inevitable consequence of it s being written from a perspective based on Date's approach, instead of Codd's. I think the article could be greatly improved by inclusion of both approaches, instead of just one.Michealt (talk) 00:21, 17 August 2011 (UTC)Reply

edit

Hello Jerome Potts. The new lead sentence references normal form. This is likely to confuse new readers who will feel the need to read the Database normalization#Normal forms section in order to understand what first normal form is. It isn't easy to understand the linked section, and it isn't necessary in order to understand the definition. It is sufficient to say 'First normal form is a property of a relation', and reference the normal forms section later, for further reading. - Crosbie 18:43, 21 March 2013 (UTC)Reply

Statement of First Normal Form is Too Narrow

edit

The second sentence of the main article reads: "A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain."

There are two problems with this:

  1. Neither the relational model nor any of the well-known databases allows anything other than an atomic value as the domain. (For the present purposes, consider the newest additions to SQL, such as row and array values, to be atomic.) As the main article says, "A typical relational database management system will not allow fields in a table to contain multiple values in this way." Therefore, a reader would puzzled as to why he or she is being warned against doing something that isn't possible. (Like the sign on an elevator that says: "Out of order. Do not use.")
  2. The real problem is with tables that have columns that seem to repeat, such as a Department table with columns like Employee1, Employee2, etc. This badly designed table satisfied 1NF as the rule is stated here. This is because Employee1, Employee2, etc., are distinct columns and the rule is exclusively about what goes on in a single column. Also, Employee1, Employee2, etc., are distinct attributes. In many writings on relational database theory, attributes are given as A1, A2, etc., a naming scheme that suggests that the Ai are a list, just as the Employeei are.

So, something should be done to this page, which I'm willing to do, but I thought some discussion might be in order first. The choices that come to mind are:

  • Add a note that 1NF can never be violated, and that the bad design (Employee1, Employee2, etc.) that is normally thought to be in violation of it in fact is not, but that bad design is still bad, as are many other kinds of bad designs that happen to be normalized but are still bad.
  • Change the statement of 1NF so it somehow applies to the case where the columns (attributes) are distinct, but in violation of the rule because they appear to be the same attribute repeated.

It will be hard to phrase my second choice because while Employee1, Employee2, etc., appear to be a repeated column, in fact they are not. An example that better illustrates the problem would be a table with columns FirstName, MiddleName, LastName, which is very frequent. Is every table in the world with these columns in violation of 1NF? Probably most people would say they are not. But, suppose it's observed that first, middle, and last names don't apply to many cultures, and therefore better column names would be Name1, Name2, Name2. Not it looks like it's in violation of 1NF. But it's a bad rule that depends on how the columns are named. A good rule should be about the structure of the table, not what choices are made for column names (as long as they're distinct, that is). — Preceding unsigned comment added by Rochkind (talkcontribs) 14:02, 7 May 2013 (UTC)Reply

The current definition is sourced to Fundamentals of Database Systems, Fourth Edition. If we tinker with the definition it is no longer reliably sourced - it is original research. As you say, the article already states 'A typical relational database management system will not allow fields in a table to contain multiple values in this way'. It would even better to have a source for this statement also. - Crosbie 18:06, 7 May 2013 (UTC)Reply
So you'd go for my first bullet point? Rochkind (talk) —Preceding undated comment added 21:21, 7 May 2013 (UTC)Reply
Perhaps it is analogous to stating that a properly made three-legged stool has three legs, but I think we need to be very careful. Interpretation of "atomic" may also vary (historically). And I think we should be careful (probably more careful than the article currently is) to treat normalization as conceptually independent of (and prior to) an actual database. After all, relations do not contain nulls (because nulls are not values), but "relational" databases may do so. I think it is OK to state that relations are always in first normal form, but - at the stage of design where normalization takes place - it is something that has to be achieved deliberately. In other words, you can use some notation for depicting a relation and come up with something that is not in first normal form. Checking against the rules for first normal form is one of the tests to see if it is a valid relation. You don't check to see that it is a relation and then conclude that it must therefore be in first normal form. I take your point about "Employee1" and "Employee2", but I think we need to consider the concepts. We need to relate "Employee1" and "Employee2 to attributes of a conceptual data model and define the concepts in a meaningful way. And we need to avoid nulls. --Boson (talk) 23:25, 7 May 2013 (UTC)Reply
Agree. But the problem is that someone might be told that a database with employee1, employee2, etc., is not in First Normal Form (what just about everyone would say), and then going to Wikipedia, would find out that it IS in First Normal Form, even though it is obviously a bad design. If First Normal Form is not the rule that one uses against employee1, employee2, etc., then what is? Normalization may be about relations (conceptual model), but in practice it is also used on tables (physical model). Is this article only about mathematics, or is it also about computer-system development? Perhaps I should add a section about First Normal Form in Practice and see what people think of it? Rochkind (talk) —Preceding undated comment added 23:32, 7 May 2013 (UTC)Reply
There may be no absolute definition of atomicity, but if your data model has an entity "Employee" with, say, a one-to-many relationship between Department and Employee, and your database is meant to allow normal queries about employees (which implies the use of relational operations by the DBMS when implementing queries) then I would say that use of Employee1, Employee2 etc. (as foreign keys) is (normally) denormalization, so it does violate 1NF. I don't think that is changed by the fact that you can trick the DBMS into not noticing what you are really doing at the logical level. It is different if the "columns" really represent different concepts (as with Name1, Name2 used for the names written in different positions of the full name), and it is different if a set of employees is meant to be treated only as a unit. --Boson (talk) 11:35, 8 May 2013 (UTC)Reply
Couldn't agree more. But don't you think it's a problem that our Wikipedia page here says nothing whatsoever about this interpretation of 1NF? Most database-design books and articles discuss what's wrong with my table under the heading of 1NF. Rochkind (talk) 15:27, 8 May 2013 (UTC)Reply
I think I now better understand what you are saying. I agree that it is a problem. --Boson (talk) 23:43, 8 May 2013 (UTC)Reply
Rochkind - when you say 'Most database-design books and articles', it would be useful to provide a reference or a link to an example. Similarly, any changes to the article should be well-sourced. - Crosbie 05:16, 9 May 2013 (UTC)Reply
They will be. Rochkind (talk) 01:12, 11 May 2013 (UTC)Reply

I think it will be difficult to explain this to the average reader without discussing the relationship between the real world and the physical tables, via the conceptual data model and the logical (ER) data model, etc. It is easy to see that

  • Each employee is a person
  • An employee is either an active employee or an inactive employee
  • An active employee is employed in exactly one department

can lead to different types of employee, but - unless the table is seen in isolation - it is not easy to see how you can end up with a table containing a column Employee1 and a column Employee2, both of which must in effect be foreign keys referring to the same table, unless the two references derive from two different relationships (for instance if Employee2 expresses a relationship like "manages/is managed by"). To keep the discussion at the level of the relation, perhaps it would be useful to introduce the concept of "external predicates", the word "external" indicating that we are talking about "what relations mean to the user, rather than to the system". The "heading" of a relation "can be regarded as denoting a certain predicate . . . [and ] each tuple in the body . . . can be regarded as denoting a certain proposition (i.e., a statement that is unconditionally either true or false)" (this can be sourced, for instance, to Date, Darwen, Lorentzos: Temporal Data and the Relational Model).
Another thing that occurs to me is the historical perspective, which may help explain why certain ideas about atomicity may have evolved. I'm not sure how much can be stated without involving original research, but I suspect that the development of object-oriented systems served by relational back-ends and the "integration" of data models and object models may have helped emphasize the idea of single devoper-defined operations on complicated data types, supporting a different understanding of atomicity.
Perhaps we also need to separate different perspectives:

  • that of the "relational mathematician"
  • that of the system designer, explaining things in terms of the data model
  • that of the database implementor, explaining things in terms of tables, foreign keys, etc.
  • that of the man in the street (or on the Clapham omnibus), who needs the above perspectives explained in layman's terms

We might need to think about what information goes in which articles (including Database normalization, Relation (database), and the various normal forms). After all, this article and the other nNF articles are sub-articles of Database normalization. Perhaps all the articles should put more emphasis on the process of normalization - stressing that you don't start with the final database, but without giving the impression that one actually (as opposed to notionally?) progresses through different normal forms. At the moment I would be hard-pressed to define the readership that this article is targeted at. --Boson (talk) 15:40, 11 May 2013 (UTC)Reply

Is tree representation not in 1NF?

edit

As per the article, Date’s definition states that there’s no top-to-bottom ordering of the rows. If that’s the case, is a common representation of a tree structure, such as the table below, not in 1NF?

Person
ID Name Mother ID
1 Jane Doe
2 Alice Doe 1
3 Bob Doe 1
4 Edgar Doe 2

I find it hard to believe so there must be something else going on. It would be good if someone explicitly mentioned this or gave an example of a table which is not in 1NF because of the top-to-bottom ordering.— mina86 (talk) 16:16, 9 September 2015 (UTC)Reply

I would understand the definition to mean that the table that you gave is the same as this table:
Person
ID Name Mother ID
3 Bob Doe 1
4 Edgar Doe 2
1 Jane Doe
2 Alice Doe 1
--Boson (talk) 21:40, 9 September 2015 (UTC)Reply
PS: The null value for Mother ID would also be a problem, but that is a different issue. --Boson (talk) 21:47, 9 September 2015 (UTC)Reply
D'oh! So it was just a case of me being silly. Thanks; and forget I brought it up.— mina86 (talk) 23:06, 9 September 2015 (UTC)Reply
edit

Hello fellow Wikipedians,

I have just modified one external link on First normal form. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 15:15, 1 October 2017 (UTC)Reply

Examples to explain the requirements

edit

As someone coming here to learn about this stuff for the first time, with my only related knowledge being professional experience with some pretty basic SQL, I read the whole article a few times and still am not sure I know what 1NF is.

This is a common experience. When you read the original author, Codd, the meaning of 1NF is fairly clear: the values in table cells should be atomic. However, what this means exactly is open to dispute, and to make matters worse, other authors tend to list additional design criteria under first normal form, turning it into an umbrella term. As a result, the meaning of the term varies from author to author. Rp (talk) 15:33, 4 November 2018 (UTC)Reply

I think some tables violating each of the different restrictions would be helpful, especially with corrected versions. The only example given is about atomicity, but also makes a kind of confusing note about the spirit of related groups being violated, but maybe not technically being violated?

Eliminate repeating groups in individual tables

edit

Is this table in 1NF? Does it violate the Eliminate repeating groups in individual tables rule from the bullets in the lead? Additionally, does it violate the Identify each set of related data with a primary key rule?

Name Grade
Paul 3
Paul 3
Alice 4
No, this is not what repeating groups refers to. This design is right out even before you start talking about normalization. In the relational model, relations ("tables") are sets of tuples ("rows"). So what you're looking at here is the same as
Name Grade
Paul 3
Alice 4
and also the same as
Name Grade
Alice 4
Paul 3
They are all different visual representations of the same thing: the relation
 
In SQL databases, duplicated rows are possible, and rows are ordered. Hence, some authors say: treating SQL tables like sets, by purposely not making use of the order and possibly multiplicity of their rows, is part of the requirements of first normal form. It really is not. It would be better to call this 0th normal form, if you want to give it a name. It simply doesn't arise in Codd's original model. Rp (talk) 15:33, 4 November 2018 (UTC)Reply

Related, would this be an example of violating the Eliminate repeating groups in individual tables rule?

Name Guardian Guardian
Paul Sarah Michael
Paul Jaime Dora
Alice Bob Charlie
This, once again, depends on the definition you happen to have in front of you. As explained here, this is not what the original author. Codd, meant; his first normal form only refers to what can be in the individual cells of a table. For him. a repeating group was multiple values occurring in the same table cell, like this:
Name Guardians
Paul Sarah, Michael
Paul Jaime, Dora
Alice Bob, Charlie
Some authors say: sure, that clearly violates first normal form, but creating multiple columns for those values is a wrong way of fixing it, and Codd didn't address that in his normal forms, so let's sweep it under the rug of first normal form as well. That's what the present text of the article is trying to say. Rp (talk) 15:33, 4 November 2018 (UTC)Reply
edit

What is an example of something that violates Create a separate table for each set of related data? Based on the atomicity section, It looks like both of these sets of relations are allowed, even though Option 2 appears to have data that can be separated into a different table.

Option 1:

Customer Name
Customer ID First Name Surname
123 Pooja Singh
456 San Zhang
789 John Doe
Customer Telephone Number
Id Customer ID Telephone Number
1 123 555-861-2025
2 123 192-122-1111
3 456 (555) 403-1659 Ext. 53
4 456 182-929-2929
5 789 555-808-9633

Option 2:

Customer
Customer ID First Name Surname Telephone Number
123 Pooja Singh 555-861-2025
123 Pooja Singh 192-122-1111
456 San Zhang 182-929-2929
456 San Zhang (555) 403-1659 Ext. 53
789 John Doe 555-808-9633
I don't understand this criterion. 1NF does not indicate a preference for either design. However, many people would frown at your ("gratuitous") introduction of a surrogate key in option 1, and if you take it out, the result violates 2NF. So they might say it violates 2NF. No doubt you can find people who would call the introduction of surrogate keys a violation of 1NF, but that isn't what Codd meant by it. Rp (talk) 15:33, 4 November 2018 (UTC)Reply
edit

Would this violate the primary key requirement? Why?

FirstName LastName Grade
Paul Smith 3
Paul Rodríguez 3
Alice Oto 4
No, it doesn't. No single column in this design constitutes a primary key, but that's totally fine. Even
FirstName LastName Grade
Paul Smith 3
Paul Smith 4
Paul Rodríguez 3
Alice Oto 4
doesn't violate any normal form; in this case, the primary key is  . (Something looks wrong with this design, but that doesn't make it a violation of first normal form.) Rp (talk) 15:33, 4 November 2018 (UTC)Reply
I think the intention of this requirement is to say: always choose a primary key, and if   is actually a key for this table, then you should say it is the primary key. Which is very well and true, but not part of what Codd meant by 1NF. I think you can spot the pattern here. Rp (talk) 15:01, 5 November 2018 (UTC)Reply

nhinchey (talk) 19:55, 31 October 2018 (UTC)Reply

I hope this clears up things a little. I'm not sure how to amend the article. Perhaps by separating Codd's notion of the term from criteria that later authors added? Rp (talk) 15:33, 4 November 2018 (UTC)Reply

Informal language in definition of conditions to satisfy 1NF

edit

Does anyone else feel that the use of contractions in "1NF tables as representations of relations" chapter's 5 rules is inappropriate for a formal article such as this?

I wonder why it is written with contractions "there's" in the first 2 rules but without them in the other rules, e.g. "there's no top-to-bottom" instead of "there is no top-to-bottom"

I am not a native English speaker so it is possible I am drawing the wrong conclusions here... Aethalides (talk) 19:01, 15 December 2022 (UTC)Reply