Talk:Constant folding

Latest comment: 10 years ago by Sunrise in topic RfC: Mention string literal concatenation

Constant folding

edit
(added title. —Nils von Barth (nbarth) (talk) 04:19, 11 July 2014 (UTC))Reply

The below article appears to be more correct, so I'm moving it here.

Constant folding is the optimization done by compilers in early stage of the compilation of a program. In C it is the optimization that makes it possible to have constant expressions in array-declarations, like:


#define WIDTH 320
#define HEIGHT 240
char buffer[WIDTH*HEIGHT];


Constant folding is similar to constant propagation, however constant folding must be done before the high-level language is translated to three-address-code to make code like above work. Constant propagation is done on the three-address-code (preferably on SSA form)

return y * (28 / 14 + 2);

edit

Shouldn't that be "-2"? RainCT (talk) 19:43, 20 May 2012 (UTC)Reply

The expression "7 - x /2" is evaluated as (7 - (x/2)), so y is zero. Second term, parens are 4; 0 times 4 is zero. Glrx (talk) 17:30, 21 May 2012 (UTC)Reply

Constant folding and string literal concatenation

edit
Copied from User talk:Glrx/Archive 7#Constant folding and string literal concatenation . Glrx (talk) 15:31, 10 July 2014 (UTC)Reply

Hi Glrx, in this edit, I’ve taken another shot at discussion string literal concatenation in the context of constant folding. I’ve kept the main text brief, focusing on the similarities (it’s solving the same problem), and relegating the C-specific details to footnotes, so it’s less distracting. How does it look?

—Nils von Barth (nbarth) (talk) 17:38, 6 July 2014 (UTC)Reply
Nils, the addition is off topic; it is not about constant folding; it is about the syntax of constants. Even you made the comment "implicitly" concatenated. (The article you link also has Python considering deprecating the operation.)
The idea of constant folding is the language allows expressions that an operation (e.g., a + b, 2 * c, 3 + 4, 2 * 7) be done at runtime, the compiler recognizes the operation can be done at compile time, and the compiler does it. In C, there is no explicit string concatenation operator for runtime. Some languages have an explicit string concatenation operator that expects to be done at runtime. The user might write "Hello "+strWorld or "Hello " || strWorld, and the compiler will emit code to allocate the string and do the concatenation. If the compiler knows that strWorld is a constant, then the compiler can do the operation at compile time and just emit the constant result.
A C/C++ example is not doing that because no runtime concatenation operator involved. Neighboring quoted strings are just the syntax of specifying long strings (or building up a string).
Glrx (talk) 20:04, 6 July 2014 (UTC)Reply
I understand that there are differences between constant folding and string literal concatenation, as you note – they are distinct concepts and accordingly have distinct articles.
However, the similarities are also striking – in both cases, during compilation, string literal tokens (generated separately and evaluated during tokenization/evaluation) are then combined at a later step of compilation, and these two distinct mechanisms satisfy the exact same purpose: allowing a complex value (here a long string) to be built up in a complex way (perhaps using functions or, in the case of C/C++, preprocessor macros) but still evaluated at compile time.
With CF, you might write:
foo(x) { return "foo:" + x }
foo("bar")
…while in C you'd write:
#define FOO(x) "foo:" x
FOO("bar")
…but these both fulfill the same goal in very similar ways.
Thus it would be useful to give readers some way to learn about SLC when reading about CF. It would be possible to just put a link to SLC in the “See also” section, if that’s necessary to avoid distraction, but it’s generally preferred to try to incorporate these into the text, hence my proposed edits.
How would you suggest that readers learning about CF should learn about SLC?
—Nils von Barth (nbarth) (talk) 04:08, 7 July 2014 (UTC)Reply
They are separate ideas, and there's a lot of ugliness in the examples above. If an operation is never intended to be executed at runtime, then it has no business in an article about compiler optimizations. If it could never run, then there is nothing to optimize.
The C preprocessor is an expedient hack that is far away from any compiler optimizations. In the old days, it was a separate pass, so the C parser would never see the defines or their invocations -- just the results. As for instances of SLC, I'd expect them to be handled during parser reductions. A remark about C in the CF article seems headed for confusion.
I have seen programs where programmers use string concatenation operations on literal strings as a convenient method of building up a longer string. The use was esoteric. One would like to see the compiler optimize the string, but there's no great penalty if it doesn't.
I want things to be clear.
In a somewhat related matter, code optimization of string concatenation will not be a big benefit in most situations. As I understand it, the big benefits of code optimization were with array index calculations: lots of mundane index arithmetic could be saved. The same opportunites for optimizing string concatenation probably are not there.
Glrx (talk) 17:10, 7 July 2014 (UTC)Reply
Understood – the C preprocessor is a part of compilation (broadly speaking) that’s not relevant these days outside of C/C++, and is generally confusing and baroque, though these later are still in widespread use (avoiding the preprocessor as much as possible). (Historically it’s part of the transition between macro assemblers and proper compilers.) To avoid disrupting the flow of the article, shall I put a link to SLC in the “See also” section, perhaps with an (HTML) comment to the effect of “please don’t conflate this with CF”?
—Nils von Barth (nbarth) (talk) 02:08, 8 July 2014 (UTC)Reply
BTW, for reference: discussions of removing SLC from Python and D specifically mention replacing it with constant folding, as in:
—Nils von Barth (nbarth) (talk) 04:45, 10 July 2014 (UTC)Reply
In this proposed edit (diff), which I’ve reverted (it’s a proposal), I’ve given a very brief mention of SLC:
“A superficially similar feature is string literal concatenation, which concatenates adjacent string literals during lexical analysis, for example replacing "abc" "def" with "abcdef".”
This shows the similarity, but clearly flags it as distinct and how, and avoids any technical discussion of C.
How does it look?
—Nils von Barth (nbarth) (talk) 04:45, 10 July 2014 (UTC)Reply
I'm opposed to the edit. Your sources do not say the operations are superficially similar. I don't consider them reliable sources about CF. What they say is that SLC may make bugs difficult to detect. I'm not sure I buy into that premise: spotting - instead of + is also difficult. One source also did not want to make CF a required optimization.
The sources don't say what you want. They say SLC is dangergous and could be replaced with explicit concatenation. SLC does not belong in the CF article because it is language syntax rather than a compiler optimization.
I think you like the parallel, but don't talk about it in CF article.
I'm copying this discussion to CF; I should have moved it there earlier.
Glrx (talk) 15:29, 10 July 2014 (UTC)Reply
We seem to be at loggerheads; per WP:DR, shall we ask for a third opinion (WP:3O)?
—Nils von Barth (nbarth) (talk) 04:36, 11 July 2014 (UTC)Reply
You can ask for a 3o until someone else jumps in. However, where are the WP:RSs that support your edit? You admit the topics are different, and your edit is an awkward one that makes a negative claim that they aren't the same. That's bringing up confusion and then saying it's different. If they are different, then why bring them up? The 3 sources listed above are mailing lists, they are not reliable secondary sources, and they aren't really on point. They are talking about the evils of SLC rather than CF. You want to tell people about something that you see is neat, but that runs up against WP:UNDUE, WP:OR, and WP:SYN. Glrx (talk) 05:00, 11 July 2014 (UTC)Reply
Dear 3O, to summarize the discussion as I understand it:
  • I (Nils) believe that there should be a mention of and link to String literal concatenation (SLC) from Constant folding (CF), as these are similar enough concepts that someone interested in constant folding would benefit from a mention of string literal concatenation.
  • Glrx believes that there should be no mention of or link to String literal concatenation from Constant folding, as these are distinct enough concepts that this would cause confusion and provide little or no benefit.
—Nils von Barth (nbarth) (talk) 00:11, 13 July 2014 (UTC)Reply
To state my case and the root of the disagreement as I understand it:
  • CF converts code like "foo" + "bar" to "foobar" at compile time.
  • SLC converts code like "foo" "bar" to "foobar" at compile time.
These are ostensibly superficially very similar, and solve the same problem, as reflected by above-mentioned sources. Thus a brief mention of SLC on the CF page is warranted, such as this proposed edit (diff), which I’ve reverted (it’s a proposal):
“A superficially similar feature is string literal concatenation, which concatenates adjacent string literals during lexical analysis, for example replacing "abc" "def" with "abcdef".”
Other alternatives would be a sentence and a footnote (if explanation is necessary to elaborate), or simply listing in “See also”.
This functions largely as a backlink: SLC is a technical and largely historical feature (originally C), which people are proposing to remove from modern languages (D and Python) because it can be replaced with CF.
The topics CF and SLC are separate, and have distinct implementation, and thus are correctly separate pages, but given the similarities, linking both ways is helpful.
Glrx and I differ on two points, AFAICT:
  • The similarity/distinctness of the concepts.
  • The weight of the proposed edit compared to the relevance of the issues and weight of the sources.
I believe that a one-line mention is warranted (not UNDUE), given the close similarity, and is sufficiently supported by sources: this is a light-weight claim, and there are multiple acceptable sources supporting this point, so it is sufficiently RS.
In terms of sources, this is a minor technical point, primarily of interest to compiler development of existing languages. Both languages that implement SLC and CF of string literals (D and Python) explicitly suggest replacing SLC with CF in their main fora (mailing lists and bug trackers), so this is not OR or SYN.
Glrx, as I understand it, believes that these are sufficiently different that any mention is unacceptable, and effectively requests a quote from a textbook on compilers that states exactly this point. I believe that this is unreasonable given how technical the point is and how minor the proposed edit is.
(Glrx, please feel free to state the case as you understand it; I’ve refrained from elaborating your position.)
—Nils von Barth (nbarth) (talk) 00:34, 13 July 2014 (UTC)Reply
  • Comment. I'm not sure how to react to the above. It waives the requirements for sources. It invests a trivial convention for specifiying strings constants at lex time with a deeper issue of constant folding. SLC is not an issue that affects optimization. It is an issue in only a small number of languages that adopt SLC. Nils offers a poor summary of my position.
Nils edited the article on July 4.[1] It took a true statement about CF constant strings and injected a confusing statement about CF not being done in C or C++. The comment is about SLC issue and not about CF. I reverted it.
Nils edited the article on July 6.[2] This has SLC being a "variant" of CF. It is not constant folding. There's a waffle that SLC uses an implicit concatenation operator rather than an explicit one. That confuses syntax (implicit is really an explicit instruction to the compiler to concatenate at compile time).
BTW, the article's existing statement that "In some compilers, constant folding is done early so that statements such as C's array initializers can accept simple arithmetic expressions" is also wrong. Languages such as C have specifications for constant expressions that can be used to set array sizes and initializations. It is not part of the computational expressions; a programmer may use special operators such as sizeof() in those expressions, but the expression were never expected to be evaluated at runtime. A C programmer cannot pass in an integer argument and use it to declare an array; the C programmer must explicitly allocate the array.
I also have trouble with "Constant folding can be done in a compiler's front end on the IR tree that represents the high-level source language, before it is translated into three-address code, or in the back end, as an adjunct to constant propagation." The problems with that statement are much deeper.
Nils edited the article on July 10.[3] This edit takes the earlier statement about constant folding strings and adds the claim that SLC is superficially similar statement. It is distracting, incomplete, and unneeded. Instead of giving an example of where CF strings is usefully employed, it describes something that is superficially similar. The description is dense. The advantages of CF strings is minimal. There is a reason that language designers don't want to require CF strings: it's not that useful; there are simpler methods.
The comments Nils wants to make apply to SLC, so those comments belong in the SLC article (where they are restricted to handful of languages that employ SLC). His sources address SLC rather than CF.
Glrx (talk) 17:34, 15 July 2014 (UTC)Reply

Third opinion

edit

I have not tried to understand the complete arguments from both parties but I do find it hard to understand how an internal wikilink can be problematic; they are how WP works. We often have links from an article to similar concepts, related concepts, different concepts, and diametrically opposite concepts. It is up to the articles themselves to make the similarities or differences clear. If you think that I have completely misunderstood the dispute, please inform me. Martin Hogbin (talk) 09:51, 13 July 2014 (UTC)Reply

You misunderstand the dispute. Constant folding, in concert with other compiler optimizations, can offer reasonable performance gains in array calculations. In compiler textbooks, you'll find a lot about optimizing integer arithmetic expressions. There's an aside that other operations can be CF: strings, for example. There isn't a lot (if any) about optimizing string expressions. (Sorry, my dragon book is not handy; I don't think it says anything about optimizing strings). Then the CF string digression is pulled on still more to say CF and SLC are different. We are too far afield. SLC is a hack to avoid line breaks within strings; it removes doubt about whether there's a trailing space at EOL or which combinations of CR and LF should be used. Googling("string literal concatenation" "constant folding") turns up 4 page of hits. It is not a due topic for this article. Glrx (talk) 17:34, 15 July 2014 (UTC)Reply
Sorry, I do not understand what the disputed text is or how what you say relates to a dispute? Could you clarify please. Martin Hogbin (talk) 13:04, 17 July 2014 (UTC)Reply
Hi Martin, thanks for your comments, and apologies for the delayed response!
The disputed text is this proposed edit (diff), which I’ve reverted (it’s a proposal):
“A superficially similar feature is string literal concatenation, which concatenates adjacent string literals during lexical analysis, for example replacing "abc" "def" with "abcdef".”
(Right, Glrx?)
—Nils von Barth (nbarth) (talk) 04:12, 22 July 2014 (UTC)Reply
What is the objection to having that text in the article? Martin Hogbin (talk) 08:37, 22 July 2014 (UTC)Reply
I'm going to object to you rendering a 3O. Your comment that "I have not tried to understand the complete arguments from both parties" is not helpful.
It should not be my burden to prove the material does not belong; it should be Nbarth's burden to show it does belong. There are a series of objections above.
There's an objection because the insertion is confusing. The reader is going along. A tangent is brought up that CF and SLC are "superficially similar". It gives an example to prove its point, but the example is not helpful. Why not just write "abcdef"? What is the point of SLC? After saying that the SLC and CF are superificially the same, the insertion does not explain why the two are substantially different -- something that Nbarth has conceded. The insertion is a bad diversion that suggests the wrong thing. It's not worth bringing up.
There's an objection because the insertion is unsourced. Nbarth has not produced any sources (let alone reliable ones) that claim the CF and SLC are "superficially similar". When unsourced material is added, then editors should try to source it if they believe the statement, tag it as unsourced if they cannot find a source, or remove the material if it is problematic. We are in the last category.
There's an objection because the insertion is WP:UNDUE. The article is about the compiler optimization known as constant folding. Some optimizations can provide provide significant runtime savings; array indexing is often improved. Significant array index savings will occur within deeply nested loops; I doubt many constant string concatenations are deeply nested. Optimizing string concatenation is not normally considered an operation to optimize. If it were, then it should be easy to find references that address the problem. In the current article, there is an aside that constant folding can be applied to not only integers, but also strings. It's a throwaway comment; we don't expect huge gains by CF string operations. The comment should end there. The insertion wants to make a side comment to the original side comment by bringing in SLC. That goes too far afield. As a language feature, SLC only exists in a handful of programming languages, and in some of those languages, constant folding strings makes no sense at all. C, for example, does not have a string concatenation operator, so it cannot have an expression that can be constant folded. The topic is a descent into the noise.
There's an objection because the comment is Nbarth's WP:OR. There is a difference between evaluation and compilation. The insertion confuses the productions and grammar for expressions that must evaluate to a constant at compile time and expressions that are compiled for execution. The simple fix for an OR challenge is to produce sources. Where are the sources?
There's an objection because the comment is WP:SYN. Nbarth has taken some comments on some language mailing lists about the evils of SLC in those few languages that have SLC. Those comments are not reliable sources. They bemoan that if somebody forgets a comma in a list of string constants, then the compiler does not raise any warning. It seems to be a minor problem. If it were a significant problem, then the use of an explicit SLC lexical-phase pasting operator could be introduced (for example, the C preprocessor uses ## to paste identifiers together). The mailing-list gurus have a different fix: they've decided to replace SLC with an explicit runtime string concatenation operator. Well, that brings up other problems. C doesn't have such an operator, so the fix doesn't work for C. When a language has a string concatenation operator, it means that an unsophisticated implementation might actually concatenate the strings at runtime (at O(n2) cost). The gurus think that is a bad idea, so they want that damage mitigated by requiring the language implemenation to always constant fold strings. The gurus have traded something that is simple (SLC) for something that can be hard (CF). Even if we take the gurus as reliable sources, the gurus do not say that CF is "superficially similar" to string literal concatenation. They say that SLC can be replaced with actual string concatenation with no execution cost penalty if the compiler does constant folding on strings. More importantly, these mailing list gurus believe SLC is a bad idea. Step back a minute. If those "sources" believe SLC is a bad idea, then why should the WP constant folding article raise the bad idea of SLC at all?
Why is it important to mention SLC in this article?
Glrx (talk) 21:46, 23 July 2014 (UTC)Reply
I have not even given an opinion yet and you are complaining. Here it is anyway.
I see no problem with mentioning SLC somewhere in the article that does not break up the flow. I do not think that "superficially similar" requires a source, it is a very weak statement that can hardly be called contentious. Sort something out between yourselves and add it. Martin Hogbin (talk) 22:47, 28 July 2014 (UTC)Reply
Martin, Glrx, would you like to continue 3O discussion?
If not, should we move to Comments (at WP:RFC or Wikipedia talk:WikiProject Computer science) or Mediation (WP:RFM)?
—Nils von Barth (nbarth) (talk) 00:51, 28 July 2014 (UTC)Reply
By all means open an RfC but I doubt you will get much interest. Martin Hogbin (talk) 22:47, 28 July 2014 (UTC)Reply
Done! Listed RfC, and mentioned at Wikipedia talk:WikiProject Computer science#RfC: Mention string literal concatenation on constant folding page?
—Nils von Barth (nbarth) (talk) 02:01, 6 August 2014 (UTC)Reply

RfC: Mention string literal concatenation

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Should there be a mention of string literal concatenation (SLC) on the constant folding (CF) article or not? Concretely, proposed edit (diff):

“A superficially similar feature is string literal concatenation, which concatenates adjacent string literals during lexical analysis, for example replacing "abc" "def" with "abcdef".”
—Nils von Barth (nbarth) (talk) 01:56, 6 August 2014 (UTC)Reply

(Per discussion in above section #Constant folding and string literal concatenation.)

Threaded discussion

edit
  • I don't feel strongly about this, but I tend to think we shouldn't mention it. The reason is that string literal concatenation isn't an example of an operator that could be applied to variables at runtime, but is being shortcut at compile time; it's something that is defined to work only at compile time. So it's not really constant folding, and if included would require a long explanation of why it's not really constant folding, which would end up being more confusing than helpful. —David Eppstein (talk) 03:22, 6 August 2014 (UTC)Reply
  • I quite agree with David. What about a link in the See also section? Or if we mention it in the article, it should clearly state that the two are different, with something like "not to be confused with". The "superficially" in the proposed edit is maybe a bit too weak. -- Pintoch (talk) 08:56, 6 August 2014 (UTC)Reply
  • Oppose. It should not be mentioned at all. The SLC-stringcat case only applies in a small number of languages and is much more complicated than the proposer considers. Furthermore, SLC knows the intended result is a constant string; the stringcat operator does not know that; the compiler must infer the result is used as a constant. Copies and side-effects are a problem with strings that are not an issue with integers. What if the programmer intended to create a new string each time? The compiler must prove the resulting string is never modified. In Lisp, there is a huge difference between (cons 'A 'B) and '(A . B); the results are equal but not eq. I actually went to include SLC as a see also, but then didn't because the other see also topics were cleaner and on topic. SLC has nothing to do with constant folding. There are many other problems addressed above. Glrx (talk) 15:33, 6 August 2014 (UTC)Reply
  • I think that it may be worth mentioning SLC in such a way as to clearly mark it as something different, to alleviate any confusion as to the relationship between SLC and CF. If there is a page on SLC, it should be linked to, as there is a (weak) case for similarities. The proposed edit stresses the similarities between SLC and CF rather than the differences, which should be here stressed more because they are markedly different. I therefore am against the proposed edit as it is currently worded. (I am not an expert, but I think that I have something of a grasp of the differences.) I like Pintoch's wording of "not to be confused with". hwalter42 (talk) 16:47, 6 August 2014 (UTC)Reply
  • Elucidate. Hello. I saw the rather long discussion and the very fact that the discussion exists, plus the very appealing and seemingly logical rationale for including string concatenation seems to point to the fact that further elucidation of this matter in the article prose is actually required, even if it is to clarify that string concatenation is not constant folding. Best regards, Codename Lisa (talk) 18:00, 10 August 2014 (UTC)Reply
  • Elucidate. I agree with Codename Lisa. The article should make clear how constant folding and string concatenation are related. Martin Hogbin (talk) 11:48, 11 August 2014 (UTC)Reply
To explain a bit, that is the purpose of WP, to inform our readers; it is not to make points. Martin Hogbin (talk) 09:53, 17 August 2014 (UTC)Reply
  • It seems reasonable to remark that SLC and constant folding are two distinct features, despite apparent similarity. One unfamiliar with either of these concepts may get the impression that any form of string concatenation done by the compiler is constant folding, so an article that introduces the concept should include a statement that the two are not to be confused. At the very least, it deserves a 'see also' entry as a superficially related syntax feature. — daranzt ] 14:11, 12 August 2014 (UTC)Reply
  • Elucidate per Codename Lisa; revised text should describe that although the two are superficially similar, they're actually quite different because they're handled differently. --Ca2james (talk) 00:31, 17 August 2014 (UTC)Reply
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.