Talk:Static single-assignment form

Latest comment: 5 months ago by Mathnerd314159 in topic Suggested Changes

attribution

edit

The introduction to the article states "SSA was developed by Ron Cytron, Jeanne Ferrante, Barry Rosen, Mark Wegman, and Ken Zadeck, researchers at IBM in the 1980s".

The current terminology is indeed that popularized by the IBM team. The ideas are a decade older.

The final sentence of Wegman and Zadeck's POPL85 "Constant Propagation with Conditional Branches" is ""We would also like to thank John Reif who made a special trip to Yorktown to help us understand his work". They are acknowleging the important of the referenced Rief and Lewis' POPL77 paper "Symbolic evaluation and the global value graph" where Phi-nodes are termed "birthpoints". Rief and Lewis in turn mention work done at Massachusetts Computer Associates: Shapiro and Saint's "The Representation of Algorithms" (CA-7002-1432, Feb 1970); Karr's "P-graphs" (CA-7501-1511, Jan 1975). Jsyjr (talk) 05:28, 31 January 2008 (UTC)Reply

The notion of birthpoints precedes SSA form. But SSA has more than just birthpoints. Phi functions, which indeed are placed at birthpoints, have more information namely the name of the variables whose values reach the birthpoint. MarkWegman (talk) 03:06, 18 March 2024 (UTC)Reply

mistake

edit

Currently the main page says "if the node A defines a certain variable, then that definition and that definition alone will reach every node A dominates", but this is clearly untrue. The variable can be redefined in other nodes that A dominates. For instance:

 x = 0;
 if(foo())
   bar()
 else
   baz()
 x = 6;
 biz();

The entire code there is dominated by the initial block with x=0 (assuming no non-local control flow out of one of the functions), but the definition of x=0 does not reach biz().

I plan on correcting this once I decide how it should be worded.

arrays and structs

edit

Arrays and structs require special treatment that ought to be described. Loisel 09:53, 22 Apr 2005 (UTC)

Sure. I don't know much about this currently, if you'd like to add it. I'd suggest a separate section, to avoid overwhelming the novice. Deco 16:44, 22 Apr 2005 (UTC)

There is an explaination of tree SSA in gcc at linux weekly news.

can we have an example involving a loop

edit

I think i can see how this would apply to loops but i'm not sure enough to write an example. Plugwash 23:00, 2 May 2005 (UTC)Reply

That's a good idea. It does make things a little more interesting. Will look at this when I have time, if no one else does first. Deco 05:14, 3 May 2005 (UTC)Reply
I found a pretty clear example at https://www.cs.cmu.edu/~fp/courses/15411-f13/lectures/06-ssa.pdf, section 3. Perhaps someone can incorporate that. Yihkrys (talk) 11:12, 24 November 2015 (UTC)Reply

Summer 2004

edit

Please change summer 2004 (in GCC description) to a correct month --200.178.63.180 05:53, 8 May 2005 (UTC)Reply

Removed from mains article in SSA Extensions

edit

Is it necessary to enumerate all known SSA extensions? Should each extension have a separate page? What about gated single assignment form, which is probably the most popular SSA extension?

I don't believe we're likely any time soon to have enough material to justify separate pages on each form. Let's keep it all here until one becomes too large. I'm not sure which SSA form is most popular, but I'd hazard a guess of "ordinary" SSA. Deco 02:05, 7 Jun 2005 (UTC)

SSA and AST

edit

The article says SSA is not suitable for tree representations. I can see where this is can be true, but I can also come up with tree representations where it could work (I think). More importantly, GCC now implements a kind of tree-ssa. Anyone care to elaborate on that?? Madhu 03:59, 19 February 2006 (UTC)Reply

It's more accurate to say that SSA is normally used with linear intermediate representations, not tree-based IRs, but there's no reason it can't be used with tree-based IR as long as reaching definitions are clear. I'll just remove the statement you noted.
As for Tree-SSA, just based on my initial impression it still uses a 3-address (linear) format, but each individual instruction is represented using a tree, for backward compatibility with GCC's tree IR. Deco 06:29, 19 February 2006 (UTC)Reply

"Φ function" verses "Φ-function"

edit

The literature (e.g. the classic Cryton et al paper, Andrew Appel's books, most papers I have read) use a hyphen between "Φ" and "function". Why is this not followed here? Is it the case that the original authors made a mistake and most literature follows them, either out of respect or ignorance? I don't think that you could say that "Φ" is used as an adjective, so it seems to me that a hyphen is indicated. However, I'm far from confident about this. I'd like to know definitively the correct way to write this term, and I believe that Wikipedia should be a guide in this respect. --Mike Van Emmerik 23:03, 5 August 2006 (UTC)Reply

We should probably follow what the literature says. Regarding its English correctness, see hyphen. I would regard this as a "noun noun" phrase, which generally is not hyphenated (e.g., ice cream, department store manager), but again I wouldn't contradict the literature. Deco 01:41, 6 August 2006 (UTC)Reply

idom(n)

edit

Can you please explain me what id idom(n). It is given that idom(n) is the immediate dominator of node n. but if a node has two predecessors (dominator) what will be the idom (n)? —Preceding unsigned comment added by Rudk (talkcontribs) 05:26, 27 March 2009 (UTC)Reply

In a flow graph, a node n dominates a node m if every path from the graph's entry node to m must pass through n. If m has more than one predecessor, neither of those predecessors can be its dominator. Instead, its dominator will be a predecessor farther along the path back to the entry node. (Use of "back indicates traversing edges against their direction.)

Given the set of nodes that dominate m, often denoted DOM(n), its immediate dominator is the node in the DOM set closest to m in the graph. Another way to picture the relationship is that inside DOM(m) you will find one node dominated by all the other nodes in DOM(m). That node is m's immediate dominator, sometimes denoted IDOM(m). [The notation IDOM(m) suggests that IDOM is a set and might mislead you into thinking that IDOM(m) can be more than one node. IDOM(m) is a single node in the flow graph.] --Kdcooper (talk) 22:04, 26 August 2009 (UTC)Reply

examples

edit

PyPy uses SSA for both compilation of RPython and JIT. Should I add this to the list of implementations? It's already sort of long. —Preceding unsigned comment added by Fijal (talkcontribs) 04:41, 1 April 2010 (UTC)Reply

Possibile mistake in section Variations that reduce the number of Φ functions

edit

When you describe the semi-pruned SSA the last phrase says: On the other hand, pruned SSA form will contain more Φ functions. But pruned SSA is better than semi-pruned, so it will contain less Φ functions. 93.145.207.190 (talk) 08:14, 6 August 2010 (UTC) by mickey_mouseReply

Notation interpretation

edit

 

In the currently provided example control flow graphs, in which way are two arrows coming from one box to be interpreted? --Abdull (talk) 15:09, 8 October 2010 (UTC)Reply

I didn't write this stuff, but I did read it, so I can take a guess. I think that it relates to the last line in the top box. The last line in the top box says "X < 3 ?". (meaning: "is X less than 3 ?"). That is, if the value of "X is less than 3" is TRUE, then [the graphical pseudo-code says] it will take one certain branch, and if not, then it will take the other branch.
Which branch is for "TRUE"? Well, by analogy with "if ... then ... else ...", (where "then" comes before "else"), I would guess that the LEFT (first) branch shown, is the one for "TRUE".
(However, I am not sure it matters, [which branch is for "TRUE"], for purposes of explaining the concept of Phi functions. Maybe what matters, for those purposes, is simply the fact that it takes one branch or the other, and the [interesting!] fact that, in some cases -- [as illustrated here!] -- the branch that will be taken may be able to be determined at compile time.)
TMI:
In the etiquette of old-fashioned flow-chart drawing, it was customary to use a "diamond" shape for a "decision point" -- and even today, it would probably be a good practice [it would be advisable] to clearly label the outgoing branches -- that is, to indicate which one is for which value of the "decision" expression. (but, "see also" the parenthesized paragraph above, beginning with "However"...) Usually, there will be two outgoing branches -- "TRUE" and "FALSE" -- but in some cases, especially if someone is used to dealing with old style IF statements (like those used in "FORTRAN II"), there might be three outgoing branches -- "negative", "zero", and "positive". Some of this might be "TMI"... [sorry]... but, it also might be responsive to what was puzzling Abdull (talk) on 8 October 2010.
I hope this helps. --Mike Schwartz (talk) 20:46, 23 March 2015 (UTC)Reply

Is SSA an IR or a property of an IR?

edit

The first sentence says that SSA form is an IR, but I think it is a property of an IR. 129.13.72.198 (talk) 13:14, 11 January 2011 (UTC)Reply

Human after all

edit
Humans can see that the first assignment is not necessary, and that the value of y being used in the third line comes from the second assignment of y. A program would have to perform reaching definition analysis to determine this.

I don't think people have any mystical computational power that stems from having a soul - a human needs to perform reaching definition analysis too... ~ Booya Bazooka 14:22, 16 March 2011 (UTC)Reply

Terrible language

edit

The article is a mostly unreadable convolute of terrible grammar and should _really_ be rewritten by someone with a firm grip of the English grammar. Example: "new variables typically indicated by the original name with a subscript in textbooks". *Shudder* — Preceding unsigned comment added by 195.212.29.186 (talk) 07:02, 3 May 2013 (UTC)Reply

edit

A definition/wikilink to what "Φ functions" means in the topic is needed — Preceding unsigned comment added by 201.55.113.110 (talk) 13:01, 20 October 2014 (UTC)Reply

Spelling: "phony" (rather than "phoney")

edit

Some lookups using online dictionaries (such as https://www.wordnik.com/words/phoney which clearly says [under the heading "from The American Heritage® Dictionary of the English Language, 4th Edition"],

adj. Variant of phony.

for example) seem to indicate that "phony" is the primary spelling, and that the spelling "phoney" is an acceptable variant, but is a secondary choice.

So, I investigated, to see whether the spelling "phoney" was being used in this article for some good reason -- such as, because of its use in the source identified in the footnote. (Spoiler alert: it is not.)

The footnote in question is footnote number [1], at least in this version of the article. It appears right after (or, right 'on') the phrase "According to Kenny Zadeck", at the beginning of the last paragraph of the section Converting to SSA.

That footnote, footnote number [1], is coded with a link (called "Presentation on the History of SSA") which points to the URL "http://citi2.rice.edu/WS07/KennethZadeck.pdf". So, I went to that URL, anbd I found that (on page 43, which is entitled "The Origin of Ф-Functions and the Name"), it clearly uses this spelling: “phony functions”.

Conclusion:

edit

The use of the spelling “phoney functions”, in the article (see above for a specific "oldid" [version] and for a specific place within the article) (OR, just do an "edit" or "show source" and search for the 4-character string "phon") was not chosen for any good reason -- such as, because of its use in the source identified in the footnote. Rather, the use of the spelling “phoney functions” in the article, is simply a TYPO which probably should be changed, to remove the 'e'.

Any comments? Even if there are, by that time I probably will already have edited (to remove the 'e') to correct the TYPO. No harm/no foul: -- in the unlikely event that there is more to this, than this Talk:-page section has covered, then my edit can always be reverted, or "re-" modified ... if/as appropriate.

--Mike Schwartz (talk) 22:22, 23 March 2015 (UTC)Reply

Potential mistake in introduction

edit

The introduction text states that a program under SSA form must meet the following requirements:

  1. each variable is assigned exactly once
  2. every variable is defined before it is used

My understanding is that only (1) is required (unless we consider strict SSA form where we demand that each use of a variable is dominated by its definition).

Any comments on the subject?

--Uncurry (talk) 20:04, 16 April 2019 (UTC)Reply

Haskell using CPS?

edit

The introduction gives Haskell as an example of a functional language that uses the Continuation Passing Style, but there is no citation. The source I guess would be the slightly obsolete "Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine" (S. P. Jones 1992), which describes the STG as similar to CPS, but not equivalent. The modern GHC does have an LLVM backend, so it does in fact rely on the SSA form.

83.24.255.228 (talk) 23:11, 8 May 2020 (UTC)Reply

Yes, you are right. As section 4.8 of (S. P. Jones 1992) explicitly notes, the STG is different from CPS. I think it would be best to remove the reference to Haskell here. Uncurry (talk) 20:48, 9 May 2020 (UTC)Reply

Suggested Changes

edit

Ken Zadeck and I (Mark Wegman) suggest a number of improvements to the article. We would make the changes directly except that we're not clear that making changes to the description of your own work is appropriate in the context of the guidelines.

The first two paragraphs should be replaced with these three. They are more accurate and more to the point.

In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a type of intermediate representation (IR) that requires each variable be assigned exactly once. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version. Additional statements that assign to new versions of variables may also need to be introduced at the join point of two control flow paths. SSA makes numerous analyses needed for optimizations easier to perform, because when looking at a use of a variable there is only one place where that variable may have received a value.

SSA is used in most commercial quality optimizing compilers for imperative languages including LLVM and GCC, which form the basis of many open source efforts as well.

There are efficient algorithms for converting programs into SSA form and converting from SSA form to machine code. Once in SSA form optimizations preserve SSA form, so that one optimization can be performed after another with no additional analysis. The SSA based optimizations are usually more efficient and more powerful than their non-SSA form prior equivalents.''

The history section contains several errors and should be replaced with:

SSA was developed in the 1980s by several researchers at IBM. Kenneth Zadeck a key member of the team moved to Brown University as development continued.[4][5] A 1986 paper introduced birthpoints, identity assignments, and variable renaming such that variables had a single static assignment.[6] A subsequent 1987 paper by Jeanne Ferrante and Ronald Citron[7] proved that the renaming done in the previous paper removes all false dependencies for scalars.[5] In 1988, Barry Rosen, Mark N. Wegman, and Kenneth Zadeck replaced the identity assignments with Φ-functions, introduced the name "static single-assignment form", and demonstrated a now-common SSA optimization.[8] The name Φ-function was chosen by Rosen to be a more publishable version of "phony function".[5] Alpern, Wegman, and Zadeck presented another optimization, but using the name "single static assignment".[9] Finally, in 1989, Cytron, Ferrante, Rosen, Wegman, and Zadeck found an efficient means of converting programs to SSA form.

For details Ken was originally at IBM when the first work started and continued at Brown. There were never two independent teams at IBM working on computing SSA form. Mark had mentioned the notion of dominance frontier to both Ken and Ron and Jeanne and since Ken was remote he later thought that Ron and Jeanne came up with it independently.

The list of compilers using SSA should be reorganized into two groups, Commercial and Research compilers.

Commercial should include:

The IBM family of XL compilers, which include C, C++ and Fortran. See https://www.cs.rice.edu/~vs3/PDF/ibmjrd97.pdf for validation that it uses SSA.

NVIDIA CUDA see: https://www.researchgate.net/publication/235605681_CUDA_Compiling_and_optimizing_for_a_GPU_platform

Also several that are already in the list, including Microsoft C++ from visual studio, Oracle Hotspot Java, Go, LLVM, GCC, Swift, Android and Erlang.

The others should be listed as Research. The current list starts with Oberon and is in an apparent random order. The above list is probably the most important compilers. We believe all the Digital compilers were also based on SSA but haven't found a reference to justify that. — Preceding unsigned comment added by MarkWegman (talkcontribs) 11:15, 8 May 2024 (UTC)Reply

I have done some changes. For the lead I tried a different structure, generally it is recommended to establish the importance of the topic in the first paragraph. The history I have adopted. For the list of compilers I went with three categories, I didn't feel comfortable categorizing GCC as commercial. Mathnerd314159 (talk) 04:28, 9 May 2024 (UTC)Reply