Talk:Pure function
This is the talk page for discussing improvements to the Pure function article. This is not a forum for general discussion of the article's subject. |
Article policies
|
Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
This article has not yet been rated on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
|
March 2009
editPure functions are required to construct pure expressions.
What does this mean? --Abdull (talk) 22:33, 31 March 2009 (UTC)
Examples?
edit- It is possible for a pure expression to yield an impure function (or more generally a value which contains one or more impure functions).
- It is also possible for an expression to be pure even if one or more of the argument subexpressions yields an impure function (or a value which contains one or more impure functions).
Can someone give examples for these cases? Thanks, --Abdull (talk) 11:02, 23 January 2010 (UTC)
- How about this Javascript for the first one:
var count; function foo() { return function() { count++; }; }
Foo is pure because it always returns the same function, but the function it returns is impure since it has the side effect of modifying the count variable. —Preceding unsigned comment added by GalaxiaGuy (talk • contribs) 13:24, 7 July 2010 (UTC)
- No this is not pure because doesn't pass the "referential transparency" criteria. Kasajian (talk) 15:44, 10 September 2022 (UTC)
Alternate usage of the term
editThe term definition provided here seems to be hardly in common usage. See here for discussion and reference list:
http://mathematica.stackexchange.com/a/64624/2079 — Preceding unsigned comment added by 140.32.183.252 (talk) 19:32, 5 November 2014 (UTC)
I agree. The article defines "pure function" in a quite unusual way, to say the least. I doesn't cite any sources (except for the subordinate issue of considering I/O as a side effect).
Here are some uses of "pure function" that disagree with the article:
- The Gnu C Compiler provides the attribute pure and const; see section 'Declaring Attributes of Functions' in the GCC manual. A pure function has "no effects except the return value and (its) return value depends only on the parameters and/or global variables" (p. 432-->418). A const function does not "examine any values except (its) arguments, and (has) no effects except the return value" (p. 424-->410).
- Fortran_95_language_features#Pure_Procedures seems to agree with the GCC's notion: a PURE procedure "alters no global variable, performs no I/O, has no saved variables (variables with the SAVE attribute that retains values between invocations), and ... does not alter any of its arguments".
- The external link constexpr attribute in C++ given in the article doesn't mention the word pure at all. Instead, it talk about functions and expressions that can be evaluated at compile time, which is quite different from all above notions of pure or const.
This one agrees with the article:
- The external link Pure attribute in D language given in the article presents the pure notion in the D language, which apparently coincides with that of the article: "a pure function does not read or write any global or static mutable state, cannot call functions that are not pure, can override an impure function, but cannot be overridden by an impure function, is covariant with an impure function, cannot perform I/O".
I suggest that more programming languages should be checked about their notions of pure and/or const. The article should be rewritten accordingly, presenting the groups of different notions. It may even be necessary to rename it (e.g. to "Const function"), or to split it. - Jochen Burghardt (talk) 09:39, 17 February 2017 (UTC)
- It is good to have an article on pure functions, but this article was not about that. Create a separate page or add a section for the discussion of other language-specific concepts. — Preceding unsigned comment added by Cerberus0 (talk • contribs) 17:32, 5 July 2018 (UTC)
@Cerberus0, Maggyero, and GregorB:
I suggest to consider (as in the version of 6 Jul) the following properties of a function:
- its evaluation has no side effect, such as mutation of objects or output to I/O devices,
- its output depends only on its arguments and not on any external state (such as a global variable).
We should distinguish functions that obey just 1 and functions that obey 1 and 2. I suggest to handle them both in this article, but in different sections. I'm not quite sure which names to suggest for both classes; for this discussion, I'll use side-effect free function (1 only) and pure function (1 and 2). Eventually , our naming should be based mainly on textbook references (which I haven't at hand).
Besides definition and examples, each section should describe the advantages (such as compiler optimizations) of each class. For the class satisfying 1 and 2, we should establish the relation to purely functional programming. In the lead, we should mention the different namings that are around. Best regards - Jochen Burghardt (talk) 13:55, 9 July 2018 (UTC)
- @Jochen Burghardt: "I'm not quite sure which names to suggest for both classes;" There's already a name for such functions, it is referentially transparent:
- "An expression is said to be referentially transparent if it can be replaced with its corresponding value without changing the program's behavior. This requires that the expression is stateless, that is to say the expression must be pure (have no side effects) and deterministic (always give the same value for the same input)."
- The current definition (pure function = side-effect free function) is sourced, the previous definition (pure function = side-effect free + deterministic function) was not sourced and unusual as you said. So unless @Cerberus0: provides solid sources that contradicts this, the current state of the article is fine to me.
- Maggyero (talk) 14:14, 9 July 2018 (UTC)
- Please do not feel disrespected, but I must ask: do you have any background in computer science? I am insisting on a very standard contemporary definition. I also sourced it. Here is another example: https://www.safaribooksonline.com/library/view/functional-php/9781785880322/ch02.html This is so standard that I am startled to be debating it. Cerberus (talk) 19:49, 9 July 2018 (UTC)
- @Jochen Burghardt: "I'm not quite sure which names to suggest for both classes;" They are called pure functions. This is standard usage. Here is YET ANOTHER reference:
- https://books.google.com/books?id=jAYvDwAAQBAJ&pg=PA98&lpg=PA98&dq=pure+functions+are+referentially+transparent&source=bl&ots=CGwF0nwy24&sig=KcoknhdxFqdaQobUFyNioo07xAY&hl=en&sa=X&ved=0ahUKEwjD_-bw5pLcAhUyTd8KHV2eBjg4WhDoAQhQMAY#v=onepage&q=pure%20functions%20are%20referentially%20transparent&f=false
- This standard fare in CS courses as well. See e.g.,
- https://courses.cs.washington.edu/courses/cse341/03sp/slides/Lisp3/tsld015.htm
- Cerberus (talk) 20:03, 9 July 2018 (UTC)
- @Jochen Burghardt, Maggyero, and GregorB: Please read the provided references. Please discuss before breaking the article again. Cerberus (talk) 13:09, 10 July 2018 (UTC)
- @Cerberus0: Thank you for providing those references (as you did not ping me the last time I had not seen them). I'll look at them and give you a feedback. However those references are not used in the current version of the article. And the deterministic property (ii) - which is the center of the debate - in the introduction of the article is not referenced at all.
- @Maggyero: This property is actually mentioned in both references, and it is listed first in the Higginbotham article. And again: this should not be controversial. It is the standard CS definition. Language-specific *usages* are an entirely separate question and do not bear on this core CS concept. (For example, in the Wolfram Language, the term "pure function" does not even rule out side effects!) Cerberus (talk) 15:12, 10 July 2018 (UTC)
- Also, it seems that you are overlooking the footnote I included to |Bartosz Milewski's definition. (He wrote C++ in Action as well as Basics of Haskell.) Cerberus (talk) 18:09, 10 July 2018 (UTC)
@Cerberus0: The provided references look solid and clear. Before my edits and yours there was no reference at all so the situation is way better. So according to your references:
- referentially transparent = output depends only on input;
- side-effect free = no visible state changes of the non-local space and time environment;
- pure = referentially transparent + side-effect free.
So the definition (referentially transparent = output depends only on input + side-effect free) in the referential transparency article is also wrong and needs to be corrected. But I am not sure about this since referentially transparent is first defined as:
"An expression is said to be referentially transparent if it can be replaced with its corresponding value without changing the program's behavior."
If we use the definition of your references (referentially transparent = output depends only on input), that is if we ignore the side-effect free property, that means that f
is refentially transparent in the following code, but it's not, since it cannot be replaced with its corresponding value without changing the program's behavior:
#include <iostream>
int x = 0;
int f() {
++x;
return 1;
}
int main() {
std::cout << x; // 0
std::cout << f() + f(); // 2
std::cout << x; // 2, but 1 if we had "std::cout << 2 * f();", 0 if we had "std::cout << 2;"
return 0;
}
- @Maggyero: Unlike the notion of a pure function, the notion of "referential transparency" has (afaict) somewhat inconsistent use in the CS literature. I did not use the term when defining a pure function. Nevertheless, a pure function is referentially transparent, under any usage I know of. I assume we can agree that in your example,
x
is not referentially transparent. I would also agree with you thatf
cannot be considered referentially transparent. The Higginbotham reference skirts this by not dealing explicitly with the notion of "rely on". In the present case, I would say thatf
does in the relevant sense rely on the mutable variablex
even though the valuef
returns does not depend on it. In the article, I chose to cite Milewski, who does not refer to referential transparency. He says that a pure function "has no state, nor can it access any external state." Yourf
does access external state, even though its return value does not depend on it. If your overall point is that referential transparency also seems to require no side effects, I would largely agree. (However, not all side effects are created equal: if you were just printingx
instead of modifying it, I would not say referential transparency had broken, yet under standard definitions the function would not be pure.) Cerberus (talk) 13:23, 12 July 2018 (UTC)
- @Maggyero: Unlike the notion of a pure function, the notion of "referential transparency" has (afaict) somewhat inconsistent use in the CS literature. I did not use the term when defining a pure function. Nevertheless, a pure function is referentially transparent, under any usage I know of. I assume we can agree that in your example,
- @Cerberus0: I agree that purity implies referential transparency and not the other way around. But I do not agree that performing I/O does not break referential transparency, since it modifies the program's behaviour. However there is a case where a referentially transparent function can have a side-effect: when it modifies the non-local time environment (what Milewski calls a function state, or internal state), such as mutating a local static variable—for memoisation for instance. This is a side-effect which does not change the program's behaviour, so the function call can be replaced by its returned value:
#include <iostream> int f() { static int x = 0; ++x; // side-effect (state change of the non-local time environment) return 1; } int main() { std::cout << f(); // 1 std::cout << 1; // 1 so f is referentially transparent return 0; }
- So it seems to me that the consistent definitions are (given in ascending order of restriction):
- side-effect free = no visible state changes of the non-local environment (space and time);
- referentially transparent = output depends only on input + side-effect free in space;
- pure = output depends only on input + side-effect free (space and time) = referentially transparent + side-effect free in time.
- What do you think?
- So it seems to me that the consistent definitions are (given in ascending order of restriction):
- @Maggyero: I admire your desire to pin down the meaning of referential transparency, but my interest was simply to correct the definition of 'pure function'. Your additional project is too big for me to take on right now, but you may want to read https://stackoverflow.com/questions/210835/what-is-referential-transparency/11740176#11740176 That said, although I am not ready to endorse your proposed hierarchy, I also do not object to it, with one exception: I would not want it in the beginning of this article. The reason is that I believe that it is pretty easy for people to garner a good first idea about what it means to say "no side effects" (especially when changes in global state and IO actions are offered as examples) and also of what it means to say "the value of the function depends only on its explicit inputs", but as soon as we speak of 'referential transparency' we seem to go down a rabbit hole. Sorry to abandon you on this interesting project. I hope you will ping me if you find a particularly useful resource. Cheers.
@Cerberus0: You are right about the difficulty to define referential transparency. As Uday Reddy said on Reddit:
Suppose we postulate a term, say "functional transparency", for this idea that functional programmers are after. What could it be?
No matter what it is, it should be something that makes sense for all programming languages, not only functional languages. Because the purpose is to come up with a property that is applicable to all languages so that we can then say how functional languages differ from the rest.
The Quine-Strachey definition of "referential transparency" is something that makes sense to all programming languages. It says that you can replace an expression by another expression that is equal in value. We are left with having to define what is meant by "equal in value". Strachey did that for imperative programming languages. Once we plug in the Strachey definition, the imperative languages become referentially transparent. But, now we find that this idea of "referential transparency" says essentially nothing. We can presumably always come up with a suitable notion of "equal in value" for all languages. (That hasn't been proven, but let us assume that for the moment.) Then essentially all languages become referentially transparent. So, the Quine-Strachey definition fits the bill, but it is essentially pointless.
Let us try the Wikipedia definition. A language is "functionally transparent" if every expression can be replaced by its "value". So, now, we are left with having to define what is meant by "value". One candidate is to say that the "value" is what is obtained by "evaluating" the expression. All we have done is to move the burden of definition from "value" to "evaluation". What does "evaluation" mean in a way that makes sense for all programming languages?
I believe that there isn't a single such notion of "evaluation" that is uniformly applicable to all programming languages. In fact, there may not be a single such notion even within a single language.
Take an expression like "x++". You say, if it evaluates to 42, then you should regard 42 as the "value" of "x++". Then, the C programmer will turn around and ask you, "what is the value of "x++" when you write it in Haskell?" You say "x++" in Haskell essentially evaluates to itself. It is a state transformer. It is only when you run "x++" as a state transformer and pipe its value to another function that 42 flows from "x++" to the other function. Ooops! Did you say "value"? What kind of "value" is that? Why don't you replace "x++" by its "value" just like you tried to do it for C? Aren't you being shifty, by using two notions of "value" for your own language, while you maintain there should be only one notion of "value" for my language?
So you are stuck. Unstick it please, if you can.
@Cerberus0: Do you intend to add to the article full references for your above links? Are there any computer science textbooks among them? I tried to find the main page of the O'Reilly / Safari link, but didn't succeed. - Jochen Burghardt (talk) 09:31, 13 July 2018 (UTC)
- @Jochen Burghardt: I intended those links only for our discussion. I consider Milewski's stature adequate to serve as the only needed reference. (Recall that I footnoted his online booklet.) However, I will add whatever other references you think are needed. (Perhaps the URL you were searching for is https://www.safaribooksonline.com/library/view/functional-php/9781785880322/ ) Another online book that is often referenced is Brian Lonsdorf's book (e.g., https://drboolean.gitbooks.io/mostly-adequate-guide-old/content/ch3.html) Cerberus (talk) 17:15, 13 July 2018 (UTC)
- @Cerberus0: I prefer Brian Lonsdorf's definition (https://drboolean.gitbooks.io/mostly-adequate-guide-old/content/ch3.html) over Gilles Crettenand's (https://www.safaribooksonline.com/library/view/functional-php/9781785880322/ch02s02.html) which is not clear and not open source (you have to create a free trial account on their website). I updated the article.
@Cerberus0 and Maggyero: Today, I implemented my above suggestion on an own section about side-effect free functions. I could reuse the non-pure function examples (length, encrypt) there. The remark on optimization in multi-threaded environments would belong to the "Pure expressions" section, but it doesn't fit there in the text very well. Moreover, I just guessed it from the Gcc quote; it would be better if we had a citation (maybe from a textbook on compiler construction for functional languages?). In case I made the wrong guess, it would be instructive to give a counter-example (i.e. a pure expression that can't be optimized by common subexpression elimination) in the article. - Jochen Burghardt (talk) 13:59, 15 July 2018 (UTC)
length example
edit- @Jochen Burghardt and Maggyero: I'm afraid I find this more confusing than helpful. I have trouble imagining a language where one cannot determine the value of `length(s)` by knowing nothing more than the algorithm for `length` and the value of `s`. In which case, the function is pure. To give an understandable example of a function that is not pure but has no side effects, one should use the usual example: a function that depends on but does not change a nonlocal variable. Cerberus (talk) 16:27, 16 July 2018 (UTC)
- @Cerberus0 and Maggyero: I had in mind the C language with its NUL-terminated strings and
s
having the typechar*
as usual. If you don't know the memory wheres
points to, you have no chance to computelength(s)
. - In a multithreaded environment, another thread may overwrite this memory by a new string between two sucessive calls oflength(s)
. In a single-threaded environment, this is impossible. - Jochen Burghardt (talk) 17:25, 16 July 2018 (UTC)
- @Cerberus0 and Maggyero: I had in mind the C language with its NUL-terminated strings and
- @Jochen Burghardt and Maggyero: While you are certainly making a case for functional programming, I do not think you are giving an example that violates function purity (and certainly not a standard example, which would be much more communicative). Here is what I think you are saying. You consider a variable `s` whose value is not referentially transparent, therefore we cannot know what `length(s)` will return. But that is not a violation of functional purity. Assuming it is correctly written, the function returns a value deterministically based on the value of its argument (at runtime, granted). If you write code that leads to the value of the argument being uncertain, that is not a strike against the function. (I can also pass a random variable into a mathematical function; that obviously does not break the purity of the function.) Cerberus (talk) 21:28, 17 July 2018 (UTC)
- @Cerberus0 and Jochen Burghardt: Jochen Burghardt is right in saying that
length
andencrypt
are not pure in C since they have a mutable pointer parameter. He's making an interesting point because more generally, I think that any function that has a function parameter that is of mutable reference type (in other words not of value type or immutable reference type) violates the 1st condition (output depends only on input) because it's an external state dependency. In fact, any indirection introduces an external state. That is also why I/O operations breaks purity: a function that takes a file parameter and reads the file content through it violates the 1st condition (output depends only on input) because the file parameter is just a reference to the file content which might vary between two consecutive calls, creating an external state. So we may say that I/O operations are just a particular case of impurity: mutable references are the general case.- Maggyero (talk) 06:27, 18 July 2018 (UTC)
- You're right that a length() function is not pure in C, but for instance, let's say you were to create a structure in C where the string of characters were immutable. Assuming, clients of the structure are are not manipulating the structure data directly but using supported functions to manage the string, they could be prevented from modifying the string. In such a scenario, a length() function would be pure. Kasajian (talk) 15:58, 10 September 2022 (UTC)
- @Cerberus0 and Jochen Burghardt: Jochen Burghardt is right in saying that
@Cerberus0: I don't understand many details of your argument: Didn't we agree to avoid the notion "referentially transparent" (I don't understand its precise meaning)? What do you mean by "deterministically ... (at runtime...)"? What do you mean by "uncertain"? if you consider length
as pure (in your stricter sense), what is property 1 supposed to mean? How could length
satisfy the requirement of your Milewski reference ("A function returns exactly the same result every time it's called with the same set of arguments.")? For example, consider the following C code:
char text[8] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', '\0'};
assert(length(text) == 7);
text[3] = '\0';
assert(length(text) == 3);
Both assert
s will succeed. Obviously, the function does not "return exactly the same value", although its is "called the the same (set of) argument(s)". Maybe you can explain what you mean by "deterministically" etc. along this exmaple? Note that there is no corresponding example with sin
; therefore sin
is pure. - Jochen Burghardt (talk) 10:44, 18 July 2018 (UTC)
- @Jochen Burghardt and Maggyero: First of all, I think illustrating a non-pure function should start with a standard example (e.g., direct dependence on a global state). Even if you are right, your example is too subtle and should come after a simple example. Now, to the issue: in C can a function that consumes an array be pure? If I understand you (and I do not use C), you say no: you cannot pass an array by value, so the value that is passed is just a memory location, and what is stored in memory is part of the global program state -- so there is a dependence on global state and thereby the function is not pure. Do I have you right? But surely we do not restrict the general (i.e., not language specific) notion of purity to say a function is only pure if it does not depend on the the contents of the memory addressed by its actual parameters. Additionally, the approach you want to take seems (iiuc) to imply that in a object-oriented language, any function that consumes a mutable object cannot be pure. This is not at all my understanding of function purity; it is orthogonal to considerations of data immutability. While your view is that you have passed the same (pointer) value both times, I think that the right way to think about it for the present purposes is that you have asked the function to operate on two different strings. Bottom line: the direction you are going seems to conflate two very different things: functional purity and data immutability. You are correct that consuming mutable data makes it harder to anticipate what a pure function will return, but that does not break its purity. Unfortunately, I cannot offer supporting references right now, but this is certainly my sense of the general usage. (That said, I have never before thought about the particular problem you are raising.) Cerberus (talk) 15:33, 18 July 2018 (UTC)
- @Cerberus0 and Jochen Burghardt: "Additionally, the approach you want to take seems (iiuc) to imply that in a object-oriented language, any function that consumes a mutable object cannot be pure." That's not what I said. I said mutable reference type function parameters break purity, not all mutable function parameters. This is a big difference because it means that mutable and immutable value type function parameters and immutable reference type function parameters keep purity, not only immutable function parameters. Indeed, mutable value types like C++'s
int
keep purity as we all know. Again, this is the indirection level that breaks purity (reference type vs value type), not the mutability level (mutable vs immutable). I could have even said that all reference type function parameters break purity. But I restricted this to mutable reference types because if the referred data cannot change, the function will always yield the same output, which does not break purity. And this restriction is actually necessary, otherwise all functions would be impure in programming languages that do not have value types but only reference types, such as Python and Javascript (other languages have both: C++, Java, C#, Swift, OCaml), which is clearly not what we observe. Indeed you can clearly create a pure function in these languages if you use only immutable reference type function parameters, such as Python'sbool
,int
,float
,complex
,str
,bytes
,tuple
,range
,memoryview
orfrozenset
(but not with mutable reference type function parameters, such asbytearray
,list
,dict
orset
). - "But surely we do not restrict the general (i.e., not language specific) notion of purity to say a function is only pure if it does not depend on the the contents of the memory addressed by its actual parameters." I disagree: if we interpreted the 1st property of pure functions (same input yields same output) that way—same input means same data after dereferencing the references in the input—, it would mean that functions that perform I/O reads would be pure, which contradicts the literature (in your own sources, Daniel Higginbotham give an example of a function that perform an I/O read that breaks purity, p. 99). So no, "same input" should be interpreted as the direct data (a memory location in case of references), not as the indirect data after dereferencing in case of references.
- Maggyero (talk) 07:54, 20 July 2018 (UTC)
- @Maggyero and Jochen Burghardt: This actually illustrates what is bothering me about the definition of purity that you prefer. (Again, I am not disputing your point, but I suspect that you are doing original research rather than reporting on existing usage.) If we take the Python example, consider the first hit from searching on Python and pure function (http://interactivepython.org/runestone/static/CS152f17/Lists/PureFunctions.html): "A pure function does not produce side effects. It communicates with the calling program only through parameters (which it does not modify) and a return value." You may be right that this is inadequately restrictive, but in my experience this usage is very standard. What is more, this definition focuses on how the function acts, not on the type of the arguments to the function. And of course it does: Python programmers think about how arguments behave, not what type they are (to only slightly oversimplify). The view you are pushing, in contrast, would say that e.g. Python's `map` is not pure because it does not restrict its inputs to being tuples. Again, I am not disputing your point about the problems with passing references to mutable inputs; I am disputing that this entry is correctly representing the standard (language agnostic) usage of 'pure function'. Hope that's clear. Cerberus (talk) 18:19, 23 July 2018 (UTC)
- @Cerberus0 and Jochen Burghardt: I see your point and you are right Cerberus0, the definition of a pure function should only be about the behavior of the function (so the current definition is totally fine and in accordance with the sources), not about its interface (its parameter types). I admit I was wrong when I said that a pure function cannot have a mutable argument of reference type: for instance if it does nothing with it (no read or write), it's perfectly fine. The function becomes impure only if it uses the mutable reference argument for its return value (violating property 1) or if it mutates it (violating property 2). I updated the article to add simple user-defined examples of impure functions illustrating each case of impurity that we've talked about in this thread. At this point, I find the article quite satisfying.
- @Maggyero and Jochen Burghardt: This actually illustrates what is bothering me about the definition of purity that you prefer. (Again, I am not disputing your point, but I suspect that you are doing original research rather than reporting on existing usage.) If we take the Python example, consider the first hit from searching on Python and pure function (http://interactivepython.org/runestone/static/CS152f17/Lists/PureFunctions.html): "A pure function does not produce side effects. It communicates with the calling program only through parameters (which it does not modify) and a return value." You may be right that this is inadequately restrictive, but in my experience this usage is very standard. What is more, this definition focuses on how the function acts, not on the type of the arguments to the function. And of course it does: Python programmers think about how arguments behave, not what type they are (to only slightly oversimplify). The view you are pushing, in contrast, would say that e.g. Python's `map` is not pure because it does not restrict its inputs to being tuples. Again, I am not disputing your point about the problems with passing references to mutable inputs; I am disputing that this entry is correctly representing the standard (language agnostic) usage of 'pure function'. Hope that's clear. Cerberus (talk) 18:19, 23 July 2018 (UTC)
- Maggyero (talk) 07:54, 20 July 2018 (UTC)
- @Cerberus0 and Jochen Burghardt: "Additionally, the approach you want to take seems (iiuc) to imply that in a object-oriented language, any function that consumes a mutable object cannot be pure." That's not what I said. I said mutable reference type function parameters break purity, not all mutable function parameters. This is a big difference because it means that mutable and immutable value type function parameters and immutable reference type function parameters keep purity, not only immutable function parameters. Indeed, mutable value types like C++'s
@Cerberus0 and Maggyero: I'm numbering my issues to ease referencing the in replys.
- No objection to a simple global variable example. First,
rand()
came to my mind, but it has a side effect.time()
doesn't have, but it is difficult to sketch the global read in its body. Linux'feof
has the same issue aslength
: its reads an "end-of-file-encountered" flag inside the struct _iobuf its parameter points to. Maybe, we should just use a non-prominent, self-invented examplefoo
. What do you think? - - Yes, I think a C function with an array parameter (which is always a reference in that language) cannot be pure (unless it doesn't use this parameter). There are types like
const int *
, presumably denoting a pointer to an immutable range of integers, but recently I had to learn that theconst
can simply be cast away, and no compiler is allowed to complain about that (cf. https://bts.frama-c.com/view.php?id=1179#c3006); so,const
is just decoration in C. I'm not sure whether really immutable objects can be expressed in C++. I don't know C++ good enough for that, let alone other object-oriented languages. - - I understand that you have a different idea in mind about what "pure" should mean. However, I think your quotations are not consistent with that idea (however, they are somewhat vague about what is 'external", when are two arguments "the same", etc.). Maybe, there are different meanings of "pure" around in the functional language community? -
- You said: "the right way to think about it for the present purposes is that you have asked the function to operate on two different strings". But that means that an external state (viz. the memory pointed to) needs to be considered to check whether the strings are different or not. An alternative view could be to consider the location the pointer points to not as external, but as internal. But then, the same memory region can be internal to arbitrarily many functions; moreover, such a "shared internal" region can be of arbitrary length. This seems weird, doesn't it? I think, the notions "shared" and "internal" are mutually exlusive. -
- I you like, we could tag the
length
example with a "disputed inline"[disputed – discuss] annotation linking to this talk section, for now. - - The referenced cgg manual says "Note that a function that has pointer arguments and examines the data pointed to must not be declared const", where gcc's "const" (defined there as "functions do not examine any values except their arguments, and have no effects except to return a value") is called "pure" in the article. I didn't browse though your references; I understand you'll do that. - Jochen Burghardt (talk) 21:32, 18 July 2018 (UTC)
- @Jochen Burghardt and Maggyero: At this point, I am not requesting a disputed annotation. I understand why you link gcc's "const" and functional purity. I am not fully persuaded, but I get your point and it has force. I'll ping you if I can find this addressed explicitly in the literature.
@Cerberus0 and Maggyero: Did you find anything yet? In the literature, the employed notion of purity might be inferrable from given examples of pure and non-pure functions. When a single language in concerned, and purity is supported or checked by a compiler, we may try out a few border-case test programs. If necessary, we could split the required properties even further (such as 1 --> 1a "... depends not on memory pointed to by pointer parameters" , 1b " ... depends not on other internal or external state"), and elaborate on which author requires which properties. Of course, the most common definition (which probably is the current one) should be used as the main one; the deviating ones could be mentioned as less usual, in order of decreasing popularity. This way, we could even mention the Wolfram Language's "pure" notion. - I still didn't come up with a good simple global variable example. Do you know a good one? - Jochen Burghardt (talk) 13:53, 1 August 2018 (UTC)
In order to concretize my above suggestion, I wrote the following border-line example code for C.
// Run
// gcc -Wsuggest-attribute=pure -Wsuggest-attribute=const -O -c pure.c
// to obtain gcc's idea of which function may be const/pure.
int last = 3;
int proc0(int s[1],int t) { return t+9; }
int proc1(int s[1],int t) { return s[0]+9; }
int proc2(int s[1],int t) { s[0] += 9; return s[0]; }
int proc3(int s[1],int t) { return last+t; }
int proc4(int s[1],int t) { last = t+9; return last; }
int proc5(int s[1],int t) { last = s[0]+9; return last; }
int proc6(int s[1],int t) { s[0] += 9; last = s[0]; return last; }
Running the indicated command reults in three messages saying that proc0 might be candidate for attribute 'const', and proc1 and proc3 might be candidate for attribute 'pure'; they are consistent with my understanding of gcc's notion of "const" and "pure", and with the "Side-effect free functions" section of the article. Maybe a similar example code could be checked for other languages, in particular functional ones? - Jochen Burghardt (talk) 14:34, 1 August 2018 (UTC)
- @Jochen Burghardt and Maggyero: Did you see my comment of 23 July? I am still rather concerned that this article deviates from common usage. (However justifiable that might be, the article is not supposed to contain original research.) As for the global variable example, I fear I have failed to understand your goal. The usual example would be to have a global (e.g., file scope) identifier `g` and a function that loses its purity by referring to it (e.g., function `addg` takes one argument and adds `g` to it). This can be relatively harmless if the constancy of `g` is enforced and dangerous if the program mutates `g`, but in either case the function is not pure (by standard, language-agnostic usage).
- Separately, I do not see any point in mentioning WL's odd usage of "pure function". I believe this is more likely to confuse than aid any reader that turns to this entry for a definition. So I am mildly opposed to bringing it up.
- Cerberus (talk) 02:46, 2 August 2018 (UTC)
- None of the procs, proc0() through proc6() are pure because you are passing a mutable reference. A pure function is something that the user can rely upon to be referentially transparent, so they could memoize, or if they call multiple times, for some reason, would be idempotent. Kasajian (talk) 16:07, 10 September 2022 (UTC)
@Cerberus0 and Maggyero: I saw your 23 July comment, but I don't get your point. If you think the article still deviates from common usag, could you make a suggestion how you would like to change it? - As for the global variable example, my goal is to find a prominent one, that is, a function that many people are already familiar with (like "sin" for the pure case). If you have a suggestion, I'd appreciate it. If not, we can of course give a completely fictitious example. - As for Wolfram language, I think the article serves two puposes: (1) to elaborate on the common usage of "pure function", (2) to refer readers that search for an unusual notion of "pure function" to appropriate places. For (2), appropriate {{distinguish}} hat notes may suffice (in particular, if the notion is completely unrelated). However, in cases where no article exists to link to, we may add a sentence here. Can you explain in which sense the Wolfram Language uses "pure"? There is nothing about it in the article. - Jochen Burghardt (talk) 10:09, 2 August 2018 (UTC)
- @Jochen Burghardt and Maggyero: (1) My July 23 point is that you provide a definition that rules out the purity of many functions that programmers typically call pure. (E.g., you rule out any Python function that accepts a mutable argument, even when the function does not change the argument.) I believe you are pushing a nonstandard definition. The article should reflect actual usage and not do original research to "improve" on the current usage. (2) As for the WL usage, it is *highly* nonstandard. In WL, one can create functions with the `Function` command (and its shortcuts) or via pattern matching. The Wolfram docs call the former "pure" without requiring *any* other properties at all. Essentially, the term "pure function" in WL just means "function literal". As I said, I think even bringing this up is likely to be confusing to most readers.
@Cerberus0 and Maggyero: Thanks for your explanations. (1) I understood your July 23 point now. My "length" (counter-)example was intended to be in accordance with your definition of "pure", and I still wonder how one could believe that it satisfies property 1. If this isn't standard, it should be easy for you to give some references for that. (I don't have references for or against; therefore my 1 Aug edit included a suggestion on how to achieve such references.) In case you are right, we should clarify property 1 accordingly; the sentence about "const" in C meaning property 1 and 2 should be adapted. (2) I agree with you that Wolfram's notion of "pure" is quite different. As a consequence, however, I suggest to warn about it rather than to conceal it. After all, a Wolfram programmer would look up pure function in order to learn more about the odd concept of his language; Wikipedia should tell him to look somewhere else. Maybe, you could add a sentence at Wolfram Language, and we could just add a "distingsuish" hat note? Would that be ok for you? - Jochen Burghardt (talk) 16:16, 5 August 2018 (UTC)
- @Jochen Burghardt and Maggyero: (i) See the new section Language-Specific Usages. Perhaps that addresses your concern about WL? I think C++ usages should be moved here too. (ii) My 23 July post includes a reference that I consider typical. (iii) Your most recent response still seems to me to conflate two things: what is the best definition of pure function for computer science, and what is the current language-agnostic usage of pure function in computer science. The former seems to me to involve original research (i.e., to outside the article scope); the later is (I believe) the intended scope of the present article. If you are not persuaded, I plan to let this point go. But then perhaps, based on the reference I provided July 23 (which I claim is typical), a clarification should be provided in the new section that essentially no Python functions are pure under the article definition. Cerberus (talk) 20:31, 8 August 2018 (UTC)
@Cerberus0 and Maggyero: (i) The new section Language-Specific Usages is fine; moving the gcc/Fortran stuff to there seems a good idea to me. (ii) I read again your Python citation from 23 July. When I translate it into C, it will read the a_list
data through the parameter pointer, which I considered non-pure (like my string length example), while the author calls it pure; so it seems I was wrong. However, before I give in, I wonder if a part (say, b
) of the actual parameter (say, c
) could be shared, and modified by another function (say, foo
), such that two calls of doubleStuff(c)
will return different lists (sorry for the syntax errors, I don't know Python details):
def foo(a_list):
a_list = [22, 33, 44]
b = [2, 3, 4]
c = cons(1,b)
val1 = doubleStuff(c)
foo(b)
val2 = doubleStuff(c)
Wouldn't that make val1=[1,2,3,4]
, but val2=[1,22,33,44]
? Possibly, my suggestion for foo
's body won't achieve that, but another implementation of foo
could? (iii) I don't understand what you mean by best definition. I think I agree with you that the main usage of pure function in computer science should be presented in the first place; from your citations I took it that it is language-agnostic and comprises property 1 and 2 as stated in the lead (I guess you agree with that). A remaining possible difference between us could be the issue of reading via parameter pointers, cf. (ii). - Jochen Burghardt (talk) 17:08, 17 August 2018 (UTC)
Advantages and disadvantages
editWould be very good to add advantages and disadvantages of pure functions in comparison with impure ones. Should everyone try to always stick to pure functions? — Preceding unsigned comment added by 192.76.172.134 (talk) 10:58, 25 May 2016 (UTC)
- I believe the pureness of a function is just a description of its behavioral characteristics. There's no necessary advantage or disadvantage between for example: sqrt(x) and write(fd,buf,len). Dannyniu (talk) 07:29, 31 October 2017 (UTC)
cf. Reentrancy
editIt would make sense to compare and contrast against Reentrancy (computing). They seem to be very similar concepts. Wootery (talk) 21:20, 3 December 2016 (UTC)
- I'd agree if that'd help the reader from getting confused. One of the key differences between functions being reentrant-safe and being pure is that: reentrant-safe functions may have observable side effects that are safe to ignore, while pure functions do ***not*** have any side effect at all. Also, in the "background" section, it is explicitly said reentrant-safe functions may produce different output on different invocation with same input, which is another key difference. Dannyniu (talk) 07:38, 31 October 2017 (UTC)
This article should be deleted
editA function is pure if behaves as a mathematical function which does not produce side effects. In functional languages programs are functions. The concept is simple, but many programmers confuse procedures with functions and memory locations with variables because those terms have been used so in imperative programming languages since 1953 when Fortran was written by John Backs. When they are aware of that and understand what is a mathematical function everything is clear to them. That is all. — Preceding unsigned comment added by 2806:107E:4:FEDB:649E:5342:FF86:D820 (talk) 11:59, 6 February 2018 (UTC)
- Yes, the article is horribly misleading. I tried to fix the introduction. Cerberus (talk) 17:29, 5 July 2018 (UTC)
- Keep, it's overall correct at this point. Also, it's John Backus. 199.227.43.198 (talk) 17:22, 31 January 2020 (UTC)
sin() is not pure.
editThe Examples section lists sin() as a pure function, but this is not the case. Since sin() operates with floats, it is subject to the IEEE rounding mode which can be changed at runtime.
If sin() were pure, the following assert would not fail:
#include <fenv.h> #include <math.h> #include <assert.h> #include <stdio.h> int main() { float a; float b; float c = 27.427896; fesetround(FE_DOWNWARD); a = sin(c); printf("%f\n", sin(c)); fesetround(FE_UPWARD); b = sin(c); printf("%f\n", sin(c)); assert(a == b); // this assert fails since sin depends on IEEE rounding mode. return 0; }
As sin() depends on external state, it should not be listed under "pure functions".
C++ constexpr
editconstexpr is not an indicator of "optimizable" purity. In particular, templated constexpr functions can still call impure functions, and as higher-order functions call other impure functions, so they are not safely optimizable, and as mentioned above, the supposed purity of constexpr is never mentioned in the C++ standard. SlowByte (talk) 20:06, 19 August 2019 (UTC)
- Maybe, an appropriate remark or/and an illustrative example should be added in the article, in order to avoid future re-insertion of the sentence you deleted today? - Jochen Burghardt (talk) 19:21, 19 August 2019 (UTC)
As someone mentioned above, I think "compile-time evaluation" is a separate aspect from purity, but not sure how to incorporate this into the article. Also, the article should probably discuss the "transitive" nature of purity, ie. for a function to be pure, all the functions or operations it calls must be pure as well. Leaving it as a TODO...
Here's some valid example C++ code to illustrate the constexpr
point:
#include <iostream>
using namespace std;
int foo()
{
cout << "foo" << endl;
return 2;
}
constexpr int bar(int (*f)())
{
return 1 + f();
}
template <typename T>
constexpr int bar2()
{
return 1 + foo();
}
int main(int, char **)
{
auto n = bar(foo) + bar2<void>();
cout << "bar=" << n << endl;
}
"mutation of a local static variable" example is misleading
editFollowing code snippet has been provided as an example of impure function due to "mutation of a local static variable":
void f() {
static int x = 0;
++x;
}
While this snippet is impure, it's not because of mutating a static, but because it's an UB to overflow a signed integer. It's allowed for pure functions to crash or never return, but they should do this consistently. Given that f()
in example accepts unit type input, it should either always produce UB or never, to be pure.
So, fixing UB,
void f() {
static unsigned int x = 0;
++x;
}
Now, given that overflowing unsigned values is well defined, this snippet is pure. Even despite it's mutating a static, value of x
is unobservable outside of f()
, no other function can read it to change their behavior, so it doesn't objectively exist for the purpose of reasoning about purity.
In the light of things explained above, I'm going to split that example in two. I want to keep an original example with explanation that it's UB that makes it unpure, and provide a counterexample that looks like impure, but is actually pure (but poorly written). Toriningen (talk) 20:30, 5 March 2020 (UTC)
- What is "UB" supposed to mean? - Jochen Burghardt (talk) 20:42, 5 March 2020 (UTC)
- It's a common abbreviation for undefined behavior. Toriningen (talk) 20:47, 5 March 2020 (UTC)
Reference arguments
editThe article is mistaken in stating that property 1 excludes only mutable reference arguments from affecting the return value, it actually excludes all reference arguments. It would be inconsistent to treat non local variables and reference arguments differently in this regard. Cf in c++ the difference between gcc attributes const and pure, where const modelizes a pure function as defined in this article, and pure is actually a relaxation of this concept which allows const (non-volatile) reference arguments and non-local variables to affect the return value (but still not allowing side effects, so observable writes on mutable ref args and non-local vars). To be specific, at the risk of being obvious, non-const variables bind to const reference arguments, that is to say, the const qualification on reference arguments only restricts the function implementation, not the original variables used as the reference arguments.
Example:
float foo(const int& a); // { return a; }
int i = 0;
float f1 = foo(i); // int& binds to const int&
i = 1;
float f2 = foo(i); // int& binds to const int&
assert(f1 == f2); // would hold if function was pure in the strict sense, but actually incorrect
So gcc considers reference arguments as pointer arguments for const/pure attribute purposes, and equality is pointer equality, not object equality, so the preclusion for gcc const on examining pointer data applies to reference arguments (we can't examine them in any way that affects the return value or the observable state of the program, so they have a very narrow use case).
This is actually related to the length example used in the article, which in the idea it gets right, but actually is wrong for a different reason... The c function strlen is indeed not pure (but is pure in the relaxed gcc sense) since, as the article states, it depends on the "memory contents where the string points to". However, this argument does not apply to the length function of std::string, since it does not directly depend on the memory and is usually stored separately in the std::string object. Still, this function is not pure, but due to being a member function and hence depend on the implicit this pointer (it is however pure in the relaxed gcc sense).
Article needs an overhaul
editSomething has gone terribly wrong in this entire article. So much of the information is simply incorrect. It also conflicts with the article "Purely functional programming".
In purely functional programming language such as Haskell, it is clearly defined that a function invoked in a different context or at a different time with the same arguments, will produce the same result.
The reason for the confusion are:
- Typically, pure functional languages assume everything is pure by default and those functions that are not pure are annotate or typed explicitly in as to what type of impurity is at play, such as I/O
- Whereas functions in imperative languages by default are not pure, those some of these languages, as they've been extended with functional capabilities have introduced the ability of adorning functions with a purity attribute.
Thus, there is more original source material out there that discusses what is meant by a pure function in the imperative world than the functional world because in the imperative the concept requires a lot more explanation and definition. Whereas in the functional world, it's the default.
Furthermore, many of the definitions in the imperative world have augmented the definition of purity to the point where it conflicts with the core definition, referential transparency being one of these requirements.
Wikipedia articles must refer to public sources not original material, so however wrong the definition of pure function in these other context may be, they must be dealt with. But this article seems to focus on these aspects as the primary examples of pure functions, which is misleading to say the least.
My request is for this article to be completely rewritten and fall inline with other articles such as "Purely functional programming". I do agree that it would be a good to have examples in imperative languages because of the wider audience, but let's ensure we stick to original source material of what is meant by functional purity in the mathematical sense, and have separate sections where deviations are indicated each case, for each popular language or framework that reasonable number of people may encounter.
This is a big ask, but if we can find the right person to do the job, I suggest there be a way to fund the activity to ensure it gets the attention, and gets done right. I would be willing to throw in a few bucks just to fix all this. — Preceding unsigned comment added by Kasajian (talk • contribs) 16:31, 10 September 2022 (UTC)