Talk:Self-modifying code
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||
|
TODO
edit- an example and discussion of 'high-level' self-modifying code such as in LISP.
- examples and discussion of traditional uses of self-modifying code, such as in graphic Blitting units, specialisation of algorithms (like sort with embedding a cmp), and in interpreter kernels.
Is a thunk and/or a trampoline (computers) also a kind of self-modifying code? --DavidCary 03:01, 18 August 2005 (UTC)
- Not unless the code is generated at run time - and, even there, unless existing code is overwritten, I wouldn't call it self-modifying code. (I.e., I think this article should apply only to cases where existing code is modified; I don't think the generation of new code at run time can reasonably be described as code doing "self-modification".) Guy Harris (talk) 08:10, 15 May 2022 (UTC)
have never written any self modifying code, but example of state-dependent loop doesn't look quite right? Maybe a misplaced curly-bracket? --(AG)
- I'll check the brackets, actually state depedant loops are a sort of self-modifying code I've coded a few times on 8bit machines, when state transition is not freq, esp if altering just the arg of an opcode, thus using a faster instruction (eg, on the 6502). Code-generation is 'still' relevant and useful, eg 'compiled bitmaps' during the 90's, and specific rendering code today.Oyd11 00:42, 13 June 2006 (UTC)
I suggest removing the entire Synthesis section, along with Massalin, Haeberli, and Karsh, on notability grounds. Marc W. Abel 15:12, 26 April 2006 (UTC)
- Futurist Programming should probably be the new link, although the original author of this document should probably check as they would know whether this is the correct article.
Javascript example: really self-modifying?
editIt seems to me that the Javascript code example is not self-modifying. The action variable is merely a function pointer that points to two anonymous functions in the course of its life. All the code from this example could be put in read-only memory and it would execute without problems. Where am I wrong? Sarrazip 02:17, 19 December 2006 (UTC)
I agree with Sarrazip. In addition, I think the placement of the Javascript example does not go under the section of "Interaction of cache and self-modifying code".
I have removed the Javascript code example, since no one has objected for several months. Sarrazip 03:05, 21 May 2007 (UTC)
Obj-C
editPossibly Obj-C code in addition to LISP? It's the only object oriented superset of ANSI C that I know of that really implements it as a base feature. [1] --Electrostatic1 08:51, 15 May 2007 (UTC)
Self-modifying code in self-referential machine learning systems
editI think there should be a section on self-modifying code for machine learning along the lines of Jürgen Schmidhuber's work on meta-learning: http://www.idsia.ch/~juergen/metalearner.html Algorithms 20:54, 4 June 2007 (UTC)
Dead link
edit- "Synthesis: An Efficient Implementation of Fundamental Operating System Services" -Henry Massalin's PhD thesis on the Synthesis kernel
This appears to be a dead link. Which makes me sad. I was really looking forward to seeing a self-modifying kernel! Guess it's time to whip out Google.
66.93.224.21 11:24, 5 June 2007 (UTC)
- I replaced it with something found on google.
Reentrant self-modifying code is possible
editMany years ago, I had to write a piece of code which was reentrant - because of a number of constraints imposed by my interrupt handling methods - but also needed to be self-modifying. To explain: an input was N, the number of a record in a file, and the assembler supplied only one type of TRAP instruction - with a constant. An earlier generation of the application used a TRAP instruction (it was on a PDP-11) thus:
:READ_N: ; R5 points to (unsigned) record number N (assumed <=255 and non-zero) : MOVB 2(R5),10$ ; Modify the TRAP instruction :10$: TRAP 0 ; Read "something" : RETURN
and I needed to retain the mechanism, but make also to make it reentrant.
The solution was simple: have a "virgin copy" of the code available (but never called directly). When it was needed, it was copied it to the top of the stack, together with "cleanup code", where the copy was modified, and executed; finally, the cleanup wiped the stack of the defiled code. All I can say is that it worked.
My simple statement about self-modifying code is this: in bootstrap code, it's fine - but elsewhere: DON'T EVEN THINK ABOUT DOING IT! (Especially where reentrancy is a prerequisite ...) Hair Commodore 18:57, 16 September 2007 (UTC)
- I've corrected the above code: the error was in the byte addressed - the low byte of a TRAP instruction was to be altered, not the high byte. (It's a long time since I've used a PDP-11 at assembler level - sorry!) Hair Commodore 20:16, 22 September 2007 (UTC)
- Awww, go on, it's not all that bad. What is required is a calm attitude and appreciation of the actual environment. By using the stack working area, you ensure the avoidance of clashes in a quite proper way. This is what multi-stack designs are all about, and by writing in assembler (with proper commentary) you need not be constrained by the shibboleths of the prating orthodoxists of flabbier computer languages that constrain themselves and declare it good. In other words, I have misbehaved also, and declare it good. NickyMcLean (talk) 19:51, 18 December 2008 (UTC)
- I'd call that "run-time code generation", not "self-modifying code". Unfortunately, this page covers both topics; there is some overlap, in that code could generate other code at run time on top of existing code, which involves both run-time code generation and self-modifying code, but some self-modifying code doesn't involve particularly sophisticated code generation, and run-time code generation may generate new code into "empty" memory rather than replacing existing code. I'd classify what you did as the latter; the generated code is on the stack, the code you copied is just a template used by the code-generation process. Guy Harris (talk) 08:18, 15 May 2022 (UTC)
JIT?
editMaybe I'm being nit-picky, but I don't think a just-in-time compiler falls into the category of self-modifying code, any more than any other compiler would. It generates some code, and then transfers control to it. It doesn't really alter its own behavior. And in the same vein, I don't think that uncompressing some otherwise static code and then running it qualifies as self-modifying, either. I would reserve the term for code that modifies its own behavior at it is running. Maybe it's a rather vague concept, though. Deepmath (talk) 11:02, 15 July 2008 (UTC)
- I utterly agree with the above statement: JIT is not self-modifying. The code itself is only being generated instead of self-modified. The compiler itself or the virtual machine never gets modified. Un(de)compressing doesn't yield any self-modification. It'd be the same to say that loading dynamic libraries (or any libs for that matter) is self-modification. Running any code by an OS, thus, can be viewed as self-modification.
Bestsss (talk) 12:43, 18 December 2008 (UTC)
- If I understand correctly, "Just-in-Time" compilation is equivalent to compiling the whole lot once at the start in that the resulting executed code in the part that is being executed would be the same. The advantage is presumably that no compiler effort is wasted on execution paths that will not be taken on the particular invocation, and that the compiled code will run faster than the interpretation of the text especially if there are loops. By contrast, consider a prog. whose purpose is to assess the workings of some routines for numerical integration such as Simpson's rule, etc. One requirement would be a variety of functions to be integrated and they might be incorporated via a tiresome "case" statement or similar. Otherwise, The test prog. could read from an input file the arithmetic statement defining the function, encase that text in suitable text for the definition of a function f(x) in the language of choice, pass the whole to the compiler and link to itself this new function that can then be invoked by the testing procedures as if it had been a part of the whole compilation all along at full compiled speed; no messy "case" statement and selection of function one, then function two, etc. The difference here is that arbitrary different code would be produced, depending on the list of arbitrary test functions supplied to a particular run. NickyMcLean (talk) 19:37, 18 December 2008 (UTC)
- That's almost correct. JIT compiles when's needed (that for example may mean interpreting a few lines that are never executed any more, like the main method, saving time for useless compilation), JIT may recompile with eager optimizations (escape analysis, inlining, etc). It simply compiles, it doesn't modify itself ever. It can change the compiled code on-the-fly but still that's not self-modification at any rate. I see self-modification only when a program changes the initial code that has been loaded from an external media (network can be considered so) and 'already' run (so decompression doesn't fit). Bestsss (talk) 09:53, 21 December 2008 (UTC)
extremely fast operating systems and applications?
editUnder the heading "Henry Massalin's Synthesis kernel" it is claimed that
Such a language and compiler [based on Massalin's techniques] could allow development of extremely fast operating systems and applications.
This sounds like pure speculation to me. —Preceding unsigned comment added by 62.73.248.37 (talk) 20:10, 28 March 2009 (UTC)
Monkey patching
editI think there should be reference to the article about monkey patching and vica versa. Monkey patching is a structured and formalised way to do self-modifying code in a an interpreted language. At least Monkey patching could be listed in "See also". What do you think? --Jarl (talk) 06:12, 1 May 2009 (UTC)
Should Lisp get its own section?
editLisp has self-modifying code unlike any of the other languages, in fact, a running lisp programming modifies itself the whole time. I would say that Lisp is unique in how it modifies its own code because Lisp has no boundary between data and code, data is stored in Linked lists in lisp, instructions are data with the head being the operation and the tail a list to operate on. Just 'dumping' data in the main runtime is interpreted as executing it accordingly that pattern. In that sense, unlike JavaScript or Perl, Lisp doesn't modify its own syntax or evaluates a string as if it were an expression. Lisp has no syntax, S-expressions are just a convention to encode linked lists, but anything that encode linked lists can create isomorph programs to those in S-expressions.
Therefore, if it's okay with you people I'd like to add a section on Lisp families because they treat self-modifying code in a unique way. Rajakhr (talk) 22:23, 23 January 2010 (UTC)
- Interpreted languages generally have (or could have) self-modification arrangements though these are usually via some special form or modification of the disc file containing the statements. An "eval" statement is a step further away from self modification. But Snobol contains features that could be regarded as self-modification (as during pattern matches), and also contains its source statements as a text array open to manipulation. So it is not just Lisp. If you prepare some examples, explanations will be needed for non-Lispers. But would they introduce a new idea? Such as demonstrating some desirable action by routine use of self-modification? NickyMcLean (talk) 20:45, 25 January 2010 (UTC)
- Well, Lisp can be compiled and have self modifying code as lisp doesn't really have code/syntax is the idea, a lisp implementation is an engine that rewrites symbols in lists, any way to specify lists will do in the end. Rajakhr (talk) 18:51, 25 February 2010 (UTC)
- I wouldn't really say that Lisp uses self-modifying code in the traditional sense. Usually the code transformations happen at compile time (although run and compile times can be interleaved in Lisp) and actual runtime "code modification" happens by the same way it would in a C program eg. by pointer reassignment. You can build new code by compiling S-expressions at runtime but it's conceptually similar to (and sometimes implemented with) a C program externally compiling and linking in new code. TokenLander (talk) 20:20, 3 March 2010 (UTC)
- Well, Lisp can be compiled and have self modifying code as lisp doesn't really have code/syntax is the idea, a lisp implementation is an engine that rewrites symbols in lists, any way to specify lists will do in the end. Rajakhr (talk) 18:51, 25 February 2010 (UTC)
Simplify maintenance?
editQuote from the frist line: "In computer science, self-modifying code is code that alters its own instructions while it is executing - usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance."
How does self modifying code simplify maintenance? It seems like it actually makes maintenance harder since it is usually more difficult to figure out what the hell is going on. —Preceding unsigned comment added by 129.65.117.57 (talk) 23:39, 21 February 2010 (UTC)
- The answer to your question is in the statement quoted above 1. "usually to reduce instruction path length" and 2. "simply to reduce otherwise repetitively similar code".
- If there are less instructions in a path, there is less code to verify for correctness (virtually or actually) ; if there are less repetitive lines of code, there are less instructions to check and/or to go wrong.86.142.85.194 (talk) 06:14, 28 May 2013 (UTC)
bullshit
editThe paragraph claiming that late binding as "can be regarded as self-modifying code" is pure, unadultereated, farcical bullshit. It is completely at odds with any useful definition of self-modifying code; that is, if virtual functions are self-modifying, *everything is*. Not only is the paragraph wrong, but it's also completely unsupported by actual citations and references to literature. I will remove it shortly if nobody objects. —Preceding unsigned comment added by Quotemstr (talk • contribs) 20:35, 17 May 2010 (UTC)
- Agreed. Oli Filth(talk|contribs) 21:23, 17 May 2010 (UTC)
King John question
editThe article fails to explain the relation between self-modifying code and "von Neumann" computer architecture. I think any hardware that can allow self-modifying code to run in at least one operating system is identically equal to a "modified von Neumann architecture" computer? Is that right? 82.131.210.163 (talk) 17:43, 24 April 2012 (UTC)
- I don't see "von Neumann" anywhere in the article, but I think that the relationship is that the von Neumann architecture envisaged a computer with a single memory space comprising both data and instructions. Self-modifying code would require that a program be able to treat a piece of memory as data and then reach it as an instruction, and would not be possible on a non-von Neumann computer with, for example, separate code and data spaces. (Who is King John?) Spike-from-NH (talk) 00:26, 25 April 2012 (UTC)
- I'm not sure what a "modified" von Neumann architecture is; a modified Harvard architecture could be one of at least three types:
- an architecture with one address space for both instructions and data, but with separate instruction and data caches and separate buses between the CPU and the two caches, i.e. what the "modified Harvard architecture" page calls a "split-cache architecture";
- an architecture with separate address spaces for instructions and data, but with instructions that allow fetches from, or stores into, the instruction address space, i.e. what the "modified Harvard architecture" page calls a "instruction-memory-as-data architecture";
- an architecture like that of the Maxim Integrated MAXQ family of processors, which the "modified Harvard architecture" page calls a "data-memory-as-instruction architecture, and which is not exactly easy to explain (MAXQ is different).
- The only one of those that I might call a "modified von Neumann architecture" would be the first of them, as, for the vast majority of operations, it is indistinguishable from an unmodified von Neumann architecture. The primary difference visible to most user-mode code is that, on some such architectures, stores must ensure that all caches, whether instruction, data, or unified, be updated or flushed (this is the case on x86, for backwards compatibility with older processors without any caches or with only a unified cache) and, on others, a store is not guaranteed to flush instruction caches, and the architecture defines an instruction or instructions to force a flush, and attempts to execute the code being modified are not guaranteed to work correctly until after the instruction completes (this is the case on SPARC, for example).
- In the case of x86, I think there might be instruction-pipeline issues that require some care when modifying code, dating all the way back to the 8086; a case could perhaps be made that any architecture on which there is no guarantee that casually storing into the instruction stream will Just Work is a "modified von Neumann architecture", even if it's not a "modified Harvard architecture" in the sense of "it has, at some level, separate buses for fetching instructions and data, even though code and data are in the same address space and the same physical memory".
- So I don't see any way in which "any hardware that can allow self-modifying code to run in at least one operating system" is required to be "a "modified von Neumann architecture" computer" - no modification to the von Neumann architecture is necessary to support that. A particular architecture might be "modified" in the sense that it requires that care be taken when storing into code space, but an architecture could also require that no code need be taken, and require implementations to do whatever is necessary to make that be the case.
- I.e., being a von Neumann architecture in the address-space sense is sufficient; a "split-cache architecture" is a pure or modified von Neumann architecture in the address-space sense, and a modified Harvard architecture in the "bus between the CPU and the lowest level of caches" sense.
- Even the other flavors of modified Harvard architecture could conceivably support self-modifying code. Guy Harris (talk) 07:22, 15 May 2022 (UTC)
- I'm not sure what a "modified" von Neumann architecture is; a modified Harvard architecture could be one of at least three types:
Apple II copy-protection citation?
editDoes anyone know if there is a citation for the anecdote of using self-mod code as a copy protection technique on the Apple II? I remember reading about it somewhere 20 years ago when I was 'into' the Apple II in high school (back when "20 megabytes" was considered "really in-humanly humungously big" LOL) Jimw338 (talk) 21:35, 14 March 2013 (UTC)
Really bad
editThis article is really bad, IMO. It seems to wander around without really declaring a purposeful path through a series of poorly illustrative examples. I'm not sure I truly understand what historically the term "self-modifying code" meant, but this article does little to remedy that. Some examples:
- distinctions are drawn for initialization and "on-the-fly" modifying of code, but it is really unclear why this is a meaningful distinction to be made
- a division between low-level and high-level languages is also embedded in the article
- this seems to imply that self-modifying code is somehow related to a choice of language, which seems at odds with generic computation theory
- the provided examples seem remote from an average reader
- the low-level examples seem to focus on arcana without grounding the examples in practical reality
- these examples seem teleological
- it's unclear whether the low-level examples are historical curiosities, or still have practical value
- the high-level examples seem to just be examples that it "can be done" in the language, leaving a practical example for the reader to intuit
- many of the listed languages aren't mainstream
- the low-level examples seem to focus on arcana without grounding the examples in practical reality
- there seems to be some unstated assumption running through the article about the useful domain of applicability of self-modifying code
- there aren't "side-by-side" examples of self-modifying code and non-self-modifying code, allowing for an apples-to-apples comparison of techniques — Preceding unsigned comment added by 70.247.175.126 (talk) 05:56, 5 November 2015 (UTC)
External links modified
editHello fellow Wikipedians,
I have just added archive links to one external link on Self-modifying code. Please take a moment to review my edit. If necessary, add {{cbignore}}
after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}}
to keep me off the page altogether. I made the following changes:
- Added archive http://web.archive.org/web/20160105054118/http://public.carnet.hr/~jbrecak/sm.html to http://public.carnet.hr/~jbrecak/sm.html
When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).
An editor has reviewed this edit and fixed any errors that were found.
- If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
- If you found an error with any archives or the URLs themselves, you can fix them with this tool.
Cheers.—cyberbot IITalk to my owner:Online 02:57, 28 February 2016 (UTC)
Proposed merge with Out-of-order execution
editThis appears to just be a possible subset or alias of self-modifying code. Ethanpet113 (talk) 07:47, 18 November 2018 (UTC)
- Don't merge – Out-of-order execution appears nicely? self contained generally about pre-emptive execution, and has come to the fore with some recent exploits. I take Self-modifying code as a different subject though there may be arguably be a slight overlap. But in all events I wouldn't regard this merge as likely to produce a helpful result.Djm-leighpark (talk) 10:26, 18 November 2018 (UTC)
- Don't merge - absolutely not. OOE is a CPU's designers "modifying" a programmer's code whereas self-modifying code is a programmer's code modifying itself. Also, the former requires custom engineering in silicon and is more properly a subset of CPU design than any kind of subset of self-modifying code. Michaelmalak (talk) 17:59, 18 November 2018 (UTC)
- Don't merge - Different topics. Code isn't self modified when executed out of order. ~Kvng (talk) 14:08, 21 November 2018 (UTC)
- Don't merge - Regarding out-of-order execution as similar to self-modifying code requires quite a twist of thinking. Like, at run time, code protected by an IF-test is executed on the assumption that the result of the IF will be the same as before, but, this time it will not be and the code should not have been executed, so the results of that execution are undone (except, ho ho, for the contents of on-chip memory caches) so that could be regarded as code modification, in that the results of the code are modified by being undone. Sortof... So, nope. Or possibly, that the apparent machine code, as executed by the microcode interpreter on the fly by the actual hardware (well, with its own programmable logic gates) involves activities that are not in direct correspondence to the static machine code and further, the mix changes on each iteration as well, so all this is dynamic code modification as the hardware churns through the microcode that implements the nominal machine code. Humm. Nothing like a piece of machine code modifying some other stretch of machine code, without reference to hardware running microcode. NickyMcLean (talk) 11:16, 22 November 2018 (UTC)
- Closed as no consensus to merge - these are different concepts as used for different reasons. Ostrichyearning3 (talk) 00:36, 25 November 2018 (UTC)
Massalin's Synthesis kernel
editIs Massalin's Synthesis kernel really relevant here? I propose removing it. Peter Flass (talk) 16:26, 11 January 2019 (UTC)
- Keep - surely it is an example of self-modifying code, and done by an operating system to itself? As distinct from an operating system modifying its various internal tables or linked-lists to activate or terminate a new task.NickyMcLean (talk) 08:49, 12 January 2019 (UTC)