Talk:Xenos (graphics chip)

Move

edit

Suggest moving to Xenos (GPU). No assertion of usage prominence over other uses for "Xenos". Would favour Xenos (Greek) under first precendences.--ZayZayEM (talk) 05:55, 10 December 2007 (UTC)Reply

It would make sense to me, as the concept is the source of the other names. Other possibilities though would be to point this to the disambiguous page. --Falcorian (talk) 17:49, 10 December 2007 (UTC)Reply


Should it be Xenos (GPU), Xenos GPU or Xenos (graphical processing unit)--ZayZayEM (talk) 02:09, 11 December 2007 (UTC)Reply
I am third his. am doing something about it. Naming it in similar way Hollywood chip was named Hollywood (graphics_chip)Oldag07 (talk) 18:13, 9 December 2008 (UTC)Reply

So, how was the vertex rate figured?

edit

In this article, they seem to have come to a "1.5 billion vertices per second" figure from somewhere. I would assume they thought that since a typical polygon is a triangle with three sides, 1.5 billion vertices, would form 500 million polygons. So it seems they worked backwards from the "500 million" figures released by Microsoft, and just assumed 1.5 billion vertices from there.

It's my understanding, that typical gpu listings for this figure, calculate vertex shader processing, by assuming it takes 4 clock cycles to complete the simplest vertex positional transform (matrix * vector). So for every vertex shader you have, = 1/4 the clock frequency. That's true of all gpus from what I can tell. And since all polygons in a mesh could (theoretically) be sharing a vertex with it's neighbor in strips and fans, the theoretical maximum polygon count is sometimes "considered" 1 to 1 with vertex rate.

The problem here, is that (being unified) all of Xenos's alus could potentially be processing geometry at once, such would be the case in the z-only pre-pass. But it wouldn't make much sense to list that as a per second figure of 6 billion. And technically, 1 block of alus devoted to vertex work, would be 2 billion vertices per second. But again, even that wouldn't make sense, for a list of reasons.

So, Microsoft listed the "set-up" limit in their specifications. That would be the maximum you could actually draw on screen, after back-face and occlusion culling, etc.. And with a reasonable number of vertex shader instructions (outside of simple transform), you would avoid reaching that limit.

I'm not sure how it should technically be listed, but isn't the 1.5 billion figure ad-lib? It's derived from the "500 million" set-up limit, and quoted as if it's the same measure as you see in other gpus. (which is a listing of transform rate, not set-up rate) And it lists 16, because that's the number of alus in a single simd block of shaders. You could just as easily devote more than 1 simd blocks of shader alus to vertex work if you found use for it. Or less.

Anyone agree / disagree? Thanks. I won't modify it personally, but I doubt it'd be written up like that officially. Swapnil 404 (talk) 06:49, 15 September 2009 (UTC)Reply

Double shader preformance for the Xenos?

edit

One of the lead engineers who was working on the Xenos (His name escapse me at the moment) stated that the xenos is capable of 96 billion shader ops per second, thats twice the ammount stated by Microsoft, Im assuming that the piplines now do 2 vector4 ops and 2 scalar ops, so 4 ops per pipline and 48*4*500,000,000=96,000,000,000 shader ops per second, I dont know if this is true, or what I said just then made any sense, im just wondering if anyone can confirm or disprove this but if its right can you please post this on the article (If im right I think it may effect everything and definetly make the flop count per pipline 20). —Preceding unsigned comment added by Gears, Gears, Gears (talkcontribs) 02:49, 29 January 2008 (UTC)Reply

Yeah, I remember when he said that. Feldstein I think it was. But I don't think he was implying shader ops. Perhaps shader flops. 4 flop madds per cycle, per shader. Rather than just listing vector and scaler. It could be, that you can issue a vector, scaler, vertex fetch, and texture load, all with-in one instruction cycle. That would be 4. And it would contrast Nvidia, because they hadn't decoupled their texturing ops from shader ops, etc.. And one stalls the other.Swapnil 404 (talk) 02:00, 23 February 2008 (UTC)Reply
I remember seeing 64 unified shaders way back before the official releases, was that a downgrade or just speculation? —Preceding unsigned comment added by 201.81.199.213 (talk) 18:02, 30 April 2008 (UTC)Reply

Shader flop or op count

edit

I thought when they said shader op they were refering to shader ALU operations per second, but any way that means that for the PS3's RSX so, because it preforms 2 scalar, 2 vector and one fog (Is a fog op similar to a vector, but missing a colour value?) op per pipe, so should we change it to shader flops per second for both? Or am I completly wrong. —Preceding unsigned comment added by Gears, Gears, Gears (talkcontribs) 03:52, 25 February 2008 (UTC)Reply

They were, for xenos. Just vector + scaler. They didn't consider anything else, like interpolation units, fetch units, etc.. I was just suggesting the fetching and loading as possible examples of what Feldstein meant by 4 ops, outside of it just being him misspeaking.
RSX can't compute a shader, in the same cycle it issues a texture fetch. And there are still situations where one alu stalls the other, etc.. Xenos has separate logic for such tasks. So on RSX, issuing a texture :fetch, (and some of its latency) cuts directly into the shader operations, while on Xenos it doesn't.
Really though, you'd have to specify what is meant by "shader op", as it could be any number of things really. (fp16 normalize could be considered one. I doubt they consider the fog alu an op, as I "think" it's there, more for legacy software, and modern games compute fog in the shader itself) And things like those mini-alus, seem to be there, to meet the shader model 3.0 specification requirements)
Just as a base figure, an rsx shader alu, can do 4 flop madds, for vector and scaler ops. (vector3+ scaler, vector2 + vector2, or Vector4, etc)
(madd is multiply + ADD, considered 2 flops)
So, 2 alus per shader, each capable of 4 flop madds, madd = 2 flops.
24 x 2 = 48 x (4 x 2) = 384 flops per clock.
http://www.watch.impress.co.jp/game/docs/20060329/3dps309.htm
But then, you could ask, where are the vertex shaders considered in those figures. (could be just a ps slide)
But it is a slide meant for developers, so no need to get counting every flop you can find to inflate the number, etc.. Swapnil 404 (talk) 20:47, 29 February 2008 (UTC)Reply
Truth is my knowledge of the subject is more limited then your own, but I would just like to know, should we consider the shader op count on this page 96 billion per second and keep it so on this page, or should we put it to something you see as correct. Also since you seem to be more informed about the Xenos than me could you work out how many programable flops there are per clock for the Xenos? Gears, Gears, Gears (talk) 09:12, 5 March 2008 (UTC)Reply


Neah, I wouldn't change it on my own really. I've read the 96 billion quote, but I don't think it implies what's considered raw shaders. There are a bunch of things you could consider as a "shader op". A Microsoft rep has said Xenos could do "160 shader operations per cycle" or more, if you consider the 32 control flow ops, 16 texture fetches, and 16 programmable vertex fetches per clock, and consider that they can all be issued simultaneously, while on RSX they cut directly into shader operations to varying degrees. (the first alu in each of RSX's shaders, doubles as a tmu for texture calls for example) And perhaps he's right. But then, there would be other things to consider on RSX as well.

And just for straight "shaders", it'd be just the 48 alus, all vector4 MADD, + scaler special function. (scaler seems to be 1 flop, from a few different places I've read, although a Microsoft rep had calculated it as 2 in one of their flops comparisons) So, 48 x 4 x 2 = 384 for vector plus 48 more for scaler. 432 generic shader flops per cycle. If we count the scaler as 2, then it's 480. Which matches the Microsoft reps figures, when he said 240 billion per second. There are flops involved in other operations, but I think most would limit "programmable" to just those. Swapnil 404 (talk) 15:48, 5 March 2008 (UTC)Reply

Thanks for clearing up all those things Swapnil, but I would like to know one more thing, Just a straight out programmable shader flop performance (including vertex piplines on the RSX) comparison of the Xenos and RSX? If not then thats ok, but its definetly four flop MADDs for a vector4 opp right and 1 flop for a scalar opp? Gears, Gears, Gears (talk) 07:58, 6 March 2008 (UTC)Reply


Well, I'm sure it depends alot on what the load is. The ratio of vertex shaders to pixel shaders, and the number of texture fetches involved, etc.. Along with a list of other factors. On paper, it used to be thought of as RSX has more raw shader power on paper, Xenos was more "efficient". Of course, that was before RSX was clocked back to 500mhz/650mhz ram, and doesn't consider any other components involved with "shading".

From the folks who've worked directly with the hardware, and would be in a position to know first hand (and actually willing to talk about it), most have said Xenos > RSX. Especially vertex shader work (by quite a bit), but it seems perhaps pixel shaders as well in some cases. Code optimized to run really well on RSX, could be expected to run ok on Xenos in many instances, but code optimized for Xenos would overwhelm RSX in some areas. (of course, none of that assumes eventually using cell to reduce rsx work with pre-culling, etc..) Overall, for gpu's shader performance, most give the edge to Xenos to varying degrees.

And a vertex shader is vector4+scaler. Xenos' need to be capable of both pixel and vertex work. I would guess, the scaler is a single flop. But I guess it could be either, as I've heard it both ways. Swapnil 404 (talk) 01:44, 7 March 2008 (UTC)Reply

Just one last question, thanks for answering all the rest, what operations in the pipline make a pixel shader (or what you could call a general pixel shader). And can you also tell me if this is right, to the best of your knowledge:

RSX total programable flops (pixel and vertex pipes)= 24x2x(4x2)+8x(4x2)+8= 456 shader flops per cycle (or 464 if scalar= 2 flops)

Xenos total programable flops= 48x(4x2)+48= 432 shader flops per cycle (or 480 if scalar= 2 flops)

Does the RSX have 24x2 because its instructions are co-issued for the pixel piplines? Thanks for all the help anyway. Gears, Gears, Gears (talk) 09:58, 7 March 2008 (UTC)Reply

Yeah, pretty much. There are two alus tied together in each pipeline. Pixel shaders are typically vector3+scaler. (red, green, blue, alpha)
Nvidia pixel shader alu, just did 4 flops at a time. (madd capable) So, it could do a vector3+scaler, or a vector4, or scaler+scaler or a vector2+vector2. Depending on what needs processing.
A typical ati gpu, had 2 alus per pipe, but only one was madd capable, the other was just an add, and they were just standard "vector3+scaler". Meaning, that any time vector2 instructions came up, the alu could only process one at a time, and the other flops go wasted in that cycle. (but I don't think those came up very often)
The difference for ati was that they had separate logic for issuing texture fetches. So, they don't waste any alu cycles doing that, and could hide fetch latency with flops, by just processing something else until it gets what it needs. Nvidia alus were far more likely to stall waiting for a fetch result.
Xenos, has 48 vector4+scaler alus, all madd, and decoupled from texture fetches, and filtering. Swapnil 404 (talk) 21:47, 7 March 2008 (UTC)Reply
I think you may have already known this, but the xenos's scalar ALU is MADD so its 2 shader flop's so the 240 Gigaflops comment for the xenos is correct. Gears, Gears, Gears (talk) 08:48, 10 March 2008 (UTC)Reply
Just a thought, since the opps preformed are scalar MADD opps , does that mean you could use the scalar ops to perform pixel shader opps over four cycles (to make one pixel shader opp)? Gears, Gears, Gears (talk) 06:39, 11 March 2008 (UTC)Reply

That could be. I've heard it described as madd, but also as an add. (watch imrpess japan) i think it was. They implied that they reorganized an ati shader, cut the add vectors in favor of an additional madd, and kept the add scaler as a special function. Something to that effect. I would go with them being madd though, until I found otherwise elsewhere, because I can't find that article. It could have simply been speculation too.

And I would assume, that if they're madd capable, they could be scheduled to issue a random scaler that happen to come up, in parallel to the vector. But no idea how versatile they are, or if they could process something in pieces like that. My guess would be no, but I wouldn't know for sure. I know that if you had just a vector2 add instruction to be processed, the mul capability of them go unused that cycle, along with the other two vector madds. It couldn't just do two separate vector 2's in parallel. or two vector2 adds and two vector2 multiply, like the flops numbers would indicate.

8 potential flops in the alu just go unused that cycle, so I don't know how far they'd go to more efficiently use the scaler. I guess it's up to the compiler to vectorize efficiently. That's why G80 went to an all scaler gpu. Swapnil 404 (talk) 01:25, 12 March 2008 (UTC)Reply

I was assuming that since the R600's pipes are similar to the xenoses and it's pipes are vector4+scalar MADD and it can assign the extra scalar to do shader operations (over multiple clock cycles, around 4), I thought it could be possible. Thats why they assume the R600 has 320 scalar piplines, 64x5=320 (5 flop madds per cycle per pipe). But what you say also makes sense, so i'll have to get some more proof. Gears, Gears, Gears (talk) 02:44, 13 March 2008 (UTC)Reply

Well, seems R600's are all MIMD, they're not locked at vector4+scaler. They could process 1+1+1+1+1 if needed. (or any other combination) Xenos vectors are SIMD, so they're vector4 with the additional scaler at all times. Like this: http://www.behardware.com/medias/photos_news/00/20/IMG0020142.gif Xenos would be listed as "4+1" if it were listed. They're 5D, but they couldn't break it up in the same way. An R600 alu has 5 separate madd capable flops. With one also capable of other tasks when needed. Scaler in Xenos probably functions as a special function unit as well, for "sin, cos, exp, log, etc." like the R600. http://www.behardware.com/medias/photos_news/00/19/IMG0019979.jpg

"One R600 calculation unit is composed of 5 math units, one of them being able to handle special tasks, and one branch unit." Swapnil 404 (talk) 08:19, 13 March 2008 (UTC)Reply

I get it, so with the informaton we have its no for now (and probably no for sure). Gears, Gears, Gears (talk) 03:55, 15 March 2008 (UTC)Reply

Yeah, I would assume not. It'd probably have a alot of work as it is. But, it's a console, so they could code "to the metal" if they wanted, so who knows.Swapnil 404 (talk) 17:23, 18 March 2008 (UTC)Reply

I know this dosent really relate to the Xenos, but could you clear something up, many people assume that beacuse the Cell can do many (because of its clock speed) fp32 ops that its more could be about as good at doing shader operations as the Xenos and RSX or anyother graphics card, but its not designed to do the calculations that graphics cards can do, so would that negate the high clock speed (compaired to graphics cards), since it can only do 4 MADD fp32 ops per cycle per SPE? If you find this is not suppost to be hear please tell me to delete it. Gears, Gears, Gears (talk) 06:27, 1 April 2008 (UTC)Reply

Well, Cell's spes can do 2 flops, on up to 4 32 bit pieces of data, since they have 128 bit registers. And clock frequency would help, but there are only 7 spes, compared to a far greater number of shaders. As a straight gpu though, It's been said that Cell isn't nearly as useful as a typical gpu for things like pixel shaders, texturing, etc... But, I guess it would depend on what type of floating point processing you were doing, and how being more flexible with regards to ram, and branching, etc. could benefit more than just straight flops. The idea of a gpu, is to offload certain graphics tasks that are easily processed on simplified processors. Powerful, but only at what they were designed to process. Things like real-time lighting calculations haven't progressed as much as other gpu functions have. For example: http://gametomorrow.com/blog/index.php/2005/11/30/gpus-vs-cell/ http://gametomorrow.com/blog/index.php/2007/09/05/cell-vs-g80/ (technically, this is comparing a G80 running a general ray-tracer, to Cell, running its own version, which I would assume is more tuned to it)

But anyway, Cell in PS3 has other tasks to take care of. One spe is lost to redundancy, one to security/os, one taken at any time for os. A number of developers were talking about using one for sound (which seems like overkill), and a few others to pre-cull geometry before it gets passed to RSX, also, ai, collision detection, particles, animation, physics, etc.. Cell varies at its efficiency with several of those, compared to other cpus. But at certain other things, it seems extremely fast. You wouldn't get "that" kind of lighting performance, but perhaps they'll have some left over once they get more accustomed to the hardware.Swapnil 404 (talk) 20:07, 11 April 2008 (UTC)Reply

Just a thought, the Xenos could render a scene by devoting all its processing power to either vertex shading or pixel shading each clock cycle, until that part of the scene is rendered, this would leave the Xenos with a large advantage over the RSX and have the Xenos complete the same scene much faster than the RSX would, would you agree with this or not? Gears, Gears, Gears (talk) 10:22, 21 June 2008 (UTC)Reply

The Xenos does not use the 65nm process yet

edit

Evidently someone has changed it incorrectly to that specification. Although the actual change to the hardware may be coming soon enough: http://www.tgdaily.com/content/view/37376/135/

I'll just leave it as is unless someone else wants to change it for the time left that it is being produced in the 90nm process.

--J5689 (talk) 01:16, 7 July 2008 (UTC)Reply

Memory

edit

Stupid question: How much memory does it have?

24.222.205.103 (talk) 16:32, 11 September 2008 (UTC)Reply

the whole system has 512MB, this is a shared memory pool which is a bit like an on-board GPU memory on the PC but more dynamic. Markthemac (talk) 14:40, 31 May 2009 (UTC)Reply

edit

The image Image:R500gpu.jpg is used in this article under a claim of fair use, but it does not have an adequate explanation for why it meets the requirements for such images when used here. In particular, for each page the image is used on, it must have an explanation linking to that page which explains why it needs to be used on that page. Please check

  • That there is a non-free use rationale on the image's description page for the use in this article.
  • That this article is linked to from the image description page.

This is an automated notice by FairuseBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. --02:20, 2 October 2008 (UTC)Reply

Requested move

edit
The following is a closed discussion of the proposal. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the proposal was No move Parsecboy (talk) 14:42, 15 December 2008 (UTC)Reply


The above discussion is preserved as an archive of the proposal. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

Z-rate with aa enabled.

edit

I notice the "maximum" Z-rate when msaa is enabled is wrong. It's listed as 16 gigasamples per second, where it's actually 32. (64 per clock)

http://www.beyond3d.com/content/articles/4/4

"8 pixels writes per cycle, as well as having the capability to double the Z rate when there are no colour operations. However, as the ROP's have been targeted to provide 4x Multi-Sampling FSAA at no penalty this equates to a total capability of 32 colour samples or 64 Z and stencil operations per cycle."

Which would be 32 z samples per second.


Some people might see it as being kinda misleading being part of the "free" aa expansion in edram, as opposed to raw z fill-rate, but the article already cites the raw z-fill as 8 rops * 2xZ = 16 gigasamples per clock * .500mhz = 8 gigasamples per second. As long as it specifies 4xmsaa (which it does), it should be fine.

Also outlined by AMD/ATI on page 3 from here: http://www.ati.amd.com/developer/eg05-xenos-doggett-final.pdf Swapnil 404 (talk) 05:05, 15 September 2009 (UTC)Reply

edit

Hello fellow Wikipedians,

I have just modified one external link on Xenos (graphics chip). Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 16:38, 16 July 2016 (UTC)Reply

Claims of R5xx/x1900/x1800 'base' chip dubious/uncited

edit

The claim that the chip is based off of an R500 series chip is uncited and dubious. There are a number of sources which indicate that the chip is actually more similar in architecture to the R600/HD 2000 series - albeit an early version that does not yet support the full DirectX 10 API - instead supporting an extended version of DirectX 9c. Consider: it supports hardware tessellation (as does the HD2000 series). The claim that it is based off of any specific consumer GPU is in particular dubious as it is clearly a custom part with custom dedicated fixed function units (again, the tessellation unit, but also the memory architecture is also massively different). http://rastergrid.com/blog/2010/09/history-of-hardware-tessellation/

See here, which claims that the GPU is based off of the TeraScale architecture (the codename for the architecture starting with the R600/HD2000 series of GPUs): https://www.techpowerup.com/gpu-specs/ati-xenos-xenon.g424

This article also claims that the Xenos uses a unified shader architecture, which again was not present until the R600 series in mainstream desktop computers: https://www.anandtech.com/show/1719/7

The R500 series still used fixed function pixel and vertex shaders. This is a very significant departure in architectural designs and clearly puts the Xenos chip taxonomically far closer to the R6xx series than the R5xx line even though it predates it. In fact the 5-wide vector unit as a part of a unified architecture which is described in the article again is a distinctive feature of the R600/TeraScale unified shader architecture. This sounds quite similar to the 5D 'MADD' function of the R580's vertex processors as written here: https://m.hexus.net/tech/reviews/graphics/4477-ati-radeon-x1900-xt-xtx/?page=2

But the R580's fragment processors do not function like that, showing another distinction between the two parts and architectures.

The introduction of the unified shader architecture represents a very important part in graphics architecture history, having occurred in equivalent generations for both ATi and nVidia graphics processors in the R600 and G80 series of GPUs respectively as DirectX 10 was coming around on PCs and eventually enabling GPGPU programming. As such, labeling this as though it is based off of an architecture using split pixel/vertex shaders as in the R500 series is quite misleading.

— Preceding unsigned comment added by 86.181.181.243 (talkcontribs) 22:02, 29 June 2020 (UTC)Reply

According to the documents presented during the patent dispute between LG and ATI, most importantly the R400 document library folder history, the sequencer specification version 2.11, and the sq_alu emulator code, it's clear that the Xenos is based on the original R400 architecture that was way ahead of its time and never appeared in PC graphics cards. The R400 series on the PC actually had an upgraded R300, and the R500 created later was their solution for Direct3D 9.0c features on the PC. The ALU instruction set in the internal emulator, the layout of shader inputs and exports (both the graphics pipeline ones and the passthrough shader exports aka the "memexport" functionality of the Xenos) in the specification, are identical to the ones used in the Xenia Xbox 360 emulator as well as the registers of the Adreno 200 (AMD Z430) based on the Xenos, while the R300, the R500 and the R600 have a large number of major differences, even including the names of the core parts of the pipeline stages (both the R400 documents and the Adreno headers call the shader unit SQ (Sequencer), the ROP unit RB (Render Backend), for example, while in R5xx Acceleration the former is called US, the latter is ZB and CB; on the R6xx the RB is also called DB and CB). The R600 (TeraScale) is way newer than the Xenos (and the R500), supporting Direct3D 10, co-issuing 5 separate instructions for scalars rather than one 4-component vector and one scalar instruction, having an integer ALU. I've corrected the information in the article based on these findings. — Triang3l (talk) 16:49, 10 December 2021 (UTC)Reply