Search logs:

channel logs for 2004 - 2010 are archived at ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

Tuesday, 14 June 2022

02:21:00 <geist> yeah
02:21:00 <geist> also yay a pair of loud bangs in the distance and then power outage
02:22:00 <geist> time to settle in for some lack of lights
02:22:00 <gamozo> Light a candle!
02:22:00 <gamozo> Get a good mood goin
02:27:00 <geist> yep, basically shutting everything down on ups so i can turn off teh generator and save gas
02:39:00 <klange> I linked this before, but, macOS TPIDRRO_EL0 fun:
02:39:00 <bslsk05> ​ kuroko/vm.h at master · kuroko-lang/kuroko · GitHub
02:53:00 <Jari--> gamozo, geis, klange morning
02:53:00 <gamozo> morn morn
02:53:00 <Jari--> TODO: PCI scan, and reserve bits, of pages, from I/O memory space.
02:54:00 <Jari--> This is easy.
02:56:00 <Jari--> Physical memory.
03:20:00 <geist> Hmm, what are you doing with that klange?
03:20:00 <geist> Is that for user or kernel space?
03:21:00 <geist> oh I guess userspace because macOS puts something in it? What are they using the tpidrs for?
03:26:00 <klange> geist: TPIDR seems to be a core index, it varies on a given thread. TPIDRRO is just the regular thread pointer that everyone else puts in TPIDR, but the TLS model is horrible and does library calls to dyld for every reference, so this is just inlining the descriptor+offset lookup instead of doing the library call
03:26:00 <geist> Yah that’s really odd. I’d expect the core index to be in the RO one
03:26:00 <geist> Kinda like the top part of ruts park
03:27:00 <geist> Rdtscp (stupid autocorrect)
03:27:00 <geist> Actually using it as a scratch as j`ey pointed out makes a lot of sense. I never thought about that, will have to consider it for stack overflow
03:27:00 <klange> It is a mystery... that could possibly be solved by looking more at Darwin kernel code, I think all of this stuff is in their source dumps now - the dyld stuff was at least.
03:28:00 <geist> But that’s only if have no other use in user space
03:28:00 <geist> Maybe they had some security reason to not let user space mess with their thread pointer
03:28:00 <geist> So they switched it so the harmless thing is in the RW one
03:29:00 <geist> You could if you took that argument from the gs: side of things and didn’t allow fsgsbase instructions on x86
03:29:00 <klange> The weird thing is __builtin_thread_pointer on clang still gives you TPIDR - presumably because it's a GCC compatibility thing no one cares to update / make target-specific.
03:29:00 <geist> And only allow values in the TP register that’s vetted by the kernel
03:53:00 <klange> Not really getting any clearer information on TPIDR_EL0. _EL1 is used for the kernel thread pointer, they seem to be trying to zero _EL0 but something's leaking somewhere or I'm missing something somewhere else in the userspace side...
03:54:00 <klange> And TPIDRRO_EL0 actually does have the CPU number maybe in the low three bits? I missed that last time I looked at this, but I see them zeroing it here:
03:54:00 <bslsk05> ​ dyld/threadLocalHelpers.s at master · apple-opensource/dyld · GitHub
03:54:00 <klange> and setting it here:
03:54:00 <bslsk05> ​ darwin-xnu/pcb.c at 2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32 · apple/darwin-xnu · GitHub
03:57:00 <Jari--> Developer Charles W. Sandmann also hoped to eventually supply code for CWSDPMI r7 that allows CWSDPMI to map up to 64 GB memory into the address space upon request.[2][3]
03:57:00 <Jari--> Intersting, DOS protected mode extender with 64 gigs of RAM support.
04:00:00 <Jari-->
04:00:00 <bslsk05> ​ DOS ain't dead - > 3 GB of RAM (hypothetical), CWSDPMI
04:01:00 <klange> Hm, I don't do the mask, but they also seem uncertain about whether it's needed, and I just did a spot check and the low bits are always clear even when I'm definitely bouncing around between cores...
04:02:00 <Jari--> 2022, I probably should not use BIOS or HIMEM.SYS to get memory map.:) PCI bus scan rules dude.
04:03:00 <klange> I accidentally exited the shell on the serial console to my RPi, but the way I set that up means it just proceeded to launch the GUI. No more uptime checks, but I have a clock that ticks, and more going on continuously.
05:00:00 <mrvn> klange: they might have retrofitted TLS onto existing code and the register everyone else uses was already used.
05:01:00 * mrvn waits for a DOS clone that runs in 64bit with 32bit userspace.
05:05:00 <geist> oh my, putting the cpu number in the bottom bits is an incredibly short sited thing. how many bits did they reserve?
05:07:00 <geist> ah hah, already too small:
05:07:00 <bslsk05> ​ darwin-xnu/machine_machdep.h at 8f02f2a044b9bb1ad951987ef5bab20ec9486310 · apple/darwin-xnu · GitHub
05:07:00 <geist> looks like only 3 bits
05:11:00 <zid> 3 bits of cpus should be enough
05:15:00 <geist> i remember beos had some sort of similar limitation. something like there's a global constant B_MAX_CPUS set to i think 4
05:15:00 <geist> and a few syscalls that just return B_MAX_CPUS number of items
06:23:00 <doug16k> 512 CPUs ought to be enough for anyone
06:25:00 <\Test_User> "ought to be enough for anyone" and so the hinderances trying to expand later when you want more begins
06:26:00 <doug16k> if 16 cpus is a reasonable high end cpu now, then we should have 512 by 2040 or so?
06:27:00 <Jari--> doug16k: Moores law indicates calculating power will be doubled every 18 months - so how does this imply to cores? I think thats more than 512 cores by 2040 in this sense?
06:28:00 <\Test_User> matters less when it happens so much as that it happens :P
06:28:00 <doug16k> the other way around. moores law only says there will be more logic, not faster
06:29:00 <mats1> fat lot of good that does when nontrivial percentages of it are dedicated to windows
06:30:00 <Jari--> doug16k: Really Wondering how long it takes as this processor is Mobile Ready ? : Xeon Platinium, Total Cores 28. Total Threads 56. Max Turbo Frequency 3.80 GHz. Processor Base Frequency 2.50 GHz. Cache 38.5 MB L3 Cache. Max # of UPI Links 3. TDP 205 W.
06:30:00 <doug16k> that has an monstrous amount of I/O bandwidth
06:30:00 <doug16k> are you comparing that to a low power embedded cpu?
06:31:00 <doug16k> the compute of the mobile is fine, it's the I/O that is starved
06:31:00 <mats1> more bandwidth for windows
06:32:00 <mats1> amazing
06:32:00 <doug16k> I wonder what curve PCB manufacturing is following
06:33:00 <doug16k> they are not going to be able to go that much past 2000 pins or whatever
06:33:00 <doug16k> imagine routing the pcb under a 2000 pin cpu?
06:33:00 <doug16k> with over 100 amps going into it
06:33:00 <doug16k> and coming out the grounds
06:35:00 <geist> yah those boards have to be using 12-14 layer boards at least
06:35:00 <doug16k> imagine trying to minimize ground bounce with 120 amp peaks on a couple of mm of copper traces, planes desperately via'd together to get the current in
06:36:00 <geist> side note the new zen 5 AM5 socket is a bit strange if you haven't looked at it
06:37:00 <geist> it's just a single grid, no hole in the middle for any caps or whatnot
06:37:00 <doug16k> haven't seen it in detail
06:37:00 <geist> so they moved those to the top of the cpu, but outside of the heat sink, so it has these cuts around the edge
06:37:00 <clever> something ive been wondering about, is how that cpu "package" is made/assembled
06:37:00 <clever> is it just a regular fiberglass pcb, with a raw die wire-bonded or flip-chipp'd onto it?
06:37:00 <zid> Good news, it's 400 half amp supplies instead *stares in 1200 pins*
06:38:00 <Jari-->
06:38:00 <bslsk05> ​'Grand Theft Auto V PC - Ultra Settings 1080p60. Intel Xeon + GTX 970' by ModCollection (00:02:31)
06:38:00 <geist> etc
06:38:00 <bslsk05> ​ AMD Ryzen 9 7950X With 16 Zen 4 Cores Shows Up In AM5 'LGA 1718' Desktop CPU Installation Video Guide
06:38:00 <geist> a strange looking cpu
06:38:00 <zid> yea saw der bauer take a cool at it
06:39:00 <zid> It is really weird
06:39:00 <zid> s/cool/look
06:39:00 <clever>
06:39:00 <zid> I really want a zen4
06:39:00 <geist> but it guess it makes sense, the caps you usually see there (presumably bypass caps?) are typically on the bottom of the cpu in the gaps
06:39:00 <clever> photos like this, make it look like the raw die is BGA mounted to a regular pcb?
06:39:00 <clever> and then some epoxy underfill helps to bond it in place
06:41:00 <geist> unrelated: I saw Everything Everywhere all at Once yesterday, and i can't recommend this movie enough
06:41:00 <geist> it's astonishingly good
06:41:00 <zid> okay but what is it called
06:41:00 <geist> yes
06:41:00 <zid> no yes is on first
06:42:00 <geist> no thats Yes
06:42:00 <geist> Roundabout over there
06:42:00 <doug16k> I wonder what the threadripper package is like
06:43:00 <doug16k> giant lga?
06:43:00 <geist>
06:43:00 <bslsk05> ​ Socket SP3 - Wikipedia
06:43:00 <geist> dunno if they've announced the new one yet
06:43:00 <doug16k> zen4 though yeah
06:43:00 <doug16k> could be that same socket?
06:44:00 <geist> i doubt it. probably will get some new one as well. TR5 or whatnot
06:44:00 <doug16k> I guess it already is LGA so that would make sense
06:44:00 <doug16k> yeah DDR5 I guess
06:44:00 <doug16k> different fiddly impedance stuff or something
06:44:00 <geist> haven't looked into it, but DDR5 may physically require more traces
06:46:00 <geist> looks like they're going to skip a socket number and SP3 -> SP5, presumably to sync up with AM5
06:46:00 <geist> TR4 ->? TR5?
06:53:00 <doug16k> they would get it all lined up
06:53:00 <doug16k> except first digit of 4 digit desktop cpu models
06:53:00 <geist> that of course assumes they'll continue the split SPn/TRn socket thing
06:57:00 <clever> something ive thought about, who says the cpu power has to come in over the same interface as the data?
06:57:00 <clever> could you have a big honking molex power port on the top of the cpu?
06:57:00 <clever> and just cut the heatsink around it
06:57:00 <doug16k> you need that high performance VRM though
06:57:00 <zid> it has voltage domains
06:58:00 <zid> multiple phases
06:58:00 <zid> etc
06:58:00 <clever> it could be a different connector, with all of the right voltages, and properly sized pins
06:58:00 <zid> still better to have 400 half amp pins
06:58:00 <clever> rather then trying to force it thru 400 undersized pins
06:58:00 <zid> unless you want crackling and warping and stuff
06:59:00 <zid> arcing is *really* healthy for electroncis
06:59:00 <geist> now you're playing with power!
06:59:00 <clever> another idea ive had before, what if you put fiber transcievers directly on the cpu package?
06:59:00 <clever> what if you replaced all of the data with fiber ports?
07:00:00 <clever> as-in, the cpu socket has a bunch of light pipes, connecting the cpu's fiber ports, to the motherboards fiber ports, or even portially directly into a fiber off to a new pci-e type thing
07:01:00 <clever> and then just beef up the size of the remaining power only pins
07:01:00 <doug16k> how much does that motherboard cost to manufacture?
07:01:00 <clever> good question
07:02:00 <clever> part of the thinking there, is that the motherboard can either convert the optical data back into standard pci-e
07:02:00 <doug16k> the ones we have now are cheap, but manufactured with such precision, it puts the cost in
07:02:00 <clever> or the board can just have a fiber routing the optical data right to an optical pcie port
07:02:00 <geist> you could just drive the light over to the storage crystals
07:02:00 <clever> so you dont have to deal with differential traces and routing anymore, just run a strand of fiber
07:03:00 <geist> your memory becomes the healing crystals that glow in the forest when you go search when both suns are down
07:03:00 <clever> i'm not that crazy :P
07:03:00 <geist> if you install windows 12 on them they become kyber crystals
07:05:00 <doug16k> if you could figure out how to use light for all data transfer, you would save power
07:06:00 <doug16k> right now, mosfet gates are essentially capacitors that we fill up and drain with huge pulses of current that are largely losses
07:07:00 <doug16k> every time something switches, there is a pulse of loss
07:07:00 <doug16k> extremely brief though
07:08:00 <doug16k> neat thing though, holding it on or off is almost free
07:09:00 <geist> he talks a bit about ryzen 7000 delidding
07:09:00 <doug16k> it charges up to whatever exactly cancels out your signal and no current flows at all
07:09:00 <bslsk05> ​'HW News - Intel Ships 1 GPU, Delidded Ryzen 7000 CPU, Apple M1 Vulnerability' by Gamers Nexus (00:26:06)
07:09:00 <geist> so you can see what's under it
07:12:00 <clever> doug16k: but which is costing more power, a gate inside the cpu switching a net that remains within the die, or a gate driving an external bus over the motherboard and into a pcie card?
07:13:00 <clever> i feel like the longer trace will have more capacitance, that you have to (dis)charge every time it changes state
07:13:00 <doug16k> yes, you would get parasitic capacitance on the traces, from that to the ground plane and adjacent traces
07:14:00 <doug16k> if the adjacent trace has the opposite value
07:14:00 <doug16k> and with ground if not low
07:15:00 <clever> and probably also against vcc if low
07:15:00 <clever> or even other voltage rails, if high but not that high
07:15:00 <clever> any time a voltage difference exists between 2 parallel-ish bits of copper?
07:15:00 <doug16k> yeah. driving them to a different value will cause that capacitance to charge
07:16:00 <doug16k> if they were the same, then there is no different to charge it up
07:16:00 <clever> yep
07:18:00 <clever> off the top of my head, isnt a modern cpu only really connecting to 3 things? 1: power, 2: every dram module, 3: pci-e lanes out the wazoo?
07:18:00 <doug16k> a whole bunch of stuff to communicate with VRM
07:19:00 <doug16k> the cpu participates in the feedback loop very much
07:19:00 <clever> ah right, power wont just be dumb voltage rails, and talking to it over pci-e is probably too much overhead?
07:19:00 <clever> need a dedicated comms channel for that feedback loop
07:19:00 <doug16k> needs to be super low latency yeah
07:21:00 <doug16k> you are right though, it is a huge number of power pins, same number of grounds, and tons of memory interface and pcie lanes, plus a bunch of SoC stuff like USB and audio
07:21:00 <clever> and something with an integrated gpu, would also have hdmi lanes over the cpu socket
07:22:00 <clever> or displayport
07:22:00 <doug16k> I'm not sure how they do the display output in detail
07:23:00 <clever> yeah, you could just have a dumb 2d only gpu on the motherboard, and a 3d core in the cpu
07:23:00 <geist> yah i think AMD in particular has N differential lanes that can be dynamically configured for PCI or SATA or USB or whatnot
07:23:00 <clever> some laptops kinda do similar, with both a 2d and a 3d gpu, and the ability to just cut power to the 3d gpu
07:23:00 <geist> but indeed, some of the pins on AM4 (and probably AM5) are video lanes
07:24:00 <clever> ive also heard of one cpu, that had 128 differential lanes, that can be configured as either 128 pcie lanes
07:24:00 <clever> or 64 pcie lanes, and 64 cpu<->cpu interconnect lanes
07:24:00 <clever> and the 2nd cpu would give you the other 64 pcie lanes
07:24:00 <clever> so it becomes numa
07:24:00 <geist> yep
07:25:00 <clever> but what if you just replaced that same idea, with 256 fiber transcievers?
07:25:00 <clever> and you just fiber the ram modules directly into the cpu
07:25:00 <clever> and fiber in the pci-e slots
07:25:00 <doug16k> one fibre can carry lots of different signals at once
07:25:00 <clever> now the motherboard is basically just doing VRM and decoding to slower electrical things
07:26:00 <clever> yeah, so you could get away with far fewer channels, say 1 fiber per ram module, 1 fiber per pcie slot
07:26:00 <doug16k> that would probably give you some feasibility
07:27:00 <clever> and the motherboard can either convert that fiber into the old pci-e 16x, or keep it as fiber for a new socket type
07:28:00 <clever> you could potentially even have the connections in new/weird places on the cpu
07:28:00 <clever> what if the transceivers are on the top of the cpu package, in a ring around the IHS?
07:28:00 <clever> and the clamp that holds the cpu down, includes the fiber couplers
07:46:00 <doug16k> I have been wondering if transparent system memory encryption has any negative effect on ram longevity
07:46:00 <doug16k> doesn't it cause it to be a 50% chance that the adjacent bit is a different value?
07:46:00 <geist> oh how you figure?
07:47:00 <geist> sure
07:47:00 <doug16k> so you maximize the chance of leakover happening
07:47:00 <geist> or i guess any given bit will flip to the opposite state more often than perhaps before
07:47:00 <Mutabah> It would probably help with predicting the lifetime
07:47:00 <doug16k> yeah, trigger the ecc
07:49:00 <doug16k> it also causes approximately half the data lines to be 1 and half to be 0, no matter what the values
07:49:00 <doug16k> does that increase data integrity due to it being less electrically bouncy?
07:52:00 <doug16k> it probably increases power losses a bit too. it causes it to be a 50% chance that each data line changes at each clock edge
07:52:00 <doug16k> where it might have got away with runs of 1 or 0 with low loss
07:52:00 <clever> hdmi scrambling also does something related, i think for emi reasons
07:53:00 <clever> there is an 8:10 encoding on the wire, so an 8bit value (the raw color) gets converted into a 10 bit symbol
07:53:00 <clever> that offers both ecc, and also some emi reduction, the symbols are chosen to have a low edge count
07:54:00 <clever> but, during the blanking intervals, it uses a different 2:10 encoding, with 4 specially chosen 10bit symbols, that instead have a very high edge densitity
07:54:00 <clever> so the receiver can calibrate its phase offset for sampling
07:56:00 <mrvn> clever: can you trace fibre optics on a PCB?
07:57:00 <clever> mrvn: first thought that comes to mind, cnc out a channel on an interior layer, and lay some fiber in there
07:57:00 <clever> but that means an entirely new way of fabbing a multi-layer board
07:57:00 <clever> a far simpler thing, is to just route some fibers on the back side, and glue them down
07:59:00 <clever> from memory, a 4 layer pcb, is just a pair of 2layer (double-sided) pcb's, with a copperless fiberglass seperator, all 3 parts glued into a stack
07:59:00 <clever> but what if you cnc'd some slots into that spacer layer, and ran some fibers?
07:59:00 <doug16k> what about blind vias and other madness they have to deal with on motherboards
08:00:00 <clever> thats just drilling holes in the layers before you glue them together
08:00:00 <mrvn> clever: how thin can you make fibres? And what about crossing them?
08:01:00 <clever> for crossing, you could maybe come up with a kind of fiber via?
08:01:00 <mrvn> can you etch traces into the board and then fill them with something that becomes fibre optics?
08:01:00 <clever> where the fiber terminates with a 90 degree prism, and just shoots up/down
08:01:00 <mrvn> clever: can't do a 90° turn with fibre.
08:01:00 <clever> and the next pcb layer has a matching prism to catch it
08:01:00 <clever> thats why you have a prism on the end, that reflects the light
08:01:00 <mrvn> have fun placing those prisoms.
08:01:00 <clever> the prism would be bonded to a pre-cut length of fiber
08:02:00 <clever> and yeah, you would have some loss at every one of those junctions
08:02:00 <mrvn> Does the fibre expand at the same rate as the board as it heats up?
08:02:00 <clever> and then a total budget of allowed loss over the whole link
08:03:00 <mrvn> Does fibre optics even get the kind of throughput your hugely parallel memory bus has?
08:03:00 <mrvn> Or where you thinking of having 256 parallel fibre lines?
08:04:00 <clever> what kind of bandwidth would you commonly get from todays ram?
08:04:00 <mrvn> 50GBit/s?
08:04:00 <clever> > Fiber optic Ethernet can typically achieve speeds up to or greater than 100 Gbps.
08:05:00 <clever> from a random hit in google
08:05:00 <mrvn> that's with big transmitters and receivers.
08:05:00 <doug16k> you'd have to make sure the transceivers don't use more power than the copper
08:05:00 <clever> yeah, so you would need to find ones that are small enough to fit on a cpu module
08:05:00 <clever> one min
08:06:00 <mrvn> I doubt they would net you an energy save.
08:06:00 <clever>
08:06:00 <bslsk05> ​'Finally Revealing my BIG SECRET - Corning Optical Thunderbolt 3' by Linus Tech Tips (00:14:19)
08:06:00 <clever> skip back to 4:09
08:07:00 <zid> and they have to not fail at 100C
08:07:00 <doug16k> if you made an optical processor, and it was only light, then it would make sense
08:07:00 <clever> 3:48 shows the emitters used
08:08:00 <clever> and i think this specific cable, is doing pcie over thunderbolt
08:09:00 <doug16k> the reason you use optical for that application is noise immunity
08:09:00 <clever> and distance
08:09:00 <mrvn> things to do with your fibre optic cables.
08:09:00 <clever> he is crazy, and putting every computer in his new house, in a single server rack
08:10:00 <clever> including things like the xbox
08:10:00 <clever> and then fiber hdmi'ing them to every room, lol
08:10:00 <clever> so the same computer, has monitors in multiple rooms
08:10:00 <clever> and all of the heat/noise is contained in one place
08:11:00 <clever> doug16k: i'm thinking less avoid noise, and more about reducing the pin-count
08:11:00 <clever> what if you just entirely did away with the traditional cpu socket?
08:12:00 <clever> what if the cpu was a brick in a hdd bay, with a bunch of fibers coming out, and a VRM module stuck onto the side of it?
08:12:00 <zid> speed of light
08:12:00 <zid> there's a reason my RAM hits my cooler
08:12:00 <zid> it's not because they're too lazy to move it farther away
08:13:00 <clever> yeah, that could add latency
08:13:00 <doug16k> the closer it is, the easier it is to get it working too
08:14:00 <doug16k> because they are extremely pushing their luck on the data rate
08:14:00 <doug16k> imagine how high the frequencies are in the edges
08:15:00 <clever> well, you have 2 seperate things there, data rate, and latency
08:15:00 <doug16k> somewhat more than the ram frequency, into the GHz
08:15:00 <clever> nothing says you cant have both high latency and high data rates
08:15:00 <zid> I mean, dram is already high latency
08:15:00 <clever> exactly
08:16:00 <zid> making it worse doesn't sound fun
08:16:00 <clever> but i think thats more about the dram module itself, taking a few clock cycles to come up with a response
08:16:00 <clever> rather then time of flight for the messages
08:16:00 <zid> well you're talking about tranceiving it twice, that's going to add some nanos
08:16:00 <zid> plus some distance, that'll add 1 or 2 more
08:16:00 <clever> yeah
08:17:00 <zid> and it's already 'slow' at 10-20ns, I wouldn't wanna turn it into 30 or 40
08:17:00 <clever> moar cache!
08:17:00 <clever> hide that latency!
08:17:00 <zid> so we just pair it with the 768MB ryzen
08:17:00 <mrvn> what's the ARM doc that deals with memory barriers and concurrency issues called again?
08:17:00 <kazinsal> as someone whose project suffers from high latency on DMA, please, don't hide it with cache :(
08:18:00 <clever> kazinsal: you also have the problem of dma being coherent or non-coherent!
08:18:00 <zid> Your DMA is going to time travel, clever
08:18:00 <mrvn> kazinsal: but without caches we won't get side channel attacks
08:18:00 <clever> if a pcie device is doing dma, and the pcie hub is in the cpu, then the pcie can read directly from the caches
08:19:00 <clever> so the caches help the dma
08:19:00 <doug16k> how hard is it to disable all the caches in linux?
08:19:00 <doug16k> it would be funny to make every instruction serializing and see what happens
08:19:00 <clever> doug16k: i have ran the rpi with the L2 cache disabled by accident before
08:19:00 <clever> it was noticably slower
08:20:00 <mrvn> doug16k: on the RPI when I clear the screen without caches you can see it progressing line by line.
08:20:00 <doug16k> I have done it before on earlier processors where it wasn't that big of a change
08:20:00 <clever> mrvn: i assume your not doing write combining?
08:20:00 <mrvn> clever: that would leave artefacts in the famebuffer
08:20:00 <mrvn> (for a time)
08:20:00 <clever> which model of pi?
08:21:00 <mrvn> doesn't really matter. With the caches turned off they are all dead slow.
08:21:00 <mrvn> No instrcution caches either
08:21:00 <clever> the axi port size differs by model
08:21:00 <doug16k> yeah, it's always awful. modern fancy out of order ones have really long pipelines, so it's worse
08:21:00 <clever> the bcm2835 has a 32bit axi port coming out of the arm, so it can only ever mode 32bits per clock
08:21:00 <clever> 2836/pi2 and up, have a 64bit port, so it can potentially move 64 bits per clock
08:22:00 <clever> and then you have axi burst stuff, which i think is covered by write combining
08:23:00 <clever> and you then need a cache-flush thing, to prevent the framebuffer artifacting
08:24:00 <clever> mrvn: vs
08:25:00 <clever> you can also use the 2d sprite hw to perform much faster updates, that are always locked to vsync
08:25:00 <mrvn> clever: and none of that matters with the cpu running without any caches. It's dead slow.
08:25:00 <clever> yeah
08:26:00 <clever> the sprite hw would just let you hide that visually
08:26:00 <mrvn> I don't think it even schedules more than a single opcode per cycle without icaches.
08:26:00 <clever> or avoid needing to draw at all
08:29:00 <mrvn> there is no optiomization better than removing code. :)
08:29:00 <clever> yep
08:29:00 <doug16k> does it dare fetch the next instruction before this instruction completes, when the instruction memory is uncacheable? it should expect the next instruction to change out from under it if part of being uncacheable is treating it as volatile
08:30:00 <mrvn> uncached or uncachable?
08:30:00 <mrvn> I don't think the later is supported
08:30:00 <doug16k> in general not being cacheable means not prefetchable
08:31:00 <doug16k> depends on how the implementation treats it
08:32:00 <clever> on arm, you dont have to configure on a per-page basis, what can be i-cached, because its assumed that anything your executing is going to be code, and all code should be cached equally
08:32:00 <clever> but d-cache needs per-page controls, for mmio to not get cached
08:34:00 <mrvn> and if you try to execute the MMIO regsiters bad things will probably happen.
08:34:00 <clever> thats how a lot of credit-warp exploits work in older consoles
08:35:00 <clever> in one case, its using the button state register as an opcode!
08:35:00 <mrvn> doug16k: you could check the ARM specs to see if it does any pipelining with the mmu/caches disabled.
08:35:00 <clever> so the opcode it runs, is directly linked to what buttons your holding at that instant
08:35:00 <mrvn> clever: talk about frame perfect input :)
08:35:00 <doug16k> mrvn, yeah. I half expect them to recklessly prefetch regardless
08:36:00 <doug16k> which makes sense
08:36:00 <clever> mrvn: well, it only reads once at a known delay, so you can just hold the buttons before that point in time
08:55:00 <doug16k> have you seen that memory corruption glitch where you can complete SNES mario by going down a pipe that shouldn't end the game?
08:56:00 <doug16k> you have to do a sequence of moves to fill memory with particular values at particular frames
08:56:00 <clever> yeah
08:56:00 <clever> part of that is that you have a jmp into the sprite xy configs
08:57:00 <clever> so you need to set the coords of sprites correctly
08:59:00 <doug16k> if you make a game, leave all the bugs in that don't crash it. apparently everyone loves it when games glitch
09:00:00 <clever> that reminds me, one developer hooked all of the cpu exception vectors, and routed them to a "secret level selection screen"
09:00:00 <clever> because if the game crashed during review, it had to start the review all over again
09:00:00 <clever> but if it randomly opens a secret level selector, thats not a fail :P
09:00:00 <dminuoso> Redefining error behavior, nice!
09:01:00 <clever> and then decades later, people discovered that if you whack the console hard enough, you can open that menu
09:12:00 <dminuoso> Curious, how would sudden physical acceleration induce cpu exceptions?
09:12:00 <doug16k> capacitors microphoning for one, ringing
09:13:00 <clever> doug16k: in this case, it was whacked hard enough for the cartridge to come loose
09:13:00 <doug16k> physical shock also seriously disrupts crystals
09:14:00 <clever>
09:14:00 <bslsk05> ​'Why does PUNCHING Sonic 3D trigger a Secret Level Select?' by GameHut (00:03:02)
09:15:00 <doug16k> yeah connector bounce would happen
09:25:00 <doug16k> I was thinking more along the lines of the the balding guy that smashed his keyboard 3x with his fist, then hit the monitor with the keyboard
09:27:00 <doug16k> I had a friend run over and rip out Street Fighter II and cave in one side from smashing it on his knee. worked fine
09:28:00 <doug16k> the whole back of the cartridge was gone, just board
09:28:00 <doug16k> near the middle
09:30:00 <doug16k> makes me wonder what you have to do to it to break a SNES and/or one of its games
09:30:00 <kingoffrance> "have you seen that memory corruption glitch where you can " theres something like that for zelda 3 snes too IIRC...beat it in 15 minutes or something?
09:31:00 <Mutabah> Nothing can beat the shenanigans people have done with Pokemon Yellow in a virtualboy
09:32:00 <Mutabah> (Or whatever that SNES GB cartridge adapter was)
09:32:00 <kingoffrance> super game boy
09:32:00 <kingoffrance> it added color :D
09:32:00 <kingoffrance> before the color game boy existed
09:32:00 <kazinsal> an acquaintance of mine is writing a new cycle-accurate game boy family emulator and they're running into so many bizarre things that some games do
09:32:00 <doug16k> when I imagine hitting a SNES with a sledgehammer, it damages it pretty bad, but also hilariously just pings up in the air and the sledgehammer bounces off it pretty nicely
09:33:00 <kazinsal> 80s/90s game dev was way more of a wild west than a lot of people realize
09:33:00 <Mutabah> A mix of pure magic/engineering... and "it runs, ship it"
09:33:00 <bradd> some early version of final fantasy didn't re-seed the rng if you died. So you had to kill in adifferent order and hope that worked. else you'd never get past the fight
09:34:00 <Mutabah> I once did some RE/decompilation work on Gen3 Pokemon, a whole lot of places where a BL instruction was used where a B was intended
09:35:00 <doug16k> how mad would you be on a processor where the only unconditional branch always sets ra
09:36:00 <doug16k> if you want to reduce the instruction set, get rid of the silly no-ra branch, right? hehe
09:36:00 <clever> bradd:
09:36:00 <bslsk05> ​'How We Solved the Worst Minigame in Zelda's History' by Linkus7 (00:24:32)
09:37:00 <clever> bradd: basically, any loaded entity can call the rng function at any time, potentially multiple times per frame
09:37:00 <doug16k> I am playing with a toy core in verilog where every branch sets ra
09:37:00 <clever> and these crazy guys found a way to reverse the rng function, and discover its current internal seed, based on the random numbers it was outputing, and the total time the game was running
09:37:00 <doug16k> because opcode space is pretty jam packed and I don't want a branch off at some weird value
09:37:00 <clever> and then used that to predict the next rng
09:38:00 <Mutabah> Pure stats, pretty darn cool
09:38:00 <clever> Mutabah: with the complication, that the total number of rng calls can vary, based on how long youve been playing
09:39:00 <clever> and even what direction the camera was facing
09:39:00 <Mutabah> Yep.
09:39:00 <Mutabah> Working from rough memory of that video (saw it not long after it was released)
09:39:00 <clever> same
09:39:00 <doug16k> You can just record the whole sequence and know the next one trivially
09:40:00 <Mutabah> They figured out expected spread of the RNG at a given elapsed time, combined that with some observations, and used that to cut down the probabilities
09:40:00 <clever> with the added complication that your not actually getting numbers out of the rng code
09:41:00 <clever> the rng algo used, will repeat after 7 trillion outputs
09:42:00 <doug16k> in zelda
09:49:00 <doug16k> years ago on msvc toolchain, I made an array of 2^32 bools and called rand() 2^32 times and set the bit from the return value, and it returned every possible number and repeated exactly
09:49:00 <clever> neat
09:50:00 <clever> the light-house 2 protocol used in vr tracking, doesnt do that, by design
09:50:00 <clever> there are 7? channels, each with 2 seeds
09:50:00 <doug16k> it's one of these
09:50:00 <bslsk05> ​ Linear congruential generator - Wikipedia
09:50:00 <clever> so you have 14 unique prng streams, that each repeat, but dont cover every possible value
09:51:00 <clever> and its not a list of 32bit ints, but a bit sequence
09:51:00 <clever> if you have 17? bits in a row, you can determine which seed your on, and your position within the stream
09:52:00 <clever> the channel# is used to seperate each lighthouse
09:52:00 <clever> while the 2 seeds, are used for a low rate comms channel, sending either seed-A or seed-B for a single sweep, giving you 1 bit per sweep
09:53:00 <clever> so the position in the stream, gives you the angular location of the tracker, within the LH's field of view
09:53:00 <clever> which pair of seeds, tells you which LH it is
09:53:00 <clever> and the low-speed comms, gives you a serial#, firmware version, and other stuff
09:54:00 <clever> if you then have multiple receivers getting hits from the same tracker, and you know the physical shape of the controller, you can then solve for distance
09:54:00 <doug16k> use ARC4 and use it to encrypt an infinite stream of zeros. output is ridiculously random, right?
09:54:00 <clever> so if 2 sensors are say 1 degree apart, in the LH's view, and 1 inch apart physically
09:54:00 <clever> then you can use basic math to figure out the distance from the LH
09:54:00 <clever> assuming its pointing square at the LH
10:02:00 <doug16k> ah. not random. ints aren't bits
10:05:00 <doug16k> you can pick constants for a simple LC generator that give a small period
10:05:00 <doug16k> if you had to do something like that with hardly any memory
10:05:00 <mrvn> but why would you ever want a period less than one less than the data type allows?
10:07:00 <doug16k> you probably wouldn't. why does RAND_MAX suck on several platforms
10:08:00 <mrvn> concerning ARC4. you don't encrypt a stream of 0. you encrypt a steam of something that occasionally changes it's value, like the cpu temp.
10:08:00 <doug16k> dumb choices for the constants
10:09:00 <clever> has more info for what i was saying
10:09:00 <bslsk05> ​jdavidberger/lighthouse2tools - General tools for working with / figuring out the LH2 (index) technology stack (5 forks/7 stargazers/MIT)
10:09:00 <doug16k> watching a video on youtube that mentions spinning around makes me sick. I can't even touch a VR headset
10:09:00 <clever> its using a Linear feedback shift register for its rng generation
10:10:00 <clever> doug16k: but this tracking hardware can also be used for non-vr things
10:10:00 <clever> any time you want to know the position and rotation of an object in a space
10:12:00 <doug16k> yeah, lfsr is a really good generator for small memory and simple cpus
10:13:00 <doug16k> the one I mentioned earlier is for big fat processors that don't mind multiplication
10:14:00 <clever> yeah, a LFSR could be implemented entirely in an asic, with relatively few gates
10:16:00 <doug16k> yeah, almost nothing
10:16:00 <doug16k> it's more wire than gate
10:17:00 <clever> and this is only running at a measly 6mhz!
10:17:00 <doug16k> N flip flips in a chain with a couple of xors
10:17:00 <doug16k> flip flops*
10:18:00 <doug16k> picks off a couple of things to xor together to put into first flip flop input
10:19:00 <doug16k> at different bits
10:19:00 <clever> yep
10:33:00 <doug16k> I wonder how bad ARC4 is if you naively initialize the state with 00 to FF in order and just start using it to encrypt immediately, no seed or anything
10:33:00 <doug16k> probably appears encrypted to the naked eye, you think?
10:38:00 <doug16k> everyone is trying to figure out the key when the key is "do nothing"
10:39:00 <doug16k> it's the same logic that makes the password "password" seem good
10:50:00 <heat> no one expects the password password
10:50:00 <heat> security by stupidity
10:50:00 <doug16k> funniest thing - everyone is convinced somehow that you aren't allowed spaces in a password
10:51:00 <doug16k> it's one of the rarest things in password dbs
10:51:00 <heat> i personally enjoy the caesar cipher with a shift of 0
10:51:00 <heat> doug16k, depends on the service probably
10:51:00 <clever> doug16k: i never even thought to put one there!
10:51:00 <heat> some are quite picky
10:52:00 <doug16k> I read somewhere that the most unused character in password databases is <
10:52:00 <heat> ^^this is why chrome's password generation is very conservative
10:52:00 <heat> how do they know tho
10:52:00 <doug16k> breeches
10:52:00 <clever> ive heard a story before about a website that would truncate passwords that are too long
10:52:00 <heat> plaintext passwords are sus
10:52:00 <clever> and then your pw isnt valid, because its comparing the truncated to the non-truncated
10:53:00 <heat> use a salted sha256 and be done with it
10:53:00 <doug16k> people were way dumber about security back then
10:53:00 <clever> heat: i just used openid, let somebody else deal with the passwords :P
10:53:00 <heat> they still are
10:53:00 <doug16k> really really bad then
10:53:00 <GeDaMo> I've seen sites which say things like "enter the 3rd letter of your password" :/
10:53:00 <mrvn> I xor twice, for extra security.
10:54:00 <doug16k> I wonder how many security system installers face people saying "make it 1111"
10:54:00 <heat> the passcode for my apartment block is a straight fucking line
10:55:00 <clever> GeDaMo: when i was going thru account recovery with netflix, a dedicated dialog, seperate from the chat popped up, to confirm the last 4 digits of my credit card number
10:55:00 <mrvn> heat: and the keys are probably showing wear.
10:55:00 <heat> the security of the whole building is compromised because "hurr durr pins are hard"
10:55:00 <doug16k> then after a couple of years, 1 is blank and the rest of the buttons are brand new
10:55:00 <clever> GeDaMo: i suspect its designed in such a way, that the support guy, only gets a boolean, and cant steal my number
10:56:00 <GeDaMo> Yeah, showing the last four digits of a card is common
10:56:00 <doug16k> I know what you mean. people pretend they are too dumb to remember 4 digits
10:56:00 <clever> GeDaMo: but in this case, its not even showing the last 4, its asking for the last 4, to confirm who i am, but also not revealing the answer to the random support dude
10:56:00 <mrvn> clever: But if they know the 3rd letter of the password then they probably have it stored somewhere. total security fail even if the GUI only shows a bool
10:57:00 <doug16k> how many people here know the IT crowd emergency services number?
10:57:00 <clever> mrvn: yep, plaintext == fail
10:57:00 <clever> doug16k: oh god, i dont remember it, lol
10:57:00 <mrvn> I would even go one step further: password == fail
10:57:00 <clever> mrvn: that reminds me, i had designed a "saved password" feature years ago, it used ssl client certs
10:57:00 <clever> it didnt actually save the pw, but instead registered your client cert to the acct
10:58:00 <mrvn> clever: we use zeromq public/private keys for our management software at work.
10:58:00 <mrvn> and one time tokens for new users to log in the first time.
10:58:00 <mrvn> token + public key creates an account, private key is only known to the user.
10:59:00 <clever> and if they loose the private key?
10:59:00 <mrvn> then they get a new token and can register a new private key.
11:02:00 <mrvn> And tokens are even encrypted with a password. So you can email them the token and send the pass via sms for 2 factor auth to become a new user.
15:24:00 <sbalmos> was doing some random reading this morning. if you're doing a C++ kernel, are the global constructors and destructors still in .ctors & .dtors, because it looke like some say .array_init should be used moreso nowadays?
15:26:00 <dminuoso> If you're doing a C++ kernel, you better define what global constructors/destructors mean and how they are implemented yourself.
15:27:00 <heat> no that's a compiler detail
15:28:00 <heat> sbalmos, I think they get put into init_array and fini_array
15:28:00 <sbalmos> I was about to say
15:28:00 <sbalmos> heat: That's what I thought. Older stuff online I had bookmarked from ages back was using ctors & dtors. But I figured it got moved more recently (FSVO "recently")
15:29:00 <sbalmos> heat: Does clang follow that also? I thought it was slightly different than GCC
15:29:00 <heat> clang does init_array only afaik
15:29:00 <heat> gcc can /if you enable it/
15:29:00 <sbalmos> thx
15:30:00 <heat> it's a configure time switch. usually it defaults to on because it can test if the host system supports them. it defaults to no when cross-compiling
15:30:00 <heat> so use --enable-init-fini-array(? need checking)
15:30:00 <sbalmos> that looks familiar
15:30:00 <sbalmos> although I'm only using clang, so ¯\_(ツ)_/¯
16:20:00 <geist> i think maybe it'll also change based on the triple and/or arch?
16:21:00 <geist> not sure it's the absolutely proper way but you can also just merge them together and define your own symbols for start/stop:
16:21:00 <bslsk05> ​ lk/system-onesegment.ld at master · littlekernel/lk · GitHub
16:28:00 <sbalmos> geist: oh cute. yeah, hadn't even started into anything non-amd64 yet. was kind of wondering what the others looked like
16:29:00 <sbalmos> linker scripts are a whole other gray-art area that I'm crash-learning too
17:39:00 <ddevault> is anyone aware of an OS project which has attempted to implement the linux loadable module API
17:39:00 <ddevault> to run linux drivers as-is
17:39:00 <ddevault> not that I want to do this, just curious if anyone has tried
18:55:00 <heat> ddevault, hmmmmm, the BSDs do kinda that with DRM
18:56:00 <heat> the're also the NDISWrapper stuff for windows network drivers on linux/BSD
18:56:00 <heat> but the whole API? Probably not
18:56:00 <heat> it would affect your whole design substancially I assume
18:56:00 <heat> (and it's not even stable!)
18:57:00 <ddevault> not much benefit, either
18:57:00 <ddevault> you get a bunch of drivers but at that point are you even particularly different from linux
18:58:00 <heat> you have become the thing you set out to destroy :0
18:58:00 <heat> it may be more feasible when rust on linux becomes a thing
18:59:00 <heat> the API surface will undoubtedly be smaller AFAIK
21:53:00 <geist> iirc the linux kernel module stuff is a pretty unstructued system in the sense that it loads raw .o files and just resolves symbols as it sees fit
22:19:00 * heat yawns
22:24:00 <heat> is gregkh at google? I thought he worked for the linux foundation but he has a and does a bunch of android work
22:24:00 <heat> weird
22:25:00 <j`ey> contracting maybe?
22:29:00 <heat> guess so
22:29:00 <heat> looks like all he does is merge upstream -stable stuff to the android kernel
22:30:00 * heat . o 0 { Reverse upstream - make a fork with so many unmergeable patches and hire the upstream maintainers to downstream the changes to your fork }
22:30:00 <geist> POWER MOVE
22:31:00 <heat> large phallus energy
22:35:00 <heat>
22:35:00 <heat> what's with the steve jobs picture linus
22:35:00 <heat> he's not even a tech ceo wtf
22:36:00 <zid`> I'm redesigning my emulator to be.. complicated :D
22:36:00 <heat> is it in enterprise C# or Java?
22:36:00 <zid`> oh god heat
22:37:00 <zid`>
22:37:00 <bslsk05> ​ JADE/InstructionManager.cs at master · BLNJ/JADE · GitHub
22:37:00 <heat> if it's not, i don't want to hear about it
22:37:00 <zid`> someone posted this a day or two ago
22:37:00 <heat> i know
22:37:00 <zid`> it runs at 0.00002fps he let slip
22:37:00 <heat> this is beautiful
22:37:00 <heat> very poor OO programming though
22:38:00 <zid`> anyway, I was thinking of having a fast path which does really lazy emulation until the next mmio, which runs for min(lcd, timer, audio) where those are 'how many cycles until that device will generate an interrupt'
22:38:00 <zid`> then it switches into an accurate mode to deal with all that interaction, then goes back to being fast
22:38:00 <heat> what's lazy emulation?
22:39:00 <zid`> full instructions rather than t-cycles
22:39:00 <zid`> vblank loop skipping
22:39:00 <zid`> for the lcd: entire scanlines rather than individual pixels
22:39:00 <zid`> for the timer: not incrementing and checking for overflow, just calculating when overflow will be then doing timer += 93039;
23:28:00 <doug16k> qemu has an option for that in tcg, where it can just jump forward until the next event or make it really wait
23:29:00 <doug16k> icount related? I don't remember exactly
23:32:00 <doug16k> ah, yeah, icount sleep=off. makes the clock just jump forward if time would elapse in halt waiting for interrupt
23:32:00 <heat> how does that work in smp?
23:33:00 <doug16k> good question, but probably min deadline across cpus if all halted
23:36:00 <doug16k> I used it some, it seemed to work on smp, but I am not certain, I only used icount to make it deterministic for debugging and the sleep=off sped it up
23:37:00 <doug16k> when idling, it makes it seem like your kernel is pegging the cores but really, time is flying by on the clock
23:45:00 <doug16k> icount just sets a fixed amount of virtual time to elapse per instruction
23:45:00 <doug16k> another way of looking at it: executing an instruction bumps the clock forward
23:46:00 <heat> that scares me
23:46:00 <heat> might uncover some weird bugs in timer code
23:46:00 <doug16k> you can't feel it inside the guest
23:47:00 <doug16k> other than the cpu being faster or slower. it already expects that
23:48:00 <heat> i can see shit going south if the code between me getting the timestamp (deadline) and setting the timer is unrealistically slow
23:48:00 <heat> maybe my timer code is just crap idk
23:48:00 <doug16k> yeah you don't make it unreasonable
23:48:00 <doug16k> sure you could put it so it does nothing but dispatch timer irqs
23:49:00 <doug16k> it can't even execute one user instruction
23:49:00 <doug16k> don't do that, though
23:50:00 <doug16k> you set the shift. shift=N causes each instruction to take 2^N ns
23:50:00 <heat> do you remember details about the tsc deadline mode?
23:50:00 <doug16k> I don't use it because of errata
23:50:00 <heat> can't remember if it triggers if the counter == tsc or if counter <= tsc
23:51:00 <doug16k> are they crazy enough to try to make wraparound ok?
23:51:00 <doug16k> if they are sane it is <=
23:52:00 <doug16k> what year does it wrap around?
23:52:00 <heat> oh yes, indeed
23:52:00 <heat> it's sane
23:52:00 <heat> no, I don't think they make it wraparound
23:52:00 <doug16k> did you read the errata about it?
23:53:00 <doug16k> if you use it you really should
23:53:00 <heat> no
23:53:00 <heat> what errata
23:53:00 <doug16k> model specific bugs
23:53:00 <heat> how did intel screw this up
23:53:00 <heat> it's not that complex lol
23:54:00 <doug16k> if windows didn't use it, good luck
23:55:00 <doug16k> there is a microcode workaround
23:55:00 <heat> linux does though
23:55:00 <doug16k> can you degrade to the normal way if there is no deadline?
23:55:00 <doug16k> example: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x52 (or later)
23:56:00 <doug16k> linux does that too
23:56:00 <heat> theoretically
23:56:00 <doug16k> right, just saying, you might want to make it chicken out and use the normal timer if you can tell it has the defect
23:57:00 <heat> my gen's errata mentions messing with IA32_TSC_ADJUST makes IA32_TSC_DEADLINE trigger at the wrong time
23:57:00 <heat> which seems sane I guess?
23:57:00 <heat> technically a bug but that's not something I would feel comfortable about doing anyway
23:58:00 <doug16k> is that all? ok as long as you know that, you can tippy toe around IA32_TSC_ADJUST changes
23:58:00 <heat> for 7th gen seems so
23:58:00 <doug16k> what if the entire capability isn't there
23:59:00 <heat> i use the lapic
23:59:00 <doug16k> that's pretty good then
23:59:00 <doug16k> what makes tsc any better?
23:59:00 <heat> faster iirc and definitely more precise