Search logs: #osdev2 - 27 January 2023

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=23&m=1&d=27

Friday, 27 January 2023

00:00:00 <zid`> when a river erodes the turn it's going around and ends up straight again
00:00:00 <zid`> and kicks off a lake
00:00:00 <gog> ohhh yeah
00:00:00 <zid`> https://upload.wikimedia.org/wikipedia/commons/e/e3/Meander_Oxbow_development.svg
00:00:00 <gog> yes
00:00:00 <zid`> it's like a reverse atoll
00:01:00 <zid`> https://upload.wikimedia.org/wikipedia/commons/e/e6/Wake_Island_air.JPG I played a lot of bf2 here
01:40:00 <gorgonical> Question about GIC interrupt grouping: the GICD and GICR have separate registers for configuringthe group and security level that an interrupt has.
01:41:00 <gorgonical> I'm guessing that the GICD is in charge of SPI interrupt configuration and the GICR is in charge of the per-CPU interrupts like SGI, PPI?
01:41:00 <gorgonical> The main question is whether there's a "hierarchy" since although SGIs originate at the CPU interface and go to the GICR, they have to go to the GICD to make it to another CPU. So then in that case the first GICR determines the type? The second one? The GICD?
02:28:00 <geist> think of the GICR as the local apic and the GICD as an ioapic, iirc
02:29:00 <geist> one of them is indeed per cpu, the other is more of a gloal thing
02:29:00 <geist> SGIs i think Just Happen on the other core and there's not really any real overall configuration, since the range is basically reserved
02:29:00 <geist> but this is just off of memory, so i might be wrong
02:50:00 <gorgonical> hmm
02:50:00 <gorgonical> If there's no configuration then that suggests anyone can SGI a secure core right?
04:15:00 <geist> oh in a hypervisor situation that's a different story, but you're right i think if there's a separate core, then yeah i think there' dneed to be some way to mask it off
04:16:00 <geist> but i dont have the spec in front of me, there may be a mechanism to configure it locally
04:16:00 <geist> at least some sort of local interrupt mask for sure for that SGI
04:16:00 <geist> but i dont think there's necessarily a way of specifying which cores can SGI which other cores
04:16:00 <geist> *aside* from whatever virtualization extensions EL2 may implement
04:43:00 <moon-child> is it slow to send ipi, or just to receive them?
04:44:00 <Clockface> whats the most practical way to emulate a specific peice of hardware for other kernel mode code
04:44:00 <Clockface> will i have to just intercept every I/O thing from everything else
04:44:00 <Clockface> and then replicate all of it "for real"
04:44:00 <Clockface> except the stuff connecting to the fake device
12:51:00 <netbsduser`> a question about unified buffer caches: in general i know pages of these to get a different treatment from e.g. anonymous pages, because pages of a page cache get written out to their backing store regularly (i think on linux every 30s) rather than just in response to page replacement deciding that a page has to be put back to make room for another. but nonetheless they also get put back to disk in response to typical page replacement demands too
12:52:00 <netbsduser`> so consider the case of certain filesystems, which have to enact invariants, like "this journal block has to be written before that metadata block is, else all hell breaks loose." i know that there are a lot of filesystems which do in fact write journals lazily. what approach is usually taken in unified buffer caches to describe such invariants and to ensure that they are not violated by normal page replacement policy?
12:58:00 <netbsduser`> i have considered two approaches: one is to let the `struct buf`s associated with a UBC hold dependency information. this would allow the pageout daemon to continue to enact its own policy on page replacement (if it calls a page eligible for swapout, and beholds it contains bufs which have dependencies, it would then write those dependencies out first.) another is to have it handled at the filesystem level. the page descriptions (or bufs they
12:58:00 <netbsduser`> contain) would be marked to say, "fs driver will handle these ones"
13:05:00 <mrvn> you write out the dependencies then throw in a barrier/flush and only then the depending blocks.
13:06:00 <mrvn> the kernel will not reorder I/O across barriers
13:07:00 <mrvn> which is also a problem. Because when you fsync() a file the updates can be stuck behind barrier with tons of unrelated data and they can't be fast tracked because that would require crossing the barrier.
13:08:00 <mrvn> If you write your own IO system then having a dependency / order graph seems like an improvement over the simple queue strategy generally used.
13:12:00 <netbsduser`> mrvn: but who writes them by that order? would, let's say, the FS driver submit asynchronous writes to the I/O system and then it maintains the ordering information and if e.g. the pageout daemon wants to write out a page, the I/O system checks it against its queue of pending writes and orders appropriately? that might be a wiser approach than either of what i was considering
15:06:00 <mrvn> netbsduser`: each IO layer writes their queue in the order the barriers enforce
15:08:00 <mrvn> There is also no checking. The I/O layers simply perform the IO they are told to do. If you write out a page twice it gets written out twice if there is a barrier between them. Maybe even always.
16:03:00 <kaichiuchi> hi
16:13:00 <heat> hai
16:19:00 <gog> hi
16:56:00 <heat> "However, the modularity of UEFI also makes it easier for HP to innovate. HP DayStarter is a simple value-add to the system allowing users to have access to productivity information while waiting for the system to boot"
16:56:00 <heat> oh my fucking god
16:56:00 <heat> https://i.imgur.com/gR89tr6.png
16:57:00 <gog> this is not what uefi is for but it's the inevitable consequence of making pre-boot application development easier
16:58:00 <gog> good job
16:58:00 <gog> we heard you liked operating systems so we put an operating system into your firmware
16:59:00 <heat> late stage capitalism EFI
16:59:00 <kof123> late stage osdev
16:59:00 <kof123> devours its children
17:10:00 <sakasama> Thank you HP DayStarter. Without this innovative technology I may never have known that useful fact about Chuck Norris.
17:12:00 <heat> i hope you all realize this is done in SMM
17:14:00 <sakasama> I've heard of that! It's kind of like BDSM but participants need double the masochism.
17:25:00 <heat> no, that is BSD
17:39:00 <clever> heat: isnt that just a clone of a minimal linux env in the flash? or does it run along side the os??
17:40:00 <clever> oh, checking the screenshot, it looks more like an odd overlay, after the bootloader has ran??
17:40:00 <clever> but where is it getting that data from
17:40:00 <heat> clever, The benefits to the customers are the instant-on user experience with user productivity information (such as calendar, to-do list and customizable information) available for display before and while Windows is booting. The main technology behind it is for the UEFI BIOS to locate the proper JPEG images and use the System Management Mode (SMM) to update the frame buffer content until Windows is ready for system login. At OS runtime, HP
17:40:00 <heat> implements an Outlook plug-in to capture the calendar information.
17:41:00 <heat> it uses fucking SMM
17:41:00 <heat> i hope they do jpeg decoding in SMM for the big funny
17:41:00 <clever> heat: windows already has a cheat for instant on, they renamed hibernate to shutdown :P
17:41:00 <clever> so when you think youve turned it off, it just went into hibernate
17:41:00 <heat> yes, this was in 2011
17:41:00 <clever> ah
17:41:00 <heat> imagine how much better DayStarter is these days!
17:42:00 <clever> smm also explains most of my questions
17:42:00 <clever> now it can be just as anoying as the HUD on my tv, getting in the way and covering up valuable UI elements
17:42:00 <clever> until it times out
17:43:00 <heat> modern daystarter should play youtube vids in SMM :v
17:43:00 <clever> or, you know, just boot faster :P
17:44:00 <heat> hmm, good point
17:44:00 <heat> there's room for a tiktok or two
17:44:00 <clever> but i have had a similar idea in the past, with that dislay on the apple keyboard
17:44:00 <clever> where they replaced the F1-F12 row, with what is basically an ipad
17:44:00 <clever> fully self-contained computer
17:45:00 <mats2> outlook in uefi
17:45:00 <mats2> amazing innovation
17:45:00 <clever> why not allow that to run on the keyboard, with the system off?
17:45:00 <clever> give it access to email, and calendar
17:45:00 <mats2> who needs windows when you have uefi
17:48:00 <GeDaMo> https://linux.slashdot.org/story/02/06/15/1416224/a-web-browser-in-your-bios
17:48:00 <bslsk05> linux.slashdot.org: A Web Browser in Your BIOS? - Slashdot
17:53:00 <acidx> with a web browser in the bios, who needs an operating system?
18:04:00 <mrvn> heat: oh how I would lought to prevent crying when the SMM shows a popup that flash has to be updated.
18:14:00 <clever> acidx: thats basically what that linux in the bios did
18:14:00 <heat> linux in the bios is more alive than ever
18:15:00 <heat> i think google has been deploying LinuxBIOS at scale
18:15:00 <acidx> when I had Linux in the BIOS, I used it mostly as a makeshift "secure" bootloader
18:15:00 <heat> sorry, not linuxbios, linuxboot
18:16:00 <acidx> the kernel was even built without networking and whatnot
20:17:00 <gorgonical> how's everyone's fridays going?
20:18:00 <clever> its friday? lol
20:18:00 <gorgonical> unless my calendar is really wrong
20:18:00 <gorgonical> But I am in fact acutely aware of what day it is because of diet
20:19:00 <slidercrank> the day depends on the country
20:19:00 <gorgonical> yes I suppose for people like klange it is already Saturday
20:19:00 <gorgonical> And maybe Russians are far enough forward?
20:19:00 <heat> no
20:20:00 <heat> maybe in asia
20:20:00 <clever> cd
20:20:00 <heat> cd ~/clever
20:20:00 <gorgonical> yeah I'm -5 here and I don't know what russia is. They'd have to be +4
20:21:00 <gorgonical> According to a map almost nobody is just +4
20:21:00 <slidercrank> gorgonical, in part of Russia it's Saturday, in the other - still Friday
20:21:00 <gorgonical> It seems maybe the caucasus countries and oman are the only national +4
20:22:00 <gorgonical> wow this timezone map is awful. So many places completely misaligned with the longitudinal demarcation of the zone they're in
20:22:00 <heat> gmt 4 life
20:22:00 <gorgonical> gog what is the meaning of iceland being gmt
20:22:00 <gorgonical> the westfjords should even be in -2 based on position
20:25:00 <heat> if iceland shifts to -2 the brits will invade them again
20:26:00 <gorgonical> in other news my forth interpreter is getting pretty close to being "done" and I'll just have to write the rest in forth itself
20:27:00 <gorgonical> After catching a whole bunch of switched a0/a1 registers and memory alignment bugs it now actually runs whole words
20:27:00 <heat> you are disgusting
20:27:00 <gorgonical> i still don't know if riscv asm can do indirect jumps
20:27:00 <gorgonical> because I used a syntax that one manual says will do an indirect jump but it definitely did not in qemu
20:27:00 <heat> which one?
20:28:00 <gorgonical> jalr zero, (a0) should do it
20:28:00 <gorgonical> but for qemu that seems to just be equivalent to jalr zero, a0
20:28:00 <gorgonical> this one manual implied adding the memory access parens would suggest an indirect jump
20:30:00 <heat> yeah gcc doesn't seem to have anything of sorts
20:30:00 <heat> https://godbolt.org/z/cfh9vs5W9
20:30:00 <bslsk05> godbolt.org: Compiler Explorer
20:31:00 <GeDaMo> Can riscv do memory indirect or do you have to load to a register first?
20:31:00 <heat> wait, wrong example
20:31:00 <heat> GeDaMo, load afaik
20:31:00 <gorgonical> GeDaMo: load yeah
20:31:00 <gorgonical> I had to change it to ld a0, (a0); j a0
20:32:00 <heat> ld a0, 0(a0)
20:32:00 <heat> jalr a0
20:32:00 <heat> so that answers your question
20:32:00 <heat> if jalr zero, 0(a0) was ever a thing, it's syntactic sugar for ld + jalr
20:32:00 <gorgonical> must have been
20:33:00 <heat> https://godbolt.org/z/zdhrM7fTY
20:33:00 <bslsk05> godbolt.org: Compiler Explorer
20:33:00 <heat> meanwhile chad x86
20:34:00 <gorgonical> don't taunt me
20:34:00 <gorgonical> though personally it does make programming directly in asm a lot easier
20:34:00 <mjg> who is highlighting me
20:35:00 <gorgonical> I have been writing a lot of aarch64 asm and I'm furious about it usually
20:35:00 <heat> wait
20:35:00 <heat> wtf is it doing
20:35:00 <heat> why is it saving %rax
20:35:00 <gorgonical> mjg: i don't see any mentions
20:36:00 <mjg> > chad
20:36:00 <mjg> that was it
20:36:00 <gorgonical> lmao
20:36:00 <heat> func: # @func
20:36:00 <heat> callq *(%rdi)
20:36:00 <heat> popq %rcx
20:36:00 <heat> addl $20, %eax
20:36:00 <heat> pushq %rax
20:36:00 <heat> retq
20:37:00 <heat> am I going cray-cray or does this make no sense?
20:37:00 <GeDaMo> Aligning the stack?
20:37:00 <heat> for int func(int(**f)(void)) { return (*f)() + 20; }
20:37:00 <heat> ooooooooh
20:37:00 <heat> maybe so
20:37:00 <gorgonical> does the stack need alignment on x86?
20:37:00 <heat> yes
20:38:00 <mjg> yes and no
20:38:00 <heat> GeDaMo, great one! seems to be it
20:38:00 <heat> gcc just does sub and add
20:38:00 <heat> now this makes me wonder, why does clang seem to codegen crap here?
20:38:00 <GeDaMo> 16 bytes
20:38:00 <heat> push %rax makes it depend on %rax
20:38:00 <heat> cc chad
20:39:00 <mjg> again with the highlights
20:39:00 <GeDaMo> The return address pushed by the call misaligns it
20:39:00 <gorgonical> i wasn't aware that the stack wanted/needs to be 16-byte aligned
20:39:00 <heat> gorgonical, yeah, it's there on sysv at least cuz of SSE
20:39:00 <mjg> gorgonical: that's only true if you use simd
20:39:00 <gorgonical> oooh
20:40:00 <heat> I think it's still true on -mgeneral-regs-only
20:40:00 <gorgonical> because I'm used to this on arm64, hence ldrp instructions and stuff
20:40:00 <heat> mjg, but seriously mr chad doesn't that make like 0 sense
20:40:00 <mjg> dude i'm running on negative brainpower today
20:40:00 <heat> unless you did something like xor %eax, %eax; push %rax to break the dependency
20:41:00 <GeDaMo> You can directly alter the stack pointer too
20:41:00 <heat> yes, gcc does that
20:41:00 <gorgonical> then it is a good question why clang just pushes garbage
20:41:00 <mjg> lol it has tendra
20:50:00 <GeDaMo> The only reason that comes to mind is instruction size
21:24:00 <gog> gorgonical: my hypothesis is that it's to keep us more in line with business time in most of europe
21:24:00 <gog> particularly banking and securities trading
21:24:00 <gog> and that this is owing to iceland's recent history as a dubious and probably corrupt financial player
21:25:00 <heat> GeDaMo, would make little sense considering I passed -O3 and not -Os
21:25:00 <gog> and in the case of our infamous finance minister Bjarni Benediktsson, plainly corruppt
21:25:00 <GeDaMo> Pfft! You can't expect compilers to make sense :P
21:27:00 <zid`> clang pushes garbage because iceland is corrupt, got it
21:27:00 * zid` paying attention
21:28:00 <heat> Big Iceland controls the toolchains
21:30:00 <geist> re push vs add, i'm guessing it's a combination of instruction size and/or various optimizations for various microarches where sometimes pushes vs direct stack instructions are faster. if you're not specifying a -march it may be up to whatever each compiler thinks they're tuning for
21:31:00 <geist> i do remember there was a lot of back and forth on fiddling with stack pointer via anything other than push/pop being slow/fast/maybe
21:31:00 <heat> yeah but in this case you do not care about what you're pushing
21:31:00 <heat> so doing a mindless pushq %rax can stall the pipeline no?
21:31:00 <geist> right, and thus it's just there to align the stack
21:31:00 <geist> i doubt it, stack stuff is optimized out the wazoo
21:32:00 <geist> flip side is in some microarches, fiddling with SP directly may stall, because it may have to synchronize the stack engine, etc
21:32:00 <heat> you think the cpu will notice you never look at it?
21:32:00 <geist> the push probably not, the pop maybe?
21:33:00 <geist> as soeone else mentioned, arm64 has a lot of these trash push/pops to keep alignment
21:33:00 <geist> via ldp/stp and sometimes using xzr as one of the regs
21:35:00 <heat> wait, how much can the CPU optimize the stack?
21:36:00 <zid`> I bet it doesn't matter unless eax isn't "settled" by the point of the push
21:36:00 <heat> if you do e.g 1: push %rax; pop %rax; jmp 1b, is %rax ever written to the stack?
21:37:00 <heat> can it do something really smart and e.g only write if you read that memory region from another thread? or if you get interrupted?
21:38:00 <GeDaMo> https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#MSROM_.26_Stack_Engine
21:38:00 <bslsk05> en.wikichip.org: Skylake (client) - Microarchitectures - Intel - WikiChip
21:39:00 <zid`> depends how good the uop optimiztion bits are I guess
21:39:00 <zid`> I doubt that has a fuse though
21:39:00 <zid`> zen2/4 might be able to do it
21:39:00 <geist> yeah there's a ton of optimizations around the stack. it's one of the reason arm moved the SP out of the main register file as well
21:40:00 <heat> this asks for a benchmark doesn't it
21:40:00 <geist> i think it's fairly standard practice to hae a cached copy of the SP floating around fairly early in the pipeline, outside of the general register file so it can be fast forwarded between stages to remove any interdependencies between instructions
21:41:00 <geist> historially i remember this meant something like if you did a bunch of push/pops in a row and then tried to read the ESP you'd get a stall because it'd have to 'write back' the cached SP to the main register file first
21:41:00 <zid`> heat we playing dark souls instead of this?
21:42:00 <heat> no
21:42:00 <zid`> even though I was *promised* dark souls? wow
21:46:00 <heat> pushpop 3.46 ns 3.45 ns 203580329
21:46:00 <heat> mov 0.968 ns 0.966 ns 704777624
21:46:00 <zid`> try it on zen2/4
21:47:00 <heat> benchmark of 11 push %rax; pop %rbx vs mov %rax, %rbx
21:47:00 <zid`> and you definitely didn't straddle an icache line, and you put some gumpf before so the decode was nice and old etc?
21:48:00 <heat> https://gist.github.com/heatd/789a400c749cc516fcfcc7fbdf2e9c45
21:48:00 <bslsk05> gist.github.com: mov-vs-push.cpp · GitHub
21:48:00 <heat> that's all I did
21:48:00 <zid`> doesn't account for many conflating effects then
21:48:00 <zid`> I imagine it's still slower though
21:49:00 <heat> push %rax; pop %rbx is actually smaller than mov %rax, %rbx
21:49:00 <heat> lol
21:53:00 <heat> ok, more weirdness: https://gist.github.com/heatd/96d246ec8d72b1a0ebe3911eb1da655b
21:53:00 <bslsk05> gist.github.com: mov-vs-push2.cpp · GitHub
21:53:00 <heat> with src and dst constantly swapped
21:53:00 <geist> well, yeah i mean of course the mov is faster
21:53:00 <heat> pushpop 13.6 ns 13.6 ns 51406861
21:53:00 <heat> mov 1.50 ns 1.50 ns 464182520
21:53:00 <geist> that just register renames
21:53:00 <zid`> stack renaming is also a thing sometimes though
21:53:00 <heat> yes, I was wondering if an x86 core could also rename that
21:53:00 <geist> ah
21:53:00 <zid`> zen2/zen4 is your best bet, it can do m emory renaming for sure
21:54:00 <geist> yeah re: the original thing the question is 'silly push/pop vs add/sub to rsp'
21:54:00 <zid`> and it may see the push and pop as a [rsp] that it renames
21:54:00 <heat> let me bench that as well
21:54:00 <zid`> no stop it
21:54:00 <heat> also no zid no dark soul
21:54:00 <geist> but even that might be hard to bench because it would be the interlocking of other stuff going around it at the time
21:54:00 <geist> ie, add to rsp when there is a call right before/after it (which also fiddles with the stack)
21:55:00 <zid`> wyhy no dark soul, you did a promise
21:55:00 <heat> i am lie
21:55:00 <heat> we are doing science here
21:55:00 <zid`> your science is rudimentary and flawed and also boring
21:56:00 <heat> i find it fascinating
21:56:00 <heat> can't debate with the other 2 though
21:56:00 <zid`> stick to locomotives like the rest of us
22:00:00 <heat> ok this one is interesting: https://gist.github.com/heatd/ed84257036e218f770fa93ae99bd07d7
22:00:00 <bslsk05> gist.github.com: pushpopvssubadd.cpp · GitHub
22:00:00 <heat> in my kabylake
22:00:00 <heat> pushpop 13.7 ns 13.6 ns 51325676
22:00:00 <heat> pushpop2 3.47 ns 3.45 ns 201544501
22:00:00 <heat> subadd 6.54 ns 6.52 ns 107096910
22:01:00 <heat> i suspect I successfully got pipeline stalls in pushpop
22:03:00 <zid`> can I have that last binary
22:03:00 <heat> ok, discording you
22:04:00 <zid`> what is libbenchmark
22:05:00 <zid`> and why is it not an .a
22:05:00 <heat> oh shoot
22:05:00 <heat> it's google benchmark
22:05:00 <zid`> I am not a google mainframe sadly
22:05:00 <heat> let me see if I can get a static
22:06:00 <heat> nope
22:06:00 <heat> i'll give you the so
22:08:00 <zid`> https://cdn.discordapp.com/attachments/1058163870453223465/1068653746525048993/image.png
22:08:00 <zid`> so my PC is better at moving but worse at pushing
22:17:00 <heat> https://cdn.discordapp.com/attachments/1058163870453223465/1068654716810166362/image.png https://cdn.discordapp.com/attachments/1058163870453223465/1068655166917722172/image.png https://cdn.discordapp.com/attachments/1058163870453223465/1068655641834573906/image.png
22:18:00 <heat> intel pt sampling on pushpop, pushpop2, subadd
22:19:00 <heat> i don't fully understand whats going on here but it seems interesting
22:19:00 <heat> my cpu does not seem to have a stalled cycles pmc
22:20:00 <mrvn> Why do you have a 5 opcode function? Why isn't that inlined? Embrace LTO and your whole benchmark becomes artificall.
22:22:00 <mrvn> WHat's the stack alignment on aarch64? 128bit?
22:23:00 <heat> yes 16b
22:23:00 <geist> also fun thing that you can enable but virtually all systems do, there's two control bits that you can set for EL0 and EL1 that cause it to instantly throw an exception if SP is ever for any reason unaligned to 16B
22:25:00 <mrvn> Other than the double register load/store does it even matter?
22:27:00 <heat> simd
22:27:00 <heat> probably perf
22:28:00 <mrvn> heat: I throw in simd load/store with double register load/store. Anything above 8 byte.
22:29:00 <moon-child> wtf is this benchmark
22:29:00 <moon-child> like what is it even trying to measure
22:29:00 <heat> sub add vs push pop
22:30:00 <moon-child> but why?
22:30:00 <moon-child> no one does just subs and adds or just pushes and pops
22:30:00 <heat> clang appears to
22:30:00 <mrvn> moon-child: except gcc vs. clang
22:31:00 <heat> for alignment stuff
22:31:00 <moon-child> yea they do that for stack alignment
22:31:00 <moon-child> and then they go and do other stuff
22:32:00 <mrvn> moon-child: The question remains though why one compiler prefers to push an extra reg while the other adds 8 to keep the alignment.
22:33:00 <moon-child> code size. push is better. But this doesn't demonstrate that because literally all it's doing is pushing and popping
22:36:00 <heat> why does that mean push is better?
22:37:00 <mrvn> heat: he means it's smaller.
22:37:00 <heat> no, he means better
22:38:00 <heat> "push is better"
22:38:00 <mrvn> prefixed by "code size"