Search logs: #osdev2 - 22 June 2022

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=22&m=6&d=22

Wednesday, 22 June 2022

01:04:00 <geist> hmm, so FWIW since i moved the server board to a new case (and thus a new power supply and better cooling) it's been up for 3 and a half days
01:05:00 <geist> but one variable i hadn't isolated before was when it was failing and when it was starting to fail more and more often, was if i had let it cool off before running it some more
01:05:00 <geist> so maybe something was slowly overheating somewhere that if i just hit the reset button would cause it to take a few more days to get up to
01:56:00 <sbalmos> odd. what about maybe bad fan speed sensor in the PSU or case fan, causing the CPU to go into thermal protection?
02:43:00 <geist> or the PSU maybe
02:44:00 <geist> well, we'll see. still got another 4 or 5 days till i get it to the usual first failure time (7 or 8 days)
02:44:00 <geist> got a batch of noctua case fans. ❤️ noctua
02:50:00 <sbalmos> yeah, I've had two PSUs go bad over my life. and it's always that wacky behavior shite, like it starts dropping a voltage on a rail /just enough/ to make for o_O behavior like that.
02:58:00 <geist> yah and maybe it's a heating up thing, because if i test it cold it looks fine
03:10:00 <pounce> my work just gave me a desktop with a hot swappable PSU, it's bonkers
03:11:00 <pounce> also dual Xeon Gold processors, and 24 dimm slots
03:11:00 <pounce> really want to do NUMA testing on it
03:15:00 <zid`> can you requestion me a w-2125
03:55:00 <moon-child> zid`: I got a w-2123 for really cheap on craigslist
03:55:00 <moon-child> just today
03:55:00 <moon-child> then I found out sourcing compatible motherboards is a complete pain
03:56:00 <moon-child> current plan is to get a used dell workstation mobo from ebay and hope it's compatible with a standard power supply
06:04:00 <mrvn> it's 8 in the morning and still to hot to work
06:18:00 <moon-child> remote?
06:18:00 <moon-child> if so, wet shirt
07:10:00 <vdamewood> I know people who wet their pants, but not their shirts.
07:11:00 <mrvn> wet shirt only works if the humitity is low.
07:13:00 <moon-child> I live in the pacific northwest. Works fine for me
07:13:00 <vdamewood> The pacific northwet?
07:19:00 <vdamewood> Dude. It's like midnight in the PNW.
07:19:00 <vdamewood> What are you doing up so late?
07:24:00 <Mutabah> Late? Midnight?
07:25:00 <\Test_User> it's a nice 3:25 for me
07:25:00 <\Test_User> 3:25 am ofc
07:31:00 <Mutabah> (To clarify - midnight isn't late)
07:33:00 <vdamewood> Mutabah: Actually, you're right. Misnight is just a few hours before bed time.
08:21:00 <Griwes> I've almost convinced myself that I should just use llvm's libunwind for the time being and only come back to the idea of writing my own once I'm much further into the project, as a side thing instead of a blocker
08:21:00 <Griwes> This is probably much healthier too, isn't it
08:21:00 <ddevault> the correct way to unwind stacks is by using %rbp
08:23:00 <Griwes> Only in languages where the only way to handle errors is to weave error handling around every line of the proper code
08:24:00 <ddevault> exceptions are super dumb
08:24:00 * moon-child grabs popcorn
08:24:00 <ddevault> exceptions are longjmp as a good practice
08:25:00 <moon-child> longjmp is greenspunned RETURN-FROM
08:25:00 * moon-child grabs more popcorn
08:30:00 <ddevault> in other news, got message passing working reliably
08:31:00 <klange> congrats; envy is settling in as I see you deliver a multitude of projects
08:31:00 <ddevault> ty, I get a lot of help
08:36:00 <mrvn> ddevault: why would %rbp have any sensible value?
08:36:00 <moon-child> sysv abi technically mandates that you maintain a frame pointer
08:37:00 <mrvn> The advantage of exceptions is suposedly that you don't have to handle them everywhere, they will just magically propagate to where you catch them. But with RAII you have to handle them at every } anyway. So what actually is the point?
08:37:00 <mrvn> moon-child: then it's a good thing I'm not doing sysc abi, nor even C.
08:38:00 <ddevault> I don't like magic
08:38:00 <ddevault> %rbp is the frame pointer
08:38:00 <ddevault> you can define any ABI you like but I like frame pointers
08:38:00 <mrvn> Basically any language with scopes and destructors the exception can't be using longjmp making it a bit pointless.
08:39:00 <moon-child> 'what actually is the point' two things. 1, it is handled automatically. Even comparing eg the way rust does it with c++, c++ is more modular, since I can call you and you can call me back, and I can catch in the outer stack frame an exception that I threw in the inner stack frame, and you don't have to care about it
08:39:00 <mrvn> ddevault: frame pointers are too little information for stack unwinding
08:39:00 <ddevault> not really
08:39:00 <moon-child> 2, in a language with tracing gc, the implementation doesn't need to spend nearly so much time with raii
08:39:00 <ddevault> it doesn't deal with inlining, sure
08:39:00 <moon-child> s/language/implementation/
08:40:00 <ddevault> but a frame pointer plus, well, a frame, is enough to walk over satck frames
08:40:00 <ddevault> stack*
08:40:00 <mrvn> moon-child: a language with GC doesn't have scopes and destructor. They are separated.
08:40:00 <moon-child> it can. Why can't it?
08:40:00 <mrvn> ddevault: if you know the frame layout you don't need the frame pointer, the SP will do the same
08:40:00 <moon-child> the only distinction is that you don't have to use destructors to manage lifetimes of pointers to allocated memory
08:40:00 <ddevault> yeah, but only if you know the frame layout
08:41:00 <ddevault> which calls for DWARF or something like it
08:41:00 <ddevault> much more complicated
08:41:00 <mrvn> moon-child: excatly. you don't call destructors at the end of the scope so you can just longjmp
08:41:00 <moon-child> you might have destructors for other things
08:41:00 <moon-child> such as files or mutexes
08:41:00 <mrvn> ddevault: you need the frame layout to unwind the stack and call all the destructors. That's my point.
08:42:00 <ddevault> not necessarily
08:42:00 <ddevault> hare does this by calling destructors before propagating errors
08:42:00 <ddevault> well, s/destructors/defers/
08:42:00 <moon-child> you can maintain a shadow stack. But that's just dwarf with extra steps
08:43:00 <mrvn> ddevault: ==> no longjmp for the exception.
08:43:00 <ddevault> again
08:43:00 <ddevault> exceptions are bad
08:43:00 <mrvn> that's an optinion
08:43:00 <ddevault> aye
08:44:00 <moon-child> error values are anti-modular. anti-modularity is a popular meme among the unix crowd, and I am not making a value judgement, but it is important to acknowledge the consequences of such a view
08:45:00 <ddevault> not defining your constraints within the type system is reckless
08:45:00 <ddevault> it's less modular, sure, but more reliable and predictable
08:45:00 <mrvn> Why do you think exceptions are about error values. I think that's the first mistake. Why should exceptions be exceptional and errors?
08:45:00 <moon-child> not defining your constraints within the type system is a _lot_ less reckless than permitting use-after-free
08:46:00 <mrvn> Not having exception as part of a functions ignature is the second mistake imho,
08:46:00 <moon-child> also what mrvn said
08:46:00 <ddevault> different trade-offs
08:49:00 <ddevault> in any case, that would just make us both hypocrites :)
08:49:00 <mrvn> I think exceptions should be much more like std::expected.
08:51:00 <mrvn> So what if you have to check for error on every level? All the cases that shouldn't better call abort() are things you catch very quickly anyway. Make exceptions not exceptional.
08:51:00 <mrvn> most of it you can hide behind syntactic suggar.
08:51:00 <ddevault> agreed
08:51:00 <ddevault> errors are just errors
08:52:00 <mrvn> Something like not_found isn't even a real error. That can be very much be the expected result.
08:53:00 <mrvn> if (s.find(foo) == s.end()) who wants to write that instead of try s.find(foo) except not_found ?
08:53:00 <ddevault> if (s.find(foo) is void), rather, but: me
08:54:00 <mrvn> or in ocaml you have this nice syntax with pattern matching: match Map.find(map, key) with IntLit i -> ... | StringLit s -> ... | NotFound -> ....
08:54:00 <mrvn> exceptions are just another case in the pattern matching.
08:57:00 <mrvn> Note to self: port std::expected to my kernel stl.
09:09:00 <Griwes> "with raii exceptions are handled at every brace" is a nonsense take. It's just a mechanism that allows you to forget that any form of early return exists for the purposes of cleanup
09:12:00 <Griwes> Anyway, I'm not planning to have a pointless conversation trying to convinced people who have their opinions set and aren't interested in ever being convinced, so I'm afraid any popcorn grabbed for this will be wasted
09:14:00 <mrvn> Griwes: it's not so much RAII but the destruction at end of scope
09:18:00 <mrvn> Griwes: For me the problem is the requirement in languages like C++ that exceptions must have 0 cost unless you throw one. That makes throwing them usualy very expensive and makes exceptions unsuitable for everything but the exceptional, usualy stuff that aborts.
09:18:00 <ddevault> getting two OS developers to agree on anything is an exercise in frustration
09:18:00 <ddevault> building an OS is the ultimate exercise in NIH
09:19:00 <Griwes> ...exceptions are meant to be for the exceptional stuff? Almost like it's in the name!
09:19:00 <Griwes> Hard disagree on the claim that it's then "usually stuff that aborts" though
09:19:00 <mrvn> So we need a different name. What would you call something to do an early return that isn't exceptional?
09:20:00 <mrvn> Griwes: ever written an app that catched bad_alloc and keeps going?
09:20:00 <Griwes> Me? No. But I know people who have, with great success
09:21:00 <mrvn> Not that bad_alloc even gets thrown in most cases with overcommit.
09:23:00 <mrvn> Griwes: so what exceptions do you regularly catch and handle without having your program termiate eventually due to it?
09:29:00 <Griwes> depends on the domain, though once again, "regularly" is a funny word to use for something explicitly exceptional
09:29:00 <Griwes> the biggest boon of them is when you're writing a library and you aren't the one handling the error, they become transparent to anyone but whoever decides to catch them
09:30:00 <mrvn> Griwes: but for that case the fact that exceptions are not part of a functions signature makes them rather bad.
09:30:00 <Griwes> hard disagree
09:30:00 <Griwes> it allows middleware libraries to ignore error handling entirely and get it transparently handled a layer above
09:30:00 <Griwes> it's what makes them rather good
09:31:00 <Griwes> anyway, that's as far into this discussion as I'll allow myself to be dragged in
09:33:00 <mrvn> it makes it impossible for the compiler to see if exceptions are handled or not. If the exceptions thrown by a function change. Maybe the middleware library should handle a new exceptions but it just silently propagates and terminates the program and you won't find out for years because it's exceptional and doesn't happen till then.
09:34:00 <mrvn> I agree that it should be possible to pass exceptions along transparently. Like say "int foo() [throws everything LibBla::blub() throws]"
09:35:00 <kingoffrance> "anti-modularity is a popular meme among the unix crowd, and I am not making a value judgement, but it is important to acknowledge the consequences of such a view" some philosophies, everything contains the seeds of its own destruction. for that, the consequences are surely that it eventually leads to modularity :D
09:35:00 <mrvn> but it should also be possible to say "int foo [throws A | B | C]" and give an error if it can throw anything else.
09:36:00 <mrvn> what is anti-modularity?
09:36:00 <Griwes> "throws a b c" has been tried and it sucked major donkey balls
09:37:00 <Griwes> it's one of the really major pain points of java
09:37:00 <Griwes> it's also bad because it applies function coloring
09:37:00 <kingoffrance> sorry, was quoting "error values are anti-modular"
09:37:00 <mrvn> Griwes: not really. They only tried: "turn everything but a b c into terminate()"
09:37:00 <Griwes> though not as absurdly bad as making everything return std::expected, which is function coloring cubed
09:37:00 <Griwes> mrvn, yeah, and other languages tried the other option which happens to be even worse
09:38:00 <Griwes> anyway "terminate called because of an uncaught exception" is a fine thing to happen vOv
09:39:00 <mrvn> Griwes: that isn't what "throws a b c" does.
09:39:00 <mrvn> or rather it's missing the "fail to compile if there is a new exception d"
09:40:00 <mrvn> terminate is about the worst thing to happen for a lib
09:42:00 <mrvn> 'The “color” of a function is a metaphor for segmenting functions into two camps: async and normal functions.' How does that apply to "throws a b c"?
09:43:00 <Griwes> anyway "throws a b c" is rightly dead and shall never be alive in C++ again and that's good
09:43:00 <mrvn> Griwes: c++ throws was horrible
09:43:00 <Griwes> mrvn, the color of a function is whether you can just simply call it and get a result or whether you need to handle it in its own special way. with checked exceptions, you *have* to handle everything it may throw, which means it has a color
09:44:00 <mrvn> Griwes: you can have to handle it or throw it.
09:44:00 <Griwes> throwing it means handling it
09:44:00 <mrvn> but it in no way limits you what color of function you can call
09:44:00 <Griwes> there must exist code that handles the color
09:45:00 <mrvn> The point, generally, of exception is that there is no code to handle the exception, only the exception handler does that.
09:45:00 <Griwes> whether it's a catch {} or an annotation propagating the thrown exception info, that's handling a color
09:45:00 <mrvn> it's purely a compile time thing, no code generated.
09:45:00 <Griwes> you're missing the point I'm making
09:46:00 <mrvn> I'm not sure what point you want to make
09:46:00 <Griwes> `void foo() throw (whatever bar() throws) { bar(); }` <- `throw (whatever bar() throws)` is a piece of code that handles the color
09:46:00 <mrvn> it's a bit of source, no runtime component
09:47:00 <Griwes> yes
09:47:00 <mrvn> ok
09:47:00 <Griwes> colors are about programming overhead, not runtime overhead
09:47:00 <mrvn> I see that as no different as: int foo(float); that's a color too
09:47:00 <mrvn> should we go back to implicit prototypes?
09:47:00 <Griwes> sigh
09:48:00 <Griwes> I have no interest in continuing a discussion that's not being made in good faith
09:48:00 <Griwes> bye
09:49:00 <mrvn> Griwes: do you agree that what a function throws is part of it's contract?
09:51:00 <mrvn> Because in my mind I'm just asking for the functions contract to be machine parsable.
09:53:00 * mrvn like google images for "function coloring"
09:55:00 <clever> mrvn: thats something i liked about java, where you had to formally declare what you can throw, and also what your not catching that can be thrown from further down your call graph
09:55:00 <clever> it made it trivial to know what exceptions you can expect, and need to either choose to handle or let pass on
09:56:00 <mrvn> clever: did java fail if you didn't declare something or convert it into uncaught_exception or termiate?
09:56:00 <clever> i think it was a compile time error only
09:56:00 <clever> and some build systems didnt enable it
09:59:00 <mrvn> that's what I want. I can see the complained about it being to noisy. Do I really want to specify bad_alloc for every function that uses the heap? How many functions will your C++ code have that do not throw bad_alloc?
09:59:00 <mrvn> s/complained/complaint/.
10:00:00 <clever> i think there was a whitelist of exceptions it didnt care about, like divide by zero
10:00:00 <mrvn> That's not really one you "throw"
10:00:00 <clever> exactly
10:00:00 <clever> but it can still be caught
10:01:00 <clever> its more of the runtime throwing it for you, when you do something bad
10:01:00 <clever> or the runtime not even checking, and converting the signal into an exception
10:02:00 <mrvn> I can see that as an option: Some exceptions are global like bad_alloc and div_by_zero. I could also see class define a list of exception that would then apply to all methods.
10:02:00 <clever> there is also the question of should malloc ever return 0?
10:02:00 <clever> maybe the process should just die instead?
10:03:00 <clever> depends on the use-case
10:03:00 <mrvn> clever: you can handle it if you have the need for it. So yes, it should.
10:03:00 <clever> for large allocations, i can see that being valid
10:04:00 <clever> but for tiny allocations, just printing an error with some frameworks needs heap space
10:04:00 <mrvn> kind of should be an attribute in ELF so the kernel disables overcommit to binaries that handle malloc returning 0
10:04:00 <clever> and if a tiny allocation fails, more are going to fail soon
10:05:00 <mrvn> hehe, how do you allocate the bad_alloc exception when new fails? That needs new and that fails again.
10:05:00 <clever> for a more embedded case (kernel or mcu), your more likely to avoid touching the heap whenever possible
10:05:00 <clever> exactly
10:05:00 <mrvn> bad_alloc kind of needs to be pre-allocated somewhere so you can throw an existing address.
10:07:00 <Griwes> <mrvn> hehe, how do you allocate the bad_alloc exception when new fails? That needs new and that fails again.
10:07:00 <Griwes> C++ abis are very specific about this
10:08:00 <Griwes> https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#imp-allocate
10:08:00 <bslsk05> itanium-cxx-abi.github.io: C++ ABI for Itanium: Exception Handling
10:10:00 <mrvn> Griwes: another of those things imposed by the "exceptions are exception" design.
10:11:00 <clever> stack unwinding and frame pointers are another tricky thing
10:12:00 <clever> my rough understanding of framepointers on x86/arm, is that the frame pointer forms a linked list
10:12:00 <clever> where each frame pointer, points to the previous framepointer on the stack, which is at the "middle" of a stack frame
10:12:00 <clever> positive offsets point to arguments (beyond that fit in the first few regs), negative offsets point to local vars
10:13:00 <clever> and a fixed offset from there, is the return addr, varying by platform
10:13:00 <mrvn> clever: the frame pointer is the top of each frame while the SP is the bottom of the frame. And there is a defined way to get the previous frame pointer given a frame pointer.
10:14:00 <clever> for x86, it would be a positive offset, because the call opcode pushes the return addr right below the args, and the prologue then saves the framepointer, and creates locals
10:14:00 <mrvn> On m68k frame pointers use the "link/unlink" opcodes so they are handled in hardware.
10:14:00 <HeTo> my rough understanding of frame pointers on x86 is that usually they don't exist. or at least you can't find the head of the list in a register. and I think it's the same for ARM too, actually (usually you can't get usable backtraces on ARM without debug symbols)
10:15:00 <mrvn> HeTo: the compiler can optimize them away and then perf doesn't work right.
10:15:00 <clever> x86 for example, the stack can look like an array of: [ local1, local2, framepointer, returnaddr, arg1, arg2, arg3 ] (exact numbers will be wrong)
10:15:00 <mrvn> There are options to force and eliminate all frame pointers in gcc/clang.
10:15:00 <clever> because the caller first pushes all args to the stack, then runs the "call" opcode, that pushes the return addr
10:16:00 <clever> and the first thing the prologue does, is push the old framepointer onto the stack, and copy sp->fp, to create a new stack frame
10:16:00 <clever> and then sp -= $locals_size to make toom for local1/local2
10:16:00 <HeTo> mrvn: perf can alternatively use dwarf for the backtrace (not sure if it consults the symbols at runtime. I think it just saves a bunch of the stack at runtime, and maybe leaves interpreting that for the analysis?)
10:16:00 <mrvn> clever: yes. the frame pointer is always pushed at a fixed offset and the frame pointer register is then updated.
10:16:00 <GeDaMo> Frame pointers were necessary on 8086 because you couldn't do sp-relative addressing
10:17:00 <clever> mrvn: for arm, the return address is slightly more complicated, because of the lr register, and it being on the stack is optional
10:17:00 <clever> *looks*
10:17:00 <mrvn> clever: what stack? ARM (hardware) doesn't have a stack.
10:17:00 <clever> yeah, the hardware doesnt enforce one, but gcc still has one
10:17:00 <mrvn> doesn't ruse have no stack?
10:18:00 <mrvn> rust
10:18:00 <mrvn> or was that go?
10:18:00 <clever> haskell's stack is a linked list on the heap, rather then the traditional stack
10:18:00 <HeTo> clever: I think the return address will be on the stack if you have a stack frame. leaf functions that don't use much stack might not have their own stack frame though
10:19:00 <clever> HeTo: i'm checking some disassembly to confirm things there
10:19:00 <mrvn> HeTo: on ARM the return address will only be on the stack if the return register gets clobbered, i.e. if you call other functions.
10:19:00 <clever> yeah, leaf vs non-leaf functions
10:19:00 <clever> but, is the return addr at a positive or negative offset from the framepointer?
10:20:00 <mrvn> and leaf functions might have an implicit stackframe with the red zone.
10:20:00 <mrvn> clever: hardware dependent
10:20:00 <clever> it feels more abi dependent to me?
10:20:00 <clever> whatever rule gcc set
10:20:00 <mrvn> clever: positive on x86 because CALL puts it on the stack before the function prolog saves the ebp
10:21:00 <clever> yep
10:21:00 <clever> but on arm, the prologue is responsible, and can do whatever it wants
10:21:00 <mrvn> theoretically the compiler you save the address of the return address or any other offset into the minimal stackframe. but normaly you would just "push ebp"
10:22:00 <mrvn> s/compiler you/compiler could/
10:22:00 <clever> ok, r14 == linkreg, r15==pc
10:22:00 <mrvn> and then ebp = sp; sp -= size
10:22:00 <clever> 80000f1c: e92d4010 push {r4, r14}
10:22:00 <clever> 80000f38: e8bd8010 pop {r4, r15}
10:22:00 <clever> this is a non-leaf function, its saving r4+lr, but then restoring into r4+pc
10:23:00 <mrvn> clever: that's a nice way to pop and ret in one go
10:23:00 <clever> yep
10:23:00 <clever> but i think this was built without frame pointers
10:23:00 <clever> so my answer is missing
10:23:00 <HeTo> also really confusing reading disassembly if you aren't used to it
10:23:00 <mrvn> r4 is the 4th argument register, so no frame pointer
10:23:00 <clever> its also not clear, if that pushes r4 then r14, or r14 then r4
10:24:00 <HeTo> when you're looking for some form of branch or return instruction, you don't expect "pop" to be one if you're not familiar with ARM
10:24:00 <clever> HeTo: its clearer when it says pop {r4,pc}
10:24:00 <clever> but objdump can decode r15 as either r15 or pc, and this disassembly went for the confusing option
10:25:00 <mrvn> clever: for PC that's true. for some other registers the number is clearer for code that doesn't use the register in a conventional way
10:25:00 <clever> mrvn: but there is a 3rd arch, where framepointers and stockholme will drive you mad!
10:26:00 <clever> on VPU, register+immediate-offset doesnt pack negative offsets well
10:26:00 <mrvn> Does aarch still have a pop {pc}?
10:26:00 <mrvn> aarch64
10:26:00 <clever> so framepointer + -123 would be expensive
10:26:00 <clever> and the author of the gcc port, decided to violate the framepointer rules some
10:26:00 <clever> and now the framepointer is total nonsense
10:26:00 <mrvn> So I guess you don't have a red-zone on the VPU?
10:26:00 <clever> ive not seen any sign of a redzone
10:27:00 <mrvn> why does it have a frame pointer at all?
10:27:00 <clever> probably just because gcc generates one by default
10:28:00 <mrvn> so make the no-framepointer option defauilt for VPU
10:28:00 <clever> let me find an example...
10:28:00 <mrvn> no need
10:29:00 <clever> ah, seems framepointer is already off
10:29:00 <clever> 80002f32: a1 03 stm r6-r7,lr,(--sp)
10:29:00 <clever> 80002f34: 59 c0 7c cf add sp,sp,-4
10:29:00 <clever> 8000309a: 59 c0 44 cf add sp,sp,4
10:29:00 <clever> 8000309e: 21 03 ldm r6-r7,pc,(sp++)
10:29:00 <mrvn> optimizer fail
10:29:00 <clever> a non-leaf function, it pushes r6/r7/lr, and decrements sp by 4 for locals, then undoes it all at the end, restoring lr into pc
10:30:00 <clever> mrvn: where is the fail? i'm not seeing one immediately
10:30:00 <mrvn> clever: why doesn't it push an extra register?
10:31:00 <clever> ah, as-in, "save" r8, just to get the sp another 32bits lower?
10:31:00 <mrvn> yep.
10:31:00 <clever> that could work, for small stack frames
10:31:00 <clever> but there is a range limit on store-many
10:31:00 <clever> 800002a0: a9 02 stm r6-r15,(--sp)
10:31:00 <clever> 800002a2: c7 02 stm r16-r23,(--sp)
10:32:00 <mrvn> sure, there is a limit for it and at some point writing to the stack costs more time than the extra opcode to add to sp.
10:32:00 <clever> yeah
10:32:00 <clever> which reminds me, this cpu is also dual-issue
10:32:00 <mrvn> but for 4 byte the extra opcode and writing a register should even out
10:32:00 <clever> not sure about this case of modifying sp back2back, but certain combinations of opcodes can run in the same clock cycle
10:33:00 <mrvn> the stm an add have a register dependency
10:33:00 <clever> yeah
10:33:00 <clever> that complicates things, and it would have to get really clever to merge them
10:34:00 <mrvn> which is why I think storing an extra register would be better. Makes SP available for other use earlier.
10:34:00 <mrvn> and the opcode after the "add" might not use SP at all
10:34:00 <mrvn> e.g. "xor r0, r0, r0"
10:35:00 <clever> there are 4 opcodes after the add, leading to a branch+link
10:35:00 <clever> 80002f38: 00 e8 04 30 00 7e mov r0,0x7e003004
10:35:00 <clever> and the very first one, is a rather fat load 32bit immediate
10:35:00 <clever> 48bit opcodes, something arm just cant do
10:35:00 <clever> at the cost of decoder complexity, of course
10:36:00 <mrvn> nothing compared to m68k. Or x86 with its 15 byte opcode limit.
10:36:00 <clever> yeah, vpu maxes out at 80bits (10 byte) for its vector opcodes
10:37:00 <mrvn> I miss the "(--An)" from m68k. auto increment/decrement is such a usefull thing when working with arrays or strings.
10:37:00 <clever> the syntax of vpu asm implies it can do the same
10:37:00 <clever> but i think ive tried using it, and it actually cant
10:37:00 <mrvn> on x86 I mean
10:38:00 <clever> it only works on the stack pointer, and only in one direction
10:38:00 <clever> so store can only decrement sp, and load can only increment sp
10:38:00 <clever> its just being verbose about what its doing
10:39:00 <clever> ghidra has an abnormally good opcode decoder, where it clearly explains every bitfield in the opcode
10:40:00 <mrvn> should have just said: push/pop
10:40:00 <mrvn> I like the way on ARM how you can increment/decrement and toggle write-back of the result.
10:41:00 <clever> with that, i can see that `stm r6-r7, lr, (--sp)` has 3 operands encoded into it, r6 is a 2bit value of 1, r7 is a 5bit value of 1, and lr is a 12bit value of 101100000011
10:41:00 <clever> and then `stm r6-r10, lr, (--sp)` has a 100 (4 decimal) in the r10 slot
10:42:00 <mrvn> it's not a bitset of the registers?
10:42:00 <clever> so, r7=1, r8=2, r9=3, r10=4
10:42:00 <clever> i think its 2 ints, for a start and end register
10:42:00 <clever> hense the r6-r10 syntax
10:42:00 <mrvn> you don't always have just one range.
10:43:00 <clever> in that case, you use multiple stm's
10:43:00 <mrvn> I believe on ARM the stm just has a bitset.
10:43:00 <mrvn> r0-r4 is just syntactic suggar for r0, r1, r2, r3, r4
10:44:00 <clever> in this case, there are 32 registers, r0 thru r31, some of them having special names like sp/pc/lr, just like arm
10:44:00 <clever> so you would need 32bits just to allow specifying every reg
10:44:00 <mrvn> indeed
10:44:00 <mrvn> or 10 bits for start/end of a range.
10:45:00 <clever> vpu complicates things, by only allowing 2bits for the start, and i think its more of an enum
10:46:00 <mrvn> There are probably some register you stm far more often than others. Maybe it's logarithmical too: start at 0, 1, 2, 4
10:46:00 <clever> the first field is a 2bit int, where 1==r6, 0==r0, not finding other examples yet
10:46:00 <clever> i believe that is why the abi says that r6 and up are the preserved regs
10:46:00 <clever> and r0-r5 are clobbered
10:47:00 <clever> because `stm r6-r??, lr` is cheaper to encode
10:47:00 <mrvn> or just a "looks random" lookup table like 0==r0, 1==r6, 2==r9, 3==lr
10:47:00 <clever> yeah
10:47:00 <clever> the designer picked some random values, to suit an ABI
10:48:00 <mrvn> Clear sign of the CPU designers having some calling convention in mind and the STM is ment to save the clobber registers.
10:48:00 <clever> exactly
10:49:00 <mrvn> 0 == big function saving everything, 1 == normal function just saving clobbers, 2 == small function, 3 == leave function
10:49:00 <clever> searching thru an example binary, i can see 3 forms of stm
10:49:00 <clever> a: just 1 register, is not many!
10:49:00 <clever> b: r0-r??, or r6-r??, storing just a range
10:49:00 <clever> c: r0-r??,lr or r6-r??,lr storing a range plus lr
10:50:00 <mrvn> ahh, start == 3 might mean just the end register, no range.
10:50:00 <clever> oh, and a 4th form
10:50:00 <clever> stm lr, (--lr)
10:51:00 <clever> again, its not many, but the range has been omitted, its now just lr!
10:51:00 <clever> oh, theres an odd decoding, but this looks like garbage binary data
10:51:00 <clever> stm gp-r12, (--sp)
10:52:00 <clever> where gp is an alias of r24
10:52:00 <clever> that makes no sense at all, the range is backwards
10:53:00 <clever> which makes sense! :P, this doesnt look like vpu asm, its some other form of binary data
10:54:00 <mrvn> Could that be r24-r31,r0-r12?
10:55:00 <clever> let me throw together some asm to brute-force it
10:55:00 <mrvn> have fun
10:56:00 <clever> mrvn: https://gist.github.com/cleverca22/9f0e424c01709c103836ed99d260a788 left-over code from my last brute-forcing session
10:56:00 <bslsk05> gist.github.com: gist:9f0e424c01709c103836ed99d260a788 · GitHub
10:57:00 <clever> 0x1234 encodes as a 16bit immediate tacked onto a 16bit opcode
10:57:00 <clever> but 0x12345 encodes as a 32bit immediate on a 16bit opcode, now waisting 1 byte
10:57:00 <clever> while 0x1c and below are more complicated, sharing the 16bits between both opcode and immediate
10:58:00 <clever> and you can see how the encoding varies wildly, depending on both the destination register and the immediate size
11:03:00 <clever> mrvn: went thru the entire range, for the single form (stm r?, (--sp)), it only supports 5 registers, r0, r6, r16, gp, and lr
11:03:00 <clever> ghidra claimed the first operand (for the range form) was 2 bits, 0-3, which would explain r0/r6/r16/gp, and lr is a special case ive seen elsewhere
11:05:00 <clever> and looking at the bytes i can confirm that, r0/r6/r16/gp have a 00, 01, 10, and 11 pattern in one of the bytes, and are otherwise identical
11:06:00 <clever> while the lr variant, is vastly different
11:16:00 <clever> mrvn: oh wow, at least at the binutils layer, r0-r1 all the way thru to r0-r31 encodes into something!
11:17:00 <clever> explains the 5bit int i saw in ghidra, 0-31
11:21:00 <clever> and a value of 0 for the end reg, is used for just r0, without a range
11:21:00 <clever> so its encoded more as r0-r0
13:11:00 <mrvn> what about the other bits?
13:19:00 <clever> mrvn: https://gist.github.com/cleverca22/9f0e424c01709c103836ed99d260a788 updated
13:19:00 <bslsk05> gist.github.com: gist:9f0e424c01709c103836ed99d260a788 · GitHub
13:20:00 <clever> if i tell it to save r6-r6, it assembles fine, but then disassembles as just r6, no range
13:21:00 <clever> and keep in mind, not all 16bits of this can be used by this one opcode
13:21:00 <clever> there are other 16bit opcodes, and bigger opcodes that need the first 16bits to not look like a 16bit opcode
13:21:00 <clever> so some of those bits are just going to be constants
13:51:00 <mrvn> When I use std::for_each(std::execution::par, std::begin(a), std::end(a), [&](auto x) { ... }); then what is creating threads? Or choosing how many threads?
13:52:00 <sbalmos> random uneducated guess is some automatic tie-in to the pthreads lib?
13:53:00 <mrvn> sbalmos: std::jthread uses pthread under the hood, yes.
13:53:00 <mrvn> the question is what creates the std::(j)thread objects
13:55:00 <ddevault> I said I got message passing working reliably
13:55:00 <ddevault> then I expanded the test suite
13:55:00 <mrvn> And how would I do the same for my BigNum add/sub/mul/div/sqr/sqrt?
13:55:00 <mrvn> ddevault: that is usualy how it goes: If it passes all tests then you didn't test enough.
13:56:00 <ddevault> to be fair, I knew my original statement was a qualified one
13:57:00 <kingoffrance> if it compiles on first attempt, be scared, be very scared
13:59:00 <sbalmos> void kmain() { start_reactor(); }
13:59:00 <sbalmos> whoops, missed the comment before start_reactor(). // Quaid
15:28:00 <mjg_> hrmpf
15:28:00 <mjg_> does gcc provide a way to tag a struct or a pointer as misaligned?
15:29:00 <mjg_> oh, there is an attribute aligned
15:31:00 <mrvn> mjg_: [[gnu::packed]], aligned can only increase alignment
15:31:00 <mrvn> you can pack and align but I don't think there is a way to mis-align but not pack.
15:32:00 <mrvn> on the other hand packed isn't recursive so you can doubble bag a struct
15:32:00 <mjg_> well let's see what's going to happen
15:33:00 <mrvn> pointers can't be mis-aligned at all, which I consider a bug in the packed extention
15:33:00 <mrvn> Tip: never ever pack something you access more than once. It's faster to copy it to a not-packed / aligned struct and work with that.
15:35:00 <mjg_> how about pre-existing big codebase which sometimes traps on 32 bit arm
15:35:00 <mjg_> where playing whack-a-mole is a non-starter
15:36:00 <mjg_> i already tried memcpy, doe snot help me as the target gets modified later
15:36:00 <mjg_> so i would have to memcpy the change back
15:36:00 <mjg_> and make sure i caught all the cases
15:39:00 <mjg_> aand aligned(1) did not help, bummer but was worth giving it a shot
16:23:00 <mrvn> mjg_: as said aligned can only increase. iirc it's actually UB to try to lower it.
16:23:00 <mrvn> but packed should fix stuff, it's just horribly slow
16:24:00 <mrvn> can't you find out what traps? Did you align the stack right for double register loads?
16:25:00 <mjg_> the specific trap is now fixed, but i expect new cases to pop up here and there
16:25:00 <mjg_> well let me restate, clang 10 hapened to generate code which did not trap (one byte loads for the struct)
16:26:00 <mjg_> but it would some times trap if the code changed
16:26:00 <mjg_> clang 11 and later always traps
16:26:00 <mjg_> in the specific place, which is now fixed
16:26:00 <mjg_> but i expect there will be more cases down the road and not easily fixable
16:26:00 <mjg_> having a simple hammer for them would be nice
16:31:00 <ddevault> aha, I think I know the problem
16:32:00 <ddevault> I bet both threads have the same IPC buffer because I was lazy
17:00:00 <mrvn> gcc and clang have different opinions about unaligned loads. iirc clang does it wrong and assumes the CPU doesn't trap on unaligned load.
18:24:00 <gorgonical> Getting close to a successful build
18:24:00 <gorgonical> Step 1 nearly completed
19:43:00 <geist> mrvn: depends on the architecture
19:43:00 <geist> modern arm it's just assumed you have the allowed unaligned access bit set
19:43:00 <geist> you can argue that's a bad idea, but that ship sailed years ago
19:55:00 <mrvn> there should be an option for the compiler
19:59:00 <geist> there is, though i think we went through all of this before
19:59:00 <geist> there's a switch that among other things disables the assumption that you can do unaligned accesses
19:59:00 <mrvn> iirc i read Android doesn't have the bit set
19:59:00 <geist> but i think it may only be arm64
20:00:00 <geist> of course because it generates shitty code
20:00:00 <geist> you use it for things like firmware before the mmu is brought up
20:00:00 <geist> *the point* of allowing unaligned accesses is it generates better code, and the architecture fully supports it and recommends it
20:01:00 <geist> you mean android doesn't have the 'allow unaligned accesses' bit? i seriously doubt that
20:01:00 <geist> but we have to be clear: do you mean arm32 or arm64?
20:01:00 <mrvn> arm32
20:01:00 <geist> i'm about 98% sure they have the bit set. i went through this fight years ago at a company that was a competitor to android, and the fact that android went ahead and set it sealed the deal
20:02:00 <mrvn> I think on some cpus the double register load/store still fails with unaligned addresses
20:02:00 <geist> you have to be very explicit about which cpus and which ones you're running your code on, android, etc
20:03:00 <geist> which versions. all the modern 'big' ones dont have the problem
20:03:00 <geist> i think some of the earlier embedded versions (armvN-m) does
20:03:00 <geist> and there *are* alignment of atomic issues you have to be aware of
20:04:00 <mrvn> anywaym my point was that gcc assumes the "allow unaligned access" bit is not set and clang assumes it's set.
20:05:00 <geist> i dont think that's right
20:05:00 <mrvn> if you access a packed struct then one does byte-by-byte acess and the other just loads registers.
20:09:00 <mrvn> probably depends on the compiler version too
20:09:00 <geist> but i did just double check: indeed, in armv7 at least there's a whole table of what can cause unaligned faults with SCTLR.A=0 (alignment checks disabled)
20:09:00 <geist> notably, atomics, load/store double, load/store multiple
20:10:00 <geist> armv8 seems to have seriously relaxed it to basically just load/store multiple (which isn't really supported in 64bit) and atomics
20:10:00 <geist> which is why i hadn't thought about it. unclear if that means an armv8 core in 32bit mode also has less traps
20:12:00 <geist> ah never mind, found the 32bit thing. its the same on armv8
20:13:00 <mrvn> https://godbolt.org/z/8Mo1a6joz I remebered it wrong. gcc screws up
20:13:00 <bslsk05> godbolt.org: Compiler Explorer
20:14:00 <geist> basicall 64bit has very few unaligned restrictions, aside from atomics
20:14:00 <qookie> you can tell gcc to emulate unaligned accesses with aligned ones only with -mno-unaligned-access
20:14:00 <qookie> or have it fail on unaligned accesses with -mstrict-align
20:15:00 <geist> exactly. also looks like the behavior started with gcc 11
20:15:00 <qookie> but the latter is aarch64 specific it seeems
20:15:00 <geist> yah was gonna say
20:16:00 <mrvn> Passing a packed struct by value produces odd code too: https://godbolt.org/z/j7ov168dz
20:16:00 <bslsk05> godbolt.org: Compiler Explorer
20:17:00 <mrvn> What is gcc thinking there?
20:19:00 <mrvn> doesn't even need packed. Why is gcc storing a struct passed as arg in regs to the stack just to ignore it?
20:28:00 <geist> yeahthat's pretty odd
21:04:00 <gorgonical> I have gotten a build all the way up to undefined symbol errors. Unfortunately there's 1000 errors so I've done something wrong evidently
23:00:00 <heat> https://pbs.twimg.com/media/EWEqV_fWsAIv3tj?format=png&name=small
23:01:00 <psykose> cute selfie
23:05:00 <heat> I DO NOT USE FUCKING NIXOS
23:05:00 <heat> i use arch btw
23:05:00 <gog> how can you tell if somebody uses arch linux
23:06:00 <gog> they'll tell you
23:06:00 <gog> heh heh heh heh
23:07:00 <heat> sadly not anymore :(
23:07:00 <heat> now it's all about nixos
23:07:00 <mjg_> vegan arch crossfitter
23:07:00 <heat> you know what solves every problem ever? THE NIXOS PACKAGE MANAGER
23:08:00 * kingoffrance throws gog elephant meat for that joke
23:09:00 <FireFly> "the nixos package manager" sounds like a roundabout way to say "nix" :p
23:09:00 <gog> what is the nixos package manager
23:09:00 <gog> i'm not a fucken nerd so i wouldn't know
23:09:00 <heat> its the nixos package manager
23:10:00 <heat> you clearly can't read
23:11:00 <gog> why would i read
23:12:00 <heat> becuz you're a fucken nerd
23:14:00 <heat> screw it I'll shamelessly copy r/programmerhumor
23:14:00 <heat> i merged my first PR for my new job at cloudflare haha
23:14:00 <heat> so happy
23:14:00 <j`ey> congrats!
23:14:00 <j`ey> my first PR at my job was a single character change
23:15:00 <heat> didn't even test it lol just pushed it
23:15:00 <heat> hopefully nothing went wrong lol
23:15:00 <heat> (I like how you didn't get the joke, or maybe you're pretending you didn't get the joke and this is all some sort of meta joke)
23:15:00 <heat> what is joke
23:17:00 <j`ey> uh, i didnt get your joke ?_?
23:17:00 <heat> there was a huge outage yesterday
23:18:00 <j`ey> ohhhh lol
23:18:00 <heat> https://blog.cloudflare.com/cloudflare-outage-on-june-21-2022/
23:18:00 <bslsk05> blog.cloudflare.com <out of privacy tokens, poke puck>
23:18:00 <puck> oh ughhhhh
23:18:00 <psykose> lmao
23:18:00 <j`ey> brought back online and by 07:42 UTC
23:18:00 <j`ey> no wonder I didnt notice
23:27:00 <heat> it brought down cloudflare on and near every big population center
23:28:00 <heat> i would've never have noticed because there's a PoP right here in lisbon, which wasn't affected
23:29:00 <heat> this sort of stuff makes me very happy that I'm not a network engineer
23:30:00 <j`ey> are you on call though?
23:31:00 <heat> no
23:31:00 <heat> my team doesn't have anyone on call afaik
23:31:00 <j`ey> nice
23:31:00 <heat> we're no goddamn SREs
23:34:00 <heat> and for my real first PR, it was like a week and a half ago and nothing broke yet sooo
23:34:00 <heat> i'm clear
23:34:00 * heat knows on wood
23:35:00 * heat learns how to type and this time knocks on wood
23:35:00 * psykose emits knowledge towards the wood
23:36:00 <heat> i am one with the wood and the wood is with me
23:40:00 * kingoffrance .oO( bruce lee spinning in his grave "i said water! be water, water!" )
23:42:00 <kingoffrance> well cremated, who knows, anyways, adios
23:42:00 * kingoffrance zzz