Search logs: #osdev2 - 1 November 2021

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=21&m=11&d=1

Monday, 1 November 2021

00:00:00 <junon> The lowest level facility to create build rules is the global `Rule()` function, which generates a ninja `rule` statement and returns another closure that you can call repeatedly to create ninja `build` statements.
00:00:00 <gog> MelMalik: allt gott
00:01:00 <junon> I just added the first standard library module that builds on top of that and auto-configures a C compiler for you.
00:01:00 <junon> It also standardizes config params and converts them to compiler-specific flags, too.
00:01:00 <junon> But the resulting config is chefs kiss. I'm really excited to start actually using it now.
00:03:00 <gog> i love stuff like that
00:03:00 <gog> parsimony
00:04:00 <junon> I can compile a list of C files using `local cc = require 'cc'; return cc{ S'foo.c', S'bar.c', warn='strict' };`
00:06:00 <junon> Once I add the linker then I can do `local link = require 'link'; return link{ out=B'my-program', cc{ S'foo.c', S'bar.c', warn='strict' } }`
00:06:00 <gog> seems like it'd be pretty flexible too
00:06:00 <junon> Yeah
00:06:00 <junon> It's strict, so you can't access things above the configuration file's own directory.
00:06:00 <junon> `B` and `S` can be prefixed to a string to resolve the path to the build and source directories, respectively
00:07:00 <junon> and the build directory is always `<build_prefix>/<source_dir>`, where `<source_dir>` is the directory of the currently running config file, relative to the root config file directory
00:08:00 <junon> you can also do things like `local c_path = S'foo.c'; local o_path = B(c_path):ext' .o';`
00:08:00 <junon> err, switch the ' and the space right before the `.o`
00:09:00 <junon> The way I was trying to configure modules for my kernel via CMake was just not clean or flexible and the resulting code was just awful.
00:12:00 <junon> I need a good documentation generator now, though.
00:12:00 <junon> I like the idea of RST but it lacks the kind of flexibility I want.
00:30:00 <cooligans> Does anyone know why my AMD Ryzen 5500 dosen't come with TSC-Deadline mode
00:33:00 <geist> not sure AMD has implemented it
00:33:00 <cooligans> Oh
00:34:00 <cooligans> it looks to me like its easy to implement
00:34:00 <geist> if it did it's going to be in Zen 3, since i know it's not in zen 2
00:34:00 <cooligans> intel CPUs have had it for quite a while
00:34:00 <geist> maybe there's patent reasons AMD didn't pick it up yet
00:34:00 <cooligans> true, true
00:35:00 <geist> but yeah it does seem like it'd be pretty easy
00:35:00 <geist> also took a while for Zen to pick up x2apic
00:35:00 <cooligans> Does QEMU have it implemented (without KVM/WHPX)
00:35:00 <cooligans> using TCG
00:35:00 <geist> actually perhaps that's the reason. it only makes sense if you have the full x2apic support because of the whole local apic access using MSR
00:35:00 <geist> yes
00:36:00 <cooligans> are you by any chance on the osdev discord
00:36:00 <geist> there have been multiple
00:36:00 <cooligans> the biggest one
00:36:00 <cooligans> ~4000 members
00:36:00 <geist> i have popped in from time to time, but haven't found it compelling to continue
00:36:00 <cooligans> you like the calm of IRC
00:37:00 <cooligans> discord is a bit more chaotic, since so many people are talking at once
00:37:00 <geist> well, i more than that found at least at the time that the level of discorse in the discord to be low
00:37:00 <geist> ie, lots and lots of folks without a lot of knowledge giving bad advice
00:37:00 <geist> i tried to help for a few months but didn't seem like anyone was interested
00:37:00 <cooligans> oh
00:37:00 <geist> technically i'm still there too
00:37:00 <geist> looking at it now, i just muted it forever ago
00:38:00 <cooligans> I mean, you seem quite more knowledgeable than i am
00:38:00 <cooligans> i'm sure you're help would be apreciated for the more advanced topics
00:38:00 <geist> sure. like i said i tried to for a while, but it didn't seem like there was the appropriate level of... discourse. hard to describe
00:39:00 <cooligans> like there was someone with a PKS (Supervisor Protection Keys) problem the other day
00:39:00 <cooligans> I didn't even know that was a thing
00:39:00 <cooligans> till 2 days ago
00:39:00 <klange> I was on the current one really early on but left. I rejoined for a bit but I just find the active userbase has strong opinions. It's likely constantly being in a Hacker News comment section.
00:39:00 <geist> yeah exactly
00:39:00 <cooligans> lol
00:39:00 <klange> s/likely/like/
00:39:00 <geist> that's part of it, lots of folks with bad advice and a lot of opinions
00:39:00 <geist> which is fine up to a point, but after a while it's like swimming in sand
00:39:00 <cooligans> i get it
00:40:00 <geist> if you're trying to sort of raise the bar so to speak
00:40:00 <cooligans> anyways, has anyone implemented shadow stacks
00:40:00 <geist> that being said i'll try to pay a bit more attention
00:40:00 <cooligans> I'm trying to add it to my operating system
00:40:00 <geist> what do you precisely mean by shadow stacks?
00:40:00 <geist> there are tons of things called that
00:40:00 <cooligans> CR4.CET and the whole SSP set of instructions
00:41:00 <geist> ah no. it's some new intel thing i haven't looked at at all
00:41:00 <geist> has it made it into consumer cpus?
00:41:00 <cooligans> I think
00:41:00 <geist> thats the poiint where i start paying a bit of attention
00:41:00 <cooligans> its also on the AMD System Developers manual
00:41:00 * geist nods
00:42:00 <cooligans> Wait, how did you do that
00:42:00 <cooligans> nods thing
00:42:00 <cooligans> i'm kinda new to IRC
00:42:00 <geist> honestly i spend most of my time in ARM64 land
00:42:00 <klange> Usually `/me` is the command.
00:42:00 <geist> type /me <emote>
00:42:00 * cooligans nods
00:42:00 <cooligans> there we go
00:42:00 <cooligans> I find ARM64 to be a bit too confusing
00:42:00 <geist> also a side note that i can't really get to worked up about but the discord server seems far more x86 centric than even this channel
00:43:00 <cooligans> now that's definitely true
00:43:00 <geist> virtually all conversations are implicitly x86, except one channel thats 'other-isas'
00:43:00 <geist> and since i spend most of my time on !x86 it's less compelling for me
00:43:00 <cooligans> 90% of people work on x86 based projects
00:44:00 <cooligans> there is only 2 projects that aren't x86 centralized afaik
00:44:00 <geist> which isn't really untrue of here as well,m but i think non x86 stuff is more tolerated here since enoiugh of the regulars are also either building portable OSes or also work on arm/riscv/etc stuff
00:44:00 <cooligans> managarm, whose creators are in this server
00:44:00 <cooligans> i think
00:45:00 <klange> There's two kinds of people in that Discord. The ones that make Managarm, and the ones that think their project is as cool as Managarm and are sorely mistaken.
00:45:00 <cooligans> I have one last question before I go, what IRC clients do you guys use?
00:45:00 <cooligans> managarm is impressive ngl
00:45:00 <gog> konversation
00:45:00 <klange> irssi
00:45:00 <cooligans> its almost as impresive as zircon
00:45:00 <geist> managarm? never heard of it
00:45:00 <cooligans> I'm using hexchat
00:46:00 <cooligans> its on github and it runs more than 50% of unix applications
00:46:00 <cooligans> https://github.com/managarm/managarm
00:46:00 <bslsk05> managarm/managarm - Pragmatic microkernel-based OS with fully asynchronous I/O (40 forks/728 stargazers/MIT)
00:46:00 <junon> geist: I think it's because all of the "hello world" tutorials e.g. on osdev.org exclusively target x86.
00:46:00 <geist> looks like x86-64 only
00:46:00 <geist> but it does at least do SMP so i'll give it that
00:46:00 <cooligans> check kernel/thor/arch
00:47:00 <cooligans> and kernel/eir/arch
00:47:00 <klange> geist: You should look into it as it's a rare gem of functionality; they're going for userspace Linux compatibility and can run Wayland.
00:47:00 <geist> cool
00:47:00 <cooligans> thats where the arch specific code is stored
00:47:00 <klange> They've earned my respect at least.
00:47:00 <geist> yeah looks like it's a thing for sure
00:47:00 <cooligans> the craziest thing is that its a microkernel
00:47:00 <cooligans> most projects like this are monolithic
00:47:00 <geist> sure, but microkernels aren't *that* weird
00:47:00 <geist> see zircon, etc
00:48:00 <cooligans> well yeah, but microkernels are rare
00:48:00 <cooligans> is zircon still actively developed
00:48:00 <geist> fair. the heydey was back in the 90s
00:48:00 <klange> People make a big deal out of microkernels vs. not microkernels, but it's really a very minor thing. Minix is a microkernel, Hurd despite all of its stumbles is a microkernel and can run a full Debian XFCE desktop, Zircon is a microkernel.
00:48:00 <geist> absolutely
00:48:00 <geist> full disclosure: i'm one of the main creators of zircon
00:48:00 <geist> and i still work on it at work
00:49:00 <klange> I think people have gotten the wrong idea into their heads from the famed Tanenbaum debates, and even most "monolithic" OSes these days are far closer to microkernels than the ones that were relevant in the 80s.
00:49:00 <geist> it's a fork of littlekernel, which is also one of my old projects
00:49:00 <cooligans> does google plan to swap out chromeos with fuchsia
00:49:00 <cooligans> at least that was my first impression
00:50:00 <junon> geist: you mean Fuchsia's Zircon?
00:50:00 <geist> cant talk about future stuff
00:50:00 <geist> junon: yes
00:50:00 <junon> I'm still surprised Google has kept the project. They like to kill those a lot these days.
00:50:00 <geist> heh, not gonna argue with that
00:50:00 <geist> naw, fuchsia has actually shipped in things at this point
00:51:00 <cooligans> wow
00:51:00 <klange> They've made a nest for themselves, so hopefully they'll stick around :)
00:51:00 <geist> also side note littlekernel is all over Pixel 6 too
00:51:00 <geist> kinda proud of that
00:51:00 <cooligans> I've always had one problem with zircon though
00:51:00 <cooligans> you can't build it standalone
00:51:00 <geist> cooligans: you and me both
00:51:00 <cooligans> there was like a bug number 35***
00:51:00 * klange hopes that joke landed somewhere
00:51:00 <geist> used to, cant anymore. i argued against it, but lost that particular battle
00:51:00 <junon> geist: how many Fuchsia devs are in this channel, do you think?
00:52:00 <geist> oh probably 5 or 6
00:52:00 <geist> we have a discord server for it now if you want to ask technical questions
00:52:00 <cooligans> geist: why did they vote aginst it
00:52:00 <geist> just dont bother asking things like 'what is google going to do with it'
00:52:00 <cooligans> ooh, could I get the link
00:52:00 <geist> cooligans: unified build system
00:52:00 <klange> Can't say we haven't tried to increase the number...
00:52:00 <junon> klange are you on fuchsia as well?
00:52:00 * klange is quite happy with current employment situation.
00:52:00 <geist> makes sense. ease of building, one GN/ninja instance, etc
00:53:00 <klange> No, but I've applied and been through interview processes, back in the before times.
00:53:00 <geist> and there's some amount of cross-contamination of libraries and whatnot between user and kernel
00:53:00 <junon> I see
00:53:00 <geist> which i was initially against, but have since relented
00:54:00 <cooligans> geist: mind sending me the discord link
00:54:00 <geist> looks like https://discord.gg/pjfYkmbq69
00:54:00 <bslsk05> redirect -> discord.com: Fuchsia
00:54:00 <geist> i googled it.
00:54:00 <geist> it's not super busy but if you have technical questions we'd be happy to answer
00:55:00 <cooligans> thanks
00:55:00 <cooligans> anyways, its kinda late, I gotta head out for the night
00:55:00 <geist> anyway, fairly proud of how well zircon has turned out
00:55:00 <geist> lots of fun decisions we made early on and most of them have turned out to be decent
00:55:00 <cooligans> since gn is the build system, it it possible to build fuchsia on windows
00:56:00 <cooligans> *is
00:56:00 <geist> on a posix environment yes
00:56:00 <cooligans> so wsl
00:56:00 <cooligans> works
00:56:00 <geist> otherwise no. there's a lot of prebuilt toolchain binaries
00:56:00 <geist> clang/rustc/etc
00:56:00 <geist> which are linux or mac only
00:56:00 <cooligans> ok
00:56:00 <cooligans> but since wsl is posix, it works
00:56:00 <geist> yep. WSL2 builds it fairly well
00:57:00 <geist> WSL1 is a trainwreck building fuchsia for Reasons
00:57:00 <cooligans> ok
00:57:00 <junon> WSL1 doesn't have a sleep syscall implementation :c
00:57:00 <junon> so any attempts to `sleep` and the like fail
00:57:00 <cooligans> wow
00:57:00 <junon> At least in every case I've personally tried.
00:57:00 <geist> yah we found that gn itself has a terrible antipattern of heap usage for WSL1
00:57:00 <geist> that causes it to take literally 20 minutes or so to run
00:58:00 <geist> then there are some Go based tools in the build that also run horribly on WSL1. probably for sleep() like reasons
00:58:00 <geist> WSL1 is a pretty amazing solution, but it's always the edge cases that fall over
00:58:00 <geist> WSL2 being just a VM it works pretty fine
00:58:00 <klange> WSL1 amazes me, and the fact that WSL2 happened just makes WSL1's entire existence even more crazy.
00:58:00 <junon> Yeah. But there are outstanding issues with WSL2 that make me nervous to switch
00:59:00 <geist> yah
00:59:00 <junon> Did they pull the plug on WSL1 entirely now, in terms of support?
01:00:00 <klange> I do a bunch of dev under WSL2 on a Surface, got it all set up for nested virtualization so I can use KVM with QEMU.
02:02:00 <raggi> geist: from what I could figure out the common cause was the ptmalloc strategy for small object allocations was hitting an extremely slow path (which is also slow on Linux, but 10x more impact on wsl1) - I'm anticipating starnix will have similar challenges to overcome eventually
02:02:00 <raggi> er, glibc malloc, strictly speaking
02:03:00 <raggi> jemalloc and tcmalloc with their map arenas operated much more efficiently
02:04:00 <raggi> I think, but I didn't get around to asserting it for sure, that it was the fine grained madvise causing the bulk problem, and more for threaded programs than serial ones, so assumption is it's hitting a global or widely shared lock
03:38:00 <klange> I just dd'd my ISO to a USB stick and popped it in my ThinkPad and my bootloader actually works - it's the first time I've tried that.
07:23:00 <vin> What does loads and stores being atomic with respect to each other mean? And why isn't this supported on x86?
07:23:00 <vdamewood> vin: Do you know what it means for an operation to be atomic?
07:24:00 <vin> yes vdamewood
07:24:00 <vin> So a load after a store should always return the stored value? Is that it?
07:25:00 <vdamewood> Yep, and a load before a store should load the value before the store.
07:25:00 <vin> Right but I thought this is guranteed on x86, this is the basic memory consistency one has to support
07:26:00 <vin> *guaranteed
07:26:00 <vdamewood> I'm pretty sure this is guarandeed for a single core, but not for a multicore setup.
07:28:00 <vin> So does that mean one could implement mutex locks with https://en.wikipedia.org/wiki/Peterson%27s_algorithm on a single core on x86 safely?
07:28:00 <bslsk05> en.wikipedia.org: Peterson's algorithm - Wikipedia
07:30:00 <vdamewood> No clue on my part.
07:33:00 <vin> Also why isn't multi-core not guarantted? Because of different L1/L2 caches and instruction reordering? The coherency protocol invaldiates a dirty cache line to ensure consistency but a thread could do a load on it before the invalidation making the change made by other thread(other core) invisible?
07:45:00 <vin> vdamewood: the notes section in the above wiki page sort of hints that memory reordering of sequential accesses without explicit memory barriers can break this algorithm but then any normal sequential program without barriers will also then provide no guaranttess of memory consistency!
07:45:00 <vin> That's absurd
07:56:00 <Griwes> the algorithm seems to write and read from _different_ variables
07:56:00 <Griwes> unless you use strong enough memory orderings or fences, those can be reordered with respect to each other
07:57:00 <Griwes> the notes don't talk about accesses to the same address, but accesses to different addresses
07:57:00 <Griwes> and those get reordered all the time
07:58:00 <Griwes> your sequential code that does not have (1) fences, (2) atomics, or (3) data dependencies between instructions won't execute the way you wrote it
08:00:00 <Griwes> there's also _at least_ two levels of reordering that happen to your program unless you directly write assembly: the compiler is going to reorder accesses (save for when it encounters memory order enforcement such as atomics or fences), so the assembly is not what you wrote; and the cpu is going to reorder instructions when actually executing (save for memory order enforcement such as atomics or fences)
08:00:00 <Griwes> you are _probably_ safe on x86 when accessing the same variable, and in a bunch more cases because the memory model semantics of x86 are _incredibly_ strong
08:01:00 <Griwes> but this looks like one case where you can easily get bitten
08:02:00 <zid> compiler barriers are fun
08:02:00 <zid> that's all you need on uniprocessor though
08:03:00 <zid> your barriers and fences etc can all define out to asm(""::"memory");
08:04:00 <Griwes> that's the strongest compiler fence, yes
08:04:00 <zid> compiler barrier, it isn't a fence
08:05:00 <Griwes> I'm not convinced that there's a meaningful distinction
08:06:00 <zid> There isn't any meaning to compiler fence, is all
08:06:00 <zid> couple hundred google results, mostly talking about fences inside compilers
08:07:00 <Griwes> C++ has a function that is called atomic_thread_fence for this, and it's definitely a compiler operation too
08:08:00 <vin> Griwes: just to clarify, even if I write the algorithm in asm (avoiding compiler reordering) I expect this algorithm to not work on a single core processor. Because x86 can reorder accesses to the two different memory locations this breaking the logic?
08:08:00 <vin> *thus breaking
08:08:00 <zid> I'd like to see how any of them would change in a uniprocessor system, though
08:09:00 <Griwes> vin: unsure, because I don't remember what the exact semantics on memory operations in x86 are
08:09:00 <Griwes> a few years back I'd be able to answer that :P
08:09:00 <zid> C's going to treat it all as a no-op unless they're actually volatile, which there's no need for them to be because it's uniprocessor, and compile it down to a nice fat nothing, given sufficient optimization
08:09:00 <Mutabah> vin: You will always read back waht you last wrote
08:09:00 <Griwes> but I've moved safely into the realm of "I'll just do the thing that's correct from a language memory model point of view" and I'm happier not wondering about the details on the architecture
08:09:00 <Mutabah> The diference is the ordering to another CPU/thread
08:10:00 <moon-child> with multiple threads on the same core you retain atomicity
08:10:00 <moon-child> on the instruction level
08:11:00 <Mutabah> (by "thread" I meant hardware thread, aka hyper-threading)
08:11:00 <zid> ye HT tosses all this out the window, that's SMP again
08:11:00 <MelMalik> I really just want to be a soft animal
08:11:00 <vin> Mutabah: sure yes that's the MC an x86 supports but what about reordering of access made to different locations on a single core? Assuming single thread -- single core
08:12:00 <moon-child> yeah; meaning vcore, not phys etc.
08:12:00 <MelMalik> and i want my OS to represent that
08:12:00 <Mutabah> On a single thread, you will never observe the re-ordering
08:12:00 <Mutabah> (unless you have some way of observing the memory bus)
08:14:00 <moon-child> not even that, cuz ssb
08:14:00 <moon-child> and it'll get flushed if you get preempted
08:15:00 <Griwes> right. I guess it depends on what you're observing
08:15:00 <Griwes> if you had a way to observe memory reads (perhaps by being an mmio device), you can observe it
08:17:00 <vin> Okay so there can be reordering of accesses to different locations with 2 physical threads on a single core. Since they share the same L1/L2 any invalidations are instantly observed (the store buffer could have stores cached though right?) by either threads thus making this algorithm work?
08:17:00 <Griwes> re-reading the wiki page again, I'm not seeing any sentences that talk about single core systems
08:19:00 <vin> Griwes: I am just thinking about it from different perspectives trying to explain when this would and wouldn't work on x86. Also improving my x86 MC knowledge
08:19:00 <geist> hmmm
08:19:00 <geist> trying to wade into this discussion
08:20:00 <vin> It is clear why it wouldn't when threads are on different cores but if the threads are on same core with SMT (vthreads == pthreads)
08:20:00 <vin> Hi geist
08:20:00 <geist> SMT hardware threads behave pretty much identically to physical cpus
08:20:00 <geist> so any sort of ordering guarantees (or lack of) apply there
08:20:00 <geist> hi vin
08:20:00 <Mutabah> SMT cores will generally have their own L1 cache
08:21:00 <Mutabah> and will definitely have their own pipeline ordering
08:21:00 <vin> THey share both L1 and L2 right Mutabah.
08:21:00 <Mutabah> yeah... just realised that was probably not write :)
08:21:00 <geist> right, there's no real interlocking between the SMT cores, except what may or may not be as a side effect of a given implement
08:21:00 <Mutabah> Pipeline point still stands
08:22:00 <geist> actually not entirely true. see Bulldozer
08:22:00 <geist> it was a hybrid of SMT where later versions had dedicated L1s
08:22:00 <Mutabah> ... "write", what the ___ is wrong with me today
08:22:00 <geist> kinda halfway between separate cores and full SMT
08:22:00 <vin> So are you saying the algorithm will not work on a single core with SMT geist? https://en.wikipedia.org/wiki/Peterson%27s_algorithm
08:23:00 <geist> i dunno, i dont particularly feel like trying to grok that algorithm right now
08:23:00 <vin> Because of reordering
08:23:00 <geist> but again i repeat: SMT for all practical purposes appears to be the same thing as separate cores
08:23:00 <Griwes> I spend too much time thinking about SIMT synchronization these days and not enough thinking about "normal" architectures
08:23:00 <vin> hmmm
08:24:00 <Mutabah> If you _ever_ have multiple cores accessing memory, you need to use atomic ops at some level
08:24:00 <geist> so if they do appear to be synchronized that's a side effect of the microarchitecture
08:24:00 <Griwes> insert the "why can't you be normal" meme with a screeching GPU on the second panel
08:24:00 <geist> also remember modern superscalar designs have *lots* of memory accesses going on in parallel, many times speculatively
08:24:00 <Griwes> anyway as soon as you have two instruction streams, you need atomics
08:24:00 <geist> a lot of what makes the memory model appear strong/etc is the dependecey tracking of all these outstanding transactions
08:24:00 <geist> in the case of SMT you end up with a bunch of outstanding transations, just spread across multiple hw threads
08:25:00 <geist> but then they wont have any explicit deps between them
08:25:00 <geist> so particular barriers or barrier events or ordering events will only apply to a particular threads
08:25:00 <vin> Wait aren't loads and store atomic? So even having two instruction stream shouldn't be a problem correct Griwes?
08:26:00 <Mutabah> Depends on the architecture
08:26:00 <Griwes> aligned stores and loads on x86 are atomic, yes, but that only guarantees no tearing
08:26:00 <vin> x86
08:26:00 <geist> note that we're talking about strongly ordered arches like x86
08:26:00 <Griwes> it does not guarantee ordering
08:26:00 <geist> most other arches that are still active nowadays are weakly ordered
08:26:00 <Griwes> (between threads that is)
08:26:00 <geist> in which case even single threaded ordering is not guaranteed
08:26:00 <Griwes> it has some enforcement *within* a thread
08:26:00 <Griwes> but not across
08:27:00 <geist> and since other arches exist and are popular, you still have to deal with weakly ordered stuff
08:27:00 <Griwes> also, if you are above assembly at any point, the language you're writing in will usually say that unsynchronized accesses are always a data race and always undefined
08:27:00 <geist> unless you happen to be writing just the x86 portion of an x86 module
08:28:00 <geist> but weakly ordering isn't as bad as it sounds, it just means a bunch of guarantees aren't there so you can't rely on particular behaviors and you need barriers, implied or explicit
08:28:00 <geist> like, for example, an atomic variable with SEQ_CST or acquiring a mutex, etc
08:29:00 <Griwes> vin: if you ever touch the same variable with two different instruction streams that have a potential to execute concurrently (and that includes on different hyperthreads), you need to use atomics. x86 allows you to avoid tearing without atomics on aligned accesses, but that's it
08:29:00 <geist> to a certain extent a weakly ordered system is almost easier to reason about because the guard rails are off, so you can just imagine the cpu does what it does with less rules to constrain it
08:29:00 <vin> Interesting, I thought SMT threads will be different. Wait the only strong ordering gurantee x86 provides is within a single thread and not across threads correct geist? like Griwes mentioned
08:29:00 <geist> vin: correct
08:29:00 <geist> that's what i keep saying. SMT for all practical purposes on all implementations i know of make no guaratees about cross thread sequences
08:30:00 <geist> they act as if they were separate cores
08:30:00 <vin> Cool, that way one less special case to design for.
08:30:00 <geist> since again modern superscalar x86s for example may have like 64 or 80 outstanding load/stores in flight, many speculatively, etc
08:31:00 <vin> I mean if one cares about performance then scheduling threads on same cores would make sense -- reuse L1/L2 hot cache lines.
08:31:00 <geist> the strong memory model x86 is guaranteeing is basically a complex sets of interdependencies between those load/stores to ensure they appear on the 'bus' in order, but the cpu may have long since moved on, etc
08:31:00 <vin> Got it!
08:31:00 <geist> but that only extends to a single hardware thread. if the same core is running another thread it may just have another set of interleaved memory transcations that are only sorted relative to other transactions for that thread
08:33:00 <geist> and yah, having software threads that are running code that deals with similar data can have a win with SMT for sure
08:33:00 <geist> or at least less of a penalty
08:33:00 <Griwes> it can have a cache benefit, but whether that will end up with perf benefit overall is something one needs to test
08:34:00 <Griwes> because you are kinda getting less overall cpu time compared to scheduling on two separate cores that aren't doing anything on the hyperthread
08:34:00 <vin> So to conclude the only case this algorithm wold work on x86 would be on a single core with no SMT. The performance would be abysmal because of spin-wait and lot of context switches.
08:34:00 <Griwes> depends on the system load, depends on the kind of work that the threads do (compute vs memory heavy and whatnot)
08:35:00 <geist> i haven't looked at it too closel but i wonder if these peterson thing would work on a weakly memory model machine
08:35:00 <geist> probably, if the spinny variables are atomic
08:35:00 <vin> Makes sense Griwes this depends on the workload and yes it is a tradeoff.
08:35:00 <Griwes> scheduling's hard ;p
08:36:00 <geist> since atomic variables (at least on arches like ARM) can/do/may have memory barriers built into them
08:36:00 <geist> which then orders things before/after
08:36:00 <geist> which is generally not a thig you have to worry about with x86 because effectively every load/store has an implicit barrier with it
08:36:00 <Griwes> all of this reminded me of a funny (hardware) scheduling-related case of atomics doing funky stuff on a gpu
08:36:00 <geist> ie, things htat happened before it happen before, things that happen after happen after (even if it's basically fiction)
08:37:00 <Griwes> we were testing a hashmap that did two loads, relaxed+relaxed vs seq_cst+relaxed and... seq_cst+relaxed was faster
08:37:00 <geist> huh!
08:38:00 <geist> question i guess is did the gpu actually implement relaxed
08:38:00 <Griwes> we aren't sure why but the working theory is that hitting a global seq_cst barrier synchronized all the warps so that it eliminated divergence
08:38:00 <geist> ARM for example allows a given core to 'relax' any of the lesser barriers to something stronger
08:38:00 <Griwes> but it's just a working theory
08:38:00 <geist> ah good point
08:38:00 <Griwes> yes, our gpus implement the full C++ memory model of atomics since a few generations ago
08:39:00 <geist> nice
08:39:00 <geist> relaxed atomics still make my head spi
08:39:00 <geist> at some point i think i was enlightened and grokked how a pipeline would allow that, but then the moment passed
08:40:00 <Griwes> our std::atomic implementation kinda translates from C++ enum names (like memory_order_acquire) to instructions with a matching part (i.e. it actually says "acquire" in the public ISA)
08:40:00 <geist> and of coruse ARM at least has a complex set of rules about whether or not a barrier applies to *all* memory transactions or just things in the same cache line, etc
08:40:00 <Griwes> relaxed is just "pls no tear" ;d
08:40:00 <Griwes> some time ago there was someone talking about proposing memory_order_tearing
08:41:00 <geist> yeah, arm64 does too. ldr and ldar and ldtr i think
08:41:00 <Griwes> which would give you _no_ guarantees, but would allow you to do a non-ub access even though you could get values never written
08:41:00 <geist> ldar (acquire) and stlr (release) is it yeah
08:41:00 <Griwes> not sure where that idea went
08:42:00 <geist> anyway, relaxed atomics are lovely. wish x86 had themb
08:42:00 <vin> So a few months ago I read http://pages.cs.wisc.edu/~markhill/papers/primer2020_2nd_edition.pdf which was pretty interesting. I wished it covered modern protocols and gurantees used in x86 or arm
08:43:00 <moon-child> geist: are they though? Like if you don't have contention the strong stuff will be cheap, and if you are operating on the same memory concurrently, the relaxed stuff will lead to races
08:43:00 <geist> they're great for things like counters
08:43:00 <Griwes> relaxed needs to be used _very carefully_
08:43:00 <geist> you just bump some counter and move on, but dont have to synchronize to world for it
08:44:00 <Griwes> some number of years ago there was a really bad bug in one of the c++ stdlib implementations, in shared_ptr refcounting
08:44:00 <moon-child> you mean like perf counters where it's fine if the value is wrong? i guess that could work. but also just make it thread-local
08:44:00 <Griwes> something that needed to be release I think was relaxed and things broke badly
08:44:00 <geist> yah i can see that
08:44:00 <geist> we use them mostly for counters and stuff in the kernel yeah
08:44:00 <Griwes> define "if the value is wrong"
08:44:00 <geist> where if it's off by one that's fine
08:45:00 <Griwes> if you use them right, you get the right values
08:45:00 <Mutabah> or just out-of-date
08:45:00 <geist> right
08:45:00 <moon-child> Griwes: where what you care about is that the value is in the right ballpark, not precise value
08:45:00 <Griwes> if you're using relaxed to do say a lockfree list, you're doing it wrong
08:45:00 <Griwes> yeah
08:46:00 <Griwes> (I mean with a lockfree list you're probably going to initially do a relaxed load to obtain the old value before you enter a cmpxchg loop but you get my point)
08:46:00 <geist> for efficiency purposes we have all the kernel counters be per cpu but since you can be context switched in the middle of it we still do a relaxed atomic bump of it
08:46:00 <geist> so that it at least doesn't corrupt the value
08:46:00 <geist> 99.9% of the time it's local to the cpu that did it so its even pretty efficieny
08:46:00 <moon-child> ah yeah, that is sensible
08:46:00 <moon-child> language vs cpu memory model
08:47:00 <moon-child> (and actually kinda coincides with vin's question)
08:47:00 <geist> yah also built around the armv8.0 atomics where you have to do a multi instruction sequence. vs v8.1 atomics
08:47:00 <geist> which look much more like x86. riscv also did single instruction atomics.
08:48:00 <geist> a violation of the risc manifesto, but basically the best way to do it on modern machines
08:48:00 <moon-child> huh. so riscv w/atomics is not actually load-and-store?
08:49:00 <Griwes> I mean being pragmatic beats strictly adhering to a manifesto in engineering
08:49:00 <geist> they have both actually
08:49:00 <Griwes> usually
08:49:00 <geist> load/store conditional and a set of atomic alu single instruction ops
08:49:00 <vin> geist: a lot of modern file systems also now maintain bitmaps and inode tables per core to provide better concurrency.
08:50:00 <geist> which is fairly surprising considering how bare bones riscv tends to be
08:50:00 <moon-child> x86 you actually have a cmpxchg loop for atomic anything but add/sub
08:52:00 <geist> hmm, never thought about it but thought you could `lock or` or whatnot as well?
08:52:00 <geist> or does the lock prefix only really work on add
08:53:00 <Griwes> huh, that's news to me
08:53:00 <Griwes> how come they don't have something that's sufficient to implement things like fetch_or
08:54:00 <geist> i honeslty haven't thought about it in a while. i try to use builtins anyway
08:54:00 <Griwes> yeah
08:54:00 <geist> https://gcc.godbolt.org/z/x8nhxvYco ah no, you can lock or as well
08:55:00 <geist> and thus i assume and/xor/etc
08:56:00 <Griwes> ...why does std::atomic not do that
08:56:00 <moon-child> ah huh; manual sez it can be applied to: add adc and btc btr bts cmpxchg dec inc neg not or sbb sub xor xadd xchg. For some reason I thought it was more restricted
08:56:00 <Griwes> ...huh, mystery deepens
08:57:00 <Griwes> gcc + libstdc++ (gcc's stdlib) does cmpxchg loop
08:57:00 <Griwes> oh
08:57:00 <Griwes> mystery solved
08:57:00 <Griwes> my -O flags weren't matching
08:57:00 <Griwes> :'D
08:58:00 <Griwes> new mystery, why does it do the cmpxchg loop at -O0
08:58:00 <geist> good question
08:58:00 <Griwes> clang does the same
08:58:00 <Griwes> loop at -O0, lock or at -O1 and up
09:02:00 <Griwes> it must be something in atomic, because the naked use of the intrinsic uses lock or at -O0 too
10:06:00 <MelMalik> would it be bad to extend risc5
10:14:00 <klange> Ah, that's why VGA text mode was not working on my ThinkPad, despite the kernel log showing up fine...
10:15:00 <klange> Was trying to map the region write-combining, which apparently doesn't work... the kernel log and bootloader weren't doing anything like that.
10:18:00 <klange> And that's why we test on real hardware~
10:18:00 <klange> I'm really starting to lose my patience with the ethernet port.
10:18:00 <klange> At some point the retaining ledge broke so cables don't stay in any more.
10:51:00 <junon> geist: Will Fuchsia work on iPhone/iPad devices in theory? Maybe you can't speak on that in any official capacity, and I know that that area is taboo because of no right-to-repair laws etc.
10:54:00 <j`ey> just as well as Linux would I think
10:54:00 <junon> That's probably true, if it were feasible it probably would have already been done with Linux.
10:57:00 <j`ey> https://wiki.postmarketos.org/wiki/Apple_iPhone_7/7%2B_(apple-iphone7)
10:57:00 <bslsk05> wiki.postmarketos.org: Apple iPhone 7/7+ (apple-iphone7) - postmarketOS
10:58:00 <junon> Doesn't the boot firmware verify the images against the root CA cert from Apple in modern devices?
10:58:00 <junon> So your image basically needs to be signed by Apple or something, right? You'd have to have their private key
10:59:00 <j`ey> this is why you need exploits
10:59:00 <junon> and the firmware can't be flashed since it's put on ROM at chip manufacture time according to this article I'm reading
10:59:00 <junon> Right okay, makes sense
10:59:00 <junon> so this is the goal of the jailbreak movement, pretty much directly, right?
11:00:00 <j`ey> I dont think they want to run linux
11:01:00 <junon> Right
11:08:00 <klange> Love a nice round number. https://klange.dev/s/Screenshot%20from%202021-11-01%2020-07-57.png
11:08:00 <Mutabah> :D
11:08:00 <gog> nice
11:08:00 <junon> Perfection
11:09:00 <Mutabah> :( 1520 here
11:09:00 <gog> show me 4KB: 4000 bytes
11:09:00 <gog> no the real 4KB: 4096 bytes
11:22:00 <junon> wow wtf, why is a hello world program in clang 14kb. That's bigger than in the past, right? Or am I imagining things?
11:23:00 <junon> with -O3 -g0 -s -DNDEBUG=1
11:23:00 <junon> just a single puts("hello world")...
11:23:00 <GeDaMo> Symbols? Dynamic loading?
11:24:00 <junon> no -s strips symbols
11:24:00 <GeDaMo> Ah
11:24:00 <junon> it*
11:24:00 <junon> dynamic loading might be in, but I thought that'd reduce the file size wouldn't it?
11:25:00 <junon> -static causes it to be 788k
11:25:00 <junon> wow
11:27:00 <Geertiebear> you can use bloaty to find out where all that space goes
11:28:00 <junon> That's a new one, have a link to bloaty?
11:30:00 <junon> just tried with both gcc and clang, about the same thing, and CMake actually produces a larger executable since in release mode it doesn't strip.
11:30:00 <junon> oh google's bloaty, got it
11:31:00 <Geertiebear> yeah, that's the one
11:36:00 <junon> It shows 9.34ki as "unmapped"
11:36:00 <junon> and 1.69ki as "ELF Section Headers"
11:36:00 <junon> seems... wrong
11:37:00 <junon> .text seems more or less correct though, just 376 bytes
11:38:00 <junon> asking in #llvm oftc right now
11:49:00 <junon> Seems platform specific but they agree it's kind of large. Oh well, I don't think there's anything I can add to a release build set of flags to make the binary any smaller. It's already smaller than CMake.
11:50:00 <klange> junon: btw re: hn comment, I actually had the first join to this channel on Libera, but the network was unstable and channel registration was not available, you got it after I d/c'd ;)
12:15:00 <junon> Oh :D
12:15:00 <junon> Yeah I remember now, the chanserv stuff was hugged to death
12:21:00 <junon> People were worried andrew was going to try to retaliate against libera
12:22:00 <junon> he forcibly took over ownership of most freenode channels that mentioned libera lol
12:23:00 <klange> It was definitely a thing that happened. The rapidity with which everyone migrated to Libera/OFTC was quite extraordinary.
12:23:00 <junon> Yes
12:24:00 <junon> I connected pretty much right as the first resignation letter drafts were leaked, and my eyes were glued to the screen for the next 8 hours after that just watching it all unfold. It was incredible how fast they got everything up and running.
12:24:00 <junon> cc jess :D lol
12:24:00 <junon> you all did a good job IMO
12:26:00 <junon> new servers, new site, managing permissions, getting everyone cloaked/registered/transferred, answering questions, doing downtime maintenance, dealing with internet drama/fallout, dealing with Andrew, dealing with the emotional end of it, all at once, the bulk of which within pretty much 48 hours from start to finish. Impressive.
19:20:00 <geist> indeed
19:21:00 <geist> it was pretty amazing how quickly that was all in the rearview mirror. for people that didn't have to admin anything at least
19:24:00 <Bitweasil> Yeah.
19:24:00 <Bitweasil> Not-Freenode popped up in a right hurry, and it's been solid!
19:26:00 <GeDaMo> The hardest part is remembering it's called libera.chat, not libera.net :P
19:46:00 <geist> hah yeah
19:49:00 <kazinsal> clever domain hacks are usually worth it. usually
19:50:00 <j`ey> usually.worth.it
21:13:00 * geist resists the urge to click on that
21:40:00 <Ermine> Spoilee: this domain is for sale
22:38:00 <junon> second spoiler: they don't tell you a price without an email
22:38:00 <junon> so now they have a domain quote request from a very childishly named test email address