Search logs: #osdev - 14 August 2019

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=19&m=8&d=14

Wednesday, 14 August 2019

03:25:20 <graphitemaster> geist, do you know which apis lstop actually uses to get such good topology information
03:37:35 <ybyourmom> I'd really like to turn on TCP_NODELAY on my ssh connection to my IRC host
03:37:47 <ybyourmom> Wish I knew how to do that
03:39:13 <jjuran> It's one of those nagling little details
03:39:40 <wcstok> It's a really good thing that guy wasn't named Smith or Jones
03:40:16 <ybyourmom> Wondering if I can ioctl on the ssh socket using an external program and set the option on it
03:40:26 <geist> graphitemaster: i straced it once, it seems to mostly fetch it out of /proc/cpuinfo and some other ones
03:40:51 <graphitemaster> geist, it uses hwloc library which does a ton of crazy shit, my god, there's even a kernel mode driver in here
03:41:34 <graphitemaster> I found this though https://github.com/pytorch/cpuinfo
03:41:39 <graphitemaster> which also does similar stuff
03:41:46 <graphitemaster> thinking of using it as a source to write my own
03:43:05 <geist> lstopo also works on non x86, so there'sstill some fallback stuff there
03:44:51 <graphitemaster> reading the intel manuals I think I even found a side channel attack
03:44:55 <zid`> https://www.servethehome.com/wp-content/uploads/2019/08/Dual-AMD-EPYC-7742-Topology.jpg speaking of lstopo, lel
03:45:03 <graphitemaster> there's a non-protected api for querying cache utilization
03:45:09 <graphitemaster> performance counter crap
03:45:20 <graphitemaster> so like, you can do the usual cache bleed crap with this
03:50:45 <graphitemaster> zid`, sexy
03:51:50 <zid`> oh noes azonenberg is back
03:51:52 <azonenberg> I know i've done this before but can't find the docs...
03:51:55 <zid`> he'll want me to write him more linker scripts
03:52:04 <azonenberg> Lol no actually my linker script is doing just fine now
03:52:10 <azonenberg> I have C running on the stm32 now bare metal
03:52:25 <azonenberg> my question was how to do memory mapped IO cleanly?
03:52:32 <zid`> cleanly?
03:52:44 <azonenberg> as in sane-looking header files or whatever
03:52:56 <azonenberg> basically, i need a bunch of named symbols at absolute addresses with C types associated with them
03:53:20 <graphitemaster> a person who writes linker scripts is a linker-linker
03:53:28 <zid`> #define ETHERNET_REG_SIX *((volatile unsigned int *)(0x1F801800))
03:53:59 <zid`> (That's actually the CD-ROM status register on a playstation)
03:54:08 <graphitemaster> it looked familiar
03:54:18 <zid`> everyone knows that one
03:54:32 <azonenberg> what, no 0xbfc00000?
03:55:05 <azonenberg> zid`: so there's not a good way to just make a .o file full of symbols i can link to?
03:55:12 <zid`> it needs to know they're volatile
03:55:19 <azonenberg> yeah but i can do that in a header
03:55:22 <zid`> and not regular objects
03:55:29 <azonenberg> ideally i'd want to say 'extern volatile int foo'
03:55:33 <zid`> you can do that then
03:55:34 <graphitemaster> volatile void *name = (volatile void*)0xdeadbeef;
03:55:36 <zid`> blah = 0x1f801800
03:55:36 <graphitemaster> would also work
03:55:57 <azonenberg> But i dont want the object to be pointer type is the thing
03:56:04 <graphitemaster> then use an integer
03:56:07 <azonenberg> i want to be an ordinary int
03:56:10 <azonenberg> that happens to be at that address
03:56:15 <graphitemaster> what
03:56:16 <zid`> blah = 0x1f801800
03:56:16 <azonenberg> and have the linker do the magic to make it work
03:56:19 <zid`> that's what that does
03:56:25 <zid`> but it's very not what you want to do
03:56:29 <azonenberg> graphitemaster: I want &GPIOA to result in 0x40020000
03:57:18 <zid`> they'll all be global symbols, you won't be able to control the linkage, and you'll end up with volatility bugs
03:57:22 <graphitemaster> uint32_t var __attribute__((at(0x40020000));
03:57:38 <azonenberg> graphitemaster: that sounds more like what i wanted
03:57:42 <graphitemaster> only works for ARM
03:57:42 <zid`> just use the macro or the explitic pointer declaration
03:57:45 <azonenberg> yeah this is arm
03:57:54 <graphitemaster> yeah, I could tell by the address :P
03:58:14 <azonenberg> graphitemaster: yeah its a stm32f777
03:58:24 <graphitemaster> gpio addresses
03:58:25 <azonenberg> i have not yet been crazy enoguh to do anything bare metal on a cortex-a
04:03:52 <azonenberg> Hmmm
04:03:58 <azonenberg> attribute(at) is apparently Keil specific
04:04:00 <azonenberg> and does not work on arm gcc
04:04:06 <azonenberg> There goes that idea
04:16:35 <graphitemaster> who uses arm gcc
04:16:52 <graphitemaster> everyone knows that armcc is the only c compiler you use for arm metal
04:45:45 <ybyourmom> Does armcc accept all the gcc __attribute__(())-isms?
04:58:30 <graphitemaster> most of 'em
04:58:32 <graphitemaster> not all of them though
05:07:07 <geist> graphitemaster: i assume you're kidding when you're talking about armcc
05:07:23 <geist> in which case i'd generally request you not spread misinformation that way
05:08:18 <geist> azonenberg: generally you either do some sort of pointer math like zid` pointed, or you create a struct and simply create a local struct var that points at it
05:08:21 <geist> lots of ways to do it
05:08:26 <geist> you dont need any extensions for it
05:08:39 <geist> but, make sure you go through a pointer to a volatile
05:14:58 <graphitemaster> geist, what, everyone knows you write assembly only for arm because it's so easy, easier than C :P
05:17:55 <azonenberg> geist: yeah what i ended up doing was making a linker script
05:18:05 <azonenberg> and just putting each SFR as a volatile struct in its own special section
05:18:13 <azonenberg> then assigning them to locations in the linker script
05:18:16 <geist> azonenberg: you dont need to do any of that, that's prett strange
05:18:21 <geist> just create a variable and use it
05:18:53 <geist> in fact i think using the linker script is much harder to understand, since its not actually defined in the source code
05:19:10 <geist> so someone coming along looking at the code wouldn't be able to figure out where the variable is declared
05:20:01 <azonenberg> geist: i declare it in the source
05:20:10 <geist> yes but the value comes from the linker script
05:20:16 <azonenberg> volatile gpio_t GPIOA __attribute__(section".gpioa")));
05:20:18 <geist> you can also just declare the pointer and put the value in it
05:20:24 <azonenberg> But then it's a pointer and not a literal
05:20:28 <geist> without needing any compiler or linker script trickery
05:20:39 <azonenberg> I'm trying to be compatible with vendor libraries where you can say GPIOA.foo = 42;
05:20:49 <azonenberg> not GPIOA->foo = 42;
05:21:13 <geist> well i guess i fyou have that constraint... sure. do whatcha want
05:21:15 <jjuran> #define GPIOA (*GPIOA_)
05:21:38 <geist> what jjuran said would work too
05:22:14 <azonenberg> i guess i need to look at some vendor headers and see how they do this
05:22:20 <geist> and if GPIOA is declared const the compiler should generate efficient code for it
05:22:34 <geist> i can tell you that i have never seen using linker scripts like that for ARM GCC level code
05:22:51 <geist> i've seen it for things like AVRs and PICs and some other non gcc compilers for embedded stff
05:22:59 <azonenberg> I'm pretty sure i've seen it used on mips
05:22:59 <geist> but by the time you get to ARM you dont see as much trickery with it
05:23:46 <geist> lkast time i saw it was in some TI DSP code, it had a default linker script with all the peripherals called out
05:24:09 <geist> but it was a very complex linker script, for a TI linker that wasn't binutils
05:25:11 <zid`> he brushed off doing anything sane earlier :P
05:25:27 <klange> sanity is for wimps
05:26:56 <graphitemaster> my imaginary friends agree with klange
05:30:40 <ybyourmom> My imaginary friends agree with me
05:31:06 <ybyourmom> I have so many imaginary friends who peer review my opinions that I'm never wrong
05:31:31 <zid`> The universe just spontaneously reforms so that whatever I say is correct
05:31:35 <zid`> it's way easier
05:31:43 <ybyourmom> I'm the world's leading authority on every subject
05:36:12 <geist> every few years i remember and end up reading about https://en.wikipedia.org/wiki/Lojban
05:36:22 <geist> (completely unrelated to any prior conversation)
05:53:12 <ybyourmom> Does Lojban have the multitude of expressive subtleties of english?
05:53:25 <ybyourmom> I really enjoy speaking in subtext
08:06:42 <bcos_> Yay - my compiler has "IRC style chat" now :-)
08:06:50 <j`ey> o_O
08:06:57 <klange> your... what... why
08:07:06 <aalm> .theo
08:07:06 <glenda> I wish more people would find real bugs to fix.
08:08:13 <bcos_> Sorry - my multi-user IDE backend that compiles in the background has IRC style chat :-)
08:09:35 <aalm> #warnings and #errors + #chat for users etc.?
08:10:30 <aalm> what does it compile? hello worlds?
08:11:14 <j`ey> why not just integrate slack into the ide
08:11:37 <bcos_> aalm: Technically, nothing yet
08:13:59 <klange> I finished converting all of my editor modes from being hardcoded switch statements to mappings of keys to actions. Some performance sacrificed for configurability and much more straightforward code.
08:14:17 <klange> And now I can automatically generate the documentation for the keybinds. https://github.com/klange/bim/blob/mapped-keys/docs/mappings.md
08:14:58 <j`ey> Im guessing negligible performance
08:15:06 * bcos_ wishes "switch()" worked in Java :-(
08:16:02 <bcos_> (it only works for ints, and I want "switch(myLong)" everywhere so...)
09:55:48 <xenos1984> geist: I think about having my kernel error messages in Lojban. And use that as primary language for gettext.
09:56:31 <klange> If you do it correctly it would ensure that your messages are completely unambiguous to the 0 people who can read them.
09:57:26 <xenos1984> Exactly - two reasons for 0 people misunderstanding are better than one. :)
09:58:54 <xenos1984> I recently saw a shirt saying "Kein Weltraum links auf dem Gerät.", which is a perfectly valid German translation for "No space left on device.", literally meaning "No cosmos on the left side of the device."
10:02:56 <xenos1984> azonenberg: If you happen to use C++, you might use a class template like my ConstObject to declare an object at a constant memory location (or ExtObject if the address is not constant, but determined by the linker): https://github.com/xenos1984/NOS/blob/master/kernel/Symbol.h
14:43:22 <geist> xenos1984: heh yah
14:46:08 <ybyourmom> Just finished implementing my DMA manager
14:46:17 <ybyourmom> It's been 3 years of part time work
14:46:19 <ybyourmom> Jesus
14:46:52 <geist> yah dma can be complicated
14:47:06 * ybyourmom nods, and I went the UDI route
14:47:51 <ybyourmom> From here on though, my kernel is "real" in that it can actually manage the hardware on the machine competently
14:50:13 <aalm> next you can try achieving s/competently/completely/ and no more fun for you:]
14:51:29 <ybyourmom> Ideally, I'll get there with some more work
14:57:04 <ybyourmom> git push -f to master
14:57:29 <ybyourmom> I'm a pretty rebellious guy
14:58:30 <geist> yolo
14:59:11 <program> hello
15:00:08 <program> why cant i ping my os running on qemu on tap (windows) which is even bridged to real network adapter?
15:00:24 <ybyourmom> Do you have an NIC driver?
15:00:28 <program> only UDPs gets thru
15:00:51 <geist> ICMP is not UDP, so may start with that
15:00:51 <program> in my OS? yes, i can see the frames, but it seems that no ICMP gets thru
15:00:58 <geist> ah, hmm
15:01:08 <geist> do you get any ARP requests through?
15:03:26 <geist> may also want to tcpdmp on the host side to make sure the arp/icmp packets are going onto the TAP
15:03:33 <geist> and it's not some routing issue
15:14:38 <heat> does anyone know if it's possible to "relink" a dynamically linked progam into a statically linked program?
15:15:36 <aalm> with good enough hex editor, everything is possible=]
15:18:51 <heat> preferably without a hex editor :)
15:23:28 <bauen1> technically dynamically linked programs get linked at runtime, so it is possible (except if the program itself uses it's own dynamic linking information)
15:27:44 <heat> hmm yes but I was thinking of unlinking with libc.so and linking with libc.a
15:27:47 <geist> the trouble would be doing things like combining data and bss segments and then having to go back patch every reference
15:27:55 <heat> but this is starting to sound very hacky
15:28:11 <geist> most of that information is lost when doing the first link, so you'd have to go back and reparse all of the code in the system and tweak all the references
15:28:20 <geist> so it's probably technically possible but it would be very very difficult
15:28:37 <heat> I just wanted to make dash into a statically linked executable for my initrd and a dynamically linked dash for my root filesystem
15:28:48 <heat> but I guess I'll either have to recompile it all or forget about it
15:28:58 <heat> s/compile/link
15:29:11 <geist> i seems like it could be possible to jam them together and produce a ELF binary that has multiple RX/RO/RW segments, as if the images were somewhat catted together
15:29:21 <geist> but i suspect a lot of dynamic loaders wouldn't deal with that
15:29:37 <heat> why?
15:29:51 <geist> my guess is they are somewhat hard coded to only deal with a limited number of program segments
15:30:19 <heat> from what I've read of musl's ldso it looks like it's entirely possible
15:30:31 <j`ey> why not compile twice, seems like a much simpler solution!
15:30:32 <geist> and of course you'd still need to probably do a runtime patchup of references since the binaries dont really know they're 'statically linked'
15:30:57 <geist> so all you're really doing there is bundling a bunch of elf binaries into one, so i dont think there'd be a large advantage anyway
15:31:05 <geist> you wouldn't get any codegen advantage, etc
15:31:26 <heat> hmm maybe
15:31:27 <j`ey> and you need a copy in the initrd and on the fs anyway?
15:31:39 <heat> I want to have dash in the initrd as like a fallback shell
15:32:24 <heat> and because of that I need to have libc.so and libgcc_s.so.1, which after being fully stripped of unneeded symbols and debug stuff, have a size of ~600KB
15:34:05 <geist> which you get back the instant you have a second binary
15:34:13 <geist> honestly 600KB doesn't sound too bad
15:34:38 <geist> unless you're trying to do some sort of single binary busybox thing i dont think you're going to get a lot
15:34:38 <j`ey> what about busybox?
15:34:52 <heat> the problem is that because they're in the initrd and not on the fs the other programs don't get to share memory with init
15:34:55 <geist> and plus even if you only had a single binary a sizable chunk of the 600KB would just end up inside your single static binary
15:35:00 <geist> so you may only really be saving something like 200KB total
15:35:12 <geist> ah, that's true
15:35:34 <geist> but yeah sounds like you want to compile the static musl too, which makes sense anyway
15:35:42 <geist> just takes twice as long to compile
15:35:50 <heat> I compile both
15:36:29 <heat> I'm just looking at my system and I think I'm getting too wasteful with my memory
15:36:56 <geist> that's definitely a good thing
15:37:07 <heat> probably a fair bit of it are the struct pages
15:37:23 <geist> how many bytes per?
15:37:42 <geist> most big oses use something like 32 bytes per page, which seems to be a fairly decent sweet spot
15:38:19 <heat> 56
15:38:23 <heat> :/
15:38:40 <geist> yah might want to trim on that if you can
15:38:47 <geist> but it's not completely unreasonable
15:38:55 <j`ey> what does 'struct pages' mean here?
15:39:24 <geist> j`ey: most systems have per physical page of memory (usable RAM in this case, not all of physical address space)
15:39:30 <geist> a structure
15:39:48 <geist> generally allocated up front at boot time, and more or less permanently 'wired'
15:39:57 <j`ey> ah
15:40:00 <geist> used to track where pages are allocated as the VM gets more complicated
15:40:26 <geist> early on you can generally get away with only really tracking if a page is in use or not (a bitmap) and then maybe use the page table entries to track if it's mapped or not
15:40:58 <geist> but as your VM gets more sophisticated you need to start tracking metadata about the page (accessed, dirty, what state its in, linked list node, etc)
15:41:12 <mrvn> In debian there is a busybox and buysbox-static package.
15:41:25 <geist> especially when you start having more layers in te VM (pages attached to vm objects that aren't mapped anywhere, or mapped in multiple places)
15:42:32 <mrvn> If you have COW then you need metadata per page
15:42:43 <geist> but, this menas, for example, that if the struct page is 32 bytes and your page size is 4096, then you essentially up front burn 32/4096 percent of usable memory with overhead
15:42:53 <mrvn> If you have shared memory you need a reference count
15:43:37 <geist> right, also there are architectures that dont have page tables like x86 or arm, and in those you cant rely on the page table as being 'durable' (ie, not going away on it's own)
15:43:53 <geist> so yo uneed metadata up front almost immediately to track what is mapped where, at least virtually
15:44:53 <mrvn> geist: But that's probably better put into address space structure that deal in segments of memory. Not per page data.
15:45:12 <geist> correct, but then within the segments you need to track within what slot what page backs it
15:45:26 <geist> which is cleanly solved by having it hold references to the physical page structure
15:45:33 <Bitweasil> ybyourmom, which DMA engine are you interfacing with?
15:45:56 <Bitweasil> progra... seriously, how do people ping timeout so badly?
15:46:14 <ybyourmom> None right now
15:46:23 <geist> that's really what i was trying to express. ie, you want the upper level VM to track what physical pages are attached to what vm object anyway, and the actual mmu mappings are lower level
15:46:26 <ybyourmom> I wrote the kernel abstractions for my DMA mangement
15:46:42 <Bitweasil> Oh, ok. So you have the plumbing in place to then write the hard part. :)
15:47:03 <ybyourmom> yup
15:47:34 <geist> ybyourmom: havey ou started to consider iommus?
15:47:56 <geist> at some point that starts to intersect with dma in a pretty serious way
15:48:10 <ybyourmom> geist: Yes, I know where they fit in
15:48:13 * Bitweasil shudders
15:48:18 <ybyourmom> But I didn't implement anything for them
15:48:58 <ybyourmom> That's a good way off tbh
15:49:10 <geist> yah dont need to implement it up front. my only input there is if your dma api is physical address based, may want to consider abstracting that behind one level of handle
15:49:21 * heat 's DMA code is just a 130 line file to get the physical memory ranges that map to the virtual range
15:49:37 <geist> ie, some sort of opaque representation of a physical buffer of memory, which will interact with iommus later on when you implement it
15:49:39 * ybyourmom nods
15:49:48 <ybyourmom> That part is already done
15:49:49 <geist> ie, you 'map' physical memory into iommu space, then DMA on top of that
15:49:51 <geist> good good
15:50:08 <ybyourmom> The core of the DMA abstraction is these handles to scatter gather lists in the kernel
15:50:16 <geist> ah perfect
15:50:27 <ybyourmom> You can transfer them between address spaces and then map/unmap them in the current owner
15:51:01 <ybyourmom> The idea behind the transfer thing is to enable zero-copy data passing from the device (NIC, etc) straight through to userspace
15:51:02 <geist> are your drivers user space?
15:51:06 <ybyourmom> yup
15:51:16 <geist> ah yes, this stuff becomes a bit deal really fast there
15:51:26 <geist> we have a slightly different mechanism in fuchsia, but not too dissimilar
15:51:36 * ybyourmom nods
15:51:49 <heat> I don't do zero-copy anywhere in my kernel
15:52:02 <heat> it's kinda bad
15:52:34 <mrvn> When you do a microkernel that passes buffers around from process to process you really should think about something that passes the buffer through without mapping it. Unmapping it takes way too long.
15:52:39 <ybyourmom> You'll get there
15:53:14 <heat> but I really don't know what else I could do if the fs stack is "read() -> read_vfs() -> lookup_file_cache() -> ext2_read() -> ahci_read()"
15:53:41 <heat> I'll have to copy somewhere
15:53:50 <mrvn> heat: why?
15:54:08 <ybyourmom> The only place you're forced to do a copy is at the front-end read() because of the posix API design
15:54:10 <geist> mrvn: yah exactly. fuchsia/zircon lives on VM objects that is generally unrelated to mapping or not
15:54:27 <ybyourmom> The rest of it can be zero copy all the way through to the end disk device
15:54:29 <mrvn> ybyourmom: not even there
15:54:29 <geist> you can create, do all your operations, including DMA, read/write/destroy a vmo without ever mapping it
15:54:35 <heat> because while ahci_read reads directly to the file cache page, I need to copy the page to the provided buffer
15:54:44 <geist> mapping a vmo is just another operation, but not required
15:54:54 <mrvn> heat: no, you can map the file cache pages to the address of the buffer with COW.
15:55:02 <mrvn> heat: assuming properly aligned buffers
15:55:05 <heat> IF it's page aligned
15:55:06 <heat> yeah
15:55:24 <mrvn> heat: the linux AIO interface requires aligned buffers. posix read() does not.
15:55:44 <heat> yup
15:56:05 <heat> But asking for page alignment is a bit too much no?
15:56:23 <azonenberg> ybyourmom: you know, you can just not be posix... :p
15:56:25 <mrvn> if you do it from the start then no. Adding that later is a pain
15:56:49 <mrvn> heat: my read() doesn't provide a buffer. it only returns one.
15:56:57 <ybyourmom> heat: when you call fopen()/open() you supply a VFs path to a file which can be used to determine what the end-device (disk) will be
15:57:03 <azonenberg> In antikernel i dont have shared memory, you move bulk data by pushing a physical address to the other node
15:57:11 <mrvn> read(fd, off, size) ==> read-only buffer
15:57:17 <azonenberg> then map it or just use it as a raw physical address as needed
15:57:28 <azonenberg> modifying page tables is a fast, unprivileged userspace operation in antikernel
15:57:34 <ybyourmom> Your kernel can take note of the fact that all read/writes on that FD are to disk0:0, and that disk0:0 has X/Y/Z DMA requirements
15:57:38 <azonenberg> because all permissions are checked on physical address
15:58:12 <ybyourmom> So, your kernel can allocate all buffer caches, etc for reads/writes on tht FD from RAM that meets disk0:0's DMA requirements
15:58:32 <heat> well yeah
15:58:34 <ybyourmom> And that way you don't have to copy anything but you can just send the frames directly off to DMA without any need for bounce buffer copying
15:59:00 <heat> yes but right now I just allocate regular pages
15:59:10 <ybyourmom> Np np
15:59:24 <heat> because my page allocator isn't designed for looking for aligned pages
15:59:47 <mrvn> heat: most hardware can scatter gather 4k pages.
15:59:47 <azonenberg> Also in antikernel i have a page allocator in hardware, so a hardware device can request a page and then DMA to it without any involvement from software
15:59:48 <heat> it's just a list of lists of pages
16:00:08 <azonenberg> because the allocator guarantees nobody else is using that page
16:00:51 * geist sheds a little tear of joy that real osdev talk is being done
16:00:53 <heat> mrvn: I know, except ATA, but ATA is garbage anyway
16:01:32 <bauen1> so there's this neat trick of recursively mapping the kernel pml4 (long mode), that can't really be used for process pml4s since that would eat a lot of virtual address space, right ? So if i want to use recursive mapping i need to write extra code just for the kernel pml4 ?
16:01:54 <heat> bauen1: the recursive mapping works for the entire PML4
16:01:56 <mrvn> bauen1: address space in 64bit is basically infinite. waste it
16:02:16 <geist> it chews up 512GB
16:02:29 <geist> since that's a size of the topmost 'slot' of a 4 level page table
16:02:40 <geist> ie, 39 bits
16:02:44 * heat just uses a big direct mapping
16:03:18 <geist> same
16:03:27 <geist> it's really convenient, if a bit of a security issue
16:03:33 <Bitweasil> mrvn, noted. mrvn, I just ran out of address space, I figured I'd just pre-allocate memory for every byte of data on the internet. :p
16:04:25 <nepugia> how would you acurately calculate the value of bytes on the internet? ;)
16:04:30 <mrvn> Bitweasil: you have 512 PML4 slots. Each recusive mapping requires one slot. Most systems reserve half the slots for the kernel. So really not a problem.
16:04:48 <heat> geist: How hard is it to design a microkernel-ish thing that isn't too slow?
16:04:59 <mrvn> define too slow
16:05:00 <Bitweasil> nepugia: while(true){porn_bytes++;}
16:05:13 <bauen1> that would limit me to ~256 processes so i guess i'll just don't use recursive mapping for simplicity
16:05:28 <heat> bauen1, what
16:05:36 <ybyourmom> Microkernels became unviable again because of the transient execution bugs
16:05:45 <heat> Basically I just want something that's fast and elegant
16:05:48 <mrvn> bauen1: huh? You use the same slot in the PML4 for the active process every time. YOu don't keep the inactive processes mapped.
16:06:01 <Bitweasil> heat, on x86? "fast" and "elegant" are usually opposed.
16:06:18 <ybyourmom> Meltdown's requirement for SAS kernel makes microkernels suffer a lot
16:06:23 <Bitweasil> And if you do manage, you've probably failed at "possible to explain to anyone else."
16:06:24 <heat> because I'm a bit tired of my POSIX monolithic one and I have a spare kernel that isn't really a monolithic kernel yet and is a bit more advanced
16:07:09 <bauen1> mrvn: thanks a lot for that for that thought
16:07:16 <mrvn> ybyourmom: huh? microkernels already do the meltdown mitigations so they don't suffer at all
16:07:50 <ybyourmom> Meltdown = you can't map the kernel in the top 1G of the address space, correct?
16:07:58 <ybyourmom> Or else the attacker can read kernel memory
16:08:03 <mrvn> bauen1: only reason you need more than one recursive slot is to copy data from one process to another. You want to be able to walk the page tables from multiple processes.
16:08:30 <heat> Like, in an ideal world, I would like to make fuchsia 2 electric boogaloo but worse
16:08:30 <ybyourmom> Putting a microkernel in a separate vaddrspace and requiring more address space switches kills microkernels more than monolithic kernels
16:08:34 <heat> because it actually does things
16:08:37 <mrvn> ybyourmom: no. you have to flush the mappings so timing attacks don't reveal what was accessed
16:08:56 <ybyourmom> I'm pretty sure that's spectre
16:09:04 <mrvn> ybyourmom: switching the address spaces already flushed so you don't need extra flushes
16:09:21 <ybyourmom> Meltdown was the permission bits not being checked properly
16:09:28 <mrvn> ybyourmom: and extra address spaces already mean the kernel parts aren't mapped
16:09:29 <ybyourmom> And you had to use KAISER
16:10:10 <heat> honest question: Don't those attacks rely on being able to handle segfaults?
16:10:36 <heat> (talking about userspace here)
16:10:52 <ybyourmom> It's difficult for me to answer because "segfault" is too braod
16:11:01 <ybyourmom> And I don't want to be snippy
16:11:44 <mrvn> heat: you avoid the segfaults by putting the access into a speculative execution branch and then time it.
16:14:08 <mrvn> ybyourmom: I think with a microkernel the kernel part itself should be so small that haing that mapped won't reveal secrets. Anything secret should be in other processes.
16:15:01 <ybyourmom> mrvn: In practice, even the most dogmatic microkernels (L4, seL4, etc) have a decently expansive and rich object heap in kernel space
16:15:24 <ybyourmom> And usually userspace directs the kernel to initialize objects in there which the kernel doesn't want to expose to userspace
16:15:30 <mrvn> ybyourmom: otherwise you are right and you get the same penalty for an extra address space switch on every syscall.
16:15:42 * ybyourmom nods np
16:18:21 <heat> hmm I don't understand how spectre works
16:20:02 <ybyourmom> I really hope that this whole transient execution bugs thing doesn't actually become baked into the job description for embedded security jobs
16:20:25 <ybyourmom> heat: Sorry, I don't remember details either, it's been a while
16:20:33 <bauen1> mrvn: i would still need to map one entry of the user pml4 to itself, right ? (eg. map kernel pml4 index 510 -> user pml4 ; user pml4 index 511 -> user pml4 )
16:20:49 <bauen1> long mode page tables are kind of melting my brain
16:21:22 <heat> bauen1: there are no kernel pml4s or user pml4s
16:21:22 <mrvn> more like index 510
16:21:40 <heat> Each address space is a PML4 + other VM stuff
16:21:58 <heat> the kernel runs on whatever address space is currently loaded
16:22:47 <Bitweasil> heat, you can run a lot of the attacks in TSX transactions as well, which is faster than signal handling.
16:23:29 <Bitweasil> Doesn't PCID help a lot with microkernels and multiple page mappings in the TLB?
16:23:36 <heat> oh yeah TSX exists
16:24:02 <mrvn> Bitweasil: yes. But I think then you get meltdown/spectre problems back again
16:24:26 <Bitweasil> I don't believe that's true. Modern Linux kernels are using PCID to split the user/kernel page table mappings.
16:24:45 <mrvn> Bitweasil: and they have to employ mitigations that make things slow
16:24:48 <Bitweasil> They used to use 0x00 and 0x80, now they're doing a LRU in the low byte and using 0x000/0x800 for the user/kernel mappinsg.
16:25:06 <Bitweasil> To the best of my knowledge, you can't speculate between page table PCIDs.
16:25:11 <heat> has anyone looked at spectre swapgs?
16:25:25 <Bitweasil> Sure, it's yet another speculative execution trainwreck.
16:25:42 <Bitweasil> "Look at how Windows handles kernel transitions, don't do that."
16:26:30 <Bitweasil> But if you've got different PCIDs for your user/kernel mappings, remember to set bit 63 on CR3 writes, and toggle back and forth, the hit isn't nearly as bad as it was when you had to blow the TLB every kernel transition.
16:28:54 <heat> I like how Intel and AMD absolutely shat the bed
16:29:14 <heat> how many more exploits can you get in 2 years???
16:29:43 <Bitweasil> AMD?
16:29:46 <heat> they even broke freaking virtual machines
16:29:52 <Bitweasil> They're far, far cleaner than Intel.
16:30:04 <heat> yeah pretty sure AMD isn't completely immune to this shitshow
16:30:09 <Bitweasil> They're mostly hit by the Spectre stuff that even impacts ARM, but the rest of Intel's screwups don't cross to AMD.
16:30:20 <Bitweasil> I've talked to people who do deep work on both, and AMD stops speculating when they hit something weird.
16:30:26 <Bitweasil> Intel plays through.
16:30:37 <Bitweasil> So there /is/ a difference in design philosophy between the two companies.
16:30:45 <Bitweasil> AMD definitely doesn't play through swapgs if it doesn't know how it resolves.
16:31:11 <Bitweasil> I've been trying to answer this for a while, and conversations with people far deeper in that realm than I am indicate that AMD didn't ignore the issue like Intel did.
16:31:15 <Bitweasil> For whatever that's worth.
16:32:28 <heat> I mean I prefer intel but boy did they screw up
16:33:19 <heat> does anyone know if ice lake is going to be immune to these new attacks or what?
16:36:28 <Bitweasil> Even Intel can't answer that.
16:36:38 <Bitweasil> They can't reason about their chips anymore.
16:36:46 <heat> ah yes, amazing
16:36:50 <Bitweasil> Look at how many of these attacks have violated SGX.
16:37:03 <Bitweasil> Intel's flagship, fancy, high end security feature that you can literally introspect, step by step, with uarch bugs.
16:37:11 <Bitweasil> Intel didn't know about L1TF or they would have put mitigations in.
16:37:16 <heat> they inspire confidence
16:37:19 <Bitweasil> (it's easy enough to dump L1 on an SGX transition)
16:37:30 <Bitweasil> So that tells me that either Intel can't, or won't, reason about this stuff.
16:37:54 <Bitweasil> Why would you expect a company that doesn't know their L1 cache design catastrophically breaks their enclaves to be able to actually fix this quickly?
16:38:00 <Bitweasil> Fortunately, AMD is now quite competitive.
16:38:15 <Bitweasil> And, while they do have some issues, they don't seem to be nearly so "#YOLO" as Intel in chip design.
16:38:52 <mrvn> fast, cheap, secure. pick two.
16:39:07 <heat> unfortunately intel only picked fast
16:39:20 <Bitweasil> If I give you enough money, will you lie to me and tell me I can have all three? :D
16:40:18 <heat> only if you make me CEO of intel
16:40:30 <Bitweasil> I won't run hyperthreading on Intel boxes anymore, and I have an increasing amount of non-x86 hardware.
16:40:42 <Bitweasil> This IRC connection is on an ARM box, admittedly bounced through an x86 cloud instance.
16:40:46 <j`ey> heat: how can you be sure relocate.c doesnt need relocating itself?
16:41:00 <Bitweasil> relocate_relocate_c.c :p
16:41:08 <heat> j`ey: you're talking about my code?
16:42:07 <heat> If so, since all that code and data is in .boot it's still running at the physical base address
16:42:28 <j`ey> heat: yup, in carbon
16:42:43 <heat> so there's no issue, and the bootloader doesn't pass in relocations to .boot or .percpu
16:46:15 <heat> hmm, is there a name for an OS that allows drivers both in kernel space and user space?
16:46:18 <mrvn> heat: on ARM I don't know where my .boot will be
16:46:42 <mrvn> heat: yes, garbage
16:47:03 <mrvn> akak windows
16:47:25 <heat> mrvn: why?
16:48:28 <mrvn> heat: because the idea of a micorkernel is to separate thhings and you just broke the whole idea
16:48:49 * ybyourmom feels attacked
16:49:11 <heat> but isn't it possibly beneficial to have some drivers in the kernel?
16:49:35 <mrvn> heat: sure. That's why windows NT put the whole graphics drivers in kernel.
16:49:39 <Bitweasil> Again, that's not the point of a microkernel.
16:49:58 <heat> s/drivers/subsystem/
16:49:58 <Bitweasil> Also, I'd argue Linux does exactly that with FUSE and some of the other userspace driver capabilities.
16:51:02 <mrvn> Bitweasil: that's a rather miniscule part of linux. Not really a design.
16:51:17 <heat> hmm
16:51:39 <heat> does a microkernel necessarily imply that drivers need to be themselves separated from each other?
16:51:46 <Bitweasil> "<heat> hmm, is there a name for an OS that allows drivers both in kernel space and user space?"
16:51:52 <Bitweasil> Linux /does/ allow it.
16:52:01 <mrvn> heat: no. but if you put them all in one address space what's the point?
16:52:39 <heat> it's faster and you're still not running as the kernel?
16:52:57 <mrvn> Bitweasil: it's better to say that liux has drivers that can communicate with userspace to implement extra functionality on top
16:53:23 <heat> please don't shoot me for my stupid questions I'm trying to think up a design
16:53:32 <mrvn> heat: if you trust the drivers not to corrupt any other driver then why do you fear they will corrupt the kernel?
16:53:56 <Bitweasil> eh, ok, I won't argue with that description too much. I'm not intimately familiar with the userspace driver capabilities/limitations.
16:54:24 <ybyourmom> formally verified zero-trust distributed blockchain microkernels
16:54:35 <ybyourmom> This is the only true path
16:54:52 <Bitweasil> As long as I can use crowdsourced machine learning to verify them.
16:55:02 <mrvn> Bitweasil: fuse is a kernel driver that creates /dev/fuse and userspace talks to the kernel driver through that. It's not like you can just write a user space driver. You always also have to write a kernel driver too.
16:55:42 <Bitweasil> Sure, but even in a microkernel setup, the kernel has to handle getting the userspace driver access to the hardware in the first place.
16:55:45 <heat> mrvn: Ah, okay, makes sense. I was thinking that the real danger here was having a way to crash the *whole* system or escalate privileges
16:56:02 <heat> But you do make a fair point since crashing the driver process would crash every single driver
16:56:04 <Bitweasil> Though a lot of those use the ring 1/2 levels, don't they?
16:56:06 <heat> making it a useless kernel
16:56:17 <mrvn> heat: that too. But if one driver can overwrite the data from all other drivers that crashes the whole system.
16:56:41 <heat> Bitweasil: AFAIK ring 1 and 2 don't exist in x86_64
16:56:56 <Bitweasil> ... hm. *pulls up the SDM*
16:56:59 <mrvn> heat: and some drivers have more rights. So if you trick them to do DMA for you then you suddenly have escalated priviledges.
16:57:44 <heat> mrvn, what do you mean?
16:57:57 <mrvn> Bitweasil: the difference is that a microkernel has a fixed API for drivers to get access to hardware in a general way. If you write a new driver you can use the existing interfaces. On Linux you have to write a custom user space interface first.
16:58:16 <ybyourmom> ioctl tho
16:59:03 <mrvn> heat: e.g. the GPU or SATA driver may do DMA. Now any driver can fiddle with the data structures of the GPU or SATA driver to make it DMA to somewhere you shouldn't have access to
16:59:48 <mrvn> ybyourmom: have to be implemented by every driver in kernel
17:00:07 <heat> yes but I'll assume that a driver should be trusted?
17:00:14 <ybyourmom> touche
17:00:17 <heat> or signed
17:00:30 <Bitweasil> I don't believe x86_64 dropped CPL 1, 2, or I can't find any reference to this.
17:00:35 <Bitweasil> Now, nobody /uses/ them.
17:00:46 <Bitweasil> But microkernel arch can use that to let drivers talk to IO ports.
17:00:53 <mrvn> heat: if you trust everything then why use memory protection at all? Why have user space at all. Users surely won't reload the page table, right?
17:01:24 <mrvn> heat: and while I trust my own code I don't trust myself not to have bugs.
17:01:39 <Bitweasil> mrvn, and Intel said, "Well, indeed, why should we have memory protection?" :p
17:01:45 <heat> :D
17:01:52 <heat> mrvn: hmmm, okay, makes sense
17:01:54 <heat> thanks
17:02:19 <heat> now let's start converting carbon into a microkernel!
17:02:37 <Bitweasil> And that's fair re a fixed API for driver permissions. I still would argue Linux supports userspace drivers, at least in special cases, so it's an example of one that supports both modes, but it's primarily kernel drivers, certainly.
17:02:39 <heat> Btw will people kill me if I keep certain code in the kernel?
17:02:58 <Bitweasil> Yes.
17:03:13 <Bitweasil> There's a group of microkernel enforcers that will literally hunt you down at night, in your sleep.
17:03:18 <Bitweasil> :p
17:03:25 <heat> Bitweasil: GPU drivers are a pretty good example of a user mode driver in linux
17:03:26 <heat> D:
17:03:34 <Bitweasil> (seriously, Windows and Linux both do kernel drivers, it's fine)
17:03:58 <mrvn> Bitweasil: they aren't microkernels. They do basically everything in kernel
17:04:15 <mrvn> Bitweasil: modern windows != windows NT
17:05:04 <Bitweasil> I know, that's what I'm saying. It's fine to put drivers in the kernel.
17:05:16 <Bitweasil> The microkernel enforcer line apparently needed a </s> at the end.
17:05:20 <mrvn> heat: some people also say you should use exokernels to run single applications and then have one VM per application.
17:11:33 <heat> mrvn, that's a bit hardcore no?
17:12:37 <mrvn> to each their own
19:03:44 <j`ey> my gf just described virtual memory as a 'detour', I think that's a pretty decent laymans explanation!
19:04:41 <Bitweasil> I just don't talk about computers to people much.
19:05:22 <Bitweasil> I whiteboarded out Meltdown to a guy... can't recall if it was before or after it actually got announced. Meltdown was a poorly kept secret. Anyway, he's an IT guy, just... glossed hard. :/
19:11:02 <j`ey> Bitweasil: im lucky in that my gf is open to talking about this stuff, even though she isnt a programmer heh
19:13:24 <heat> virtual memory is a detour but it's the coolest detour ever
19:14:49 <aalm> detours with traps are atleast as cool imo.
19:18:27 <j`ey> I really should setup my exception tables, so qemu doesnt just got into an infinite exception loop
19:19:00 <heat> j`ey: not having an exception table can be pretty useful sometimes though
19:19:06 <heat> keep that in mind
19:19:28 <j`ey> for why?
19:19:49 <aalm> .theo
19:19:49 <glenda> A related issue is lots of people don't know anything about anything.
19:19:58 <j`ey> mostly it just makes my logs from qemu massive, if I replaced them all with a b ., it would make the logs smaller
19:23:08 <heat> j`ey: imagine you fault. if you don't have an exception, you crash but if you add "-no-reboot -no-shutdown" it freezes, and lets you inspect the kernel's state
19:23:12 <heat> with gdb, etc
19:23:43 <j`ey> to qemu?
19:24:03 <heat> yes, to the command line
19:24:11 <j`ey> Im not sure if this is different on aarch64 and x86, but with mine it just gets in an infinite undefined exception loop
19:24:20 <j`ey> (with mine == on aarch64)
19:24:25 <heat> I'm not 100% sure
19:24:37 <heat> works on x86, thought it should work with aarch64
19:25:22 <Bitweasil> x86 should triple fault on an "infinte exception loop."
19:25:31 <Bitweasil> Fault, double fault, reboot.
19:26:26 <heat> yes that's the point of passing -no-reboot -no-shutdown
19:26:34 <heat> effectively you just freeze the vm
19:26:49 <Bitweasil> Ok, yeah.
19:29:24 <eryjus> j`ey, i had the same problems on arm 32-bit -- i agree that the exception tables will be critical to debugging. Take the time to query all the proper CP registers when you do so you get the right causes (not just the CPU registers)...
19:30:02 <j`ey> eryjus: QEMU gives them already as output. the main thing I want to do is avoid the infinite exception loop
19:30:11 <j`ey> (but yes, I will have to write some real handler eventually)
19:30:40 <eryjus> hmmm... the CP registers..?? something to look forward to then.
19:31:16 <j`ey> I assumed they were something like the ESR (exception syndrome) registers that aarch64 has
19:31:19 <eryjus> as in cp15
19:31:33 <heat> also keep in mind that having an exception table where code can register their handlers and recover from page faults can be fairly useful
19:31:36 <eryjus> maybe...
19:32:18 <j`ey> welp, looks like I will need to start looking at relocating my kernel
19:49:09 <mrvn> eryjus: the first time you get a tripple fault (after you got your MMU working) is the perfect time to implement the double fault handler.
20:17:01 <eryjus> mrvn: for x86 for sure; j`ey is working on ARM.
20:26:16 <mrvn> eryjus: different name, same thing
20:27:03 <mrvn> You want to dump the CPU and CP registers to the serial and maybe add a stack backtrace.
20:57:09 <belgarten> what happens when an external interrupt is triggered but the entry in the idt isn't present? Is the interrupt just ignored?
21:00:43 <bcos_> belgarten: You get a general protection fault
21:02:19 <belgarten> bcos_: thanks
21:54:25 <grimondo> Hi - anyone got any good information about how to read the clock frequency as a guest in qemu+kvm? Most of the documentation I see refers to cpuid flags that don't get exposed in qemu. How is this commonly achieved in qemu? Thanks!
21:56:26 <isaacwoods> grimondo: you can add some flags and it'll present the leaves, give me a sec
21:57:21 <isaacwoods> I pass -cpu vmware-cpuid-freq,invtsc
22:01:47 <grimondo> huh, let me look into that then - thanks
22:02:43 <bcos_> grimondo: Just assume that CPU clock frequency is a random number in the range from 0 to "host_max_clock - VM_overhead" that will change immediately after you determine what it was
22:06:43 <geist> grimondo: on x86 you can read it out of some hypervisor specific fields in cpuid
22:06:48 <geist> see around 0x4000.0000 leaf
22:07:06 <geist> that's reserved for hypervisor stuff, and there is a set of KVM specific fields that you can read and parse, including the tsc rate
22:08:24 <bcos_> Hrm
22:08:39 <geist> i forget where the docs are, but i think it's in the linux source
22:09:05 <bcos_> Actually; most VMs support "live migration" now so it'd be more like a random number from 0 to "highest_host_max_clock_that_will_ever_exist - VM_overhead"
22:14:39 <grimondo> great thanks geist, bcos_ and isaacwoods :)
22:17:27 <bcos_> ...like, maybe next year Intel will release a CPU that (under some circumstances) can turbo-boost itself up to 10 GHz (but where all instructions take 3 times as many cycles so it's about the same as a 3.333 GHz now); and someone will migrate your VM to one of those..
22:18:32 <bcos_> :-)
22:26:26 <griddle> How should I go about implementing nested fs mounts
22:27:02 <griddle> Should I just have a list of mount points to filesystems, then when you want a file you search that list to find the filesystem, or should it be a tree where a filesystem contains another filesystem
22:44:50 <mrvn> griddle: yes
22:45:03 <mrvn> except s/list/hashtbl/
22:45:44 <mrvn> Having search and list in the same sentence usualy means you are doing it wrong.
22:46:32 <mrvn> Note: Having a list of mounted filesystems wouldn't be nested
23:04:04 <mrvn> bcos_: doesn't the hypervisor send some signal when the VM is migrated?
23:15:03 <bcos_> mrvn: Not sure, maybe - would be unlike a real computer that it's trying to emulate though
23:26:36 <Nuclear_> mrvn: aphorisms are always wrong :)
23:27:24 <Nuclear_> it all depends on the amount of elements you expect
23:27:50 <Nuclear_> searching a list of mounts is absolutely fine. you'll never have to search more than a few tens of elements at most
23:28:09 <Nuclear_> and closer to 2-5 elements in the common case
23:40:23 <isaacwoods> hmm, does anyone know the llvm inline assembly (on intel mode) equivalent of `mov r10, [gs:0x8]`