Search logs:

channel logs for 2004 - 2010 are archived at ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

Tuesday, 24 January 2023

02:39:00 <wxwisiasdf> Hiiiiii
02:39:00 <wxwisiasdf> today is the day we consume RISCV 64 and embrace the greatness of RISCV 128
03:59:00 <mrvn> You went from riscv 32 to riscv 64 and it wasn't enough. What makes you think doing the same again will be any better? Come one, go up to the next operand. 64 * 64 = riscv 4096
03:59:00 <mrvn> 64^2
04:44:00 <wxwisiasdf> mrvn: riscv 96-bit
05:31:00 <geist> i think there is a prototype riscv128 in work though, i should dig up infos on it to see
06:02:00 <sham1> What reason would there be
06:03:00 <sham1> What would be the use of a 128 bit ISA? I mean, okay, arithmetic, but other than that
06:04:00 <geist> well, the arithmetic could be a thing
06:06:00 <moon-child> I thought it was just about addr space
06:06:00 <moon-child> I mean, you could do just 128 bit arithmetic and smaller address space. That would be fine imo
06:06:00 <moon-child> (though not too useful in practice--multiword arithmetic is fine when you need it)
06:07:00 * geist nods
06:09:00 <geist> but a 64bit aspace we're fairy close to exhausting in some extreme situations, so i can see extending that out in a natural way to be a thing to consider
06:15:00 <moon-child> which conditions?
06:15:00 <moon-child> I mean, you could have more than 2^64 bytes of data
06:15:00 <geist> big ass machines
06:15:00 <moon-child> but it's not clear to me that you can practically exceed a 64-bit address space
06:15:00 <geist> also mapping large storage things into the aspace
06:16:00 <moon-child> and you don't want to double the size of your regular pointers
06:16:00 <geist> it's already enough that arm and x86 are extending from 48 to 57, etc
06:16:00 <AttitudeAdjuster> moon-child: bring back weird segment addressing of the old days maybe?
06:16:00 <geist> i'm not saying it's something anyone needs right now, but in 10-20 years easy
06:16:00 <AttitudeAdjuster> 16bit segment pointer with 64bit addr pointer
06:16:00 <moon-child> that's my point; the considerations for large storage things are different than for main memory. And doubling the size of all pointers seems like a bad tradeoff
06:16:00 <moon-child> AttitudeAdjuster: pls
06:16:00 <geist> so the riscv folks at least left in a nice forward compatibility mechanism
06:17:00 * geist shrugs
06:17:00 <AttitudeAdjuster> moon-child: fine i'll see myself out :'(
06:17:00 <AttitudeAdjuster> jk
06:19:00 <moon-child> I wonder how many captchas on shady sites are fronts for captcha solving services
06:19:00 <moon-child> considering they get to sell a solved captcha and still get the benefit of a regular captcha
06:47:00 <sham1> moon-child: doesn't even need to be a shady site. reCaptcha and thus nowadays Google does it even now to get training data
06:48:00 <moon-child> obviously. That's different
07:20:00 <epony> that implies that GOOG is not a shady business operaton.. but it is
07:21:00 <epony> nothing that happens in the inside is well understood and verifiable or validity evaluated by the public, it does some things that people speculate about and that's it
07:24:00 <epony> the primary purpose of that is to rate limit the concurrency overloaded (enum) n:M (many) problem of servers that everyone uses, but that "concentration" is not really natural or meaningful, it's artificial (and not very intelligent)
07:28:00 <epony> it only obstruct regular usage people, not intentional violators of policies and limitations, nor business nor criminals, not mechanised and serviceable solvers and bypasses, as with copy protection and copyright and patents (and other "intellectual property") in general.. and GOOG steals secrets from your cornputers, that's why it's banned in research and development institutions and facilities outside USA (for example in German universities and other places)
07:28:00 <geist> can you just stop
07:28:00 <epony> yes
07:29:00 <geist> then please do
07:29:00 <epony> ok
08:15:00 <dinkelhacker> does anyone know how to make qemu start at EL3?
08:17:00 <dinkelhacker> nvm, found it -machine secure-on,virtualization=on
08:18:00 <geist> bingo. yep
08:18:00 <geist> aso means it wont emulate PSCI or whatnot, that's now your job (if you want to)
08:20:00 <dinkelhacker> as I don't know what it is I think I don't need it right now :D
08:22:00 <geist> yeah, if you just use virtualization=on you start at EL2 though
08:22:00 <geist> with PSCI emulated at a pseudo EL3
09:57:00 <ddevault> is this slide comprehensible
09:57:00 <zid`> not really
09:58:00 <zid`> Usually you'd put blank lines in and make a sort of diagram showing the switches
09:58:00 <ddevault> blank lines?
09:59:00 <zid`> making y the time axis
09:59:00 <ddevault> Y is the time axis here
09:59:00 <zid`> so like your (blocked) lines
09:59:00 <ddevault> but, hm
09:59:00 <ddevault> maybe a table is better than an enumeration
09:59:00 <zid`> your thing does two things at the same time on multiple rows, so it isn't cpu time on y
10:00:00 <ddevault> fixed some of the timing issues
10:00:00 <ddevault> could be multiple cores, the key is not CPU time but task states
10:00:00 <zid`> I'd say just outright remove (blocked)
10:00:00 <zid`> it's just making the screen busier
10:00:00 <zid`> -- at best
10:01:00 <zid`> I'd get rid of line 12 for similar reasons
10:02:00 <zid`> and is task 1 line 6 doing anything?
10:02:00 <zid`> seems like it could be folded into 5, and remove another 'two things on same line' case
10:02:00 <ddevault> latest
10:02:00 <ddevault> not sure what these numbers refer to after several edits
10:02:00 <zid`> same
10:05:00 <ddevault> any better?
10:05:00 <zid`> much nicer
10:05:00 <zid`> I prefer the old colours I think though
10:06:00 <zid`> no idea what they were trying to express
10:06:00 <zid`> but they were prettier
10:06:00 <ddevault>
10:06:00 <ddevault> orange is kernel, black is userspace
10:06:00 <ddevault> to be explained in narration
10:12:00 <ddevault> here's the whole slide deck, still not done expanding it for the full hour slot
10:16:00 <dinkelhacker> So if one compiled with -pic for pa space and you switch to va space you can't just write the va offset to the pc and sp, right? I mean it works as long as you don't have any static function pointer arrays which will contain the pa addresses...
10:19:00 <zid`> well if it's pic it's pic
10:19:00 <zid`> if it's not pic it's not
10:19:00 <zid`> tautology best ology
10:20:00 <zid`> If it's PIC, you can.. position it wherever you want, if it's not, you cannot
10:24:00 <dinkelhacker> I'll have to check later but I think I compiled with -fpic
10:36:00 <dinkelhacker> zid`: even if a global array contains pointers to functions? I mean the addresses stored in memory can only be one value?
10:39:00 <zid`> you'd need to process the GOT for that
10:39:00 <zid`> and do relocations
10:43:00 <dinkelhacker> Hmm.. seems like it would be much easier if the bootloader already sets up the vaspace and you directly compile the kernel for that?
10:44:00 <zid`> I get it easy because I use two binaries
10:44:00 <zid`> I turn the mmu on and jump to the pre-prepared kernel binary built to run at a specific VA
10:44:00 <zid`> achieveable with a linker script fine though
10:44:00 <zid`> even as a single binary
10:46:00 <dinkelhacker> as a single binary? How? Tell the linker that this one part of the code is at pa and the rest at va?
10:46:00 <zid`> it's just two sections with two different virtual addresses
10:47:00 <zid`> . = 1M; .text.low : { bootstrap.o } . = -2GB; .text.high : { kernel.o } or such
10:49:00 <dinkelhacker> bootstrap.o would be at a physical address, then you turn on the mmu und jump to kernel.o which is at a virtual address? I don't get the "two different _virtual_ addresses part.
10:50:00 <zid`> VA = PA
10:50:00 <zid`> you can still consider it a virtual address
10:50:00 <zid`> it's just identity mapped until the mmu is on
10:50:00 <zid`> your code doesn't give a shit about the physical address, just which virtual address things are visible through
10:51:00 <dinkelhacker> okay but wouldn't the image grow if you they are far apart?
10:51:00 <zid`> We're only changing the virtual addressing
10:52:00 <zid`> the physical is still the load address of the ELF (1MB for me, text.low would be at like 0x1001000 and text.high would be at 0x1002000)
10:53:00 <dinkelhacker> and what exactely tells the linker that you are changing the virtual address?
10:54:00 <zid`> . =
10:54:00 <zid`> I made a test setup I can show you
10:58:00 <zid`>
10:58:00 <bslsk05> ​zid/test_va - Example (0 forks/0 stargazers)
10:58:00 <zid`> There
11:00:00 <zid`>
11:00:00 <zid`> f() and g() both know which address they will be running from, as shown by the disassembly
11:02:00 <zid`> you can also use AT() to disjoint what ends up in the program headers, if needed
11:02:00 <zid`> or >
11:04:00 <dinkelhacker> thx! I'll take a look. I thought I did that at some point and it ended up groving my image a lot. But now that you explained it I don't kow why it should.
11:05:00 <zid`> you did . inside the {}
11:05:00 <zid`> so you had 'start of section at x, end of section at y'
11:06:00 <zid`> so it had to pad it
11:06:00 <zid`>
11:07:00 <zid`> That's a weird binary that says .text.low will be in physical memory at 10M but expects to run from 1M, and .text.high will be in physical memory at 20M but expects to run at 128M
11:08:00 <zid`> I have a 1M = 1M, and a 1.1M = 510TB for my acutal thing, the 1M=1M low code runs with paging disabled, I use it to set up the 510TB -> 1.1MB mapping, then jump to 510TB
11:08:00 <dinkelhacker> oh okay! I think I got it
12:30:00 <dinkelhacker> zid`: thx, btw ;)
12:35:00 <ddevault> final slide deck
13:11:00 <dinkelhacker> zid`: ok so I've actually done it like you mentioned. But when I create a binary I used objcopy -O binary out.elf out.img. The -O binary makes it bigger actually
13:13:00 <dinkelhacker> at least when I have sections with addresses far apart. Without that my binary is actually smaller (30k instead of 250k)
13:22:00 <heat> dinkelhacker, how does your linker script look?
13:23:00 <heat> for a regular elf if you start jumping around the vaddr when objcopying to binary you are forced to have padding
13:24:00 <heat> so PHDR [1MiB, 2MiB], PHDR [4MiB, 4MiB + 4] will objcopy to ~3MiB + 4 bytes
13:25:00 <heat> sorry, not vaddr but probably paddr
13:27:00 <zid`> unrelated
13:34:00 <zid`> honestly the phys field in an elf loader is *incredibly* rarely useful
13:37:00 <dinkelhacker> heat: it looks like so
13:37:00 <bslsk05> ​ ENTRY(_start)__stack_core_0 = 0x160000 - 0x10000;__stack_core_1 = 0x1600 -
13:39:00 <dinkelhacker> so if I `obcopy -O binary` that I get roughly 30k. Without -O binary I have 250k.. just tried running that on the pi which did not work
13:41:00 <zid`> show readelf -l
13:41:00 <zid`> end is a bit of a mess btw
13:42:00 <dinkelhacker> I mean probably because without that its just an ELF file right? The pi expects a binary?
13:42:00 <zid`> . = align(4096); . = align(4096); bss_end = .; end = .;
13:42:00 <heat> I don't know. maybe?
13:42:00 <heat> they usually expect a flat binary
13:42:00 <heat> but idk about the pi
13:42:00 <zid`> idk what pi expects, qemu can probably deal with elf at least
13:42:00 <zid`> but, show readelf -l
13:42:00 <heat> btw, let me guess, your elf has debug info/syms
13:42:00 <heat> :))
13:43:00 <zid`> /DISCARD/ ho
13:44:00 <heat> btw, quick linker script tips: you can ALIGN(4096) when declaring your sections (like .text ALIGN(0x1000) : ...), you should do *(.data*), *(.text*) because the compiler sometimes generates stuff like, etc
13:45:00 <zid`> .text.startup
13:45:00 <zid`> is a classic
13:45:00 <zid`> and arm type devices always have a bunch of weird shit
13:45:00 <dinkelhacker> yeah qemu can but not the pi.. so that's my problem I can't compile it in a way where I have some code in the pa space and some in the va space to get around the problem I head when switching to va space
13:45:00 <zid`> like .data.constpool.rel8
13:45:00 <zid`> dinkelhacker: readelf -l plskthx
13:45:00 <heat> dinkelhacker, why can't you
13:46:00 <zid`> You just need to make the rom be what the elf would be sans header, which will probably just be.. to do nothing besides move . around
13:47:00 <dinkelhacker> heat: bc. the binary whould be huge?
13:47:00 <heat> it would not
13:47:00 <zid`> no, you're confusing file offsets with virtual addresses
13:47:00 <heat> you just need to do it properly
13:47:00 <zid`> file offsets should be linear and packed, you use the mmu to map some of that file into memory at high addresses
13:47:00 <heat> ELF supports vaddr != paddr
13:48:00 <zid`> paddr doesn't even matter here, heat
13:48:00 <zid`> we won't be using a physical loader for the elf
13:48:00 <heat> in the linker script you can use AT(...) to set up the paddr for your sections
13:48:00 <zid`> if paddr mattered, which it won't
13:48:00 <heat> wdym "we won't be using a physical loader for the elf" ?
13:48:00 <zid`> elf will be flashed to a rom or whatever
13:49:00 <zid`> nobody is then going to 'load' the elf section by section into physical memory
13:49:00 <zid`> it'll get splatted there in an -O binary blob
13:49:00 <clever> but objcopy to .bin, uses paddr rather then vaddr when laying out sections
13:49:00 <zid`> which is why you should ignore it
13:49:00 <clever> so you may have a gig of gap in the vaddr, but no gap in paddr
13:49:00 <zid`> just shove everything into .text starting at 0 if you're blobbing
13:49:00 <heat> sure, but you need to do it properly to get a usable ELF you can easily objcopy or use for debugging, etc
13:50:00 <zid`> ignore paddr, let the linker sort it out
13:50:00 <zid`> yea that tracks
13:50:00 <clever> zid`: the difference matters most in XIP targets, where you want the linker to put .data in ram, but objcopy to put .data into the ROM with .text
13:50:00 <dinkelhacker>
13:50:00 <bslsk05> ​ Elf file type is EXEC (Executable file)Entry point 0x160000There is 1 progra -
13:50:00 <zid`> clever: that's a loader
13:51:00 <zid`> jesus lol
13:51:00 <zid`> oh, -ffunction-sections?
13:51:00 <heat> what the fuck
13:51:00 <heat> why are you putting everything in one phdr?
13:51:00 <zid`> I mean, that's what I do, unless like you said, I need to run it through other tools
13:51:00 <zid`> like deboogers
13:51:00 <clever> looks like the linker script didnt merge .text.* into .text
13:52:00 <heat> it did not
13:53:00 <zid`> but I use the wildcard :p
13:53:00 <heat> dinkelhacker, btw your load address is bogus
13:53:00 <zid`> .text : { *.o (.text*); }
13:53:00 <zid`> --wide exists also btw
13:53:00 <dinkelhacker> guys I can't follow anymore >D
13:53:00 <kaichiuchi> hi
13:53:00 <zid`> sections go in, sections go out
13:53:00 <heat> ok everyone shut the fuck up
13:53:00 <heat> including kaichiuchi
13:53:00 <heat> fuck you
13:54:00 <kaichiuchi> fuck you too
13:54:00 <zid`> oh I have him ignroed, makes sense
13:54:00 <heat> <3
13:54:00 <kaichiuchi> zid`: me!
13:54:00 <kaichiuchi> ?
13:54:00 <heat> dinkelhacker, pressing concerns: your load address is nothing a rpi will ever load
13:54:00 <zid`> pfft I checked logs he was being fine
13:55:00 <heat> dinkelhacker,
13:55:00 <bslsk05> ​ documentation/boot.adoc at develop · raspberrypi/documentation · GitHub
13:55:00 <kaichiuchi> wonder why zid` would ignore me
13:56:00 <kaichiuchi> i’d love to target an OS to rpi
13:56:00 <heat> dinkelhacker, I've also heard "0x80000 for older 64-bit kernels ("arm_64bit=1" set, flat image)"
13:56:00 <gog> hi
13:56:00 <kaichiuchi> hi
13:56:00 <heat> hell
13:56:00 <gog> did i mss some darama
13:56:00 <heat> no
13:56:00 <kaichiuchi> no
13:56:00 <gog> boring
13:56:00 <clever> heat: you can also just set kernel_address= in config.txt to force a certain load addr, as long as it doesnt conflict with other parts
13:57:00 <heat> dinkelhacker, TLDR your load address makes no sense and that would explain it if your thingy doesn't work
13:57:00 <heat> clever, any insight into the load address spaghetti fuckery?
13:57:00 <clever> i would need to see the linker script
13:57:00 <dinkelhacker> heat: The binary the pi will load is a small bootloader I have on the sd card which allows me to send the actualy binary via uart. However, this bootloader has the some load address. I mean, the pi just loads the binary to address 0x80000 and jumps to it.
13:58:00 <dinkelhacker> and it works fine ... i don't think the load address matters at all for the pi?
13:59:00 <clever> dinkelhacker: its more, that if your not writing PIC code, and your binary is loaded to a different address from where you linked it, things malfunction in fun ways
13:59:00 <heat> yes, the address you link your binary to run at needs to more or less match or here be dragons
14:00:00 <clever> but if the bootloader is loading your binary to 0x160000, it should be fine
14:00:00 <heat> s/more or less//
14:01:00 <dinkelhacker> yeah that is what I'm saying. The bootloader is linked to 0x80000, the pi laods that and executes it and loads the binary to 0x160000
14:01:00 <sham1> PIC without IP-relative addressing seems "fun". I hope that ARM has that
14:03:00 <clever> dinkelhacker: what is not working?
14:04:00 <dinkelhacker> but I still don't get how i should have one linker script where one code section is at 0x160000 and one at 0x40000000, objcopy that to a flat binary without it not beeing like 1GiB in size?
14:04:00 <clever> dinkelhacker: thats what AT and paddr is for, to tell objcopy how to layout the things in the .bin, something else (mmu or memcpy) then has to move them to the "right" addr later
14:06:00 <heat> sham1, i think arm and riscv are mostly PIC
14:07:00 <clever> ive recently been looking into the encoding more, `b label` is always PC-relative, but bits0/1 of the addr are missing, because the target must be 32bit aligned
14:07:00 <clever> but if your jumping to something that might be thumb, you need `bx r0`, and now you need to get the addr into r0 first, `ldr` is typical, but thats not usually PIC
14:08:00 <clever> and i vaguely remember an `adc` opcode, that is basically just `r0 = pc + offset`
14:08:00 <heat> dinkelhacker,
14:08:00 <bslsk05> ​ Onyx/linker.ld at master · heatd/Onyx · GitHub
14:09:00 <heat> this linker script has code at 16MiB and -2GiB (almost +256TiB)
14:09:00 <heat> as you may guess, I don't get a 256TiB blob :))
14:10:00 <dinkelhacker> I thought AT was irrelevant? o.O
14:10:00 <dinkelhacker> but that makes a lot more sense ^^
14:10:00 <clever> objcopy uses the paddr, AT sets the paddr
14:11:00 <dinkelhacker> Ok... that makes sense... THANK YOU
14:11:00 <sham1> I dislike this immensely. Why would you physically link your kernel at 16MiB heat
14:12:00 <sham1> Just make a separate thing that puts your kernel at -2TiB vaddr from the outset
14:21:00 <zid`> 16MB gives you loves of space for activities underneath
14:22:00 <zid`> lots
14:22:00 <zid`> like, 128 stacks
14:25:00 <clever> dinkelhacker: also, a handy trick, if you pass qemu a .elf file, it will respect the load addresses (i forget if its paddr or vaddr) and i think the entry-point
14:25:00 <clever> dinkelhacker: so you could skip your bootloader when in qemu
14:26:00 <dinkelhacker> I do that.
14:42:00 <dinkelhacker> how do you actually set the load address?
14:43:00 <clever> dinkelhacker:
14:43:00 <bslsk05> ​ gba-template/linker.ld at master · cleverca22/gba-template · GitHub
14:43:00 <clever> this defines various regions of memory that the linker should know about
14:43:00 <clever> and the > later on, says which region a section belongs in
14:44:00 <clever> .data has a vaddr within iwram, but a paddr within rom
14:45:00 <clever> any time c/asm refers to a symbol in .data, it will get the vaddr of the symbol
14:45:00 <clever> but objcopy -O binary, uses the paddr
14:47:00 <dinkelhacker> So my load address was weird because I never set a paddr in the linker script?
14:47:00 <clever> there was probably a default load addr somewhere
14:48:00 <clever> or your using that trick others do, where they just shove 16mb of zeros into the .o file, via .space
14:48:00 <clever> and praying it all lines up
14:48:00 <dinkelhacker> where could that default load address be?
14:49:00 <clever> somewhere in the binutils source
14:49:00 <dinkelhacker> kk
14:50:00 <dinkelhacker> and that load address whould be normaly used by a proper loader?
14:51:00 <clever> when using a .bin file, the load address is basically lost
14:51:00 <clever> objcopy just gives you a binary, that spans the lowest addr to the highest addr
14:51:00 <clever> and its your responsibility, to ensure its loaded at the addr the linker was expecting
14:58:00 <clever> dinkelhacker: another option, is to just implement elf in the bootloader, and send it that
14:58:00 <clever> then the bootloader will respect the elf headers
14:58:00 <dinkelhacker> Yeah that lines up with what I knew. I think I just got completely confused bc. I didn't know about about the paddr/vaddr/objcopy thing.
14:59:00 <dinkelhacker> Maybe once I have usb running. sending more than a couple of KB via uart is so slow
14:59:00 <dinkelhacker> heat: thx for the link of your linker file. That helps!
15:00:00 <zid`> Imagine having more than a couple of kb of project
15:00:00 <clever> i implemented xmodem a while back, for loading an entire .elf
15:00:00 <clever> it wound up taking 2 minutes to load
15:00:00 * zid` hides his 3.2MB kernel image
15:00:00 <clever> so i went back to using the official netboot
15:00:00 <zid`> It's water weight I swear
15:00:00 <clever> dinkelhacker: which reminds me, you can just have your rpi boot kernel.img over tftp, 100mbit or more!
15:00:00 <zid`> It *might* be the giant background bmp.
15:01:00 <sham1> I wouldn't imagine XModem being particularly fast
15:01:00 <dinkelhacker> zid`: I bet you don't send it via uart^^
15:01:00 <zid`> who needs a uart when you have have a background bmp
15:01:00 <clever> sham1: yeah, all it added was error detection and retry
15:02:00 <clever> i was also only running at 115200 baud
15:02:00 <zid`> lto breaks --wide even, stupid lto
15:02:00 <clever> but i have ran at 1,000,000 baud before, and could have tried that
15:03:00 <zid`> .gnu.lto_ayame.0.9b1d301769837a9b
15:03:00 <zid`> good section name
15:05:00 <sham1> Thanks .gnu
15:09:00 <clever> dinkelhacker: have you looked into the netboot on the pi yet? it works on every model
15:09:00 <clever> (that has ethernet)
15:10:00 <dinkelhacker> no i haven't
15:10:00 <clever> it lets you just throw start(4).elf + kernel.img onto a tftp server, whack reset on the pi, and boom, its running
15:10:00 <clever> no need to swap SD cards, no need to wait on uart
15:11:00 <dinkelhacker> seems like I tend to set life difficulty to `hard`....
15:11:00 <dinkelhacker> "wack reset" please don't tell me it has a reset button...
15:11:00 <clever> it has a reset pin
15:12:00 <clever> in the past, ive wired it to a giant arcade console button
15:12:00 <clever> so i can just smack it every time the build is done
15:12:00 <clever> but lately, ive wired reset to a pin on my uart adatper
15:12:00 <dinkelhacker> I've setup an interrupt on one of the gpios which i trigger through openocd and than let the watchdog timeout
15:13:00 <clever> openocd could also just halt the arm, and then write to the watchdog
15:14:00 <clever> and with the arm halted, it cant fight back!
15:14:00 <dinkelhacker> funny you say that ... i realized that today and tried which seems to segfault my openocd version
15:15:00 <clever> that sounds like a bug in openocd
15:15:00 <clever> also beware of the arm mmu, you might need to turn it off, if your not sure where the mmio is mapped
15:17:00 <clever> also, with just 3 opcodes (and knowing which registers can be clobbered), you can write a single byte to the uart
15:17:00 <dinkelhacker> Follow up question on the linker topic: So i get now how i can compile the code so that one portion uses pa and the other va. If one version uses a function that is normally in the other world that won't work? Or will it with PIC?
15:18:00 <clever> in the past, ive made a putc ASM macro, so i could just print a char anywhere, to debug things
15:18:00 <zid`> It's all VA.
15:18:00 <zid`> Just sometimes the mmu is disabled, such that PA=VA
15:18:00 <zid`> (or identity mapped)
15:18:00 <dinkelhacker> okay right
15:19:00 <clever> the linker always acts on vaddr, and assumes the vaddr is always right
15:19:00 <clever> so when the mmu is off, you need to ensure the binary is loaded to that vaddr (or exclusively use PIC asm)
15:19:00 <clever> when the mmu is on, you then need to ensure the binary is mapped to that vaddr
15:20:00 <clever> 2023-01-24 10:09:07 < heat> this linker script has code at 16MiB and -2GiB (almost +256TiB)
15:20:00 <clever> this example, has 2 chunks of code, a pre-mmu code, with an addr that is in valid phys memory
15:20:00 <clever> and some post mmu code, that lives at the top of the virt addr space
15:21:00 <dinkelhacker> and you can't call the post mmu code from the pre mmu code
15:21:00 <clever> if the mmu is on, and youve mapped both to the respective addresses, you can call back and forth
15:21:00 <dinkelhacker> yeah of course
15:21:00 <dinkelhacker> well that might be the way to go
15:21:00 <clever> but typically, the pre-mmu half is only mapped for a short time, until you jmp to the post-mmu code
15:21:00 <clever> then the pre-mmu half is discarded
15:22:00 <zid`> va -> what address I want to jump to to run this code
15:22:00 <zid`> if your mmu is off at the time, that locks you into "it has to be the same as the physical address it is loaded to", if it's on, it can be whatever you like
15:23:00 <zid`> You know what the case is
15:23:00 <dinkelhacker> Right. So the first thing the pre-mmu part would do is map the post-mmu part. Now nothing can go wrong at this point and you can jump wherever. After that I jump to post-mmu code and disable the pre-mmu mapping
15:24:00 <dinkelhacker> Is that more or less what you would do?
15:24:00 <clever> yep
15:24:00 <clever> yep
15:24:00 <zid`> <zid`> I have a 1M = 1M, and a 1.1M = 510TB for my acutal thing, the 1M=1M low code runs with paging disabled, I use it to set up the 510TB -> 1.1MB mapping, then jump to 510TB
15:24:00 <zid`> and we're full circle again :p
15:29:00 <dinkelhacker> Yeah.. I'm a bit slow today.. woke up at 5 bc our central heating died
15:44:00 <mrvn> dinkelhacker: I found it becomes far easier to understand and implement if you separate the pre-mmu and post-mmu parts fully. Build your kernel to run in virtual address space and make a blob of that. Then make a tiny loader that has a bit of ASM code and the kernel blob and just activates the MMU, sets the page table and then calls into the actual kernel.
15:45:00 <mrvn> there shouldn't really be any shared code between the two.
15:45:00 <clever> that split design also makes it far simpler to have a pre-mmu printf, and a post-mmu printf
15:45:00 <clever> and you can just printf() from either, and it will call the right variant
15:46:00 <zid`> If you're really disgusting it can be the same printf twice
15:46:00 <zid`> using cool ifdefs to stop it including the wrong headers, yum yum
15:46:00 <clever> another option (little-kernel for example), is to hand write the pre-mmu part as PIC asm
15:47:00 <mrvn> not like the pre-mmu stuff should need a full printf. a puts() and put_hex() at most.
15:47:00 <clever> zid`: headers shouldnt matter, it can even be the same printf.o, its purely what the linker script does
15:47:00 <clever> mrvn: that as well
15:47:00 <clever> with LK, the pre-mmu part is as dumb a a brick, and i dont think it even has a stack
15:47:00 <clever> and because its PIC, the load addr can be "wrong"
15:47:00 <clever> and it will just configure the mmu to fix that
15:47:00 <mrvn> that's the ideal case.
15:48:00 <zid`> clever: Depends where your printf is
15:48:00 <clever> yep
15:48:00 <zid`> mine's in basically "string/stdlib except malloc.o"
15:48:00 <zid`> so you'd need to massage the source a small amount with some light ifdefs to stop it trying to pull in the rest of my kernel
15:48:00 <dinkelhacker> mrvn: I was thinking about that but wouldn't I end up with 2 binaries ?
15:48:00 <clever> zid`: ive had trouble getting newlib to work on my latest project, so i just grabbed the old rpi-open-firmware printf
15:48:00 <clever>
15:48:00 <bslsk05> ​ gba-template/xprintf.c at master · cleverca22/gba-template · GitHub
15:49:00 <mrvn> dinkelhacker: sort of. You build kernel.elf -> kernel.blob and that you link into the loader.
15:49:00 <clever> zid`: this basically just turns into a xprintf.o with a .text, and in theory, the linker could then include that in both the pre-mmu and post-mmu binaries
15:49:00 <zid`> I just wrote one, ignore the ghetto as fuck ega text parts :p
15:49:00 <bslsk05> ​ bootstrap/print.c at master · zid/bootstrap · GitHub
15:49:00 <zid`> but I started the kernel one by just copy pasting this file
15:49:00 <dinkelhacker> mrvn: okay so only one binary in the end?
15:49:00 <zid`> so really I could hve just done stupid incestuous linking
15:50:00 <clever> dinkelhacker: yeah, the post-mmu binary gets baked into the second binary
15:50:00 <clever> either with cat, or .incbin
15:50:00 <zid`> lame
15:50:00 <mrvn> dinkelhacker: yes. In many cases you only have the option of a kernel and initrd. With multiboot you can do loader, kernel, initrd, other-blobs, ... but that is rare.
15:50:00 <clever>
15:50:00 <bslsk05> ​ lk-overlay/payload.S at master · librerpi/lk-overlay · GitHub
15:50:00 <clever> dinkelhacker: here is a .incbin example, where i'm taking the objcopy output of another build, and including it into the .rodata
15:51:00 <mrvn> dinkelhacker: On some hardware you even have to attach the initrd to the loader/kernel for a single file alltogether.
15:52:00 <clever> xen under grub, abuses the initrd api, to pass the true kernel to the xen "bootloader" kernel
15:52:00 <dinkelhacker> clever: so you just branch to bcm2835_payload_start and - abrekadabra - you
15:52:00 <dinkelhacker> are in the other binary?
15:52:00 <clever> dinkelhacker: you would want to configure the mmu, so something like -2GiB maps to bcm2835_payload_start
15:53:00 <clever> and then turn on the mmu and jump to -2GiB
15:53:00 <clever> .align can be used, to ensure bcm2835_payload_start is page-aligned
15:54:00 <mrvn> dinkelhacker: in the simplest case the included binary just starts with the entry point and you just jump to it. But you can also have a blob that contains structured data telling you where the the .text, .rodata, .data, .bss section of the kernel is. Where the entry point is. A whole lot of relocation data so you can do address space randomization. But just calling the payload_start is a good begining.
15:59:00 <dinkelhacker> Okay... man I just wanted to run the thing in qemu, which made me realize all kinds of things have just worked by accident because of quirks of the pi and now I'm basically back to the start >D But that's good I feel like the picture gets much clearer.
16:00:00 <zid`> ye I rewrote my boot setup shit several times
16:00:00 <zid`> until I got something I was only vaguely unhappy with
16:00:00 <dinkelhacker> Haha yeah sometimes it's one step forward, 2 miles back
16:02:00 <clever> some things i need to look into in the future
16:02:00 <clever> 1: usb-device bootloader, for the device capable models
16:02:00 <clever> 2: usb-host bootloader, with msd/tftp support
16:02:00 <clever> 3: fixing u-boot
16:03:00 <clever> 4: implementing psci
16:27:00 <dinkelhacker> a lot to do
16:53:00 <mjg> heat:
16:53:00 <bslsk05> ​ Trying Out The BSDs On The Intel Core i9 13900K "Raptor Lake" - Phoronix Forums
16:53:00 <kaichiuchi> you forgot to highlight me as well
16:53:00 <kaichiuchi> :(
16:54:00 <kaichiuchi> since i am a bsd fan
16:55:00 <mjg> kaichiuchi:
16:55:00 <mjg> well in short it is already resolved, just not present in the release they tested
16:55:00 <kaichiuchi> thanks
16:56:00 <mjg> and it was not even a freebsd bug per se
16:56:00 <mjg> even so, makes you wonder how come openbsd did not have the problem
17:03:00 <zid`> That's exactly as working as I expected freebsd to be
17:04:00 <mjg> :)
18:23:00 <ddevault> back to EFI grief
18:29:00 <ddevault> yeah this ain't it
18:29:00 <ddevault> $ git add boot
18:29:00 <ddevault> $ git commit -m "some garbage that doesn't work"
18:30:00 <ddevault> $ git checkout master
18:32:00 <gog> taht's programming
18:39:00 <ddevault> would be nice if someone wrote a good linker
18:39:00 <ddevault> a halfway decent linker that can build hare programs and/or helios is probably only a few weeks of work
18:39:00 <ddevault> hmm...
19:03:00 <zid`> ddevault: Yea I've considered a quick and dirty linker as a fun project
19:13:00 <sham1> Replace the GNU ecosystem from your OS build process one by one
19:13:00 <sham1> Where GNU of course is Giant, Nasty and Unavoidable
19:15:00 <mjg> and BSD is Bad, Stale and Dead
19:17:00 <sham1> Right. That's why we should all just use TempleOS
19:26:00 <heat> sortie, linker when????
19:29:00 <mjg> sortild
19:29:00 <mjg> not a good name
19:29:00 <sham1> sortie-link
19:30:00 <sham1> Could also say that it's an exit of some kind
19:30:00 <mjg> i would totally use Elon Musk linker
19:30:00 <mjg> would probably be named linkex
19:32:00 <heat> sortie-link is very microsoft
19:32:00 <heat> ...perfect for MAXSISTRING
19:33:00 <heat> CONST STATIC MAXSI_STRING gOutputName
19:34:00 <heat> mjg: mjg's object link editor
19:34:00 <heat> mold for short
19:35:00 <mjg> you are just jelly onyx does not run on a toaster
19:36:00 <heat> NetOnyx when
19:37:00 <mjg> here is a historical lolfact concerning netbsd
19:37:00 <mjg> when they decided to larp as a smp-capable os a bunch of code showed up which required the CAS instruction
19:37:00 <mjg> around 2009 or so
19:38:00 <mjg> apparently however the instruction is implemented on *VAX* it sucks terribly over there
19:38:00 <mjg> and some dude started protesting the smp effort becaue ofi t
19:39:00 <heat> they said it runs everywhere
19:39:00 <heat> they did not say it runs everywhere, *well*
19:39:00 <mjg> "of course it ruins netbsd"
19:40:00 <mjg> the official slogan misses a letter by accident
19:40:00 <heat> lol
19:41:00 <heat> still can't believe none of you idiots have /bin/python3
19:43:00 <zid`> I have a /usr/bin/python3
19:43:00 <zid`> does that help
19:43:00 <heat> no
19:43:00 <heat> the bsd idiots don't
19:45:00 <zid`> if you want it in /bin you need to root me first
19:45:00 * zid` passwd -L heat
19:46:00 <heat> i don't run bsd
19:46:00 <heat> what the fuck do you think I am
19:46:00 <zid`> heat did you ever figure out how to use mkisofs
19:46:00 <heat> no
19:51:00 <heat> i can't connect my xbox one controller to linux thru bluetooth
19:51:00 <heat> thank you desktop linux
19:52:00 <heat>
19:52:00 <heat> look at this shit
19:53:00 <mrvn> heat: lib is a link to usr/lib in most modern linuxes so a lot of people have it
19:59:00 <heat> the best part about using linux is that everything is fucking broken
20:00:00 <sham1> It's not broken, if you define it as not broken
20:02:00 <heat> ok so apparently I need to boot to windows to fix this shit
20:02:00 <heat> poggers
20:02:00 <heat> kill me now
20:08:00 <sortie> what u do to our heat
20:15:00 <sham1> Made 'em launch Windows
20:15:00 <zid`> You can't leave a conga line, only form a rival conga line that is in competition with the original
20:19:00 <gog> i run bsd
20:19:00 <gog> just kidding i don't hate myself
20:21:00 <zid`> too busy leading a rival conga line to run bsd
20:42:00 <kaichiuchi> sometimes being a programmer is annoying
20:43:00 <kaichiuchi> definitely feels like you can’t write a hello world without 500,000 lunatics criticizing it
20:45:00 <mrvn> kaichiuchi: you are missing punctuation. :)
20:45:00 <kaichiuchi> :)
20:49:00 <jimbzy> Constructive criticism doesn't bother me.
20:50:00 <kaichiuchi> that’s fine
20:50:00 <kaichiuchi> there’s nothing wrong with that
20:50:00 <kaichiuchi> it’s when you get completely shit on no matter what you do
20:50:00 <kaichiuchi> not that i’m a victim of that
20:51:00 <jimbzy> Yeah, I give those people a standard, ":D" response and go about my business.
20:51:00 <kaichiuchi> but I saw something at work that I did not want to see
20:55:00 <geist> yah also jokingly shit comments bugs me sometimes too
20:55:00 <jimbzy> ?
20:56:00 <kaichiuchi> essentially, there is an intern who is legitimately trying to learn and get better
20:56:00 <kaichiuchi> but his boss is completely shitting all over him
20:56:00 <kaichiuchi> not a good look
20:57:00 <gog> definitely not
20:58:00 <gog> the point of an internship is to learn, not to get beat up
20:58:00 <gog> and if the boss is just beating up somebody who has no power in the arragnement then the boss is a massive jerk
20:59:00 <gog> if the internship is unpaid double my condemnation
20:59:00 <jimbzy> Unpaid internships should be illegal.
21:06:00 <gog> agreed
21:06:00 <gog> they are in many places anyway
21:40:00 <immibis_> while watching emerge update my system I wonder why some kind of throughput scheduler isn't more common. Instead of running `make -j5` the system should have a queue of all remaining work, and it should pick the next item from the list whenever the CPU is idle.
21:41:00 <immibis_> it shouldn't be make's job to guess how many concurrent processes to run. It should queue them all as soon as they are ready to run, and the system decides when to start them
21:42:00 <immibis_> this scales properly when make runs make (or emerge runs make) without the need for a "job server"
21:42:00 <sham1> Wouldn't the kernel in that case count as a job server
21:42:00 <immibis_> recursive make normally uses a "job server" process which just hands out "concurrent process tokens" so that you get 5 concurrent processes instead of 25
21:45:00 <immibis_> sham1: only if you consider it to already be a job server since it already schedules processes
21:45:00 <mrvn> "This is the time when you run."
21:46:00 <immibis_> the time when I run is when a velociraptor is chasing me.
21:48:00 <mrvn> immibis_: If you start a new build whenever the cpu is idle then every time the compiler waits for a file to load from disk a new compiler spawns. YOu end up with all file being build in parallel.
21:49:00 <mrvn> Better would be to put all jobs into a group and always run the lowest PID in running state when a cpu is idle.
21:51:00 <mrvn> Picking the job that will run longest would be even better. Otherwise you end up with all jobs finished except one that takes forever.
21:51:00 <mrvn> and jobs that block many other jobs.
22:10:00 <\Test_User> could you just start the next one ahead of time and wait() for the previous?
22:10:00 <\Test_User> or would that totally break if one of 'em took an absurd amount of time
22:11:00 <moon-child> immibis_: see discussion of a few days ago. Kernel has limited knowledge of what userspace is actually doing
22:11:00 <\Test_User> or actually, have make itself be multithreaded wait()ing on stuff, then fork and start the next when the last ends
22:12:00 <\Test_User> waitpid*
22:13:00 <\Test_User> actually no, generic wait would do from a single thread bc it'd detect if any exit, so yeah
22:14:00 <mrvn> \Test_User: how many can you start before you run out of resouces?
22:14:00 <\Test_User> ...but it should already be doing that, so where's the extra delay..
22:16:00 <mrvn> "start the next one ahead of time and wait()" is kind of what "make -j5" does. Every fork does a read() on the jobserver pipe instead of your wait but that's basically the same.
22:16:00 <mrvn> just fewer resources invested before the read()
22:16:00 <\Test_User> make -j5 runs 5 actively at once though, so more ram eaten
22:17:00 <mrvn> as it should. But all the extra ones wait on a read()
22:17:00 <\Test_User> though yeah... why would read be delaying long enough for an extra thread to make the difference ig isw the quest
22:18:00 <\Test_User> *question
22:18:00 <mrvn> The read blocks till one of the running 5 writes back a token.
22:18:00 <mrvn> Only then the new process starts up and allocates resources.
22:19:00 <\Test_User> and it's not writing as soon as it's done? or...
22:19:00 <clever> mrvn: that jobserver stuff might explain that weird bug ive noticed, where make sometimes hangs
22:19:00 <clever> but its been years since i saw it happen
22:19:00 <mrvn> it is. The only difference is that the resource allocation is after write insted on your start+wait idea it would be before
22:19:00 <clever> if i just whack the process with a non-fatal signal, it unhangs
22:20:00 <mrvn> clever: you should never loose tokens so make should never hang.
22:20:00 <clever> hence it being a bug
22:20:00 <clever> i never got good details on it, because it was so rare
22:20:00 <mrvn> kernel bug then, the read()s should wake up with pending data.
22:20:00 <clever> and now that i mention it, i realize i havent seen the fault in years
22:21:00 <mrvn> clever: did it maybe happen when 2+ processes finished and then you only wake up one read() even though that only processes 1 byte?
22:21:00 <clever> dont remember
22:21:00 <clever> i just know that make had no children, and wasnt using any cpu
22:22:00 <clever> its safe to assume its been fixed by now
22:33:00 <immibis_> \Test_User: starting the next one ahead of time and then waiting, seems equivalent to just running a certain number in parallel, like make already does
22:33:00 <immibis_> yes, RAM usage is a problem
22:33:00 <immibis_> CPU and I/O throughput are in some sense queue-able resources; if they are not available now, you can delay the task and get it later. Memory does not work that way.
22:34:00 <immibis_> of course this is a well-known fact in scheduler design
22:36:00 <mrvn> except it kind of does. you can swap out processes and run fewer compilers in parallel when ram gets tight.
22:36:00 <immibis_> What if the compiler was segregated into input/process/output phases - you could start a new input or output phase whenever the disk drive wasn't busy, and a processing phase whenever the CPU wasn't busy. With limits on the number of pending tasks in each state.
22:36:00 <mrvn> immibis_: use c++. I/O is quite irrelevant then.
22:36:00 <immibis_> you can swap processes out, but it seems slower than not having started them to begin with
22:37:00 <immibis_> mrvn: segregating the I/O phase avoids the problem of starting a new processing phase whenever a processing phase does I/O
22:37:00 <\Test_User> immibis_: having more waiting rather than running means less process switching
22:37:00 <mrvn> immibis_: only when you have to swap. if you have enough ram then running one compiler per core is worth it.
22:37:00 <mrvn> swapping is just to recover when you guessed wrong
22:37:00 <\Test_User> also removes the chance of enough ending at the same time
22:37:00 <immibis_> running more than one compiler per core can be better if they are I/O bound
22:38:00 <immibis_> or rather, partially I/O bound. If they are fully I/O bound you might want to run one per disk drive :)
22:38:00 * immibis_ 's system currently has 7 disk drives attached
22:38:00 <clever> that reminds me, twice now (on both linux and macos), ive seen bugs where not calling fsync on a file, and then copying it with cp, pokes giant holes in the file
22:39:00 <clever>
22:39:00 <clever> the linux case, was a zfs bug
22:39:00 <bslsk05> ​ Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency by behlendorf · Pull Request #12724 · openzfs/zfs · GitHub
22:39:00 <clever> i dont know how macos had nearly the identical bug
22:39:00 <mrvn> In most cases the whole thing is a non-issue anyway. Just run one compiler per core. They have enough ram and all file I/O will just use caches or close enough with ssd.
22:40:00 <immibis_> that's not a terrible heuristic. I tend to configure N+1 parallel processes.
22:40:00 <clever> but in both cases, the hole detection api lied, and then cp copied around the fake hole
22:40:00 <clever> resulting in giant nulls in a file
22:40:00 <immibis_> either way the kernel should still be responsible for the parallel processing limit
22:40:00 <immibis_> or at least for avoiding extra context switches of processes tagged for throughput
22:41:00 <mrvn> clever: asking the FS where holes are and then copying around them is race prone.
22:41:00 <immibis_> if I start 5 compilers on 4 cores, and they all want to use the CPU, suspend whichever one is last, until one of the earlier ones does I/O
22:41:00 <immibis_> copying a file that's currently being written to is race-prone
22:41:00 <mrvn> immibis_: yes, you should have a process group like that
22:42:00 <clever> mrvn: in the zfs case, the problem is that after you close() a file, but it only exists journal, the kernel reports holes where data actually exists
22:42:00 <mrvn> clever: if you don't fsync() then there is no sequence point. So I would say user error
22:42:00 <immibis_> close, or rather munmap, should probably clear up such inconsistencies
22:42:00 <immibis_> if it doesn't I'd say that's a bug
22:42:00 <clever> yeah, fsync or even plain /bin/sync was enough to mask the problem
22:42:00 <immibis_> if you are copying the file while still mapped, that's user erro
22:43:00 <mrvn> "A successful close does not guarantee that the data has been success‐ fully saved to disk, as the kernel uses the buffer cache to defer writes.
22:43:00 <clever> immibis_: in both cases, it occured after the file was close()'d
22:43:00 <mrvn> " just close without sync is not enough
22:43:00 <immibis_> mrvn: but the kernel cache should be consistent
22:43:00 <clever> if you closed the file, then immediately copied with cp, it had chunks missing
22:43:00 <mrvn> immibis_: it should.
22:43:00 <clever> but if you closed the file, `sleep 120`, then cp, it didnt have chunks missing
22:43:00 <immibis_> apparently the ZFS bug is that ZFS did not update holes immediately on close/munmap
22:44:00 <clever> yeah
22:44:00 <mrvn> which is totally fine if the cp is not in the same process
22:44:00 <immibis_> no, it's not fine, because all processes use the same kernel cache
22:44:00 <mrvn> fine as in by specs
22:44:00 <immibis_> close/munmap (whichever one it was) should behave as a sequence point. anything else is crazy
22:44:00 <clever> macos is more of a black box, and bisection pointed to a commit where coreutils had sparse support re-added
22:45:00 <mrvn> immibis_: yeah. but the specs explicitly say it's not
22:45:00 <clever> which implied macos was always broken, and just removing sparse support from cp fixed it
22:45:00 <immibis_> mrvn: the specs are stupid then. It's excusable for cache to not be written back on close, but it's not excusable for the cache itself to be inconsistent
22:46:00 <clever> yeah, i agree with that
22:46:00 <clever> if read() says there is data at a given offset
22:46:00 <mrvn> immibis_: might not be kernel cache but per process IO buffers
22:46:00 <clever> then lseek should not claim there is a hole at that offset
22:46:00 <immibis_> mrvn: per-process I/O buffers after closing and unmapping?
22:47:00 <mrvn> immibis_: sure. they take time to flush
22:47:00 <clever> mrvn: userland buffers where not the issue, it was basically a bash script that did: ghc foo.hs -o foo ; cp foo $out/bin/foo
22:47:00 <clever> and random holes appeared in the file
22:47:00 <immibis_> mrvn: explain where these per-process I/O buffers are implemented?
22:47:00 <mrvn> immibis_: anywhere between your code and the disk.
22:47:00 <mrvn> clever: in that case the process ending is a sequence point
22:48:00 <immibis_> mrvn: and where is that?
22:49:00 <mrvn> immibis_: in hypothetical land
22:52:00 <mrvn> immibis_: close can also fail before data is flushed to the FS.
22:52:00 <mrvn> (but has already closed the FD, so don't close it again)
22:53:00 <mrvn> Fun fact: If close() is interrupted by a signal that is to be caught, it
22:53:00 <clever> but in this case, it hasnt failed, because just running sync in a shell between ghc and cp fixes it
22:53:00 <mrvn> shall return -1 with errno set to EINTR and the state of fildes
22:53:00 <mrvn> is unspecified.
22:53:00 <mrvn> clever: obviously your case was a bug
22:53:00 <clever> yeah
22:54:00 <mrvn> clever: as said the process ending (the shell running waitpid) and starting the cp is a sequence/synchronization point.
22:54:00 <immibis_> there's another OS design problem here about flushing in general: how to square the desire that a process has really finished when it thinks it's finished, with the conflicting desire for efficiency when the file is temporary
22:54:00 <immibis_> when I run `cp -r ~/homework /mnt/usb/` I would like the command to finish when the copying has really finished
22:55:00 <immibis_> but when I run `cp foo.o build/foo.o` I would like the command to finish immediately so the command stream can run ahead. In fact I don't even care if the data is ever on the disk as I can remake it
22:55:00 <mrvn> immibis_: sync on close on removable drives?
22:55:00 <clever> immibis_: in the past, i wasnt aware of how much usb will buffer the crap out of things, and often thought "oh it crashed again" and forcibly remove the usb
22:55:00 <clever> i still dont know why usb lets the dirty memory hit 500mb+, while a hdd doesnt
22:55:00 <immibis_> probably because your hdd is faster to write back
22:55:00 <immibis_> because it's just faster
22:56:00 <mrvn> The "eject USB device" should show a popup with progress bar showing the amount of buffers to be written.,
22:56:00 <clever> immibis_: na, ive seen cp take 10 minutes to run before
22:56:00 <clever> its definitely blocking on the writes, and refusing to get dirty, heh
22:57:00 <mrvn> clever: dirty memory is kind of broken in linux. You get some 30% and then the data is flushed. While that happens you rack up gigabyte of more dirty data for the USB stick without it getting blocked.
22:57:00 <mrvn> clever: but not at first. Takes a few dirty/flush cycles before that happens. At first it blocks future writes correctly.
22:58:00 <clever> ah
22:58:00 <mrvn> Happens with USB sticks or NFS.
22:58:00 <mrvn> Somehow I don't see it with local disks, they might just be fast enough.
22:58:00 <clever> ive not seen it happen on nfs
22:58:00 <mrvn> write a few TB to NFS.
22:58:00 <clever> the cp on the nfs client always blocks for me
22:59:00 <clever> but ive not tried copying TB
22:59:00 <mrvn> Always worked for me for some 100GB and then suddenly it flips and has no limit.
22:59:00 <clever> the nfs server is also configured as async, for that client
22:59:00 <clever> so it should just lie and take everything
22:59:00 <clever> but i have noticed the write speed varies based on free space
23:00:00 <mrvn> that's likely the FS at fault if you are talking >90% full
23:00:00 <clever> 6gig free, out of ~8tb
23:01:00 <mrvn> any reserved free space?
23:01:00 <clever> just the usual zfs slop space
23:01:00 <clever> *looks*
23:01:00 <mrvn> zfs definetly has that slowdown when it gets full
23:01:00 <clever> [root@nas:~]# cat /sys/module/zfs/parameters/spa_slop_shift
23:01:00 <clever> 5
23:01:00 <mrvn> ext3/4 reserves 5% per default that only root can use and that isn't included in the free stats.
23:02:00 <clever> i forget the math, but this tunes how much zfs reserves, so the CoW doesnt hard jam from a full disk
23:02:00 <clever> if i echo an 8 into there, i suddenly have 105gig free
23:02:00 <clever> because i told it to reserve less
23:02:00 <mrvn> both ext and zfs slow down towards the end. zfs gets really slow.
23:03:00 <clever> so df may claim i have 6gig free, but its actually over 100gig
23:03:00 <mrvn> .oO(Gives you time to buy more disks before it fails :)
23:03:00 <clever> in the zfs case, the major slowdown is the spacemap histograms
23:04:00 <clever> and zfs_metaslab_try_hard_before_gang being turned on
23:04:00 <clever> each metaslab (like an ext block group) has its own free space list, and they are rather memory costly
23:04:00 <clever> so zfs only has a few loaded at once
23:05:00 <clever> if zfs_metaslab_try_hard_before_gang is enabled, and zfs cant find a big enough hole, it will "try hard" (load more metaslab spacemaps) to find a properly sized hole
23:05:00 <mrvn> buy more disks and make some big holes.
23:05:00 <clever> without that, it can give up early (faster) and create a fragmented record, which harms performance more down the road
23:07:00 <clever>
23:07:00 <clever> i also wrote a patch to zfs, that lets me generate these graphs cheaply
23:07:00 <clever> if that orange line hits zero, then even with zfs_metaslab_try_hard_before_gang, it will fragment most writes
23:11:00 <immibis_> cp over NFS reminds me of yet another OS design problem which is how to efficiently accelerate things that can be more efficiently done by external hardware or other computers
23:11:00 <immibis_> you could expect to tell an NFS server "copy this byte range to that byte range" without downloading the entire byte range and uploading it again
23:11:00 <immibis_> and maybe NFS has that ability, and maybe it's even supported in cp, but it's all special-cased
23:12:00 <clever> i also have had other fun bombs go off with nfs
23:12:00 <immibis_> there's absolutely no ability for e.g. gcc -E to rewrite the unchanged segments of include files through that special case
23:12:00 <mrvn> immibis_: NFS doesn't but some filesystems have smart links for that
23:12:00 <clever> my server was graphing free disk space, and that involved running df in cron
23:12:00 <immibis_> and it would be completely absurd to expect gcc to write special-case code for it
23:12:00 <clever> the "server" had the laptop mounted over nfs (as an nfs client)
23:13:00 <clever> when i left for a trip, the laptop went with me
23:13:00 <clever> df then hung, because the nfs server was missing
23:13:00 <clever> cron kept forking out new df's, and swap just ate them all harmlessly
23:13:00 <clever> then the laptop returned....
23:13:00 <immibis_> on Windows, that would get you an ERROR_NETNAME_DELETED I think
23:13:00 <clever> every single df, woke up at once, and all demanded a share of the cpu, and ram
23:13:00 <immibis_> the decision of which errors to return to clients and which to attempt to paper over has no universal right answers
23:14:00 <clever> immibis_: thats what the soft vs hard mount flag controls, in nfs
23:14:00 <immibis_> I believe in MS-DOS, you could simulate a dual-drive system with a single drive. When accessing B: after accessing A:, the system would pause the running "process" and ask you to swap disks.
23:14:00 <clever> hard means retry forever
23:14:00 <mrvn> clever: I now that behavior. Takes a while but everything eventually recovers just fine
23:14:00 <clever> soft means give an io error if there is network problems
23:14:00 <immibis_> such emulation seems rather useful in odd cases and anti-useful in othres
23:15:00 <clever> immibis_: ah, i had seen that on YT recently, to get 3 floppy drives working on 1 machine
23:15:00 <mrvn> immibis_: AmigaOS has disk names so you can open "fonts:bla.ttf" and it will acess whatever drive thas the fonts floppy in it or ask you to insert it.
23:15:00 <clever> he had physical switches to re-route the drive select lines
23:15:00 <clever> and enabled that DOS feature, and then re-routed things manually
23:15:00 <mrvn> immibis_: you can even remove a floppy during write operations and reinsert it in another drive and it will just keep going.
23:15:00 <clever> neat!
23:15:00 <immibis_> every problem can be solved by adding more abstraction except the problem of too much abstraction
23:16:00 <immibis_> Linux also has this ability, if you were to set up something to automount floppies, but mount them at consistent paths - but it wouldn't block on access, you'd probably need something like FUSE for that
23:16:00 <clever> zfs can recover from a block device going missing mid-write, but only if it comes back at the same /dev/ path
23:16:00 <immibis_> the Linux behaviour of "an unmounted drive is an empty folder" is not a particularly sensible default
23:16:00 <clever> renaming or symlinks can fool it enough to work, its a limitation of the userland tooling
23:16:00 <mrvn> immibis_: no it doesn't. You can't umount and remount a device and have open files continue to work
23:16:00 <immibis_> it just falls out of the design of how Linux mounts go over the top of existing folders
23:17:00 <immibis_> mrvn: as we see with clever's df thing, hanging the process until the drive comes back isn't always a good idea either
23:17:00 <mrvn> immibis_: starting a cron job again while the previous is still running is just plain broken.
23:17:00 <mrvn> cron should never do that as default.
23:18:00 <clever> systemd timers dont do that!
23:18:00 <immibis_> also not universally true
23:18:00 <clever> because its less of a cron job, and more of a service, that starts (if not already running) on a schedule
23:18:00 <immibis_> and I bet if it was a flag, clever would've had a 50% chance of setting it to the wrong value because why would you even think to consider that?
23:18:00 <mrvn> immibis_: hence the "as default"
23:19:00 <clever> the df cronjob, was part of the cacti polling setup
23:19:00 <clever> but ive since moved to prometheus based graphing, which doesnt spawn a new process on every poll
23:19:00 <clever> so it would never fork-bomb the same way
23:19:00 <clever> more likely to just hang the entire exporter
23:19:00 <immibis_> instead it would just freeze the entire graphing system until the drive came back?
23:19:00 <mrvn> also bad
23:19:00 <clever> yeah, for that one machine
23:20:00 <clever> but not DoS level bad
23:20:00 <mrvn> you want to create a thread per resource so all other graphs still process
23:20:00 <immibis_> what we all want is a highly abstracted system, so everything is very flexible, with no abstractions so everything is very efficient
23:20:00 <clever> and thats why i just soft-mount everything now
23:20:00 <immibis_> the Cheetah/XOK webserver stores your static HTML files as pre-formatted TCP packets on disk
23:20:00 <mrvn> immibis_: sendfile to the rescue
23:20:00 <immibis_> sendfile is not the same level of abstractionlessness
23:20:00 <clever> mrvn: the exporter, is basically just an http endpoint, that returns all of the metrics
23:21:00 <clever> the central graphing server, has http timeouts, so it wont 100% die
23:21:00 <clever> it will just consider that 1 host as down
23:21:00 <immibis_> even being able to "store your HTML as TCP packets" requires cutting through a lot of abstractions and writing code that only works for the specific case of serving static files over TCP
23:21:00 <immibis_> and then you have to rebuild your static pages folder if the link MTU changes
23:21:00 <mrvn> how does that even work? You need the right sequence number
23:22:00 <clever> immibis_: oh, that reminds me, you can configure some http servers, to send a foo.html.gz file, but slap a content-encoding header on it
23:22:00 <immibis_> I assume it filled in the dynamic fields at runtime
23:22:00 <clever> so the client will decompress it on the fly
23:22:00 <clever> that then saves you cpu cycles on the server, having to re-compress the file for every request
23:22:00 <immibis_> some network cards might support TCP Segmentation Offload, and then you have code that not only only works for TCP, but only works for your specific card and DMA controller, but it's very fast because it DMAs directly from the disk to the network card
23:23:00 <mrvn> immibis_: using writev() seems like a better way. Splice the ip packets together from the header and chunks of the file.
23:23:00 <clever> ive looked at the genet (bcm2711 ethernet) driver before, the tx ring is a big array of addr+size+flag sets, and flags include "start of packet" and "end of packet"
23:24:00 <immibis_> depending on the segment size the kernel might do more work to process a writev than it would cost to just copy the bytes
23:24:00 <immibis_> now, if you could store a DMA descriptor chain on disk...
23:24:00 <clever> so scatter-gather dma, is just giving it multiple addr+length pairs
23:24:00 <mrvn> immibis_: splice should be able to directly DMA stuff
23:24:00 <immibis_> I don't know consumer ethernet drivers but I worked on some kind of industrial router and this sometimes involved peeking under the hood. The engine takes a linked list of DMA descriptors.
23:25:00 <clever> so your writev() call, could prepend a few buffers for packet data, and gather-dma pieces out of both kernel and userland ram
23:25:00 <clever> but, if userland modifies the buffer mid-write, your checksums wont be right
23:25:00 <immibis_> you would, of course, want it to gather the same packet header over and over, while gathering different data pieces
23:26:00 <immibis_> maybe you add some kind of modification thing to the DMA chain, telling it how to increment sequence numbers and decrement checksums
23:26:00 <mrvn> clever: that's why my OS doesn't even allow that all. If you write a buffer you give up rights to the buffer. you can't modify it. Having to COW every buffer on write is too much work.
23:26:00 <immibis_> or you hardcode logic in the network card telling it how to generate TCP packets from a plain old data stream
23:27:00 <immibis_> mrvn: how's the overhead of handling many smallish buffers?
23:27:00 <clever> mrvn: yep, that sounds like a valid solution
23:27:00 <mrvn> immibis_: On modern cards you write 64k frames to the NIC and it internally splits it into MTU chunks and generates all the right headers for it.
23:28:00 <immibis_> that would be called TCP Segmentation Offload
23:28:00 <mrvn> immibis_: horrible. 1 page for the message, 1 page for the buffer. Two INVLPGs. I'm not writing my OS to be fast, just simple.
23:30:00 <immibis_> that seems to be a common problem in message passing systems (and what is the difference between a message and a buffer?)
23:30:00 <mrvn> immibis_: and everything is a process. So every subsystem the message passes through is another INVLPG fest. You should not use an external buffer for short stuff. Better to include it in the message itself.
23:30:00 <clever> mrvn: i would say, if the buffer is under some size threshold (maybe under 1 page), just copy it
23:30:00 <immibis_> a silly thought: maybe messages should be passed in XMM/YMM registers
23:31:00 <mrvn> clever: that would require allocating pages inside interrupts. not a possibility.
23:31:00 <clever> or that, include the data in the same page as the message
23:31:00 <mrvn> clever: that's what I do.
23:31:00 <clever> the zfs journal does similar
23:31:00 <clever> for small writes, the data is in the journal itself
23:31:00 <clever> but for large writes, the data goes to its usual final destination, and the journal just holds the pointer
23:31:00 <immibis_> in XMM registers you have 256 bytes, with direct access from CPU instructions, that won't be stomped on by context switching code and in fact not stomping on them makes the context switch *faster*
23:32:00 <mrvn> If you write under 4000 bytes then just include it in the message itself.
23:32:00 <clever> so it can put off the more costly updates, until later
23:32:00 <clever> and the journal is enough of a promise to userland, that the data is secure
23:32:00 <immibis_> mrvn: software interrupts or real interrupts?
23:33:00 <clever> immibis_: that sounds like lazy fpu context switching
23:33:00 <clever> defer the fpu context switch, and set access control registers so it faults upon any access
23:33:00 <mrvn> Another thing I plan to do is to have buffers attached to a message but not map them. You just get a handle for the buffer and can pass that around and if you actually want to access the buffer you have to map it.
23:33:00 <mrvn> immibis_: yes
23:33:00 <immibis_> even lazier. deliberately leaking FPU context between processes as message passing
23:34:00 <clever> mrvn: that sounds like linux dma_buf api
23:34:00 <clever> but in linux, each buffer, is a seperate fd handle
23:34:00 <clever> so you need to be passing whole fd's around, potentially 1-3 per video frame
23:34:00 <mrvn> clever: for me it would be addr+size specifying a bunch of physical pages that then get turned into some VM handle.
23:34:00 <clever> immibis_: heh, thats one way, just read it, and context switch to the right destination
23:35:00 <immibis_> can Memory Protection Keys be used to switch between 16 different processes without changing page tables?
23:35:00 <mrvn> immibis_: ARM has 256 ASIDs you can switch between
23:35:00 <mrvn> x86_64 has 4096, right?
23:35:00 <immibis_> the ideal message-passing context switch is like "ASID = new_proc->ASID; jmp new_proc->MessageReceiver;"
23:36:00 <clever> immibis_: now youve reminded me of the centurion cpu6, it has 16 banks of registers, each with its own mmu config, and can switch between them freely, an irq will also force a switch to a specific set (each irq is bound to a diff one)
23:36:00 <immibis_> mrvn: don't know
23:36:00 <mrvn> immibis_: you still load a new page table but you don't loose any TLB or cache content.
23:37:00 <immibis_> oh well that's good. Last time I knew about context switches, page table flushing was the main overhead.
23:37:00 <clever>
23:37:00 <bslsk05> ​ Instructions · Nakazoto/CenturionComputer Wiki · GitHub
23:37:00 <mrvn> immibis_: with a microkernel you should definetly look into ASIDs.
23:38:00 <clever> i should get off to bed, its getting late here
23:38:00 <mrvn> then you still have time. it's not early yet.
23:38:00 <clever> lol
23:55:00 <kaichiuchi> I cannot believe CMake doesn't have reasonable line break support