Search logs: #osdev - 3 August 2020

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=20&m=8&d=3

Monday, 3 August 2020

00:32:45 <unlord> doug16k: why do you take the hi value of the xsdt address here? https://github.com/doug65536/dgos/blob/master/boot/boottable_bios.cc#L61
00:32:46 <bslsk05> github.com: dgos/boottable_bios.cc at master · doug65536/dgos · GitHub
00:33:02 <unlord> I thought that was specified to be 0 always (so rsdt is in the first 4G)
00:33:09 <doug16k> no
00:33:19 <doug16k> whole point of xsdt is to have a 64 bit address for it
00:33:32 <unlord> https://wiki.osdev.org/RSDP#Xsdt_Address
00:33:33 <bslsk05> wiki.osdev.org: RSDP - OSDev Wiki
00:33:35 <doug16k> it could be above 4GB
00:33:56 <doug16k> it will be above 4G
00:34:06 <doug16k> often
00:34:13 <unlord> really?
00:34:24 <doug16k> yes
00:34:45 <doug16k> same with pci. lots of my drivers see BARs above 4G in qemu
00:35:03 <unlord> I was not planning on enabling long mode
00:35:05 <doug16k> when you boot efi, it stops with the fear and uses high addresses
00:35:24 <unlord> if I boot legacy will it be < 4G ?
00:35:32 <doug16k> in CSM boot or whatever it will almost certainly be < 4G
00:35:36 <doug16k> bios boot
00:35:43 <unlord> okay, I can live with that
00:36:00 <unlord> I will check though
00:37:14 <doug16k> fully expect to see an xsdt on a real machine, even if below 4G
00:37:33 <geist> yah i'd be alittle surprised seeing one above 4GB, but an XSDT sure
00:38:02 <doug16k> in efi 64 bit boot there is no reason to be below 4G
00:38:24 <geist> true, though also no particular hard reason to be above either
00:39:12 <geist> only in that you'd only do it if there was a strong need
00:39:21 <geist> and it's possible the bios code that's filling it in is 32bit, etc
00:39:24 <doug16k> unlord is asking for permission to blindly assume addresses below 4G and just truncate
00:39:30 <doug16k> I am not granting permission
00:39:34 <geist> oh no, please do not :)
00:39:44 <doug16k> :D
00:39:55 <geist> just saying i'd be a teensy bit surprised it's that widespread
00:40:00 <geist> *yet*
00:40:46 <doug16k> you will get addresses above 4G in efi x64 boot. lots of them
00:40:56 <doug16k> if you count the BARs
00:41:13 <geist> yah but given that the bios probably still supports both boots and the ACPi stuff is likely set up in an early stage
00:41:28 <geist> basically you'ed need a hard 64bit UEFI no backwards compat build
00:41:38 <geist> probably you'd see that on servers first
00:41:56 <unlord> wait what?
00:42:07 <unlord> if I'm CPM booting, it should be < 4G right?
00:42:24 <doug16k> yes bios boot will not put it high
00:42:28 <geist> ah
00:42:29 <geist> yah
00:42:34 <unlord> OK, that is all I will assuem
00:43:04 <geist> ie if you're writing 32bit code and booting it then you can assume it's 32bit, because you got booted (unless you somehow dropped out of 64bit boot)
00:43:26 <geist> if you're booting 64bit code from the get go then you shouldn't assume
00:43:45 <geist> well booting 64bit in a 64bit method (ie, UEFI directly into 64bit mode)
00:43:45 <unlord> geist: I'm booting 16 bit code
00:44:14 <adu> 16 bit is where it's at
00:44:17 <geist> yah that would be broken if it booted you in 16bit mode but put the ACPI root out of view
00:44:23 <geist> but. then again who knows
00:44:32 <geist> all that matters is windows boots on it, and ship it
00:44:58 <doug16k> the acpi spec says that firmware should use an xsdt (starting at some ancient version (2?)). you should use the xsdt and just range check that it is below 4G, panic otherwise
00:45:18 <doug16k> should try to use the xsdt*
00:45:35 <doug16k> right?
00:45:43 <doug16k> I remember it strongly urging use of the xsdt
00:45:51 <geist> yah one of those check one first then the other
00:46:07 <geist> possible that you find a broken implementation that doesn't do xsdt because they figure what's the point
00:48:04 <doug16k> xsdt was introduced in acpi 2.0, 20 years ago. year 2000
00:48:08 <doug16k> I see what they did there
00:49:30 <adu> does that mean they are going to release v2.02 soon?
00:49:39 <doug16k> approximately P4 or newer should have xsdt
00:50:54 <geist> yah where i expect things to be weak are random emulators and VMs
00:51:01 <geist> frequently i've observed them really phone it in
00:51:24 <doug16k> yeah qemu guests see a weird patchwork of ancient and extremely modern stuff
00:51:55 <geist> ya and even qemu may get bug reports and someone fixes it. stuff like virtualbox or vmware or whatnot it would only get fixed if there's some business rason for it
00:52:01 <geist> does it boot windows? dont screw with it, etc
00:52:05 <doug16k> somehow it's that old version with some amazing numa locality info or whatever
00:52:27 <doug16k> exactly
00:52:31 <kingoffrance> cpm is back? oh, different cpm :/
00:53:18 <geist> heh yeah my mind thought the same thing
00:53:55 <unlord> will there always be a 1.0 RSDT ?
00:54:02 <doug16k> CPM booting is definitely below 4G. probably below 4K
00:54:10 <doug16k> lol
00:54:38 * doug16k is guessing CPM was a typo of CSM
00:55:19 <unlord> I said legacy boot
00:55:34 <doug16k> CSM is legacy boot, on a modern machine
00:55:45 <unlord> and then I typo'd CPM
00:55:46 <unlord> sorry
00:56:05 <doug16k> a modern EFI machine loads the CSM to provide bios boot apis and behaviour
00:56:28 <keegans> i'm running a linux kvm and need to do some funky virtual->physical mappings with page tables in long mode. i've identity mapped the first 128mb of memory, but I also want one page at 0xFFFFF78000000000 to point to the final page in the 128mb. 0xFFFFF78000000000 is index 1ef in P4 which points to the same P3 in my identity mapping, which means that 0xFFFFF78000000000 will just point to the exact
00:56:30 <keegans> same 128mb identity map, which is not want I want. my solution was introducing ANOTHER P1-3 table and then rigging that up. so now 0xFFFFF78000000000 and the final page in my 128mb point to the same physical address, which is what I wanted. the question is: do I really need this second set of page tables? is there a better way?
00:58:30 <doug16k> you need the one entry in PML4, and then in that new page it points to, a sequence of 32 large pages
00:58:45 <doug16k> oops, one more level between
01:00:36 <doug16k> keegans, you could use recursion
01:01:19 <doug16k> or put another way, you could make the high PD pages share the PT pages from the first 128MB
01:01:33 <unlord> huh, just rev 2 in virtualbox
01:03:14 <doug16k> i.e. need the branch of the tree sticking out of PML4[511] pointing to a PDPT with a single entry pointing to a PD with a sequence of entries sharing the PT from the 1st 128MB
01:03:43 <doug16k> keegans, is that what you mean?
01:04:14 <doug16k> you didn't say whether you used large pages
01:06:52 <doug16k> aliasing a range of memory (making the same physical memory appear at two places at once) can be done by sharing the same page tables for the region, alignment permitting
01:17:52 <ybyourmom> My youtube experience these days is "Let's play Raid: Shadow Legends" and "Sign up to DIDI now!"
01:17:57 <ybyourmom> Fking stop
01:18:03 <ybyourmom> Especially the DiDi thing
01:18:12 <Mutabah> Ads or sponsored?
01:18:33 <clever> ive noticed that the unskippable ad's are getting longer lately, and often 2 in a row
01:18:50 <ronsor> lol raid shadow legends
01:19:00 <ronsor> youtube is getting desperate
01:19:32 <doug16k> they recently went through and enabled interstitial ads on every video
01:19:50 <doug16k> channel owners have to go and turn it off on each one
01:20:24 <doug16k> even if they selected no interstitial ads when they first uploaded it
01:21:02 <doug16k> that's why you see the sudden drastic increase in ads
01:26:10 <clever> ah yeah, i saw one youtuber talking about that upcoming change
01:26:35 <clever> thats also the reason for all of those fishy 10:01 long videos, to get it over the old time limit for interstitial ads
01:30:11 <unlord> nice bochs is ACPI 1.0 and virtualbox is ACPI 2.0
01:30:39 <geist> hah yeah that sidesteps the XSDT issue
01:30:53 <unlord> I get to test both
01:31:07 <geist> yep
01:31:10 <unlord> and virtualbox XSDT is < 4G
01:31:19 <geist> i mean sidesteps it for them. as in it's pre XSDT
01:31:43 <unlord> now to figure out how many CPU's each of these see
01:31:55 <doug16k> weird, I tried moving __initial_stack to 0x20000, and on this instruction "movl $ ___initial_stack,%esp" I get "R_386_16 relocation truncated to fit"
01:32:47 <doug16k> the hell? why R_386_16
01:33:35 <geist> huh that's interesting
01:35:03 <doug16k> comes from here: https://gist.github.com/doug65536/44df60e4814ad2c4cf6d754b44348d69#file-bootfat-ld-L66
01:35:04 <bslsk05> gist.github.com: bootfat.ld · GitHub
01:38:43 <unlord> well, I guess my code needs to handle 64-bit pointers
01:39:00 <unlord> we're little endian, so I should just be able to take the lower 4 bytes
01:39:02 <geist> at the minimum assert or something
01:39:08 <geist> yah, and test the upper is 0
01:41:12 <doug16k> ah my bad. line number was oddly pointing at the mov, when it was another place where I had .hword __initial_stack
01:51:17 <unlord> so this is very interesting
01:54:22 <doug16k> it's interesting to get a 32 bit pointer like 0x20000 to be suitable as ss:sp pair. needs to be ss=0x1000, and sp=0 to wrap to 0x1FFFE
01:57:45 <doug16k> I'm making all my real mode stuff setup a 64KB stack in 0x10000-0x1FFFF region
01:58:19 <unlord> doug16k: so why do you search 0x9fc00 when the EBDA is usually 0x9fc0 ?
01:58:28 <doug16k> ran into an problem with disk I/O using just a bit too much stack so I'm sledgehammering it all the way up to 64KB stack to make that a non issue
01:58:35 <ybyourmom> Mutabah: Ads
01:58:49 <ybyourmom> clever: Yes, way more unskippable ads now
01:59:00 <doug16k> unlord, ebda is above 64KB!
01:59:10 <doug16k> check your numbers
01:59:25 <unlord> ebda is a 2 byte segment
01:59:32 <doug16k> no
01:59:36 <doug16k> it is 1KB
02:00:03 <doug16k> typically the last 1KB of real mode address space below 64KB
02:00:05 <unlord> [0x40E] is a 2 byte segment
02:00:26 <doug16k> yes that is where the bios tells you where to find it
02:00:30 <unlord> https://github.com/doug65536/dgos/blob/master/boot/boottable_bios.cc#L168
02:00:31 <bslsk05> github.com: dgos/boottable_bios.cc at master · doug65536/dgos · GitHub
02:00:41 <unlord> you do both ebda and hardcoded p_9FC00
02:01:18 <doug16k> ah, and I have this too: https://github.com/doug65536/dgos/blob/master/boot/boottable_bios.cc#L14
02:01:19 <bslsk05> github.com: dgos/boottable_bios.cc at master · doug65536/dgos · GitHub
02:01:19 <doug16k> oops
02:01:40 <doug16k> oh I see!
02:01:46 <unlord> doug16k: enjoy having a second pair of eyes on the code :)
02:01:53 <doug16k> if the EBDA isn't at 0x9fc00, I check there anyway
02:02:00 <clever> ybyourmom: another problem/exploit i noticed, if you have 2 ads in a row, and you must watch 5 seconds min each
02:02:01 <unlord> belts and suspenders
02:02:15 <doug16k> just in case line 157 told me crap
02:02:18 <clever> ybyourmom: if you skip the 1st ad, you also skip the 2nd, but if the 1st ad ends, you must watch 5 sec of the 2nd ad
02:02:57 <doug16k> unlord, ah now I really see, line 168: p_ebda ? p_9FC00 : nullptr
02:03:00 <clever> id say thats a bug/exploit, because they dont want you skipping the 2nd ad, but you can
02:03:09 <unlord> doug16k: yeah, you at least don't search twice
02:03:23 <doug16k> if I got a null pointer from 0x40E then I force it to look at 0x9fc00 anyway
02:03:47 <doug16k> checksum should cover my ass
02:03:47 <unlord> doug16k: that isn't what that says
02:04:00 <doug16k> sure it does
02:04:02 <unlord> it says if p_ebda != p_9FC00 you search there
02:04:06 <unlord> not if it is nullptr
02:04:36 <doug16k> the p_ebda one searches it even if it is a nullptr
02:05:09 <doug16k> but later it sees 0 and doesn't look
02:05:27 <doug16k> https://github.com/doug65536/dgos/blob/master/boot/boottable_bios.cc#L32
02:05:28 <bslsk05> github.com: dgos/boottable_bios.cc at master · doug65536/dgos · GitHub
02:06:26 <doug16k> then 2nd search line says, "if ebda wasn't at 0x9fc00, force it to look at 0x9fc00 anyway"
02:06:45 <doug16k> that is MP-tables search though
02:06:48 <doug16k> really old stuff
02:07:06 <doug16k> acpi tables are heavily based on them though
02:07:13 <doug16k> not much effort to support it
02:08:01 <doug16k> for acpi I don't try to force 0x9fc00 to be checked (boottbl_find_acpi_rsdp)
02:08:30 <unlord> doug16k: this is what my code does so far, https://dpaste.com/5GKQBWRA6
02:08:30 <bslsk05> dpaste.com: dpaste: stdin
02:09:50 <unlord> next up is to print all the ACPI tables
02:10:35 <doug16k> looks pretty good
02:13:27 <doug16k> line 55 output could be misleading. 1111:2222 looks like a seg:off pointer
02:13:43 <doug16k> use ' if you meant it as a separator
02:14:08 <unlord> also, should be %08x:%08x
02:14:14 <unlord> you prefer ' as the separator instead of :
02:14:26 <doug16k> well, I prefer not : when it isn't seg:off :)
02:14:35 <doug16k> ' is what C++ picked. good enough for me
02:14:49 <Mutabah> Underscrore?
02:14:59 <unlord> my homebrew printf doesn't even support padding to leading zeros :)
02:19:31 <doug16k> why not: printf("Found 64-bit XSDT %#016" PRIx64 "\r\n", ((uint64_t)rsdp2->xsdt_addr_hi << 32) | rsdp2->xsdt_addr_lo); :D
02:21:43 <unlord> doug16k: isn't that just %08x%08x ?
02:21:48 <doug16k> no
02:21:58 <doug16k> PRIx64 would be "lx" on x86_64
02:22:16 <unlord> maybe you didn't catch that this is my own implementation of printf
02:22:34 <doug16k> yeah, that doesn't mean you don't support it :)
02:22:39 <unlord> hahaha
02:22:44 <doug16k> giving you credit :D
02:23:31 <unlord> this is the entire thing: https://dpaste.com/8BZQW34Z4
02:23:32 <bslsk05> dpaste.com: dpaste: stdin
02:23:32 <doug16k> you should get at least 64 bit hex formatting to work correctly. nevermind the other number bases at first
02:24:01 <doug16k> don't encumber yourself with that highhalf/lowhalf thing over and over
02:24:33 <unlord> I don't care, I will only be CSM so all pointers will be 32-bit
02:24:37 <doug16k> it's easy, no divide needed either
02:24:58 <doug16k> no 64 bit divide needed*
02:25:17 <unlord> no divide needed at all
02:26:06 <unlord> although I guess uitoa has a divide
02:26:28 <unlord> I don't care because none of the printf's will exist when I'm done
02:27:15 <doug16k> done? what does that mean? :)
02:27:25 <unlord> it means the deadline has passed :)
02:28:29 <doug16k> I wish I had powers where I could see into the future and know I won't need to change it or add new capabilities
02:28:41 <ybyourmom> You guys ready for the UFOs
02:28:53 <ybyourmom> https://www.youtube.com/watch?v=FTdCDfEu6x0
02:28:54 <bslsk05> 'Ex-defense official on new details of UFO encounters soon to public' by CNN (00:05:43)
02:28:54 <unlord> this is basically a crazy hack for a glorified programming contest
02:29:01 <doug16k> ah
02:29:22 <ybyourmom> Who tryna clap some alien cheeks
02:30:58 <doug16k> if we met aliens, how long do you think until we murder one of them
02:31:48 <doug16k> first day, week, or month. what's your guess?
02:31:58 <ronsor> first hour
02:31:59 <ybyourmom> You mean, how long until we start intergalactic nuclear wars?
02:32:02 <ronsor> give or take a few minutes
02:32:53 <unlord> ahh, so here is where I get in trouble
02:32:57 <unlord> I need to mmap these addresses
02:32:58 <ybyourmom> But honestly, how cute do you think the green anime catgirl aliens are
02:33:46 <doug16k> unlord, paging on right?
02:34:00 <kingoffrance> ive only seen plan9 and solaris, unaware of any other alien oses
02:34:03 <doug16k> if you identity map you need to make sure you reach high enough
02:34:03 <ronsor> ybyourmom: are you a weeaboo?
02:34:05 <unlord> doug16k: how far down the rabbit hole do you want to go?
02:34:32 <doug16k> unlord, are you in real mode?
02:34:47 <ybyourmom> ronsor: My brain isn't, but my libido is still undecided
02:34:58 <unlord> doug16k: worse, I could be running under a DPMI host
02:35:19 <unlord> I have a DOS COM file
02:35:27 <ronsor> noted...
02:35:30 <doug16k> and you have to tolerate EMM386?
02:35:37 <unlord> although, the DPMI host can do the mmap for me
02:36:06 <unlord> doug16k: assume I don't have an EMM386 for now, I wrote some code that enters protected mode
02:36:13 <ronsor> where's EMM386 and why is it running on a modern OS?
02:36:31 <unlord> ronsor: safely in a VM
02:36:31 <doug16k> ronsor, DPMI is DOS Protected Mode Interface. DOS
02:36:50 <ronsor> but why are you running under DOS?!
02:37:05 <unlord> ronsor: 22:28 < unlord> this is basically a crazy hack for a glorified programming contest
02:37:21 <ronsor> I see
02:37:23 <ronsor> that's cursed
02:37:42 <unlord> everyone needs a hobby
02:38:45 <unlord> wait, these tables are within physical memory
02:38:55 <doug16k> of course
02:38:57 <unlord> n/m, we're good here :)
02:39:19 <doug16k> can't wait to see tables not in physical memory :D
02:40:36 <unlord> so I'm still holding out hope I can make this work inside NTDVM
02:40:53 <unlord> but if not, I'll be OK with that
02:40:54 <doug16k> 32 bit?
02:40:57 <unlord> yes
02:41:05 <doug16k> won't on 64 bit
02:41:07 <unlord> I know
02:41:11 <unlord> no WOWOW
02:41:18 <ronsor> NTVDM too? oh my
02:41:34 <doug16k> 32 bit windows would throw an old lady under a bus if it meant being compatible with one more thing
02:42:12 <unlord> doug16k: I briefly worked at Microsoft in the year 2000 and met some of the people who did terrible things for DOS compatibility to ship Windows 95
02:42:39 <ronsor> remember kids, cli & hlt in userspace is a-ok!
02:44:27 <zid> Yaaay
02:44:36 <zid> I finally got prehistorik man rendering correctly
02:44:39 <zid> the hardest gb game to emulate
02:44:55 <zid> https://cdn.discordapp.com/attachments/705057289966714962/739674997832024114/unknown.png
02:45:11 <clever> zid: have you been following the edge of emulation?
02:46:23 <clever> zid: https://byuu.org/articles/edge-of-emulation/
02:46:23 <bslsk05> byuu.org: byuu.org | An archive of the most important parts of byuu’s personal website.
02:46:35 <clever> zid: he's doing things like adding support for a sewing machine to the emulator
02:46:45 <zid> byuu's dead baby
02:46:52 <zid> he left a message
02:47:04 <zid> he's retiring the nick on every platform that lets you
02:47:15 <zid> and becoming uninvolved with emulation
02:47:20 <zid> apparently he was being harassed
02:47:30 <clever> https://shonumi.github.io/articles.html
02:47:31 <bslsk05> shonumi.github.io: Shonumi: Articles
02:47:37 <clever> dang, by who? why??
02:47:54 <clever> and maybe i'm getting byuu and shonumi mixed up?
02:48:28 <unlord> darn, exception OE
02:51:04 <unlord> it did not like me dereferencing 0x1d50f30
02:54:07 <unlord> okay, that is probably enough for today. To fix this I need to tackle some technical debt in my linker
03:05:35 <unlord> ahh, so I was right
03:05:43 <unlord> I need to call the DPMI host to do the memory mapping
03:06:13 <unlord> doug16k: if you are running your own kernel what does the mmap do ?
03:08:41 <doug16k> in mine I am passing a flag to map a specific physical address
03:08:55 <doug16k> the 1st parameter then becomes interpreted as the physical address
03:09:09 <doug16k> it finds somewhere in virtual address space and maps it in there
03:09:18 <doug16k> returns a pointer to where it placed it in virtual memory
03:10:13 <unlord> but isn't your kernel running in ring 0 ?
03:10:18 <doug16k> yes
03:10:38 <unlord> why can't it just write to the physical address?
03:10:43 <doug16k> doesn't make the page tables mapping that virtual address to that physical address exist by themselves
03:10:58 <doug16k> you can't read or write physical addresses with paging on
03:11:20 <doug16k> you have to map that physical address into page frame(s) and access it through there
03:11:59 <doug16k> the only register that uses physical address is cr3
03:12:16 <doug16k> everything else is looked up in the virtual address space created by the page tables
03:12:47 <doug16k> the content of the page tables is also all by physical address
03:14:18 <unlord> okay, so if I want to use a VESA frame buffer, I'll still needto mmap this
03:15:56 <doug16k> so in my kernel, I might ask it to map 64KB at physaddr 0xD80000000. it finds a range of unallocated address space and chooses 0xFFFF_F00C_0000_4000. updates the page tables so a pointer to 0xFFFF_F00C_0000_4000+n is actually mapped to physical address 0xD80000000+n where n < 64K
03:17:43 <unlord> where does your kernel handle SYS_mmap ?
03:18:05 <doug16k> sys_mem.cc
03:20:55 <unlord> but that just calls mmap()
03:22:33 <doug16k> of course it does
03:22:39 <doug16k> https://github.com/doug65536/dgos/blob/master/kernel/arch/x86_64/cpu/mmu.cc#L1775
03:22:42 <bslsk05> github.com: dgos/mmu.cc at master · doug65536/dgos · GitHub
03:27:32 <doug16k> unlord, in DPMI, you can get control in ring 0, and then your eyes start glowing and you can destroy everything and take over everything if you want
03:27:55 <unlord> doug16k: so with DPMI I think I can just ask it to do the physical to virtual mapping for me
03:28:14 <unlord> and I'm done (still need to remap though)
03:28:33 <doug16k> I thought so but I didn't see that in the API ref that I found
03:30:02 <unlord> doug16k: here is how I do it, https://dpaste.com/7L4JDH4UT#line-144
03:30:03 <bslsk05> dpaste.com: dpaste: stdin
03:33:42 <unlord> doug16k: so, pretend I'm an idiot, I see your mmu has some datastructures, but what does the actual mapping?
03:33:49 <doug16k> nice
03:34:02 <geist> usually there's a piece of code that just abstracts the mmu
03:34:16 <geist> i think doug16k does it in the usual way, as in puts it behind some api
03:34:20 <doug16k> modifying memory structure used by the cpu when doing virtual-to-physical translation
03:34:33 <doug16k> what geist said
03:35:20 <doug16k> you setup a series of pages, linked together in a tree, to sparsely specify what physical memory appears in each page frame
03:35:20 <unlord> I'm getting dangerously close to writing my own DPMI host
03:35:39 <geist> usually it's a nice match for object oriented design: AddressSpace { members: cr3, etc methods: map, unmap, query, etc }
03:35:56 <geist> and then you abstract away the implementation details of the architecture
03:37:53 <doug16k> I did it a bit oddly. I made it so kernel map and user mmap look the same in code, I just added extensions to the flags to allow physmap and specifying memory type
03:38:18 <doug16k> but it behaves like real mmap
03:38:31 <geist> yah i dont think that's particularly odd
03:38:34 <doug16k> i.e. mmap again gets new pages, if not MAP_POPULATE, demand paged
03:38:44 <geist> makes sense to try to reuse the kernel VM and user VM./ that's what i've always done, and are doing in zircon
03:38:57 <geist> works basically the same way in the kernel, you just have a more direct version of the API, via the objects directly
03:39:18 <doug16k> oh good. I thought everybody else made up a set of different apis for kernel and call all that from something :)
03:40:10 <unlord> I see, this code I'm looking at works because it is being run in unreal mode
03:40:26 <doug16k> don't use unreal mode
03:40:32 <doug16k> use real protected mode
03:40:43 <geist> nah, i really learned a lot of design from BeOS which was very regular in the kernel API department. more or less user space syscalls were the same api that kernel used, though there may have been more powerful/less safe/etc apis
03:40:56 <geist> vs something like linux which seems to really go in the other direction in a lot of ways
03:41:27 <doug16k> yeah I just ban all the kernel-only flags if the user bit is set, and I force-set MAP_USER in mmap calls from user mode
03:41:31 <unlord> doug16k: I'm looking for the shortest path (in terms of bytes) that lets me use all the cores
03:41:54 <doug16k> you can just cheat and guess the apic ids
03:41:57 <doug16k> screw acpi
03:42:06 <unlord> doug16k: perfect
03:42:13 <doug16k> if the whole project is a total hack already
03:42:14 <doug16k> :D
03:42:39 <unlord> doug16k: :)
03:42:41 <doug16k> use cpuid to find out how many cpus and even how to calculate their apic ids, then there you go. assume one cpu package and done
03:44:00 <unlord> works for me
03:44:02 <doug16k> you can get the topology info from sufficiently recent cpus with cpuid and just know their ids
03:44:11 <geist> keep in mind it's a big hack, there's no one way to do it for all cpus, but it's kinda fun to be honest
03:44:25 <geist> it's like a big flow chart of 'if this exists then use that method, otherwise fall back to X or Y if vendor A or B'
03:44:35 <geist> which honestly i kinda like writing that sort of thing for some reason
03:44:56 <doug16k> kernels are largely mazes of control flow
03:45:10 <doug16k> not a lot of actual computation
03:45:20 <geist> yah
03:46:33 <ybyourmom> How does somebody submit a pull request where they modify a utility program and it segfaults as soon as it starts executing and they didn't notice
03:46:57 <doug16k> politely
03:47:01 <ybyourmom> lol
03:47:02 <doug16k> kill them with your kindness
03:47:07 <unlord> doug16k: I'm okay with hardcoding this
03:54:08 <unlord> doug16k: so even if I'm running under a DPMI host, I should be able to turn on the other cores
03:54:37 <unlord> they will all start in real mode and need to switch to their own PMODE
03:59:31 <doug16k> yes
03:59:41 <doug16k> the entry point is any 4KB boundary below 1MB
03:59:55 <doug16k> there are 256 possible entry points
04:00:39 <doug16k> sets cs to that address and ip = 0
04:01:34 <doug16k> you pass a byte parameter to the SIPI IPI, it makes the other cpu set cs to (parameter << 8) and ip to 0
04:02:09 <geist> *thats* what it was. the other day i was saying that i thought you could start an x86 cpu at a unique address
04:02:15 <geist> and everyone was like 'naw you can't do that'
04:02:47 <geist> i thought i was just confusing it wth ARM's PSCI (which can start at a unique address)
04:03:11 <geist> so yeah if you can pass a unique cs then that's a unique address
04:03:21 <doug16k> yes
04:10:00 <Belxjander> 68K has a similar trick but you can set both stack and initial address
04:10:37 <Belxjander> you write both address values as 2x 32bit values to location $0 and Location $4 in physical ram as mapped by the CPU and use the reset instruction
04:11:10 <Belxjander> so if you have multiple 68K in the same hardware setup... and change the "VBR" register for whatever chip boots...
04:11:35 <Belxjander> the other chip can launch into an alternate entrypoint and "jump live" into an already running OS environment
04:12:58 <geist> yah makes sense. VAX i think does too, which is probably where 68k got it from (first two vectors are reset and SP)
04:13:29 <geist> and interestingly cortex-m* picked it up,. there's enough random things in the cortex-m series that seem to be direct homages to VAX that i have to think someone was a fan that designed it
04:13:46 <geist> and i know that 68k devs were obvious fans
04:14:06 <unlord> doug16k: but how do I send the SIPI IPI ?
04:14:11 <clever> http://www.weasner.com/etx/autostar/as_schematic.html
04:14:11 <bslsk05> www.weasner.com: Weasner's Meade Autostar Information
04:14:18 <unlord> I thought I had to write to some memory location
04:14:34 <clever> ive been looking into some telescope stuff lately, and the controller unit includes a 68h cpu
04:14:54 <clever> and it seems like the bank switching changes out the flash at the reset vector, so they had to put the reset handler in both banks
04:21:12 <unlord> wrmsr instruction
04:23:00 <geist> unlord: the whole SIPI stuff is kinda complicated and arcane and silly for historical reasons
04:23:11 <geist> but, both intel and AMD manuals actually have a whole section on how to do it
04:23:21 <doug16k> unlord, you write a command to the LAPIC command register
04:23:34 <doug16k> with the right destination and sipi delivery method
04:23:44 <geist> and there's some whacky bits like you wait so many ms and try again and whatnot
04:23:53 <geist> *presuimably* you dont have to do any of that nonsense any more
04:23:58 <geist> except it's still recommended you do
04:24:48 <unlord> doug16k: I still need to mmap the address of the LAPIC registers
04:25:03 <geist> this is true
04:25:26 <geist> x2apic you can do it with MSR, but i think you have to mmap it once at least to flip it into x2apic mode, and most AMD cpus dont do x2apic
04:25:29 <geist> as well as lots of emulators
04:51:35 <doug16k> unlord, https://github.com/doug65536/dgos/blob/master/kernel/arch/x86_64/cpu/apic.cc#L2063
04:51:38 <bslsk05> github.com: dgos/apic.cc at master · doug65536/dgos · GitHub
04:52:42 <doug16k> actually this is where it starts https://github.com/doug65536/dgos/blob/master/kernel/arch/x86_64/cpu/apic.cc#L2117
04:52:44 <bslsk05> github.com: dgos/apic.cc at master · doug65536/dgos · GitHub
04:55:48 <doug16k> I'm supposed to time out and retry though
04:55:53 <doug16k> doesn't
04:57:12 <doug16k> I wait until each AP has jumped into the kernel, initialized itself, and transformed itself into an idle thread before starting the next one
04:58:22 <doug16k> it could do groups in parallel by making them use different entry points and inferring their initial stack from that
04:58:33 <doug16k> 32 cpus startup really quickly though
04:59:06 <geist> yah my experience is x86 cpus are up and running in just a few ms so it's not a big deal to do them serially
04:59:38 <geist> possibly a little slower on VMs, but i haven't seen that
05:00:10 <doug16k> it's utterly ludicrous speed when I run my LTO build smp kvm
05:00:54 <geist> what's interesting is i've experienced exactly the opposite on ARM on KVM
05:01:14 <geist> for some reason the PSCI call to bring up a secondary cpu on some of the ARM machines we use at work can sometimes take like a second to start a secondary
05:01:23 <doug16k> probably because I have the vapic enabled
05:01:30 <geist> so if you do them serially and yuo bring up 8 cores or so you're talking like a 10 second kernel boot
05:01:50 <doug16k> that would vmexit. I mean, the LAPIC is super super good in kvm
05:02:10 <geist> but thankfully ARM PSCI has a separate entry point *and* and argument in x0, so you can easily start a bunch of cpus in fire and forget mode
05:02:44 <doug16k> a second is ridiculous
05:02:52 <geist> point being that if you try to make your secondary cpu bootup code be as cross platform as you can you can get into some weird problems with assumptions as to how the cores are initialized and shutdown
05:03:21 <geist> cpu shutdown is similar. at least on x86 and ARM they're two completely different mechanisms and annoyingly different
05:03:39 <geist> x86 you can stop another core, so you can stop a cpu by putting it in a deep loop and then hitting it with a shutdown IPI
05:03:50 <geist> ARM you can only power a core off (when using PSCI) by making a call on the cpu itself
05:03:52 <clever> i believe with the official rpi firmware, you just write a different (or the same) entrypoint into a list of magic addr's (plain ram) and then IPI each core
05:04:05 <geist> basically a cpu_exit() call. which is great, except there's no way to actually wait for a second cpu to stop
05:04:11 <clever> and no x0/r0 control
05:04:11 <geist> you can't actually *know* that it has exited
05:04:47 <geist> yes, that's the 'cpu is trapped in a loop' cpu boot method. that's the fallback method that some ARM platforms use
05:04:49 <clever> pretty sure the pi has no on/off switch at the core level, all cores are always on
05:04:55 <doug16k> so people end up waiting really long to make sure?
05:05:00 <geist> doug16k: basically yeah
05:05:25 <geist> it's really mainly a problem when you're trying to do some sort of kexec() styule thing and you want to tear all the secondary cores down to some initialized place
05:05:46 <geist> this is a thing where x86 is actually very easy, you can off cpus such that they look like they're ready to be born very easily
05:06:17 <clever> on the VPU side of things, there are 2 special registers, that seem to let you just write PC directly
05:06:27 <clever> so you can force either core to begin executing something
05:06:38 <clever> guessing it also wakes from halt
05:06:55 <clever> but obviously, you shouldnt use it when the core is doing things, it likely doesnt preserve any state
05:07:48 <clever> again, no off, but you can mask interrupts and the run a halt opcode, and it will just never wake up
05:08:02 <clever> though, the masking is in mmio, so you can unmask for another core...
05:08:12 <geist> that's almost always a bad idea is the problem
05:08:41 <geist> the reason thins like PSCI exist is to make that 'safe' beause traditionally doing things like ripping the clock or resetting a cpu at the low level can leave all sorts of bad state in the bus or whatnot
05:08:47 <clever> i think the only time that force-jmp thing is used, is in early boot, when the rules of the api say the core should be dead
05:09:05 <geist> since cpus have been getting more and more complicated arm tried to clean it up by adding the PSCI call interface to hide all these details
05:09:29 <geist> because it's specific about making sure the caches are properly synchronized and the bus is quiesced, outstanding memory acceses are done, etc
05:09:46 <geist> but if you just rip the clock or hit the resset line on a modern core it's Pretty Bad
05:10:00 <geist> unless you're going to go through a hard system reboot in which case everything will be put back
05:10:15 <clever> i do see evidence of that stuff in the power gating driver for the rpi (found in linux source)
05:10:33 <clever> there are many stages, including gating every bus out of that region, and doing a region local reset
05:10:47 <geist> yep. in the old days you had to jump through a bunch of hoops by stepping the cpu down through a series of flushes and loops and whatnot to make sure it's quiesced before you pull the plug
05:10:52 <geist> yep
05:10:52 <clever> so you can power it up, and get it into a sane state, before you give it access to the bus
05:11:05 <geist> the point of PSCI is to put all that logic in the firmware so the kernel doesn't have to
05:11:21 <clever> i think the 4 arm cores of the pi all share a single power domain
05:11:28 <clever> so linux would never have to deal with that for arm cores
05:11:49 <clever> it only comes into play if your turning the 3d cores on/off, and other hefty peripherals
05:12:24 <geist> anyway this is another reason the broadcomm stuff sucks because they've basically ignored 10 years of advice and good practice that ARM has been trying to get folks to do
05:12:48 <geist> it's like if VIA suddenly got interested in making another x86 core but just sort of screwed everything up and made it mostly incompatible
05:12:49 <clever> you can always add PSCI ontop of the pi, and some people have done that already
05:12:52 <geist> and then x86 folks had to all deal with it
05:13:20 <geist> oh sure, and that's whats so stupid about it. broadcomm *could* have done the right thing but they just dont give a shit
05:13:39 <geist> and since a ton of people are exposed to ARM via the Pi, you end up with a fairly bad example
05:13:50 <clever> i think at this point, the rpi team doesnt want to deal with maintaining that large chunk of code
05:13:52 <geist> but, then most people dont do low level so it doesn't really matter
05:13:54 <clever> the current solution "works"
05:13:59 <geist> right
05:14:07 <geist> hack it till it works, ship it.
05:14:21 <clever> ive seen UEFI implementations that ignore efi vars, and only boot the windows .efi file
05:14:26 <clever> "it booted windows, ship it"
05:15:15 <clever> i think part of the problem, is that the rpi team is overloaded with many future plans, and they cant tell us what those are
05:15:20 <clever> so they dont have time to work on smaller things
05:15:37 <clever> and since its closed-source, nobody can help out
05:15:57 <clever> but they are working to improve things, and make linux control more hw directly, less firmware involvement
05:16:22 <clever> the h265 decode in the pi4, just skipped the firmware stage entirely, it only has linux drivers
05:17:14 <clever> my hope is that the firmware gets simple enough that it could be replaced by an outsider, and then we just need to convince them to adopt it as the official firmware
05:19:16 <clever> i think the official statement from the rpi team, is that nobody should be messing with the firmware, and it should (in the end) just setup clocks and boot the arm
05:19:40 <clever> but there are still feature requuests that need firmware changes, like a custom splash screen, which youve likely done on android
05:20:11 <clever> 6 seconds of dead screen when you apply power isnt a good user experiemcne
05:21:11 <clever> one weird "feature" ive discovered, is that the SPI clock for the boot eeprom, is on the same gpio as the activity LED
05:21:21 <clever> so you can monitor SPI activity via the LED, for zero cost in code
05:42:39 <geist> clever: yeah that's why i think a lot of the problem is broadcomm, not the RPI team
05:52:32 <clever> geist: i'm not sure on the details, but they got a guy from the mainline drm team to sign an nda, and gave him docs for the pi4 gfx pipeline
05:52:51 <clever> and thats where the current gfx drivers are coming from
05:53:11 <clever> and the rpi guys have said on the forums that broadcom wont let them release those docs
05:53:31 <clever> makes me wonder what special sauce could be in there? is it really just the hdmi crypto, or is there something more?
05:53:56 <clever> and what can you do with docs, that you couldnt do with open-source drivers
05:54:11 <clever> i have plans to just re-create the docs from the source
05:54:27 <clever> whats the difference at that point?
05:55:37 <clever> one of the engineers has also recently said that modern dram can performn way faster then the benchmarks claim, and they cant give full details due to NDA
05:55:44 <clever> why is even ram under an NDA nowadays???
05:56:11 <clever> when will you need to sign a contract to use the stack pointer? lol
05:58:25 <geist> dram controller
05:58:36 <geist> but yes, that sounds like a company that is historically overprotective
05:59:44 <clever> the pi4 dram controller is a pretty large black box
06:00:26 <clever> on VC4, the dram controller is "relatively" simple, you just poke a series of registers in the right order, and the code is fully in plain c with well named variables
06:00:42 <clever> i dont really understand it, but you can at least get a rough idea, and it has comments
06:00:55 <clever> but on VC6, its an even worse black box, because it has its own firmware blob
06:01:12 <clever> ive yet to even figure out how to load that blob into the controller
06:01:41 <clever> VC4 is much more of a fire&forget thing
06:28:35 <clever> was just thinking, how crazy the rpi3 netboot stuff is
06:28:48 <clever> its driving a usb interface and doing network, with the dram offline
06:29:20 <clever> but it has 128kb of L2 cache to work from, and the .text is in rom, so it doesnt eat into that
06:33:47 <geist> what is the ip protocol it's using?
06:40:31 <clever> geist: dhcp and tftp, to fetch bootcode.bin
06:40:37 <clever> over v4
06:41:41 <clever> thats enough to bring dram online, then it has to fetch start.elf over tftp, then that inits more hw, and has to fetch the rest (config, linux, dtb's_
06:43:21 <clever> hmmm, both are udp, that kind of makes it simpler
06:43:57 <doug16k> udp is much simpler
06:44:16 <clever> it does still have some rather major bugs
06:44:28 <clever> a timeout at the tftp layer is treated as a silent file not found reply
06:44:39 <doug16k> tftp is not bad
06:44:43 <clever> so if kernel8.img isnt found, it silently falls back to kernel.img, the wrong one for that model
06:44:54 <clever> yet it is found, it just lost a packet
06:44:58 <doug16k> I have written a tftp server
06:45:11 <clever> i have confirmed that the tftp server software is not to blame
06:45:19 <clever> it gives a clear error packet, when a file isnt found
06:45:23 <doug16k> it's easy because it's client retries. server just sits there and responds
06:45:27 <clever> the tftp client is treating no reply at all, as not-found
06:45:36 <clever> the client isnt retrying :P
06:45:59 <doug16k> I mean the way the entire works
06:46:11 <clever> a properly working client, yeah
06:46:12 <doug16k> normally it's sender retries
06:46:23 <doug16k> (in other protocols)
06:46:25 <clever> but the rpi client has a bug, where it just assumes no answer == doesnt exist
06:47:16 <clever> i should try making my own tftp server, and intentionally delay the answers
06:47:22 <clever> and see how i can break it and proove its a bug
06:48:38 <clever> hmmm, ive not checked, but i suspect the rpi tftp client (in the rom) cant use gateways either
06:48:45 <clever> seems a bit too advanced
06:50:16 <doug16k> that would be a bit of a surprise. it just did bootp/dhcp request
06:50:37 <clever> lan only just needs plain old arp
06:50:52 <clever> but a gateway involves sending the arp to the "wrong" ip, which needs extra code
06:51:02 <doug16k> ya maybe you are right
06:51:18 <doug16k> I did have to manually parse the dhcp data in my pxe bootloader
06:51:23 <clever> and they definitely have 3 seperate builds, possibly from different codebases
06:51:28 <clever> rom, bootcode.bin, start.elf
06:51:44 <clever> the rom is directly from broadcom, so likely its own codebase
06:52:03 <clever> bootcode.bin isnt really a kernel, more a bootloader, and is relatively dumb
06:52:11 <clever> start.elf is a full threadx kernel
06:52:25 <clever> not sure how much code could be shared between the stages
06:53:24 <clever> even the logging within each is radically different
06:53:33 <clever> the rom can never log, it cant even blink an LED
06:53:45 <clever> bootcode.bin is just printf's to the uart with an if statement to on/off
06:53:57 <clever> start.elf has a whole ringbuffer in ram, which can be dumped over an RPC
06:57:21 <Colin_M_> start.elf and bootcode.bin run on the GPU, right? It's all very odd
06:57:29 <clever> Colin_M_: yep
06:57:48 <clever> Colin_M_: i think the model, is that originaly it was only the "GPU" and the arm was an optional thing they just threw on the side for future use
06:58:11 <clever> Colin_M_: and that tumor on the side has just kept growing and taking over control of the host, lol
06:58:40 <clever> the older chips in this line, didnt even have an arm core
06:58:43 <Colin_M_> Hahaha yeah
06:59:45 <clever> the "GPU" cores also offer broadcom some trustzone/management engine style features
07:00:03 <clever> yes, they could just re-implement that in arm trustzone now, but why change what works?
07:01:11 <clever> geist: does arm maybe want an extra licensing fee if you setup trustzone? though that doesnt make much sense, every EL is fully implemented on the pi3/p4, its just not wired to the dram controller, which does have secure region support
07:14:01 <clever> (facepalm), somebody on the pi forums asking if they can desolder the arm core, and solder in an x86 one, lol
07:14:29 <Mutabah> weeeelll...
07:14:37 <Mutabah> I can see how someone would think that's possible...
07:14:43 <doug16k> if you are asking that, you can't
07:19:27 <clever> > Now that it looks likely that ARM will be bought by Nvidia it is perhaps time for all those building with ARM to consider getting together and cooking up a RISC-V replacement for their cores.
07:19:44 <clever> are people just over-reacting to that news? i dont see why nvidia would buy arm only to kill it off
07:36:45 <geist> clever: probably overreacting
07:36:57 <geist> my main concern would be if they just completely botch it
07:37:36 <Mutabah> Some companies that currently compete with NV might be concerned
07:38:05 <geist> it'd be stupid for them to fiddle with ARMs business, especially after spending as much as they'd have to do
07:38:16 <geist> but... companies also frequently do dumb stuff after acquisions
09:11:53 <siberianascii> does anyone here by chance use or used to use DDD ?
09:12:42 <siberianascii> im trying to get bigger font with it. since
09:12:47 <siberianascii> https://www.gnu.org/software/ddd/manual/html_mono/ddd.html#Customizing%20Fonts
09:12:59 <siberianascii> i tried everything there but it's still have small fonts
09:13:01 <bslsk05> www.gnu.org <no title>
10:48:45 <kingoffrance> siberianascii, you can use "xfontsel" to get fonts in the format X stuff wants
10:49:08 <kingoffrance> if you have many fonts it can take a while to load though
10:49:49 <kingoffrance> im sure theres gotta be something "newer" just cant think off the top of my head
11:00:03 <siberianascii> kingoffrance: i think sasm pretty much replaced DDD
11:00:18 <siberianascii> still i appreciate your help
11:02:53 <kingoffrance> well my guess was it had nothing to do with DDD -- just if you specify invalid x font, or a bitmap font that doesnt have a size, it doesnt exactly warn you (maybe stderr or .xerrors-whatever) -- it will just silently substitute with something else
11:05:34 <siberianascii> kingoffrance: yes it's probably jsut a matter to fix it in .Xresources to the right dpi/size
11:06:16 <siberianascii> i did something like did when i first used i3 with Xft fonts.. i used a manual i found online but i have no clue of how to do the same with hevlatica* fonts
11:06:32 <siberianascii> s/did/it
12:39:43 <Piraty> siberianascii: what a coincidence. i just read a tutorial on gdb which recommended ddd as a frontend, but it's so broken on my system it doesn't even show letters (possibly missing the hardcoded font or sth) :D
12:40:58 <siberianascii> let me recommend you sasm it's a IDE not a frontend to gdb but it have the same features you want with DDD
12:41:47 <clever> ive used ddd once when gdb was connecting to openocd and jtag, it was a bit buggy
12:43:52 <Piraty> i tried gede but it lacks a lot
12:44:26 <Piraty> siberianascii: thanks for that recommendataion
12:44:51 <Piraty> qt4 :(
12:45:07 <Piraty> let's see if it works qt5, doubt it though
12:48:36 <Piraty> argh, qmake
12:57:32 <Piraty> qt5 seems to do it
12:57:40 <Piraty> will play with that, thanks siberianascii
13:03:50 <siberianascii> Piraty: np
13:16:41 <keegans> doug16k: sorry, machine crapped out. yeah, if I direct the 0xFFFFF78000000000 mapping table to the same one as the identity map it will mean that 0xFFFFF78000000000 maps the zero page, which won't work since I'm trying to write separate data to that address. I am using 2mb pages, yeah
13:17:18 <keegans> plus i'd like the mmu to trap if I write over that 2mb page since that's a problem
17:04:08 * geist yawns
17:05:36 * froggey yawns back
18:55:56 <siberianascii> Piraty: it seems that in ddd you can investigate the stack while in sasm you can't...
19:10:50 <geist> hmm, i should give ddd a try
19:11:07 <geist> i think the last time i fiddled with it i was less than impressed for i think similar graphical reasons
19:11:13 <geist> but i probably gave it 2 minutes and gave up
19:13:58 <siberianascii> i just can't make it work with hidpi screen...
19:14:24 <siberianascii> but its seems a shame to give up on such a tool
20:03:55 <stuff_is> can someone teach me stuff please?
20:07:43 <geist> well, depends on what you want to learn
20:08:18 <stuff_is> I want to learn the knowledge and squire the skills to make a good OS and compiler
20:08:22 <geist> ah
20:10:15 <stuff_is> could you do that?
20:10:31 <uplime> have you checked out the books stuff_is ?
20:10:39 <stuff_is> the main issue I struggle with is motivation or rather discipline, I haven't done anything productive
20:10:44 <stuff_is> what books?
20:10:56 <uplime> https://wiki.osdev.org/Books
20:10:57 <bslsk05> wiki.osdev.org: Books - OSDev Wiki
20:11:01 <stuff_is> I've read getting things done and some other books but it didn't help much
20:11:12 <stuff_is> thank you, I will take a look
20:11:22 <stuff_is> I am currently reading catch 22, funny book
20:11:37 <uplime> Jack Crenshaw's compiler book is also pretty decent, although it is a bit weird in how it compiles
20:12:40 <stuff_is> I ahve been following the crafting interpreters online book, and some googling and trying thing since I wanted it to have typing
20:12:53 <stuff_is> but almost before midway in the book I thought I want to do a low level language instead
20:14:06 <siberianascii> stuff_is: hello :D
20:14:08 <stuff_is> is it possible to learn stuff faster than reading books? I mean I am a slow reader
20:14:14 <siberianascii> my other half
20:14:22 <stuff_is> lol hi?
20:14:30 <stuff_is> you struggle with discipline too?
20:14:36 <stuff_is> or are you reading catch 22 too?
20:15:29 <siberianascii> i have dicipline ... i still try to sell shit on 0day today
20:15:46 <siberianascii> i learn one instruction at a time but i keep learning ...
20:15:54 <stuff_is> sell what stuff? what is 0day
20:16:10 <siberianascii> forget about it... i never made it to the first page
20:16:21 <siberianascii> how things in your end ?
20:16:26 <uplime> stuff_is: nothing that is discussed here
20:16:37 <stuff_is> is it some exploits selling site or something?
20:16:45 <stuff_is> or 0days lol
20:17:15 <stuff_is> things are well not so good on my end, I mean I'm not doing anything productive and I am upset about it
20:17:27 <geist> well, these are all things you will have to work through
20:17:37 <geist> if you're unmotivated there's not a lot we can do from here
20:18:05 <siberianascii> and you are not helping geist ...
20:18:06 <stuff_is> can't you tell me what to do? I seem to be able to follow stuff easier when others tell me what to do rather than myself
20:18:16 <geist> siberianascii: hmm?
20:18:26 <siberianascii> stuff_is... what you watn to know ?
20:18:33 <uplime> stuff_is: tell you what to do to write an os? or to get motivated
20:18:39 <stuff_is> how can I become disciplined and not waste time?
20:19:02 <geist> that's sort of the eternal question that everyone deals with
20:19:06 <stuff_is> well I mean, I am not doing anything productive, I think I should do stuff that can help me learn and understand things better, and could also potentially help my career
20:19:13 <stuff_is> it's better to do stuff rather than waste time IMO
20:19:16 <siberianascii> stuff_is: jordan chase is your best bet
20:19:37 <stuff_is> you keep referensing movies and tv shows and stuff lol
20:19:45 <stuff_is> you're not helping either
20:19:58 <siberianascii> you can learn alot from movies and tv and stuff
20:20:17 <siberianascii> i still impact by the movie swordfish...
20:20:26 <geist> so back to the original question: what do you want to learn?
20:20:30 <siberianascii> it got me motivated... which seems that it is something you are lacking
20:20:39 <stuff_is> and also waste stuff, I doubt watching TV will make me learn stuff related to os and stuff
20:20:42 <geist> and once you find it, question is *why* do you want to learn those topics
20:20:48 <siberianascii> i disagree...
20:20:51 <geist> the why is pretty important
20:20:56 <stuff_is> well each person works differently, I doubt a movie will give me an epiphany and change me, I doubt it works for anyone really
20:21:05 <siberianascii> i just downloade the kali linux advanced pentesting course .... i can tell you right now soe cool shit is going on there
20:21:23 <stuff_is> personally I don't believe in the why, because at least for me, I believe nothing makes sense and life is strange, if I chase why I'll never find
20:21:33 <geist> again, what do you want to learn?
20:21:37 <stuff_is> I personally believe in doing stuff the same way you eat to not die and breathe, and stuff, like you just have to cope with it and do it
20:21:40 <geist> the why tells you how you should try to get motivated
20:21:56 <geist> ie, if you're doing it because you want to then that's much easier to self motivate
20:22:21 <geist> if you're doing it because you think you need to learn X before some point in time Y then you'll have to be a lot more disciplined, but you can potentially build some sort of schedule
20:22:24 <stuff_is> well it is hard to explain in words, but I guess I want to learn how to create an OS? or rather I want to learn the skills requires to creating an OS
20:22:26 <geist> etc etc
20:22:34 <geist> okay, so then why?
20:23:19 <stuff_is> well I mean that is a hard person, I want on one part to learn stuff, lower level stuff, on another part I think it might be useful career wise, and I also half believe I have some good ideas, that maybe many people have about oses but nobody does anything cause of status quo and other stuff
20:23:28 <stuff_is> sorry not person, question
20:23:31 <siberianascii> stuff_is: well yo missed the 80's becasue tthe times of making an oses are over
20:23:46 <geist> well htat's not true at all
20:23:51 <stuff_is> I believe most people never challenge anything, thats' why we're stuck
20:23:53 <stuff_is> and that is true everywhere
20:23:55 <siberianascii> now it's the time to exploit
20:23:57 <uplime> what career are you in and/or what do you want to be in?
20:24:03 <siberianascii> opportunities are everywhere
20:24:08 <stuff_is> like look at the state of the world, if people had principles we'd live in a better world
20:24:27 <geist> well, i'd suggest not getting all of that too wrapped up into your motivations
20:24:32 <geist> that really complicates things
20:24:52 <stuff_is> but people overall are cowards, and nobody wants to sacrifice for the better good, then there is the bystander effect
20:24:52 <siberianascii> i disagree ... i learned alot from my obsessions
20:25:05 <siberianascii> i want to sacreifice for th better good ...
20:25:17 <siberianascii> i want the world to see how crap software is
20:25:32 <stuff_is> well I am not sufficiently obsessed or it doesn't last long enough
20:25:49 <siberianascii> try viagra...
20:25:51 <stuff_is> I believe more in habits and doing stuff without liking it, I mean maybe some people are passionate but it doesn't work for me
20:26:09 <siberianascii> and i belive in digital vandalism
20:26:12 <stuff_is> well it seems to me, that you always give such unhelpful stuff, at times I even wonder if youre not a bot
20:26:53 <stuff_is> if you want to sacrifice for the better good, start by sacrificing time and resources for me XD
20:27:16 <siberianascii> im trying but you keep on saying you dont like my mehods
20:27:47 <stuff_is> well what methods? talking about TV show characters, or some substance from some movie, or viagra?
20:27:50 <siberianascii> think about computers as the pharmacy world ...
20:28:24 <stuff_is> I'd rather geist help me lol, he seems more conventional and rational, no offence
20:28:29 <siberianascii> there are no real pharmacutical that wil make you feel better ... they want you ill and keep on spending money
20:28:35 <stuff_is> you seem like those alternative medicine, that doesn't work
20:28:51 <siberianascii> it's the same in computers ...
20:29:01 <siberianascii> there is no security
20:29:17 <siberianascii> because if they were ... you could throw a billion worth of industry to the trash
20:29:41 <siberianascii> but they want you to belive... that you safe and your data is in "the cloud"
20:30:43 <siberianascii> keep on trying to see how shit work and make money
20:32:06 <siberianascii> stuff_is: we done ?
20:33:24 <siberianascii> i guess we are
20:33:42 <siberianascii> geist this is how you battle trolling without banning
20:33:58 <siberianascii> you jsut need to be fast on the keyboard and be determinde
20:34:02 <geist> hmm?
20:34:12 <siberianascii> his gone ...
20:34:15 <siberianascii> thank me later
20:36:01 <stuff_is> his...
20:36:11 <siberianascii> gone
20:36:22 <stuff_is> you talk like a bot, so amusing
20:36:41 <siberianascii> im not a bot
20:36:55 <stuff_is> why do you talk this way? with so many mistakes and things that make no sense
20:37:14 <siberianascii> im immersing my self in the l33t literature
20:38:01 <kkd> hi
20:38:08 <siberianascii> hi
20:38:11 <kkd> I don't understand this sentence: "The CPU will eventually execute the unlock operation (which preceded the lock operation in the assembly code), which will unravel the potential deadlock, allowing the lock operation to succeed." from https://www.kernel.org/doc/Documentation/memory-barriers.txt
20:38:52 <kkd> is it something CPUs do in OoO that they proceede if a certain reordering makes it spin?
20:39:19 <kkd> i can see the case they make for sleeping locks, in that the barrier ensures everything before it is issued before itself, so the lock is eventually lifted
20:39:32 <siberianascii> kkd: i dont know bro i used to ask you questiosn in ##C...
20:39:44 <siberianascii> if you have a problem it must be no joke ...
20:39:50 <siberianascii> i must call the cavalry
20:40:01 <kkd> (this is about ACQUIRE floating up before a RELEASE operation)
20:40:01 <siberianascii> geist: doug16k zid
20:40:14 <kkd> i figure compilers wouldn't do it, only the CPU would, but I might be wrong
20:40:43 <siberianascii> kkd: i wish i could be of any help
20:41:56 <siberianascii> kkd: this is not ##C or ##asm .... people here are snobs
20:42:22 <siberianascii> you must raise your hand first before you get to speak here ...
20:43:15 <siberianascii> geist: he is a good guy from ##C help him already lol
20:43:50 <siberianascii> kkd: fucking brits bro ...
20:46:26 <siberianascii> msg stuff_is which means im fucked lol...
20:46:38 <siberianascii> /////////
20:52:42 <stuff_is> man, I had such high hopes for this channelle lol
21:02:11 <stuff_is> kkd: from my understanding when the CPU reorders stuff, it always ensures it behaves as if it wasn't reordered, so if it finds a dependency between instructions or stuff like that, it will resolve it or w/e
21:03:05 <kkd> that's what i was thinking, there's no other meaningful explanation (other than that the CPU decides to proceed forward with execution of stuff that was supposed to happen earlier)
21:03:10 <kkd> and that indeed lifts the lock
21:03:30 <kkd> though I am puzzled if this can occur for the same location release; acquire; pair, which CPU does this!?
21:03:43 <stuff_is> well reading wikipedia, it says that cpu can reorder stuff in multi-threaded mode that breaks stuff, hence why you need memory barriers
21:04:24 <stuff_is> it says it can reorder stuff all the time, but it only gaurantees it works "correctly" in single thread code, in multi-thread you need to manually tell the cpu don't reorder from what I understand
21:04:28 * stuff_is shrugs
21:04:37 <doug16k> kkd, store release prevents the store from being reordered ahead of prior stores. load acquire prevents subsequent loads from being reordered ahead of the load acquire
21:05:22 <doug16k> they are like check valves
21:05:48 <kkd> yes, that part is clear, consulting the c11 memory model it says the for the same location, release cannot jump ahead of an acquire that pairs with it, but the kernel doesn't really follow the c11 model
21:06:37 <kkd> they're one way, so they say it can pass the release barrier (float up) since nothing prevents stuff below the release from floating up
21:07:17 <kkd> well, ok, maybe the kernel memory model allows for that, and some arch does it, but i didn't get that sentence at first
21:07:45 <kkd> but i guess it's that the release that was supposed to be executed before will be eventually executed by the cpu and hence break the deadlock
21:08:04 <kkd> in the spinning case, the sleeping case is fine where the barrier ensures both are executed before itself
21:09:09 <kkd> but all i can say is that this reordering for the same location is pessimistic, since more cycles were spent acquiring a potentially free lock
21:09:44 <doug16k> kkd, think of it as not letting it cheat and run the ops early just because they will hit the cache
21:10:00 <doug16k> don't think of it as a big delay
21:10:44 <kkd> perhaps i need to read a book on cpu architecture :-) maybe it would have been clearer if i would have known how OoO works in general
21:11:17 <doug16k> watch professor Mutlu's computer architecture channel on youtube
21:11:48 <stuff_is> can you teach me stuff too please doug16k ?
21:12:02 <doug16k> https://www.youtube.com/user/cmu18447
21:12:04 <bslsk05> www.youtube.com: Carnegie Mellon Computer Architecture - YouTube
21:12:24 <kkd> thanks
21:12:29 <doug16k> goes in depth on OoO and cache coherency
21:13:02 <kkd> btw, what are the ordering constraints if these were pure fences (and not tied to a store/load), then the same location clause doesn't hold
21:13:31 <kkd> e.g. atomic_thread_fence(release); atomic_thread_fence(acquire); can this pair undergo reordering?
21:14:15 <doug16k> pure fence blocks both ways. later stores won't reorder ahead of the barrier, subsequent loads won't reorder before the barrier
21:14:31 <doug16k> full barrier blocks both ways*
21:14:37 <kkd> ah, so they're two way
21:15:03 <kkd> and the mo_flag just denotes the type of fence (i see power does different things for release/acquire)
21:15:14 <kkd> thanks, that makes it a lot clearer
21:15:37 <kkd> arm seems to just emit dmb
21:16:47 <geist> more modern arm has more find grained barriers, forward and backward
21:16:52 <geist> plus load/stores that can have a barrier built in
21:18:20 <kkd> are forward/backward something new or same as release/acquire?
21:27:01 <doug16k> kkd, do you know the classic example of why you would need a store release?
21:27:39 <doug16k> running a constructor and storing a pointer to it somewhere that another thread can see the pointer
21:28:04 <doug16k> if the pointer store isn't store release, then the cpu is allowed to update the pointer before it has even done the stores in the constructor
21:28:22 <doug16k> (from the perspective of other agents)
21:28:57 <doug16k> store release tells it "dude, make sure the prior stores are done before you let this one go through"
21:29:13 <doug16k> done meaning, visible from other agents
21:30:25 <doug16k> make it so the stores in the constructor happen before the store to the pointer
21:30:26 <geist> also note that the reason a lot of this doens't show up in user space code is that mutexes and other locks have built in barriers
21:30:51 <geist> so things like it's hard to actually construct an object and get it handed to another thread (on another cpu) without having gone through some sort of lock probably
21:30:55 * Bitweasil smashes the barrier to prove... uh... wait, is this Seattle?
21:31:05 <geist> or a new thread where a barrier was implied
21:31:55 <Bitweasil> Memory models and such are definitely not something I've thought about much.
21:31:59 <Bitweasil> x86 kinda lets you ignore most of them.
21:32:05 <doug16k> yeah, my example applies when you are cheating and doing a trick where you peek at the pointer outside a lock to see if it needs constructing, and only acquire it when it needs constructing
21:32:26 <geist> right
21:32:45 <geist> 'publishing' data via waiting for a global variable to change is where you get in trouble
21:32:54 <geist> but thats not safe for precisely these sort of reasons
22:06:32 <kkd> doug16k: yeah, this is probably why C++ shared_ptr does relaxed rmw increments but decrement is acq_rel to get the correct view of object when destructing
22:07:58 <kkd> though for trivially destructible it could avoid paying that cost since in that case all that is needed is freeing the storage
22:09:06 <doug16k> on x86, on writeback memory type (normal memory), stores appear to occur in program order from the perspective of other agents
22:09:15 <doug16k> it's like every store is a store release
22:09:41 <doug16k> deeply queued though, with excellent store to load forwarding
22:10:08 <doug16k> a load from a queued store completes immediately
22:10:26 <doug16k> before it has even done the store
22:11:28 <doug16k> on modern ones, if all the bytes of the load are contained in the store (arbitrarily positioned/sized) then it can forward it
22:12:16 <doug16k> on early ones the store and load address had to match to get forwarding
22:13:12 <kkd> you mean there's no stall on modern intel if load operand size is different?
22:13:37 <doug16k> I'm sure zen2 is that way. pretty sure intel is
22:14:02 <doug16k> so if you did a 64 bit store then two 32 bit loads from it, they'd complete immediately from forwarding the store data
22:14:32 <doug16k> they had to do that. that is the way it passed parameters on i386
22:14:40 <kkd> yeah, and this happens quite often, due to compilers liking to tear loads
22:14:56 <doug16k> push push push call framesetup load-from-push load-from-push load-from-push ...
22:15:39 <doug16k> the x86 calling convention made forwarding stores extremely good
22:15:51 <doug16k> so they made forwarding extremely good
22:17:08 <doug16k> push will be 32 bit practically every time. it has to be able to forward the low 8 bits of a 32 bit push if the function had a uint8_t parameter
22:19:11 <doug16k> the code would likely do a byte load from n(%ebp)
22:20:56 <kkd> interesting stuff. i also remember reading about some experiment where they cut the stall latency of multiple bad SLF cases by interleaving stores and loads instead of a simple unroll
22:20:58 <doug16k> that's the easy case. lately they can handle the hard case of the load bytes lying anywhere entirely within a store too
22:22:36 <doug16k> yeah, both amd and intel have one store port, and two load ports
22:22:48 <doug16k> you shouldn't burst a bunch of stores back to back if you can help it
22:23:01 <doug16k> put stuff between them
22:23:41 <doug16k> the stuff between the stores will be effectively done for free
22:24:55 <doug16k> could be as many as 3 free simple instructions between stores
22:25:02 <kkd> https://easyperf.net/blog/2018/03/09/Store-forwarding
22:25:04 <bslsk05> easyperf.net: Store forwarding by example. | Easyperf
22:25:16 <kkd> "One more interesting experiment"
22:25:49 <doug16k> that 1st quote is totally misleading
22:26:00 <doug16k> "the processor can ...".... oh? which processor?
22:26:06 <doug16k> presents it as it is a universal truth
22:28:22 <doug16k> zen2 optimization guide: "The LS unit supports store-to-load forwarding (STLF) when there is an older store that contains all of the load's bytes, and the store's data has been produced and is available in the store queue. The load does not require any particular alignment relative to the store or to the 64B load alignment boundary as long as it is fully contained within the store"
22:29:28 <doug16k> the end part means, "and it's so badass it can even forward a misaligned store that crosses a cache line boundary"
22:31:32 <kkd> lol
22:37:57 <doug16k> intel tries to cover 20 years of processors in one annoying pdf, so it says: "The first requirement pertains to the size and alignment of the store-forwarding data. This restriction is likely to have high impact on overall application performance. Typically, a performance penalty due to violating this restriction can be prevented. The store-to-load forwarding restrictions vary from one micro-architecture to another."
22:38:17 <doug16k> translation: old ones suck. new ones are awesome
22:38:28 <doug16k> thanks intel for letting me know that
22:42:23 <doug16k> intel's isn't as easy to describe as zen2, has a bunch more restrictions
22:44:20 <doug16k> "The store must be the last store to that address prior to the load" "The load cannot cross a cache line boundary" "The load cannot cross an 8-Byte boundary. 16-Byte loads are an exception to this rule."
22:45:11 <doug16k> "The load must be aligned to the start of the store address, except for the following exceptions" ... 64 bit store may be loaded as 2 32 bit halves, 128 bit store may be forwarded as 4 32 bit quarters or two 64 bit halves
22:48:06 <doug16k> then, annoyingly, later it adds more crap that basically relaxes it to be closer to what zen2 said
22:48:33 <doug16k> vaguely
22:49:15 <doug16k> shows a picture that indicates it can arbitrarily forward, but neglects to mention whether it still fails at cache line boundaries
22:50:05 <doug16k> sorry, not arbitrarily. has to be aligned properly with the bigger store.
22:50:20 <doug16k> 16 bit load will only forward if the offset from the 64 bit store is a multiple of 2
22:51:33 <doug16k> 32 bit multiple of 4, 64 bit multiple of 8, etc
23:12:36 <doug16k> ya if you keep reading it eventually is as good as zen2 at nehalem
23:12:55 <doug16k> still unclear about cache line boundaries
23:27:28 <geist> pretty neat stuff though