Search logs:

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present


http://bespin.org/~qz/search/?view=1&c=osdev&y=19&m=2&d=17

Sunday, 17 February 2019

12:06:11 <jmp9> ok i have problems with AHCI
12:06:18 <jmp9> it doesn't work and i don't know why
12:06:29 <jmp9> PxCMD.ST is 1
12:06:52 <jmp9> but PxTFD doesn't have BSY and DRQ bits set
12:09:25 <Mutabah> jmp9: When are you checking TFD? Just in a loop? or in an interrupt handler?
12:09:35 <jmp9> loop
12:10:37 <Mutabah> Do you set FRE when issuing your read?
12:10:49 <Mutabah> (Actually, on init)
12:12:08 <jmp9> yes
12:12:27 <jmp9> s_ahci.sata->ci = (1<<0);
12:12:32 <jmp9> s_ahci.sata->cmd |= SCMD_FRE|SCMD_ST|(1<<2);
12:13:44 <Mutabah> And to start the slot you're writing both SACT and CI?
12:14:55 <jmp9> SACT?!
12:15:04 <jmp9> i writing to CI
12:16:10 <Mutabah> Try both? That's what my code does
12:19:33 <jmp9> it didn't worked
12:20:40 <jmp9> https://pastebin.com/PY10r9Bq
12:27:24 <jmp9> okay
12:27:36 <jmp9> where in specification described how to send commands?
12:29:32 <Mutabah> The AHCI specs?
12:30:14 <klys> jmp9, is s_ahci.sata a mmio or a port io device?
12:30:21 <jmp9> mmio
12:30:26 <jmp9> port
12:30:32 <jmp9> hba_port_t*
12:30:39 <jmp9> its a structure
12:30:55 <klys> and it exists withing memory space
12:31:28 <jmp9> yes
12:31:38 <klys> kk
12:31:49 <jmp9> do i need spin up drive before sending commands?
12:32:05 <jmp9> when i testing on laptop my drive silent
12:33:48 <klys> you mean like this? 26. s_ahci.sata->is = 0xFFFFFFFF;
12:34:32 <jmp9> i added in hope that it will help
12:34:36 <jmp9> i don't know what it does
12:35:35 <klys> so in addition to being a mmio device, it actually points to other mmio devices in linear memory?
12:36:38 <jmp9> sata points to sata
12:36:47 <klys> kk
12:36:49 <jmp9> identity mapping
12:37:06 <klys> looks lik I read that wrong anyway
12:37:30 <jmp9> page_map(u32 __virt virtual,u32 __phys phys,u32 size,u32 flags);
12:40:43 <klys> so this is all relative to hba_phys, and there's a physical address mapped into linear memory by that function
12:42:04 <jmp9> hba_phys just store physical address
12:42:10 <jmp9> which later going to be identity mapped
12:42:17 <klys> yeah
12:43:01 <mrvn> My maze algorithm seems to work as intended now: http://pasteall.org/pic/show.php?id=771cee12be311d407f90b77e0f1c100b
12:45:17 <klys> jmp9, do those maps for ->clb and ->fb also include the mmaps for ->is ->ci ->sact ->tfd ->serr that you use later?
12:45:28 <jmp9> i did
12:45:35 <jmp9> page_map(s_ahci.sata->clb,s_ahci.sata->clb, AHCI_CLB_SIZE,PAGE_DIR_READWRITE);
12:45:54 <jmp9> mmaps for is ci sact tfd?
12:46:13 <jmp9> ports is a part of HBA memory region
12:46:18 <jmp9> which a fully mapped
12:46:23 <jmp9> page_map(s_ahci.hba_phys,s_ahci.hba_phys, 16384,PAGE_TABLE_READWRITE);
12:46:33 <jmp9> 16kb
12:47:28 <klys> I'm wondering if you need to verify that ->is ->ci ->sact ->tfd ->serr are all within that range
12:48:33 <jmp9> hm
12:50:21 <jmp9> 0xFEBF1000 0xFEBF1110
12:50:26 <jmp9> yes, they're mapped
12:50:32 <klys> okok
12:51:31 <jmp9> ain't it supposed throw page fault if there is no mapping?
12:51:42 <klys> yeah likely true
12:54:35 <klys> on line 41, are you actually writing to a register?
12:55:55 <klys> that would be kmemzero(fis,sizeof(fis_reg_h2d_t));
12:58:34 <jmp9> fis is a pointer to cfis buffer
12:58:37 <jmp9> and i'm zeroing it
01:00:05 <klys> is that part of the above mapping too? it would have to be a physical address, right?
01:00:56 <jmp9> >page fault
01:01:21 <jmp9> if there something wrong with mappings, i'll get page fault
01:01:28 <klys> so you mean, if they aren't identity mapped you'd get a page fault
01:01:37 <jmp9> yes
01:01:44 <jmp9> i didn't get any page fault
01:01:52 <jmp9> and they are identity mapped
01:08:49 <jmp9> huh
01:09:09 <jmp9> ci is always zero
01:10:16 <klys> jmp9, I've been reading at https://wiki.osdev.org/AHCI at void start_cmd(HBA_PORT *port)
01:11:49 <jmp9> okay
01:11:51 <jmp9> i tested now
01:12:09 <jmp9> PxCMD.ST, PxCMD.FRE, PxCMD.CR, PxCMD.FRE is 1
01:14:34 <GwenNelson> so, anyone got any ideas on sensible ways to do linux-style modules that can be either statically linked or dynamically loaded?
01:14:41 <GwenNelson> when statically loaded how do you find them?
01:14:57 <GwenNelson> do you just need to iterate through the kernel symbol table or what?
01:16:03 <jmp9> okay intersting thing
01:16:15 <jmp9> hdd led on my laptop is always on
01:18:35 <jmp9> i did everything as everyone does
01:18:40 <jmp9> as it in example
01:18:44 <jmp9> as it in other drivers
01:18:47 <jmp9> and it doesn't work
01:18:52 <klys> jmp9, #define HBA_PxCMD_FRE 0x0010 ?
01:19:27 <klys> now you said it was 1
01:19:45 <klys> did you just mean bit 0x0010 was set?
01:19:54 <jmp9> #define SCMD_FRE (1<<4)
01:20:01 <klys> ok
01:21:33 <jmp9> while(s_ahci.sata->ci & (1<<0))
01:21:37 <jmp9> this loop is never executed
01:21:43 <jmp9> that means that CI is always zero
01:22:00 <jmp9> also as this
01:22:01 <jmp9> while(s_ahci.sata->tfd & (ATA_DEV_BUSY|ATA_DEV_DRQ))
01:22:11 <jmp9> BUSY&DRQ never 1
01:23:26 <klys> jmp9, how about that example: while (port->cmd & HBA_PxCMD_CR); // Wait until CR (bit15) is cleared
01:24:05 <jmp9> if(port->cmd & (SCMD_ST|SCMD_FR|SCMD_CR|SCMD_FRE)) //Isn't IDLE { port->cmd &= ~SCMD_ST; while(port->cmd & SCMD_CR)
01:24:15 <jmp9> line 155
01:24:28 <Mutabah> GwenNelson: I use a special linker section that's populated with module descriptions
01:24:32 <Mutabah> GwenNelson: Then iterate that at boo
01:24:34 <Mutabah> *at boot
01:27:50 <klys> jmp9, this part looks puzzling, what does it even do for you?: port->cmd &= ~SCMD_ST;
01:29:34 <jmp9> hba_port_t* port = &HBA->ports[i];
01:30:10 <Mutabah> jmp9: Just checking, are these register fields defined as volaile?
01:30:24 <jmp9> volatile?
01:30:47 <klys> I'm wondering what would it be if you comment that line out
01:30:49 <Mutabah> I'd assume not then
01:31:07 <Mutabah> Have you triple-checked the assembly to see that your register writes are happening?
01:31:18 <Mutabah> and that reads of status values aren't being ignorred/deferred
01:37:32 <klys> to initialize the port, or start command engine, there are a couple of things you added to the routine. that includes port->cmd &= ~SCMD_ST;
01:38:07 <klys> which isn't in the original example I'm reading on the wiki page
01:39:26 <jmp9> mov eax, ds:0C051300Ch .text:C0103202 mov eax, [eax] .text:C0103204 and eax, 8000000h .text:C0103209 test eax, eax
01:39:29 <jmp9> this is reading
01:39:39 <jmp9> " to initialize the port, or start command engine, there are a couple of things you added to the routine. that includes port->cmd &= ~SCMD_ST;"
01:39:40 <jmp9> yes
01:39:46 <jmp9> i read init routine from spec
01:40:10 <jmp9> they told me that is when device is not in IDLE state, so it must be switched to IDLE state
01:40:18 <jmp9> anyway i didn't hear any sounds of hard drive
01:40:21 <jmp9> is it powered on?
01:40:42 <klys> that however is in the example code for stop command routine.
01:41:06 <klys> s/routine/engine/
01:41:18 <jmp9> well
01:41:23 <jmp9> things changed
01:41:31 <jmp9> CI has changed state
01:42:51 <jmp9> "For each implemented port, clear the PxSERR register, by writing ‘1s’ to each implemented bit location. "
01:42:54 <jmp9> what is that mean
01:43:01 <jmp9> port->serr = 0xFFFFFFFF ?
01:43:29 <Mutabah> yep
01:44:04 <Mutabah> R/WC is a pretty common register type
01:44:13 <Mutabah> Reads the state of a flag, and you write a 1 to clear that flag
01:44:30 <jmp9> WHAT
01:44:40 <jmp9> UHHHH
01:45:12 <jmp9> looks like my clearing code is totally wrong
01:46:00 <eryjus> jmp9 -- yes, the concept of write-to-set and write-to-clear registers is a detail not to miss
01:46:16 <eryjus> been there; done that
01:46:19 <jmp9> how ironically that i missed that thing
01:52:41 <jmp9> RW1
01:52:44 <jmp9> what it does menas
01:52:45 <jmp9> means
01:54:52 <klys> port -> serr = 0x07ef0f03; // try that
01:55:22 <klys> from page 39 and 40 of: https://www.intel.com/content/www/us/en/io/serial-ata/serial-ata-ahci-spec-rev1-3-1.html
01:56:27 <Mutabah> jmp9: I think RW1 is another shorthand for "read, write 1 to clear"
01:56:35 <Mutabah> jmp9: There should be a summary in the same document
01:56:51 <jmp9> RW1 Read/Write ‘1’ to set
01:57:04 <jmp9> so actually you can't set bit to 0
01:57:40 <Mutabah> Ah, so it's the oposite
01:58:21 <Mutabah> as said, it's a pretty common pattern - Packing a bunch of booleans into one address, and allowing individual single-operation control
02:01:29 <jmp9> okay so PxCI reg changes it state
03:01:48 <jmp9> any suggestions how to fix this?
03:02:02 <jmp9> ahci doesn't want to run my commands
03:02:24 <geist> is this against an emulator?
03:02:34 <jmp9> on against everything works
03:02:39 <jmp9> i can even read sectors in LBA mode
03:02:43 <geist> is this against an emulator?
03:03:11 <geist> if so you can debug it by instrumenting the emulator and/or reading it's code
03:03:20 <geist> it's a highly effective straegy that i've employed many times
03:03:25 <jmp9> that's not so helpful
03:03:30 <jmp9> i must write for real hardware
03:03:43 <geist> since most of the time it's your fault, and it's a logic error, then sometimes debugging it against an emulator gets yo most of the way there
03:03:49 <geist> so you cannot use an emulator or you dont want to?
03:04:12 <jmp9> i can't use emulator because it must work perfectly on real laptop
03:04:12 <geist> or it only reproduces against real hardware?
03:04:23 <jmp9> it only doesn't works on real laptop
03:04:36 <geist> so in other words it only fails on real hardware, and it works on an emulator?
03:04:41 <jmp9> yes
03:04:45 <jmp9> that's it
03:04:50 <geist> thank you
03:04:57 <geist> then i dont have any more suggestions
03:05:38 <jmp9> i have little feeling that it executed command
03:05:39 <jmp9> BUT
03:05:43 <jmp9> not FIS
03:05:53 <jmp9> because PxCI reg updated
03:22:00 <jmp9> okay
03:22:02 <jmp9> i'm high now
03:22:06 <jmp9> an d ihope it will help me
03:25:06 <geist> heh, kinda doubt it
03:28:01 <jmp9> can anyone show their AHCI driver code?
03:28:10 <jmp9> i want figure out what the hell i did wrong
03:28:17 <jmp9> because i followed spec and osdev tutorials
03:28:26 <jmp9> i fixed RWC registers
03:28:32 <jmp9> i've
03:32:07 <geist> https://fuchsia.googlesource.com/fuchsia/+/master/zircon/system/dev/block/ahci/ahci.c and https://fuchsia.googlesource.com/fuchsia/+/master/zircon/system/dev/block/ahci/sata.c
03:33:05 <jmp9> thanks very much
03:38:46 <jmp9> okay
03:38:47 <jmp9> question
03:38:52 <jmp9> what is purpose of FIS_TYPE_REG_D2H?
03:43:55 <rajasrijan> jmp9 , do you use qemu? you can enable trace for HDD controller. It prints out highly detailed logs which I've found very useful.
03:44:04 <jmp9> how
03:44:20 * rajasrijan googling
03:44:20 <jmp9> it will be helpful
03:45:16 <rajasrijan> https://wiki.qemu.org/Features/Tracing
03:45:25 <rajasrijan> try this as start
03:45:54 <rajasrijan> https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/devel/tracing.txt;hb=HEAD
03:46:27 <rajasrijan> ^this is more detailed
03:53:13 <jmp9> okay simple question
03:53:20 <jmp9> how do i should wait for command completion?
03:53:33 <jmp9> wait when PxCMD.CR or .FR register clears?
03:53:46 <jmp9> or when my command disappears in PxCI?
03:54:23 <doug16k> GwenNelson, I have kernel modules
03:54:58 <doug16k> GwenNelson, the module loader -> https://github.com/doug65536/dgos/blob/master/kernel/arch/x86_64/elf64.cc
03:55:33 <doug16k> the kernel is linked with --export-dynamic -fvisibility=hidden then, in code an EXPORT macro expands to __attribute__((__visibility__("default")))
03:55:44 <doug16k> then modules are linked -shared
03:56:03 <doug16k> then you call modload_load with the module and it loads and dynamically links to the kernel
03:58:46 <jmp9> okay i figured out problem
03:58:48 <jmp9> my laptop
03:58:51 <jmp9> doesn't execute FIS commands
03:59:11 <doug16k> FIS commands?
03:59:34 <jmp9> fis->fis_type = FIS_TYPE_REG_H2D; fis->c = 1; fis->command = 0x25;
03:59:35 <jmp9> like this
03:59:49 <jmp9> that's might be something wrong with PRDT
04:00:37 <doug16k> without that the controller is completely non-functional
04:00:43 <doug16k> you are mistaken
04:01:37 <doug16k> is that really what you are doing though?
04:01:52 <doug16k> is the controller mind reading what LBA and count you want to use?
04:02:08 <doug16k> what's c
04:03:23 <doug16k> what makes you sure that the other fields are zeros?
04:04:21 <doug16k> jmp9, why do you ask how to wait for command completion? it's spelled out in extreme detail in the spec
04:04:21 <geist> and i guess is the strruct appropriately packed, etc
04:06:34 <doug16k> it depends on whether you are using NCQ or not
04:06:57 <doug16k> the mechanism for handling completion is completely different for non-NCQ and NCQ
04:07:57 <doug16k> 0x25 is non-NCQ read DMA EXT
04:10:05 <doug16k> jmp9, are you using qemu?
04:10:32 <geist> i think the problem statement originally is they're using a particular laptop that has trouble
04:10:40 <doug16k> add -trace ahci*
04:10:40 <geist> the implication is that the driver works against othe hardware and/or qemu
04:10:51 <doug16k> the trace might tell you that you are doing something bad
04:11:04 <doug16k> also, add -d guest_errors
04:11:31 <geist> that's a handy switch
04:11:36 <jmp9> yes
04:11:40 <jmp9> i'm using qemu
04:12:17 <doug16k> geist, it's one of the older mechanism for reporting the guest did something bad. there is still a bunch of use of it in qemu
04:12:47 <geist> very nice. btw i'm sorry i said something mean to you the other day
04:12:57 <geist> i forget what it was but you logged off
04:13:05 <geist> probably hadn't had my coffee yet
04:13:17 <doug16k> ya no worries. I wasn't in the best mood either
04:13:28 <geist> it happens
04:18:37 <doug16k> I added a bunch of UB detection messages when I added tracing in nvme, they had me update the patch to also write UB messages to the "-d guest errors" logger too, so it's being maintained
04:25:41 <jmp9> Okay i copy pasted code
04:25:44 <jmp9> and it doesn't work
04:26:00 <jmp9> okay i'm doing it without rebasing
04:26:06 <jmp9> do i really need rebase port?
04:26:55 <doug16k> if you blindly accept all their pointers, are you being careful that you aren't trashing that address?
04:27:30 <doug16k> you should allocate some system memory things like I did in my driver and update those pointers to something you know isn't overwritten by then
04:28:41 <doug16k> that's what the rebase in my driver is doing. it takes over the control controller and puts things in sensible locations and gets pointers to them for the driver to manipulate them, and tells the AHCI where to find them
04:29:10 <doug16k> ...where to find them in physical memory
04:29:48 <doug16k> if you don't then are you sure those data structures are in a place you didn't touch? are you sure they are in a decent location on your real machine?
04:35:21 <jmp9> something stange happens
04:35:26 <jmp9> when i first execute command
04:35:32 <jmp9> it just executes and no result in buffer
04:35:35 <jmp9> the second time
04:35:43 <jmp9> PxCI command slot doesn't resetting
04:35:53 <doug16k> first one works, then stops working from then on?
04:35:59 <jmp9> yes
04:36:19 <doug16k> did you look for RW1C bits in the spec like I mentioned yesterday?
04:36:22 <jmp9> first it works (even that it doesn't copy result to my buffer. My buffer is still empty)
04:36:26 <jmp9> yes
04:36:28 <doug16k> you probably need to write 1 to one of those
04:37:02 <doug16k> the controller is a complicated state machine. you need to do certain things for it to return to a state where it will report another completion
04:38:35 <doug16k> jmp9, when you write to PxCI, what do you write? 1 every time?
04:38:45 <jmp9> yes
04:39:06 <jmp9> when i first run it clears that bit in CI
04:39:10 <jmp9> in second try, it doesn't
04:42:07 <jmp9> okay i'm gonna sleep
05:36:05 <Telyra> Jeeeeeesus, the datasheet for the Intel X550 is 1200 pages long.
05:42:31 <Telyra> Oh god the X710 series datasheet is 1700.
05:50:08 <geist> oh that's not that bad
05:50:30 <geist> it's seen *far* worse. and really a lot of datasheet is generally good. means it's actually documented
05:50:44 <geist> vs say a 100 page datasheet where it only mentions stuff and doesn't actually tell you what's going on
06:04:55 <bluezinc> geist: the real question is how much of what's in the datasheet is _wrong_
06:07:07 <geist> that's the rub
06:13:28 <doug16k> qemu `info mem` doesn't show executable bit in -rw part? that sucks!
06:14:30 <doug16k> people would want that right? separate r-x from rw- from r--
06:15:00 <doug16k> or, rwx for that matter (eww)
06:15:25 <geist> doug16k: didn't you do some work on that?
06:15:39 <doug16k> ya on the addresses
06:15:51 <doug16k> they are canonical now for 48 or 57 bit addressing modes
06:15:53 <geist> sounds like a good addition. i've thought it'd be neat to add the info mem/info tlb stuff for arm
06:16:31 <doug16k> it has no info mem or tlb at all?
06:16:41 <geist> negative. i assume that the guts of it are highly arch specific
06:16:53 <geist> probably just a call into the target layer with 'figure out what to do here'
06:16:58 <doug16k> it should "know" the format it implements though
06:17:15 <geist> arm of course is a little more complicated with all the page sizes and whatnot
06:17:18 <geist> so may be a bit more work
06:17:19 <mahackemu> hmm i see -rw, what is the first field supposed to be?
06:17:33 <doug16k> mahackemu, it might be x for executable. that's what I was expecting
06:17:53 <doug16k> it would show scary rwx ranges too at a glance
06:17:55 <geist> could be its inverted logic, like the bit?
06:18:13 <doug16k> I thought so but every line of my output has - there, and I have NX enabled
06:19:03 <geist> and you have them set to nx or not?
06:19:10 <geist> - would imply no X
06:19:28 <doug16k> info TLB shows miles of pages with X
06:19:36 <doug16k> lowercase tlb*
06:19:41 <geist> aaah so it's info mem specifically
06:19:45 <doug16k> ya
06:19:47 <geist> i always forget the difference there
06:19:56 <doug16k> mem is summary. tlb is insane verbose
06:20:14 <geist> ah
06:20:27 <doug16k> every single PTE is listed by info tlb
06:20:41 <doug16k> the last level ones that is
06:20:45 <doug16k> inner arent shown at all
06:21:36 <doug16k> I've been wishing I could give info tlb a range for ages. maybe I'll grant myself that wish sometime toon :D
06:21:58 <doug16k> soon*
06:23:54 <doug16k> this is my workaround, set up telnet port server on qemu and run telnet with expect script that runs a qemu monitor command and exits, then I can grep it or poll it in another window with watch -> https://github.com/doug65536/dgos/blob/master/emu/qemu/qemu_monitor_cmd
06:26:15 <geist> yah info tlb is darn interesting when you boot <random OS>
06:28:19 <mahackemu> do all the newer architectures force you to do paging for memory protection?
06:28:45 <doug16k> force you? you can't force the willing
06:28:52 <Telyra> Segmentation was a bad 70s hack that was thankfully contained to the x86.
06:29:12 <geist> mahackemu: mostly yes. most architectures you simply run with no protection at all, in physical space, until you enable paging
06:29:24 <geist> funny enough, powerpc calls running withtout the mmu on 'real mode'
06:30:02 <mahackemu> i thoguth i saw a slide or something about riscv beign able to create 4 zones or something funky
06:30:09 <doug16k> intel's real mode should have been called "paragraph mode" :)
06:30:22 <geist> but it's not entirely true... some arches like mips or sh-4 or even vax have some fixed mappings baked into the architecture
06:30:36 <geist> like, for example, hard coding that 'top half' of the virtual address space is supervisor only
06:30:49 <geist> and having an identity map of physical ram starting at say 0x8000.0000
06:30:59 <geist> only accessible from supervisor
06:31:04 <doug16k> so you can wire the MSB of the address to the supervisor control signal ?
06:31:06 <geist> that's a fairly common hack
06:31:11 <geist> basically
06:31:22 <geist> mips has something like this, sh-4 does too
06:31:27 <doug16k> neat
06:31:53 <geist> it's popular for rchitectures where you have TLB software fill, because you can run the software fill routine (or the entire kernel if you want) out of the hardware identity map area
06:32:06 <geist> and thus avoids the recursion issue there
06:33:00 <geist> the other day when nyc` was talking about kseg0 and hatnot on mips, that's what they were talking about. tat's the identity supervisor only map at the base of kernel space
06:33:04 <geist> which is hard coded to be the top half
06:33:20 <doug16k> ah
06:34:44 <geist> mahackemu: it may have been referring to the 4 runtime levels. arm64 (armv8) has that too
06:34:54 <geist> it's not exactly the same thing, that's much closer to ring0 through ring3 on intel
06:37:36 <mahackemu> might have been soemthing like this, fuck patent pending, https://riscv.org/2018/09/hex-five-security-adds-multizone-trusted-execution-environment-to-the-sifive-software-ecosystem/
06:37:58 <geist> ah, thatsounds like some software thing
06:38:01 <mahackemu> GDT+TSS is prior art?
06:39:20 <geist> oh who even knows
06:39:34 <doug16k> arm has 4 privilege levels like x86?
06:39:53 <doug16k> you don't mean that do you
06:40:00 <geist> armv8 yes. some of them are tehcnically optional, but prtty much all implementations implement all 4
06:40:13 <geist> yep. that's why when i say EL1, EL0, EL3, etc
06:40:17 <geist> that's referring to the 'exception level'
06:40:22 <geist> EL1 == supervisor, EL0 == user
06:40:33 <geist> EL2 and 3 are used for hypervisors and security monitors, respectively
06:40:44 <mahackemu> lol of course it's backwards
06:40:57 <geist> it's not backwarrds. it's forwards
06:41:04 <doug16k> ok ya I vaguely remembered those last two after I asked
06:41:05 <geist> makes total sense, number from least priviledged up
06:41:15 <geist> since the higher priviledged levels are more optional as you go
06:41:24 <geist> you can easily make a core with just EL0 and EL1
06:42:09 <doug16k> so it has binary privilege level then, user/supervisor
06:42:16 <geist> yes
06:42:17 <doug16k> nothing like CPL 0 thru 3
06:42:33 <geist> well, sort of. but it's assumed that other code controls the other ELs
06:42:44 <geist> ie, you have your hypervisor running at EL2, so it's separate from the EL1 binary
06:43:50 <geist> that's how ARM implements a hypervisor: simply anothe rlevel. looks like another kernel, basically, where a 'process' is a whole nother kernel
06:43:52 <doug16k> so it's safe to say only x86 has those intermediate privilege levels and practically nothing else so never use 1 or 2 ever
06:44:03 <geist> and EL2 has it's own page tables that EL1 fetches through. it's a very clean design
06:45:04 <doug16k> that level would be the guest physical to host physical paging?
06:45:17 <geist> right
06:45:21 <doug16k> nice
06:46:28 <geist> EL3 is somewhat like SMM. it runs in physical mode
06:47:07 <geist> it's where you implement your super low level rom or whatnot
07:20:11 <doug16k> that bouncing balls test I showed, I said it was a bunch of divs, actually that was the canvas one, it was rendering lots of circles with canvas 2d drawing api
07:20:30 <doug16k> idk where the corresponding div one is
07:20:57 <ryoshu> doug16k: hi
07:21:07 <doug16k> hi
07:21:18 <ryoshu> doug16k: solaris has the same issue with reading IA32_EFER with !PAE && PG
07:21:42 <doug16k> it should allow it
07:22:20 <ryoshu> I'm waiting for feedback from upstream for the rationale to block this operation
07:24:41 <doug16k> they seem to be enforcing the SDM vol 3 section 9.8.5 procedure, real cpus let you do lots of wacky things that work that aren't mentioned there
07:25:33 <doug16k> Intel says, "The operating system must be in protected
07:25:33 <doug16k> mode with paging enabled before attempting to initialize IA-32e mode". really? no.
07:25:51 <doug16k> why with paging enabled?
07:26:02 <doug16k> that alone discredits 9.8.5 quite a bit
07:26:30 <geist> yah you dont need to do tht
07:26:57 <geist> actually that kinda doesn't work anyway, since you have to load crr3 with something that's pointing at 4 level page table
07:27:08 <geist> so you can't actually do what with paging already enabled in 32bit mode
07:27:58 <doug16k> ya one of the steps is set PG=0
07:28:08 <doug16k> step 1 actually, lol
07:28:41 <geist> i think there is a bit of a questio as to if you can officially switch directly from real mode to long mode 64bit, but that's different
07:29:34 <geist> or maybe that's 16bit protected mode
07:29:40 <doug16k> I followed the silly SDM procedure, except the "must have paging on then turn it off in step 1" nonsense
07:30:24 <doug16k> I probably should throw an identity map somewhere and flick it on and off so I can say I fully followed it :)
07:31:06 <ryoshu> why do we need instruction emulator in vtx?
07:31:15 <ryoshu> one reason is MMIO
07:31:24 <ryoshu> but another some 'legacy' not sure what does it really mean
07:36:12 <geist> well, themmio is a big one
07:36:32 <geist> the pio stuff is less of a problem because it can fairly easily decode that for you. vtx has some decode assists for that, svm too
07:37:05 <ryoshu> decode assists in sw or hw?
07:37:24 <geist> in hw
07:37:55 <geist> as in there are some regs that are filled in with PIO traps that say which type of pio it was, and then generally its implicit that its using the a and d registers
07:37:57 <ryoshu> but we still need it in sw?
07:38:18 <geist> well, you have to at least still decode that register so you can then figure out how you want to emulate it
07:39:03 <ryoshu> and what does mean that for legacy OSes we need emulation as well? why?
07:39:27 <geist> not sure exactly the context of 'legacy' here
07:40:09 <doug16k> in general I'd guess it means some old stuff does strange things that worked but not anymore, so workarounds
07:40:19 <geist> could be wonky TSS and whatnot stuff
07:40:31 <geist> certainly in earlier verrsions of VTX that was not hw accellerated
07:40:38 <ryoshu> https://news.ycombinator.com/item?id=13487241
07:40:44 <geist> i think that got picked up around nehalem, being able to run < 32bit protected mode
07:40:48 <ryoshu> '
07:40:48 <ryoshu> bonzini on Jan 25, 2017 [-]
07:40:49 <ryoshu> Apart from the legacy case, you need it for MMIO---KVM for ARM also has a mini parser for LDR/STR instructions.'
07:40:52 <geist> so sometimes the docs may refer to that
07:41:02 <geist> aaaaaah so you've been talking about ARM all this time?
07:41:15 <ryoshu> 'Wait, x86 still requires instruction emulation for non-weirdo non-legacy cases? My vague recollection of the KVM Forum talk G. did was that you don't need it for "modern" guests.'
07:41:48 * geist shrugs
07:41:51 <ryoshu> is this arm?
07:41:54 <doug16k> he was talking about strange bootstrap code in a couple of cases turned PAE on when PG=1
07:42:13 <doug16k> then a hypervisor that insists that PAE=1 or else you can't read EFER
07:42:15 <ryoshu> aah, right ARM in a comment
07:42:18 <doug16k> right?
07:42:28 <ryoshu> yes
07:42:43 <ryoshu> just trying to understand HAXM
07:42:58 <geist> so arm64 at least is very straightforward for trapping. the hw fully decodes a subset of LDR/STR instructions
07:42:59 <ryoshu> so legacy is for arm, probably older generations
07:43:02 <ryoshu> of isa
07:43:07 <geist> and spills it all out in a register
07:43:27 <geist> so you can easily emulate it without needing to decode an instruction
07:43:44 <geist> and there's nothing really else that you need to trap like that, since there's no pio
07:43:57 <ryoshu> I see
07:44:12 <doug16k> ryoshu, ok good! I thought I had you mixed up with someone else :)
07:44:15 <geist> all other instruction traps have a specific exception code
07:45:09 <ryoshu> but IA32_EFER is independent doubt to instruction emulator, I was just surprised to see it in the source code
07:45:48 <doug16k> you have to know whether LMA is 1 to know if it is long mode. how else can you know?
07:46:05 <doug16k> other than instruction decoder tricks
07:46:37 <ryoshu> doug16k: OpenBSD reads IA32_EFER and enables NXE and then PAE
07:47:07 <ryoshu> with PG and PE enabled
07:47:08 <doug16k> right but I meant, when you are instruction emulating, you'd consult some data behind the emulated EFER, which holds LMA flag
07:47:27 <ryoshu> EFER is unrelated totally :)
07:47:42 <ryoshu> to a general topic of emulating MOVS etc for MMIO
07:47:47 <ryoshu> sorry for mixing 2 topics
07:48:42 <doug16k> I was responding to "but IA32_EFER is independent doubt to instruction emulator", and I disagreed, you need to know the value of LMA to begin
07:49:05 <ryoshu> ah right
07:49:22 <ryoshu> hypervisor has some logic
07:49:37 <ryoshu> and it maintains LMA LME bits
08:06:51 <doug16k> geist, what arch would be most useful to have info tlb on that's missing? aarch64?
08:07:02 <geist> yah probablyso
08:07:40 <geist> and it's fairly close to x86. a bit more bits in play
08:07:47 <doug16k> is there a daily build iso of fushia or something that'll spin up in 2 seconds on it somewhere?
08:07:56 <geist> negative
08:08:48 <geist> ifyou just want to build zircon, it's fairly easy
08:09:01 <doug16k> I can "make" an iso?
08:09:11 <geist> yes and no
08:09:30 <geist> basically yes
08:09:44 <geist> but you're far better off asking o #fuchsia where folks that actually know may be able to help
08:10:57 <doug16k> that might be better than what I need. I should be able to find a bootable iso of linux or something I can throw at it that works in qemu under aarch64
08:11:17 <doug16k> at least at first
08:11:17 <geist> we generally boot with -kerne
08:11:21 <geist> theres a whole script to do it
08:11:35 <ryoshu> are there plans to port HAXM to fuchsia? :)
08:11:41 <doug16k> ah, yeah that'd be just as good actually. I see. jus build fushia and I can -kernel it?
08:11:47 <geist> i doubt it ,since we already have our own KVM
08:11:52 <geist> well, our own KVM like thing
08:12:10 <ryoshu> there is now preparation to switch dosemu to HAXM
08:12:10 <geist> and haxm is intel only, so it doesn't cover amd and arm
08:12:14 <ryoshu> well, research
08:12:33 <ryoshu> so 1 API to support all mainstream OSes (Win,Lin,Mac,NetBSD)
08:13:17 <ryoshu> there is work on performance bottlenecks with MMIO now (needed for efficient VGA)
08:17:39 <geist> but it's intel only, right? or is that the other thing?
08:17:47 <doug16k> efficient and vga are a contradiction
08:18:11 <ryoshu> doug16k: good enough to play DOOM in this case :)
08:18:28 <ryoshu> geist: yes, intel only.. but there are people ready to work svm
08:18:36 <ryoshu> on SVM
08:18:41 <geist> but i thought it was based on an intel project?
08:18:53 <ryoshu> intel is the maintainer
08:19:01 <geist> so isn't that going to be a problem?
08:19:04 <ryoshu> but they are open to ARM/SVM/other patches
08:19:16 <geist> yeah, time will tell there
08:19:22 <ryoshu> no problem, they just cannot spare any resources from intel team on it
08:19:42 * geist nods
08:20:00 <doug16k> impartiality and fairness would be in question though
08:20:08 <ryoshu> but intel people formally added svm port as gsoc proposal this year (and there is a student precandidate interested to work on it, and a mentor to help him)
08:21:15 <ryoshu> in a same way intel just needs to maintain Darwin and Windows ports, Linux and NetBSD are done by the community
08:21:42 <geist> anyway, if the api is not too wonky it's possible to make it work with zircon
08:21:54 <geist> except the kernel side would have to be in the kernel, which would be problematic
08:22:37 <ryoshu> is this microkernel with drivers in userspace?
08:22:40 <geist> yes
08:23:48 <geist> but we currently have the hypervisor side stuff in the kernel
08:23:52 <ryoshu> it needs to run in a privileged mode, splitting it would be too much refactoring (and certainly performance issue)
08:23:58 <geist> it's more of a where does the code live, etc
08:24:24 <ryoshu> first we need real users of this api, beyond qemu
08:24:30 <ryoshu> and androidstudio
08:28:56 <ryoshu> for an experienced person and assuming that the kernel has all the features (atomics, mutexes, spinlocks, memory pinning (wiring), allocators, mapping kernel into user and user into kernel memory, few other utility routines).. it's a weekend of porting
08:29:13 * geist nods
08:29:26 <geist> and gettting the existing implementatio of this stuff out of the way
08:31:25 <ryoshu> at least in other OSes it doesn't conflict much (if at all)
08:32:12 <ryoshu> I mean, no need to patch host kernels
08:33:11 <geist> yah
08:33:25 <ryoshu> http://polprog.net/blog/netbsd-hax/ we are fixing now issues with guests
08:36:08 <geist> guess it depends on what the kernel/user api looks like
08:36:14 <geist> to see if it' be compatible with zircon's way of doing things
08:36:53 <ryoshu> https://github.com/intel/haxm/blob/master/docs/api.md
08:37:40 <ryoshu> UNIX systems use ioctl(2), Windows something distinct
08:38:49 * geist nods
08:42:22 <Telyra> DeviceIoControl(), which... basically works exactly like ioctl(2)
08:42:53 <Telyra> Instead of an fd it takes a HANDLE, but other than that it's basically the same thing.
08:43:13 <ryoshu> I see
08:44:06 <Telyra> Aaaaaand it takes a bunch of other parameters for things like input and output buffer size and a pointer to a DWORD for buffer filled size and optionally a pointer to an asynchronous I/O structure
08:44:18 <ryoshu> I still need to read Windows Internals
08:44:29 <Telyra> Welcome to the world of NT, where the system calls are built like a goddamn tank
08:44:47 <ryoshu> they just need to be quick
08:51:00 <rakesh4545> hello, I am getting back into kernel dev. Is it a bad idea to try and implement a GUI for my kernel (more like a bootable program rather than a kernel) being myself quite a beginner?
08:53:27 <ryoshu> is this ring0 OS?
08:54:56 <geist> ryoshu: yeah that's not too far off with what we do currently
08:55:00 <klys> probably better off to use a common graphics lib to make your gui, and keep in mind most folks that write an os go all the way into memory management, so you will need support routines for things like malloc() and free().
08:55:08 <geist> currently we have a handle to one or more vcpus that you do a series of syscalls on
08:55:32 <ryoshu> geist: is there qemu available on fuchsia?
08:55:44 <geist> https://fuchsia.googlesource.com/fuchsia/+/master/zircon/docs/syscalls/guest_create.md etc
08:55:55 <geist> no we have our own runtime that acts like qemu
08:56:47 <geist> https://fuchsia.googlesource.com/fuchsia/+/master/garnet/bin/guest/ i believe
08:57:14 <ryoshu> thanks!
08:58:07 <klys> while discussing garnet features, is there a filesystem driver for a common filesystem?
09:01:04 <geist> yah, there's minfs which is a fairly traditional looking unixy thing
09:01:17 <klys> is that for a minix partition
09:01:51 <geist> no. not at all
09:02:09 <klys> okay, I guess time to look up minfs fuse for linux
09:02:17 <geist> heh
09:03:05 <geist> t does't eixst, in case that's what you mean
09:03:54 <klys> https://github.com/minio/minfs
09:03:58 <klys> you sure?
09:04:10 <geist> absolutely
09:05:44 <klys> okay you convinced me
09:06:30 <geist> minfs was invented for zircon. it's a simple 'minimal' fs
09:06:41 <geist> though i think at this point it's grown COW featurers and whatnot
09:06:47 <ryoshu> this garnet/bin/guest/ is just userspace single program code, am I right?
09:07:12 <geist> it's the user side of the virtual machine stuff
09:07:40 <ryoshu> I wonder if it (ported) could be used with a different hypervisor
09:08:03 <geist> probably not. it's likely to be highly tuned for the particular way zircon delivers its messages from the kernel
09:08:11 <geist> and how shared memory works in zircon
09:12:24 <ryoshu> uint8_t mov_dh[] = {0x88, 0b00'110'110};
09:12:32 <ryoshu> is this valid syntax with ' ?
09:12:38 <geist> yep!
09:12:40 <ryoshu> https://fuchsia.googlesource.com/fuchsia/+/master/garnet/bin/guest/vmm/arch/x64/decode_unittest.cc#164
09:12:43 <ryoshu> in c++?
09:12:44 <geist> that's a c++11 think i believe
09:13:03 <geist> you can put ' anywhere in a numeric literal and it just ignores it
09:13:10 <geist> may be some other chars too
09:13:41 <ryoshu> I see
09:22:42 <Telyra> Yeah that's a c++11 thing, binary constants are c++14
09:26:46 <ryoshu> anyway I have got enough to learn with HAXM still, need to fix few annoying bugs that are left
09:33:01 <geist> yah binary constants have been a gcc extension forever, but nice that they got made into an official thing
09:33:20 <geist> we're generally compiling everything as c++17
09:35:27 <ryoshu> garnet == userland?
09:35:50 <geist> not exactly. the garnet/peridot/topaz stuff is mostly going away, but they have up until now been a 'layer' sort of thing
09:36:04 <geist> as in zircon is the core, garnet is a layer of code above that, periodot/topaz are above that
09:36:18 <ryoshu> stdlib
09:36:22 <geist> as in higher layers can depend on lower, but not vice versa
09:36:44 <geist> but most of that is going away, it's getting mashed into a single repo now
09:36:52 <ryoshu> I see
09:37:19 <geist> garnet was the second deepest layer though, just above zircon
09:37:55 <ryoshu> I'm mostly interested whether there is gui (x window replacement)
09:38:04 <geist> oh there is. it's higher up
09:48:59 <geist> yeah looking at this haxm api it'd be interesting to extend it to ARM
09:49:08 <geist> since about 80% of it is in some way or another specific to x86
09:50:00 <ryoshu> we have now new native api in netbsd.. http://netbsd.gw.com/cgi-bin/man-cgi?libnvmm++NetBSD-current what do you think?
09:50:17 <ryoshu> it's inspired by windows one
09:50:54 <geist> at first glance it at elast seems to acknowledge that there are other architectures, so thats good
09:51:43 <ryoshu> vmx/svm supported now, aarch64 planned
09:52:27 <geist> yah the main cpu stuff should be fairly straightforward on arm64. it's generally simpler than x86
09:52:45 <geist> the GIC interrupt controller stuff is a bit more complicated and flexible than x86s so there is some stuff there that ca get wonky
09:52:53 <geist> but aside from that it's not too different, interface wise
09:53:04 <geist> the kernel side is of courrse absolutely totally different in every way
09:53:13 <geist> but at the end of the day you end up with the same set of problems
09:53:58 <ryoshu> I see
10:03:39 <ryoshu> too much information, I need more reading.
10:03:43 <ryoshu> thanks!
10:19:13 <doug16k> hey I wonder how completely the flush is when you write the PAT MSR -> http://www.sandpile.org/x86/coherent.htm
10:20:06 <doug16k> I wonder if that'd be faster than toggling PGE back and forth to flush G pages
10:20:20 <doug16k> if it even flushes them
10:21:27 <doug16k> CR0.WP changes flush TLB? that's a surprise
01:15:12 <zhiayang_> is there a point in separate libs for libc and (eg.) libsyscall?
01:17:53 <Mutabah> IMO, yes
01:20:06 <sortie> I tend to disagree
01:20:34 <sortie> Depends on the flexibility you need of course, and whether you want your libc to work on multiple platforms, or want to integrate with multiple proglangs
01:20:46 <sortie> But libc will be highly tied to the underlying platform no matter what
01:20:59 <sortie> So it might as well just invoke the syscalls directly
01:21:26 <sortie> On the other hand, maybe you want apps to ship their own libc, and have libsyscall be stable between major OS releases. That's the Windows method.
01:24:39 <nyc`> I'd like to make significant departures from UNIX/POSIX, but probably have to limit the scope of the project to VM internals on a smaller number of platforms than I'd like in order to get things done fast enough.
01:26:27 <nyc`> So I'll probably end up doing a vanilla libc and bending over backwards to provide UNIX/POSIX semantics despite my ultimate preferences.
01:28:52 <jmp9> okay i'm planning in future port my os onto x86-64
01:32:18 <renopt> nyc`: same
01:32:31 * renopt starts linux syscall compatibility layer
01:32:57 <renopt> musl soon :D
01:33:17 <jmp9> i heard that x86-64 doesn't uses GDT, right?
01:34:15 <nyc`> BTW how does YeOS sound for a name?
01:35:24 <mrvn> jmp9: it does
01:35:48 <mrvn> jmp9: x86_64 uses all of x86 except a bunch of bits in the structures are ignored.
01:35:59 <jmp9> it uses gdt, whaT??
01:36:21 <jmp9> i read in IA-64 spec that it assumes segment selectors as zero
01:36:22 <mrvn> jmp9: How else would you switch between code64 and code32 segments?
01:36:22 <klange> nyc`: Don't worry about names at this stage... remember, it can be hard to change them later ;)
01:36:36 <zhiayang_> nyc`: aren't you unnecessarily limiting yourself by forcing an 'os' suffix
01:36:47 <mrvn> nyc`: how about nycos?
01:36:52 <zhiayang_> jmp9: fyi ia-64 != ia-32e
01:37:00 <jmp9> uuuuh
01:37:04 <mrvn> jmp9: ia-64 is itanium
01:37:08 <jmp9> fuck
01:37:15 <renopt> YeetOS
01:37:16 <jmp9> i read wrong spec :D
01:37:24 <mrvn> itanium has selectors?
01:37:40 <nyc`> zhiayang: Possibly. I'm trying to come up with something.
01:38:06 <jmp9> ok recommend by x86-64 spec
01:38:09 <mrvn> nyc`: you know every name you mention will soon have a domain squatter
01:38:15 <jmp9> recommend me*
01:38:17 <zhiayang_> Mutabah, sortie: hm, interesting. i'll probably end up splitting them up
01:38:24 <mrvn> jmp9: can't go wrong with amd64
01:38:36 <zhiayang_> mrvn: lol if intel added segmentation to ia-64 they were just plain stupid
01:38:48 <zhiayang_> but i wouldn't be surprised in the slightest :D
01:39:05 <nyc`> mrvn: I want to do other things later, so it's not the embodiment of me as OS.
01:39:22 <zhiayang_> jmp9: i typically recommend the amd manual esp. for 64-bit development, but there are some differences so you should cross-check with the intel one once in a while to be sure
01:40:15 <jmp9> also who the hell is itanium?
01:41:27 <nyc`> mrvn: Ye means page (or leaf) in Chinese, which is thematic because the project is centered around superpaging etc.
01:43:42 <zhiayang_> jmp9: you could google this, but basically intel made a new 64-bit architecture and accompanying processor with a different paradigm from x86
01:43:58 <zhiayang_> performance was shit and the performance of their x86 emulator was even worse
01:44:15 <zhiayang_> meanwhile amd released x86-64 afterwards and itanium basically went "welp"
01:44:50 <mrvn> I still think having an emulator (half in hardware at the start) was a mistake. Just ment nobody ported their software.
01:45:53 <zhiayang_> yea, that makes sense
01:46:36 <nyc`> jmp9: I liked Itanium. The register count was massive plus it had register windowing plus it had nice things for numerics like rotating registers.
01:46:46 <jmp9> heh
01:46:57 <jmp9> okay so i should disable red zone for libgcc
01:46:58 <jmp9> https://wiki.osdev.org/Libgcc_without_red_zone#Preparations
01:47:18 <jmp9> i should put this file before configure or after configure? (becuase that file doesn't exist)
01:47:21 <mrvn> yes. Imho it's a total waste
01:47:29 <mrvn> and for kernel you must.
01:47:50 <mrvn> well, not must but it gets real painfull with.
01:49:31 <zhiayang_> (it sounds quite interesting, but i think their initial implementation wasn't good enough)
01:50:07 <zhiayang_> who knows, maybe things would've been different if intel didn't cave to amd
01:50:36 <nyc`> I don't see why they had to NIH, though. They bought the silicon side of DEC. They might as well have just rebranded the planned 21464 as the 64-bit successor.
01:51:04 <mrvn> doubtfull. amd64 killed any incentive for a major cpu arch switch
01:51:43 <zhiayang_> hm, that's true i guess.
01:51:47 <nyc`> zhiayang: Intel never gave it time on the fabs with the latest fastest processes.
01:51:51 <zhiayang_> was the x86 emu a feature from the start?
01:52:03 <zhiayang_> nyc`: oh really?
01:52:10 <mrvn> zhiayang_: it was in hardware and later moved to software.
01:53:13 <nyc`> mrvn: The 21464 was designed before 2000 and didn't have any radical features that OS's and compilers were unprepared to utilize.
01:53:41 <mrvn> nyc`: we were talking about ia64
01:54:32 <jmp9> okay i'm building toolchain
01:54:41 <jmp9> ok i find a lot of amd64 spec
01:54:45 <nyc`> mrvn: Intel's 64-bit strategy was poorly executed on multiple dimensions.
01:54:50 <jmp9> which one i should use?
01:55:01 <zhiayang_> jmp9: get chapters 2 and 3
01:55:15 <jmp9> thanks
01:55:27 <jmp9> AMD64 Architecture Programmer’s Manual Volume 2
01:55:28 <jmp9> this?
01:55:33 <zhiayang_> yes
01:56:27 <nyc`> zhiayang: IA64 hardware was always a generation down in the fabs I guess because of it being prototype or low count production runs.
01:56:46 <jmp9> ia64 is a meme
01:56:59 <zhiayang_> maybe they wanted to "trial run" it before taking up more fab space, i guess
01:57:13 <zhiayang_> looking at their sales forecast graph i'd've thought they were pretty confident
01:57:36 <nyc`> mrvn: There was no good reason to design a 64-bit successor from scratch.
01:57:47 <mrvn> zhiayang_: because you always make the sales forecast show how bad you are doing.
01:58:07 <zhiayang_> nyc`: well if intel had that idea why would they base it on another ISA vs their own x86
01:58:11 <zhiayang_> mrvn: ok true, true
01:59:06 <nyc`> mrvn: And they already owned all the DEC hardware technology. Hyperthreading came out of that.
02:00:09 <nyc`> zhiayang: They knew x86 had serious issues limiting performance and power efficiency.
02:00:18 <zhiayang_> and look where we are today
02:00:24 <zhiayang_> i love technology
02:00:31 <nyc`> zhiayang: Alpha did not.
02:00:55 <mrvn> zhiayang_: have you looked at the W/MIPS for alphas, x86, amd64, mips, arm?
02:01:23 <zhiayang_> w/mips?
02:01:28 <zhiayang_> i don't seem to be able to google what that is
02:01:30 <nyc`> zhiayang: Still being stuck with x86 on the mass market is a bad thing.
02:01:41 <mrvn> zhiayang_: millions instructions per second
02:01:49 <nyc`> Mega Instructions Per Second
02:02:24 <zhiayang_> oh, i thought it was about MIPS the isa
02:02:54 <mrvn> and don't forget BOGO MIPS.
02:03:37 <nyc`> All those MIPS numbers are fabrication process affairs.
02:04:32 <nyc`> Comparisons on equal fabrication processes are necessary to get a real picture.
02:04:51 <jmp9> okay where is in amd64 spec is gdt entry structure?
02:04:59 <jmp9> i'm kinda new to spec navigation
02:05:14 <Mutabah> jmp9: Intel or AMD docs?
02:05:14 <zhiayang_> jmp9: do you have a pdf reader with chapter navigation
02:05:22 <jmp9> yes
02:05:22 <mrvn> or search function?
02:05:25 <jmp9> amd64
02:05:26 <zhiayang_> chapter 4.6
02:05:28 <jmp9> search yes
02:05:30 <jmp9> thanks
02:05:34 <zhiayang_> actually just the whole of chapter 4
02:07:29 <jmp9> i have question
02:07:36 <jmp9> so we use grub loader to load os
02:07:40 <jmp9> its loads 32 bit binary
02:07:47 <jmp9> in this binary we load 64 bit kernel elf
02:07:50 <jmp9> parse and load
02:07:58 <jmp9> then we switching to long mode
02:08:03 <jmp9> but what happens
02:08:09 <jmp9> we doesn't have 64 bit gdt
02:08:21 <jmp9> or what will happen if we enable paging
02:08:35 <zhiayang_> the wiki has a bunch of articles on this
02:08:59 <jmp9> i'm kinda confused becuase it will collapse if there is 64 bit gdt before paging
02:09:14 <jmp9> gdt representing segments in virtual mem
02:09:31 <mrvn> jmp9: RTFM
02:09:39 <jmp9> what
02:10:40 <nyc`> Actually, one has to compare on both equal fabrication processes and equal clock speeds for meaningful comparisons between architectures.
02:11:25 <mrvn> jmp9: you are asking questions that are explained in the manual in detail
02:11:43 <mrvn> nyc`: you can compare W/MIPS for any fabrication process and clock speed.
02:12:42 <nyc`> mrvn: You're just comparing fabrication processes and clock speeds, not the architectures.
02:13:17 <mrvn> nyc`: nope.
02:13:46 <mrvn> nyc`: I'm comparing how much power a cpu needs per instruction.
02:15:02 <jmp9> "Fields Ignored in 64-Bit Mode. Segmentation is disabled in 64-bit mode, and code segments span all of virtual memory. In this mode, code-segment base addresses are ignored. For the purpose of virtual-address calculations, the base address is treated as if it has a value of zero. "
02:15:08 <nyc`> mrvn: Jack up the clock speed and use a tiny fabrication process beyond everything else out there and grossly inferior architectures will beat everything else in the world.
02:15:30 <mrvn> nyc`: jack up the clock speed and the power goes way up. W/MIPS drops like a stone in water.
02:15:33 <jmp9> that tells amd64 spec
02:15:49 <mrvn> nyc`: use a tiny fabrication process and power goes down so that helps.
02:16:43 <mrvn> nyc`: but if intel manages to do 10nm processes and amd doesn't that is their benefit then.
02:17:16 <mrvn> nyc`: as someone who has to pay the electric bill I truely don't care if the cpu is 20nm, 15nm, 12nm or whatever.
02:17:55 <nyc`> mrvn: You're still mostly comparing semiconductor features or at least comparing things where the semiconductor technology completely dominates the actual design of the architecture.
02:19:27 <nyc`> mrvn: Congratulations, you're more interested in semiconductor technology than computer science.
02:20:27 <mrvn> nyc`: you are grossly overstating the effect of the fabrication process on the cpu.
02:22:33 <nyc`> You can resuscitate architectures from before 1980 if you're convinced semiconductor technology is such a modest effect.
02:23:33 <mrvn> nyc`: Yeah, sure. Lets take a C64 and crank it up to 2GHz. Still will only do 1/16-1/32 the number of instructions as say a Ryzen 5.
02:23:52 <mrvn> And thT's ignoring that it has no L1/L2/L3 cache for ram access.
02:23:57 <mrvn> that's
02:25:28 <nyc`> Apart from cache effects, I'm not convinced that's an accurate assessment.
02:26:36 <mrvn> nyc`: A C64 iirc takes 4 cycles for an instruction: fetch, decode, execute, writeback. Modern CPUs pipeline, are superscalar, have multiple execution units, speculative execution, ...
02:27:16 <nyc`> Circuit depths were far shallower and that translates to low latencies.
02:27:47 <mrvn> So crank it up to 4GHz, still 1/8th the MIPS.
02:28:42 <mrvn> nyc`: If a C64 made with modern methods were faster than and amd64 then amd would build C64s.
02:31:49 <nyc`> It's faster at what, not if it's faster. They're not 32-bit, they don't have memory protection, I'm not sure they even fully support interrupt-driven IO, and (the real performance issue) caches weren't there.
02:32:51 <nyc`> Oh, no floating point, either.
02:34:14 <mrvn> nyc`: The MIPS is already lower and you need a lot more of them to do the same thing. See. No contest.
02:34:22 <nyc`> Not having (or using) virtual memory is a speedup.
02:35:27 <nyc`> mrvn: You're missing something major here, like any concept of what the circuits look like.
02:36:19 <nyc`> (And how that relates to timings.)
02:37:16 <nyc`> Circuit depth is a real measure in algorithmic analysis.
02:37:18 <mrvn> nyc`: so you are saying a modern made C64 will be faster? dream on.
02:38:16 <mrvn> Note: Circuit depth is offset by pipeline depth.
02:38:34 <nyc`> mrvn: There's something very strange about the ways you analyze things.
02:40:46 <ashkitten> cpus are complicatedddddd
02:40:55 <nyc`> Pipelines are there to compensate for the depth of different processing stages of instructions because they got deep enough for some stages to sit idle while others were actively processing.
02:41:35 <ashkitten> nyc`: so risc might help alleviate the need for pipelining?
02:41:55 <ashkitten> or am i misinterpreting
02:43:00 <nyc`> ashkitten: No, it has more to do with mutual interference between registers and the instruction decoder as a huge bottleneck.
02:43:09 <ashkitten> ah,
02:43:44 <nyc`> ashkitten: I guess addressing modes also created dependencies on memory artificially too.
02:43:46 <mrvn> pipeline allows splitting up a very long circuit depth into multiple stages.
02:44:18 <mrvn> and then you can run the stages in parallel instead one per cycle.
02:44:55 <ashkitten> ah
02:44:58 <ashkitten> neat
02:45:35 <nyc`> Yes, so when you don't have that circuit depth in the first place, pipelines are meaningless as ways to compensate for latency with bandwidth.
02:45:58 <mrvn> nyc`: doesn't change though that a C64 doesn't run the stages in parallel.
02:46:14 <mrvn> nyc`: nor power down the parts of the cpu currently not used.
02:50:03 <ashkitten> the commodore 64 would barely qualify as a general purpose computer these days given its lack of many features in modern cpus. there's a reason specialization makes things faster, yeah?
02:51:45 <nyc`> I'm going to caffeinate myself and work on an OS instead of filling the channel with philosophical differences about hardware (to put it kindly).
02:51:50 <ashkitten> there's no arguing that the c64 is not as useful as a modern x86 machine, but if it was remade with modern tech there should be no doubt that it could outperform a more complex cpu i think
02:52:14 <ashkitten> as in, it would outperform in the specific tasks it can do
02:52:49 <nyc`> ashkitten: Well, I acknowledged a cache memory proviso.
02:53:03 <ashkitten> there is that, yeah
02:57:19 <mrvn> ashkitten: except modern cpus can run 4+ instructions per cycle in parallel while a C64 takes at least 4 cycles each.
02:57:24 <ashkitten> but generally that is why there are asics, it's much more efficient to run something on hardware specifically made to run that task
02:59:36 <ashkitten> a c64 is not an asic but it's not nearly as general purpose as an x86 cpu. i don't know enough about cpu hardware architecture to say but i'm sure there are some other gains that could be made, certainly with regard to power consumption
03:01:04 <ashkitten> i'm gonna stop talking now because this is outside my realm of knowledge
03:01:17 <nyc`> I think there are some x86 chips that literally have ARM systems embedded in them to do some kind of control of power consumption affairs.
03:01:48 <ashkitten> that sounds familiar
03:01:51 <nyc`> Try that one on for irony.
03:02:16 <ashkitten> like the intel management engine?
03:03:17 <ashkitten> i can't remember but i saw a talk that contained some info about that.. maybe it was the one about rings below 0 and -1
03:04:34 <ashkitten> iirc that talk didnt say it was arm but some unknown risc architecture
03:05:13 <ashkitten> oh, it might have been the one about undocumented instructions
03:06:52 <nyc`> Seriously, when you're embedding multiple systems of other architectures to manage the state in your CPU's, it's time to admit there's a serious problem.
03:08:29 <ashkitten> lol yep
03:34:00 <nyc> ashkitten: https://en.wikipedia.org/wiki/Circuit_complexity could be a little enlightening.
03:44:02 <nyc> wrt. domain squatters, there's a Singaporean soy drink named Yeo's whose maker has http://www.yeos.com/
03:47:37 <nyc> (I guess technically the original inventor of it was from Fujian, but they migrated to Singapore and the business was based out of there.)
04:04:29 <zhiayang_> nyc: hey now what you saying about my country
04:06:14 <nyc> zhiayang: Someone from China named Yeo started a soft drink company in Singapore that has www.yeos.com so the people that have YeOS -related domains aren't really domain squatters.
04:10:27 <nyc> zhiayang: I guess paper was invented there, so alluding to that history for the name of the OS I'm working on seemed meaningful because its purpose is to fiddle with page sizes. I thought about using the name of paper's inventor, but TsaiOS sounded too close to iOS.
04:11:32 <zhiayang_> i see, i see
04:12:44 <nyc> zhiayang: My first attempt, TalOS, was even worse as far as having video game characters, a Cisco security service, and a Greek god all called that when I was trying to allude to the event where paper-making got transmitted to the Abbasid caliphate.
04:22:50 <nyc> I guess if I ever get anywhere with it, this OS will have an official soft drink.
04:24:26 <zhiayang> protip: as far as i know they don't make any drinks with fizz
04:24:36 <zhiayang> just drinks loaded with sugar
04:24:37 <zhiayang> :D:
04:26:38 <nyc`> zhiayang: I'm not 100% sure what to call them if not soft drinks.
04:27:15 <zhiayang> sugary drinks?
04:28:30 <nyc`> zhiayang: Bottled drinks? Who knows.
04:58:18 <knebulae> @nyc: you were TalOS?
04:58:26 <knebulae> @nyc: talk about coming full circle.
04:58:42 <nyc> knebulae: No, I considered that as an OS name for about 24-48 hours.
04:58:52 <knebulae> @nyc: oh, ok.
04:59:27 <knebulae> @nyc: I believe someone else in the past (maybe distant) used that name.
05:00:24 <nyc> knebulae: I'm waiting to see how YeOS (really YèOS or 頁OS) holds up now. I don't think they used it for an OS, but a video game character, a Cisco security service, and a Greek god all had Talos as a name.
05:00:51 <knebulae> @nyc: gotcha
05:01:09 <knebulae> @nyc: I don't read too good
05:02:15 <knebulae> @nyc: sorry, Poe's Law: ;)
05:02:30 <nyc> knebulae: If TsaiOS didn't sound too much like iOS, it would have been fun because of it presenting a UNIX-like interface while its namesake Tsai Lun was a eunuch.
05:03:13 <knebulae> @nyc: if Tsai is pronounced the same as chef Ming Tsai, that's not really that close to iOS. Sounds like psy-OS.
05:03:51 <nyc> knebulae: Yes, it would have been like psy-OS, but it's still too close in sound to iOS.
05:04:01 <knebulae> @nyc: to each their own.
05:09:39 <nyc> knebulae: I've got bigger fish to fry than the name, like actually getting the thing running. It's at least building on mips64, sparc64, and arm64. I can probably stub things out to get riscv64 and ppc64 going. Sadly, or1k doesn't have Ubuntu cross compilers and my attempts to build my own didn't go so well, at least for bare metal OS-less ${cpu}-elf target triplets.
05:55:48 <nyc> There's a bare bones IBM POWER asm stub.
05:59:33 <zhiayang> ok, i found out the hard way via heisenbug that you can get interrupted between syscall and swapgs
06:00:52 <nyc> zhiayang: =(
06:03:10 <zhiayang> added this to the wiki
06:03:22 <zhiayang> this cost me a good two hours ):
06:05:25 <doug16k> zhiayang, you can setup an MSR to mask IF during syscall until you get the stack switched over and all proper
06:05:27 <nyc> ... and there is a bare bones RISC-V asm stub.
06:05:50 <zhiayang> doug16k: yep, i did set the sf_mask msr
06:06:09 <zhiayang> but it did not occur to me that the interrupt might interrupt at an inopportune moment
06:06:28 <zhiayang> (i only realised after looking through your code, tbh :D )
06:06:44 <doug16k> cool
06:12:28 <zhiayang> on the bright side, basic ipc is working now
06:14:27 <nyc`> zhiayang: =)
06:21:56 <c32> hello, i can't enter protected mode
06:22:22 <c32> qemu does wierd things when i set %ds after the long jmp
06:22:37 <c32> what could it be?
06:25:57 <c32> maybe i should try to use gdb with qemu..
06:29:25 <nyc> Okay, ARM, MIPS, POWER, RISC-V, and SPARC all have some sort of executable building for all their 64-bit variants. Now to get some asm hello worlds for ARM, POWER, and RISC-V (sadly, SPARC has some issues with kernel loading on qemu). MIPS is outputting again, as before the custom toolchain attempt.
06:43:59 <nyc> https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg01352.html <<<=== this thread will probably tell me everything I need for the hello world on arm64.
06:50:47 <c32> i didn't have a $ before a constant...
06:52:51 * bcos_ is wondering... If one person has a degree in CS and another person has a degree in marketing; which person has a higher chance of writing a successful OS?
06:55:53 <c32> depending on whether the one with the marketing degree can hire a programmer
06:58:28 <nyc> bcos_: It probably depends more on factors like how connected their parents are, how much wealth they inherited, how much corrupt access to deals like Microsoft's OEM exclusivity affairs they can get, and the like.
06:58:56 <bcos_> I'm guessing the person with a degree in marketing would start with some kind of market analysis (figure out where competitors are weak and where the opportunities are) and is more likely to create a useful "stategic plan"; but (without help) is a lot less able to implement it
06:59:21 <bcos_> ..but the CS person is far more likely to end up with "working code that nobody wants"
06:59:55 <bcos_> In other words, it'd be "good plan with bad implementation" vs. "bad/no plan with good implementation"
07:00:06 <bcos_> But
07:00:32 <bcos_> Then the marketing person would use scummy advertising tricks to make people think theirs is "god plan with good implementation"
07:00:38 <bcos_> *good
07:01:04 <bcos_> Which implies, the marketting person is far more likely to succeed where a CS person won't
07:02:15 <c32> but a marketing person likes marketing and making an os requires a deep interest in how a computer works
07:02:29 <c32> from what i know which is not too much
07:02:40 <c32> the marketing person would give up
07:06:03 <bcos_> c32: Not sure about that - motivation is a scarce resource for anyone
07:08:52 <doug16k> the way I see it, a true engineer type person can't resist solving problems put in front of them, ones that can write an operating system keep presenting themselves with new problems to solve
07:09:51 <nyc> Ouch! ARM64 removed predication!
07:10:09 <geist> hmm?
07:10:24 <geist> you mean conditional instructions? yes
07:10:49 <nyc> geist: https://static.docs.arm.com/100898/0100/the_a64_Instruction_set_100898_0100.pdf says ```Conditional operations A64 only enables conditional execution on branch instructions. This is in contrast to A32 and T32 where most instructions can be used with a condition code. The A64 instructions are:```
07:11:41 <doug16k> if you have a big kick ass branch predictor, then predication is probably not going to be as good as prediction. if you want to save transistors, then predication is better
07:11:50 <bcos_> nyc: Determine the opcodes for "conditional branch over next instruction" and pretend they're "predicate prefixes".. ;-)
07:13:05 <nyc> doug16k: Saving transistors is a big deal.
07:13:23 <bcos_> No...
07:14:06 <nyc> I liked ia64's predication affair where they had predicate bits floating around you could set with comparator etc. ops and then you could make probably most operations predicated on them.
07:14:42 <geist> nyc: that's correct
07:14:48 <bcos_> (typically it's the opposite - the engineers start with a budget like "10 million transistors" and try to find ways to make the most effective use of all those transistors and run out of ideas and end up saying "Heck, lets just have a honking great cache to use up the last 5 million transistors")
07:14:51 <geist> and what doug16k is exactly right
07:15:20 <geist> there are two reasons arm removed it in arm64: a) you get 4 bits back in the instruction that you can put to better use and b) modern branch predictors are really good
07:16:17 <geist> so it's very much like x86 now. if you compare and then branch that can get folded into a single op (arm64 even has a cbz and cbnz instruction for the common compare with zero and not zero)
07:16:37 <geist> and then let the branch predictor do it's job so that a not taken branch folds away
07:17:18 <mrvn> also the predication is pretty much a branch prediction except with fixed offset.
07:17:54 <geist> sort of. you stll have to support all the guts of interrupting it in the middle, consulting the cpsr on every instruction, etc
07:18:03 <mrvn> So you need extra hardware to make the opcode into a jump+opcode internally.
07:18:04 <geist> i think its fairly hard to implement in a high performance superscalar design
07:19:15 <geist> but in as much as most arm64 cores still can run arm32 code, they still need all the stuff wired up there and ready to go in the pipeline
07:19:18 <mrvn> On the other hand you can tweak the condition into the write back. Evaluate everything like normal and then in the write back check the condition before writing to the register file.
07:19:28 <mrvn> it really depends on how you design the internals.
07:19:48 <mrvn> Saving the 4bits in the opcode was probably the deciding factor.
07:20:26 <geist> right. in many cases the arm64 ISA is simpler than the arm32 one. it got rid of some of the fancy nice-for-humans stuff
07:20:51 <geist> conditional instructions, barrel shifter, conditinoal writeback to the cpsr (the . but), load/store multiple to name a few
07:21:03 <geist> all of these are things that a superscalar design doesn't want to have to deal with
07:21:28 <bcos_> I suspect that; assuming CPU has an "external instructions converted to internal micro-ops by front-end" design; the front-end can convert external predicates to internal forward branches (by detecting "sequences of ops with same predicate"), and/or convert external forward branches to internal predicates (by "smearing" flags and conditions to subsequent "jumped over" ops), by front-end
07:22:49 <geist> probably. i think there are some general rules you should follow about a sequence of conditional instructions
07:23:08 <geist> ie, do a run of one and then a run of another, dont go back and forth
07:23:20 <geist> the IT block on thumb2 is i think the much harder one to deal with
07:23:49 <geist> it's similar, but more dynamic, basically a way of saying 'conditionally skip the next N instructions and run M after that'
07:23:53 <geist> or vice versa
07:24:05 <geist> but the trouble is the state is a little counter that lives in the cpsr, and the cpu can interrupt in the midle of it
07:24:09 <geist> i'm sure they wish that didn't exist
07:24:28 <bcos_> Can't save counter at start of IRQ and restore after?
07:24:35 <geist> you can. it's in the cpsr itself
07:24:53 <geist> but it would be harder for a superscalar design if you wanted to do the thing where you fold it out
07:25:17 <geist> clearly they can do it, and maybt it's not too weird,but i've heard ARM engineers grumble about it
07:25:28 <geist> i think it's one of those good ideas at that time that cause a lot of headaches later on
07:25:59 <geist> basically with modern superscalar design you just want compare and branch and nothing too funny, and no instructions that run a lot of microcode or take a lot of time
07:26:12 <geist> hence arm removing load/store multiple instructions
07:29:53 <jmp9> okay i compiled test 64 bit kernel
07:29:56 <jmp9> and question
07:30:01 <jmp9> why it's 2 mb in size?
07:30:27 <gamozo> Well your pointers got 2x larger!
07:30:48 <jmp9> it's only 10 lines of code
07:30:54 <jmp9> in one file that i compiling - kmain.c
07:31:12 <gamozo> What are you linking against?
07:31:35 <jmp9> nothing
07:31:58 <geist> it's probably a linker script thing, padding out to 2MB
07:32:08 <geist> try passing -max-page-size=4096 to your linker
07:32:09 <jmp9> https://imgur.com/Mlrzukbl.png
07:32:12 <jmp9> yes
07:32:12 <geist> or whatever that switch is
07:32:13 <jmp9> padding
07:32:17 <jmp9> i saw in ELF header
07:32:21 <jmp9> that padding is 2 mb
07:32:23 <jmp9> how to fix that
07:32:24 <geist> the default padding size is 2MB on x86-64
07:32:27 <geist> see above
07:32:31 <jmp9> wtf
07:32:51 <geist> what is wtf about it, i told you where to look
07:32:58 <geist> there's a reason for it but i suspect you dont care to hear
07:34:52 <jmp9> how to fix that padding in linker script
07:35:06 <jmp9> .text BLOCK(4K) : ALIGN(4K)
07:36:17 <geist> i just told you
07:36:23 <geist> see -max-page-size=4096
07:37:08 <geist> i dont remember the precise switch format, but it's basically that. you should be able to search for it
07:37:12 <mrvn> geist: beq 1f; mov r0, r1; 1: is the same as cmov except with more bits. So I doubt it makes that much of a difference.
07:37:25 <geist> mrvn: <shrug>
07:37:33 <doug16k> nyc, oh ia64 went way beyond that - it had the capability for compilers to reorder loads to before it is sure it is a good address, that instruction carries with it an exception flag that taints all its dependent operations if it faulted. if not, the compiler can get away with scheduling outside what should be possible
07:37:40 <geist> mrvn: though arm64 actually has a cmov
07:38:09 <nyc> I guess one of the things that had me thrilled about ARM was how nice it was to write asm for on 32-bit, but I guess the relevant features that made it so nice are gone for arm64.
07:38:12 <jmp9> thanks
07:38:16 <mrvn> geist: i.e. I don't think they save much metal by removing the coniditonal on opcodes. Just a recovered a lot of space in the opcodes.
07:38:17 <jmp9> it fixed problem
07:38:20 <doug16k> i.e. it could start a load before it even does the "if" that checks something
07:38:34 <geist> mrvn: well, take it up with arm
07:38:39 <nyc> doug16k: I remember NaT well.
07:39:04 <geist> mrvn: and no, i dont think they saved a lot of space. they saved some complexity. or at least will when they finally get to build a 64bit only core
07:39:10 <mrvn> geist: they probably looked at existing code and check what opcodes had the most conditional and wether changing them to cmov or branch would cost anything.
07:39:16 <geist> at the moment their cores are all still 32/64, so they can't remove any of the machinery
07:39:31 <geist> mrvn: right. that and cbz/cbnz, which are common enough occurrences
07:39:40 <geist> though iirc they got added in thumb2 so there's already prior art
07:40:24 <geist> nyc: arm64 is pretty nice too. once you get the hang of it it's a fairly nice and expressive risc machine. just a little less clever than arm32
07:40:59 <geist> it's not as hard core risc as say mips or riscv so it's a little more useful to humans
07:41:08 <geist> since it is a little more irregular than both of those
07:41:43 <geist> in terms of how constants are encoded for alu and bit instructions and whatnot. you can tell they designed it to be a workhorse and less of a pure hard core design
07:42:15 <geist> it's basically a data driven design, and so far it's turned out to be pretty nice
07:45:42 <doug16k> jmp9, you probably need ya that thing geist said already, -z max-page-size=4096
07:45:49 <jmp9> i did
07:45:52 <geist> ah yeah, that
07:45:52 <jmp9> it helped
07:46:02 <jmp9> i did it before in gcc options
07:46:10 <jmp9> and i wondered why it doesn't changed
07:46:15 <geist> yah it'sa linker thing
07:46:29 <doug16k> it's trying to set you up so you could theoretically choose large pages
07:46:51 <doug16k> you probably won't want that in kernel code, hence max-page-size
07:46:52 <geist> right. usually with a specific triple for your os, like say -linux, it may override that
07:47:08 <geist> but the default -elf triple picks the most conservative option
07:48:08 <geist> would be nice if you could just set it as a variable at the top of the linker script or something
07:49:08 <geist> also depending on how the linker works, it doesn't necessarily have to pad out the elf file, it can simply tell it to map the same 2MB page twice
07:49:14 <nyc> The no OS ELF environment triple for MIPS is missing some pieces for binutils but I didn't have binutils trouble on most arches. Some obscure ones are missing it for gcc, too.
07:49:29 <geist> but when you flatten it to a .bin file it'd get padded out because a .bin file is effectively an in memory image of how it'd look if you loaded it
07:49:50 <geist> nyc: yep, that's what i was talking about way back when about how mips toolchain/abi/etc is more fiddly than most
07:50:00 <geist> i fought it for a bit too
07:50:17 <geist> also for some reason gcc mips-elf doesn't build on mac for some internal reason
07:53:30 <doug16k> when ld does that 2MB alignment "magic" thing, it weirdly overlaps things so data is in executable when that's on. totally, utterly unacceptable to me. I don't have one byte of data in executable pages in my project that I am aware of
07:53:40 <nyc> geist: Sometime when I'm not dead set on writing kernel code I'll go back to try to deal with it, but for now, the custom toolchain goes out the window, and for some things (e.g. sparc64) the entire port goes out the window. It's been 10 years. I want to be bit twiddling and doing raw hardware access in asm, not tweaking autoconf scripts.
07:54:07 <doug16k> I even have my readonly data first so my sizeofheaders overlaps the readonly data pages
07:54:59 <doug16k> ...not executable .text
07:56:17 <doug16k> I even patch my ld so when it tries to do that "everything starts at the base overlapping" crap it won't it will push data 2MB ahead to the next page so permissions work
07:56:26 <doug16k> but my build won't even try to do it
07:57:59 <doug16k> it is all 4K because nothing is even close to needing a 2MB page yet in the kernel's code and data sections
07:58:20 <geist> doug16k: so actually that overlap is a source of a discussion internally on fuchsia
07:58:27 <geist> it's a big difference in the way binutils and lld behaves
07:58:44 <geist> lld will never overlap source pages like that. at the expense of actually making the size of the binary increase on disk
07:58:57 <doug16k> good, I like it
07:59:00 <geist> but for the same reason, that way you never ever get data in your X pages, and vice versa
07:59:05 <doug16k> correctness trumps file size
07:59:06 <geist> trouble is, of course, it makes the binaries larger
07:59:17 <geist> welllll it's tough, because file size can matter depending on the situation
07:59:25 <doug16k> can't it be sparse though?
07:59:32 <doug16k> are they really that big size it says?
07:59:41 <doug16k> ah you mean in memory footprint
07:59:43 <geist> hypothetically, except clang/lld also likes to pad things out with !0
07:59:50 <geist> in both, really
08:00:25 <geist> so on x86 generally you just override it to max page size 4k, because 2MB is a bit much
08:00:55 <geist> but on arm64 it gets more interesting: max page size = 64k, which is somewhat more reasonable, especially if you choose to use 64K base page granules, where it becomes mandatory that the elf files were linked that way
08:01:11 <mrvn> When I boot over serial having pages >4K is a pain because the padding needs to be send over serial too.
08:01:19 <geist> so if you look at a linux arm machine, for example, the elf files tend to be a bit larger than they need to be
08:01:30 <mrvn> geist: isn't 64k a bit small for the max?
08:01:46 <geist> mrvn: it's a reasonable size, since it's the largest smallest page you can get on arm
08:02:06 <geist> but more importantly to be forward compatible with using larger base pages, it's the smallest you can actually use
08:02:34 <geist> we've had this discussion in fuchsia and decided on at the moment using 16K as the padding
08:02:41 <mrvn> geist: it's something I consider using as default/only page size.
08:02:45 <geist> a tradeoff of size (since we use clang/lld and it pads out the binaries)
08:02:55 <mrvn> kind of bad if there is no room to grow
08:02:58 <geist> and future desire to use larger base page granules
08:03:32 <mrvn> geist: can't you use 1M pages one leve higher or something?
08:03:34 <geist> on x86 there's no smallest page sizes on the horizon, but if there were then all of this using of max page size = 4k will be problems too
08:03:57 <geist> mrvn: it's complicated, depends on what the page granule is set to
08:04:00 <geist> if 4k you get
08:04:12 <geist> 4k 64k 2MB (maybe 16MB) 1GB
08:04:17 <geist> with 16k i think you get
08:04:25 <geist> 16k 256k 16MB... something like that
08:04:28 <geist> and 64k i forge
08:04:54 <geist> basically it scoots over the sizes of the combined page sizes and the ones that are terminated at a higher leaf
08:05:35 <geist> of course the kernel is free to pretend any number of intermediate page sizes for accounting purposes, but it just doesn't geta TLB gain by using a non hardware page size
08:05:36 <mrvn> geist: terminating the page walk early is easy to implement. That's why x86 only have 1G, 2M, 4k.
08:05:44 <geist> indeed.
08:05:50 <mrvn> geist: what's the size of a page table on arm64?
08:06:10 <geist> mrvn: depends on the base page granule. same as x86: if your base page granule is 4K you get 4K page tables
08:06:13 <geist> if 16K 16k, etc
08:06:40 <mrvn> geist: that gives you 4k and 2M. So 64k is something in between levels.
08:06:44 <geist> which is why everything shifts over. each page is now larger, and each page table then has a correspondingly larger number of entries, etc
08:06:51 <geist> yes, 64k is between levels
08:07:04 <geist> you can combine 8 or 16 entries (depending on situations) to get to an intermediate page
08:07:06 <mrvn> geist: same deal as on ARM32 that you repeat the entry 16 times?
08:07:09 <geist> correct
08:07:34 <geist> the page table format is a little cleaner: you set the C (combined) bit so it's a bit more clear
08:07:44 <mrvn> geist: so basically the cpu doesn't have to have 16k page even if you use them
08:08:00 <geist> that's not true. that's were it's complicated
08:08:19 <nyc`> I need to reread the ARM manuals.
08:08:20 <geist> the cpu can declare that it supports 4K, 16K and 64K page granules
08:08:34 <geist> 16K is optional, and most low end arm cores dont yet support it, but the newer higher end ones do
08:08:47 <mrvn> geist: yes. But it doesn't have to do 16k even if it says it does.
08:09:00 <bluezinc> Just finished doing my taxes. Time for some OSDEVing.
08:09:03 <geist> well.... i think it does. there's some verbiage in there
08:09:05 <mrvn> geist: or can you leave out 15 of the 16 entries if the 16k feature bit is set?
08:09:19 <geist> i think you're misreading what i'm saying here
08:09:34 <geist> when you set it to 16K base page granule, that *is* the smallest page
08:09:43 <geist> 1 entry, ther eis no repeating of entries
08:09:51 <mrvn> The verbiage on ARM32 was that you had to set all 16 entries and the cpu might combine the pages or not. Your page table has to work in both cases.
08:10:05 <doug16k> ah I was confused too. earlier it seemed you said yes you repeat entries
08:10:06 <geist> yes, but combined pages and base page granule are not the same thing
08:10:12 <geist> youc an do both
08:10:19 <geist> that's where its confusing
08:10:29 <mrvn> geist: I'm not talking about granularity. I'm talking about the inbetwene levels size
08:10:31 <geist> it's a mixture of both.
08:10:50 <geist> well, in that case 16K isn't a good example because that's not a valid 'in between' size
08:10:59 <geist> 4K can be combined to 64K
08:10:59 <mrvn> So 4k granularity and 64k page.
08:11:04 <geist> correct
08:11:08 <geist> and i think 2M -> 16MB
08:11:17 <mrvn> And then the cpu may or may not do 64k pages.
08:11:33 <geist> but i do think you're right. except in armv8 it does define 64K as a hard implemented size
08:11:40 <geist> so i think in practice it does
08:12:03 <mrvn> Hmm. In armv6/7 it was definetly optional.
08:12:26 <geist> i'll go reread and write it down, since i keep repeating this stuff
08:12:31 <mrvn> If it's hard defined then setting just the first of every 16 entries should work.
08:12:32 <geist> all the page sizes that you can get to
08:13:11 <geist> mrvn: correct. you can easily use 64k runs in 4K base page granules, ust like armv6/7
08:14:13 <geist> i'll reread the verbiage as to whethe ror not the hardware needes to support it or can it split
08:18:29 <aalm> isnt it 1M -> 16M
08:19:49 <aalm> section to supersection
08:20:25 <geist> looking at it now.... will get the data shortly
08:21:26 <geist> it's not symmetric. the number of 'contiguous' entries you can add together at different levels in different page granules is not constant
08:21:27 <mrvn> looking forward to it
08:21:31 <geist> sometimes its 16, sometimes its 128, etc
08:21:43 <mrvn> 128? Wow
08:22:11 <mrvn> does the 128 combined size correspond to a full level with the other granularity?
08:25:04 <geist> hang on.
08:25:15 <geist> i'm making a table of everything
08:25:43 <geist> and ys, basically. it looks like there are really a finite number of TLB sizes, and depending on what base page granule and what level you're at you can combine them to get to the next TLB size up
08:26:07 <geist> also as you say: "The architecture does not require a PE to cache TLB entries in this way. To avoid TLB coherency issues, any TLB maintenance by address must not assume any optimization of the TLB tables that might result from use of the Contiguous bit.
08:26:08 <mrvn> geist: that was my thought when I saw 128
08:26:31 <geist> yah 128 is a way to hoist 16k -> 2MB
08:27:06 <mrvn> geist: That's the part that really put me off trying 16k pages. Do I get this right? When you want to invalidate a 128 combined paged you have to invalidate 128 addresses?
08:28:23 <geist> correct
08:28:59 <geist> furthermore it gets complicated if your cpu supposed the A and D bits. if it writes back to the page table entry, it definitely does not write it back to all 128
08:28:59 <mrvn> So it realy still is "the TLB may or may not combine, who knows, code like it didn't"
08:29:11 <geist> it'll write back to any one of those depending on it actually implementing the size
08:29:33 <mrvn> geist: oh yeah. So you have to check all entries for that too
08:30:29 <mrvn> geist: would that be a valid test to see if it supports combined pages? Or could the cpu do combined when it feels like it?
08:31:03 <geist> i think contig pages are mandatory
08:32:29 <mrvn> geist: that's not what the above blurb says.
08:34:56 <geist> i mean they're mandatory in that the feature must work
08:35:03 <geist> not that it does or doesn't implement it directly in hardware
08:36:19 <mrvn> but that is simple. Just ignore the bit.
08:36:40 <geist> hmm, you're right. i think
08:37:20 <mrvn> That's why you hav to assume the TLB didn't do contingous. That's when it simply ignores the bit.
08:37:43 <geist> but i think it works out cleaner than that. it just so happens that the contig page sizes line up pretty nicely with higher levels
08:37:51 <geist> give me a second, still working on this
08:39:51 <geist> http://newos.org/txt/arm64_pages.txt
08:40:28 <geist> i think it works out that based on the assumption that the non contig pages have to be directly supported in the TLB, then the contig pages at 16K line up with existing ones
08:40:41 <geist> and in general every contig page size lines up with a non contig page size at another level
08:41:36 <mrvn> I bet 16k pages are only supported on TLBs that do continious pages
08:41:51 <mrvn> it's the same feature in the TLB.
08:42:11 <geist> exactly. or the ther way around. the contig pages always line up with a TLB entry size that you'd probably have to implement directly
08:42:26 <mrvn> well, except 16k != 64k.
08:42:36 <geist> sure, which is why 16k is the optional one
08:42:54 <geist> wehereas 4K/64K are mandatory
08:43:32 <geist> feature wise. whether or not the TLB directly supports it is i guess not required, as long as you dont also implement the A and D bit, which most lower end/older armv8s dont
08:43:35 <mrvn> 64k is 4k*16
08:43:41 <nyc> ./sys/arm/boot.S:14: Error: immediate out of range at operand 2 -- `ldr x2,#0x9000000'
08:43:55 <geist> mrvn: yes. see the table i just linked
08:44:05 <geist> oh hrm.
08:44:07 <mrvn> geist: you have 64 (4k*8)
08:44:57 <mrvn> With 64k pages do you still hve 4 levels of page tables?
08:45:14 <nyc> No, they use it to trim off a level IIRC.
08:45:51 <jmp9> x86-64 paging is complicated
08:46:02 <geist> mrvn: depends. yes if you trim the size
08:46:21 <geist> so independent of this you can per address space(it's in the TTBR) set the max number of bits you support
08:46:28 <mrvn> 64k = 16bit, 64k table == 13 bit address space. 16 + 4 * 13 = 68. A bit much..
08:46:49 <geist> yes. so in the case of 64k you start off with 3 levels
08:47:05 <geist> if you then limit the sze to.... 42 bits you can drop to 2 levles
08:47:12 <mrvn> e levels = 55 bit. .oO(640k are enough for everybody)
08:47:17 <mrvn> 3
08:47:40 <mrvn> geist: can I do 1 level?
08:48:03 <geist> if you use 512MB pages, sure
08:48:16 <mrvn> No, 1 level with 64k pages. So 29 bits.
08:48:17 <geist> or in this case 4TB pages (which i think may be some other optional feature)
08:48:24 <geist> ah, yes i think so
08:48:53 <mrvn> Or even 1 level 4k pages = 21 bits. 2MB are enough for everbody :)
08:49:49 <geist> not sure you can go that low there, but in general yes, you can limit the sze of the vaspace dynamically (per address space) to limit the number of levels. it's very slick
08:51:00 <mrvn> I played around a bit on ARMv6+ with microprocesses. One 4k page for the process containing 256byte L1 table, 1K L2 table, process state for task switching, stack and whatever is left for heap.
08:51:47 <mrvn> It's fun to run 200k processes on a 1GB ram system.
08:51:50 <geist> there's yet another feature called combined page tables, that you can do at the top level of your paging structure to instead of having a top level that has like say 8 entries in it (much like the x86-64 PAE thing) you can simply say there are 8 next level page tables in a row
08:52:21 <mrvn> geist: so basically cut out a level at the top.
08:52:45 <mrvn> (without loosing bits)
08:52:51 <geist> yah. i think there are limits there, and then the corresponding larger next level has to be aligned, etc
08:55:07 <doug16k> geist, is contig vs config intended? (headings)
08:56:21 <geist> no i keep misstyping it
08:56:37 <geist> fixed
08:57:31 <geist> ah i see. the 4TB stuff is hidden behind a new armv8.2 feature that extends the VA range out to 52bits
08:57:46 <geist> (from 48) but only when using 64K page granules
08:59:27 <geist> by extending out the number of top level page table entries
09:00:56 <geist> armv8.2-LVA and armv8.2-LPA are the features. the first extends the VA range to 52 bits, and the second extends the physical range to 52 bits and adds 4TB page sizes
09:01:45 <nyc> Okay, I think I've got ARM asm hello written but have yet to test it. (The idea is to see some sort of output before trying to run C/C++/Ada/Zig code.)
09:09:42 <nyc> The assembler is complaining that _edata isn't defined. Maybe it's because it doesn't know how to resolve mov to a more precise instruction. I guess I need the real arch docs instead of the quick guide to get the mnemonics for the fully-detailed instructions.
09:10:38 <geist> is this arm32 or arm64?
09:10:51 <geist> in arm32 the canonical way to load a large constant is
09:10:57 <geist> ldr reg, =value
09:11:26 <geist> then it will potentially emit a load from a nearby constant pool, one that you can generally emit off the end of your function with .ltorg
09:11:53 <nyc> arm64
09:12:31 <geist> in arm64 its a bit more complicated. depends on what you're loading
09:12:35 <nyc> ./sys/arm/boot.S:32: Error: operand 1 must be an integer register -- `ldr sp,=_edata'
09:12:44 <geist> yes that's not a arm64 opcode
09:12:54 <geist> furthermore you can't load into sp directly
09:12:58 <geist> that's what it's actually complaining about
09:13:08 <geist> load into a reg first, then mov into sp
09:13:11 <nyc> It's a pseudoinstruction and I guess it doesn't like me trying to set sp.
09:13:30 <geist> its because sp isn't a real integer reg in arm64
09:14:23 <nyc> And the destination is first in arm mov/ldr/etc. I hope.
09:14:50 <geist> yes
09:15:06 <nyc> I'll work out the linker script offsets later. Time for RISC-V.
09:15:57 <nyc> Actually, if I'm going in alphabetical order, POWER should be next.
09:16:47 <jmp9> okay
09:16:48 <jmp9> page->table[table+(dir<<9)+(pdp<<18)]
09:16:57 <jmp9> is this correct indexing in x86-64 page table?
09:17:53 <doug16k> is that supposed to be for large pages or something? not enough context
09:18:12 <jmp9> 4k pages
09:18:57 <mrvn> jmp9: That only works if the page table is one big array of entries.
09:19:06 <mrvn> the page->table
09:19:14 <graphitemaster> does exec-family functions in the kernel copy the contents of argv, envp, at the place of call or does the parent own the memory that the child gets in int main()
09:19:25 <mrvn> graphitemaster: COW
09:19:42 <jmp9> yes
09:19:44 <jmp9> it's very big
09:19:45 <jmp9> 256 mb
09:19:49 <jmp9> for 128 gb virtual space
09:19:51 <mrvn> graphitemaster: the prent can't alter the args after the call
09:19:55 <graphitemaster> this is really confusing me, because I'm writing a simple fork + exec thing and the child process does some memmove of envp/argv
09:20:13 <graphitemaster> but the contents of the memory when I go to execve no longer exists in the child
09:20:18 <graphitemaster> it goes out of scope
09:20:22 <graphitemaster> and I don't know if this should work
09:20:39 <mrvn> graphitemaster: might be the libc workting together with the kernel.
09:20:44 <graphitemaster> basically I just want to know if contents are copied by execve
09:20:46 <graphitemaster> implicitly
09:21:13 <graphitemaster> I can't find any documentation explaining how the exec family functions actually work
09:21:13 <mrvn> graphitemaster: fork makes them COW and exec must save them into the new address space somewhere.
09:21:45 <doug16k> jmp9, the indexing goes like this: you take the linear address, and the 47:39 index into PML4, 38:30 index into pdpt, 29:21 index into pd, and 20:12 index into PT
09:21:48 <mrvn> graphitemaster: the kernel used to copy them, which is why they are limited in size to prevent DOS attacks.
09:22:13 <mrvn> doug16k: that's what the hardware does. Not necessarily what the kernel does.
09:22:50 <doug16k> the physical address in the PML4 entry tells you the base physical address of the PDPT. the PDPT entry tells you the base physical address of the PD, the PD entry tells you the base physical address of the PT
09:23:16 <doug16k> then finally you index into PT with those last bits and get base physical address of the mapped memory
09:23:31 <graphitemaster> basically, imagine `char argv[] = { ... }` on the stack in the parent, I do `if (fork() == 0) { execve("ls", argv, nullptr); }` and this parent function now goes out of scope, so the `char argv[]` is basically out of scope now, does the "ls" main() get a copy of the stack allocated argv or is it literally referencing the stack allocated argv? if so when the parent runs another function the stack gets filled with what ever is pushed
09:23:31 <graphitemaster> and popped there, does "ls" get a clobbered argv ?
09:24:19 <doug16k> graphitemaster, CoW doesn't cover that?
09:24:34 <doug16k> wait it's not even fork
09:24:42 <doug16k> not directly anyway
09:25:35 <graphitemaster> the reason I bring it up is because apparently different OSes treat it differently, on macOS, if you do this, the child processes main references the direct content of the parent
09:25:37 <doug16k> good question though ya. the data for it is in the pages that got replaced?
09:25:52 <doug16k> you must have to copy it out
09:26:49 <graphitemaster> if you have to make a copy of the contents before the call and wait for the child to die before you can free it
09:26:53 <graphitemaster> well then that sucks
09:26:53 <doug16k> graphitemaster, direct content of the parent as if it didn't really replace the core with the other executable?
09:27:28 <graphitemaster> doug16k, as in if the child modifies the argv array passed into int main, the parent can see the change
09:27:37 <doug16k> man that's awful
09:28:14 <graphitemaster> this whole interface seems awful
09:28:20 <graphitemaster> I would've expected the kernel to make a copy of it
09:28:35 <graphitemaster> and not directly reference the contents of the parent in the child
09:28:35 <doug16k> that would be sane and obvious yes
09:28:45 <doug16k> how could that ever work reliably without it?
09:29:27 <graphitemaster> no clue
09:30:01 <doug16k> the obsession with blazing insanely fast spawning of processes in unix is a bit much
09:30:37 <doug16k> spawning doesn't need to be _that_ good does it? :)
09:30:58 <geist> i dont think so, but a lot of the unix mentality is based on the idea of lots oflittle helper processes in a chain
09:31:05 <geist> and thus the speed of it matters
09:31:20 <doug16k> ya
09:31:45 <geist> it's certainly not a path that all systems take
09:32:07 <doug16k> I heard an anecdote of a security fix patch being rejected because it added 2 microseconds to process creation
09:32:12 <graphitemaster> congrats, every unix system is insecure since child processes can write into parent memory by writing into argv/envp then
09:32:26 <graphitemaster> only if the parent process is buggy mind you
09:32:58 <graphitemaster> and doesn't do a copy
09:33:03 <graphitemaster> I wonder how many shells don't
09:36:05 <graphitemaster> fast process creation is important yes
09:36:19 <graphitemaster> doesn't seem like copying argv/envp would be a slow thing to do
09:36:22 <mrvn> graphitemaster: exec MUST copy, even if it's just COW.
09:36:49 <mrvn> graphitemaster: argv+envp is limited to 128k so that exec fiished in reasonable time.
09:38:13 <doug16k> if two microseconds pissed off linux then, I can imagine his coffee going all over the wall and smashing the cup when they put the meltdown and spectre mitigations in
09:38:27 <doug16k> linus*
09:38:37 <geist> indeed
09:39:39 <nyc> doug16k: heh
09:40:15 <rajasrijan> doug16k: lol
09:42:57 <nyc> doug16k: Who knows what his thought process is when he's all for a CPU with so much state to dump on context switches and process spawning etc. and is somehow obsessed with the timings of these things the bloated CPU is burdening. There are literally multiple embedded ARM systems in each x86 CPU to manage different pieces of its bloated state.
09:44:10 <mrvn> and you better get things you access together into the same cache line.
09:44:31 <rajasrijan> iirc there was a mail thread where he was ranting, and a corresponding Phoenix article 🤣
09:44:44 <dminuoso> rajasrijan: There's always a mail thread with linus ranting.
09:44:46 <dminuoso> It's what he does.
09:44:54 <doug16k> when I time stuff on x86 computers, my usual reaction is "holy crap that is awesome"
09:45:23 <doug16k> I'm old enough t have had PCs so slow you can read dir as it goes by
09:45:29 <geist> exactly
09:45:32 <rajasrijan> then later that patch corrupted firmware in some of the lenevo laptops, he was livid
09:45:45 <eryjus> doug16k -- "awesome" is not very descriptive... "awesome good" or "awesome bad"?? lol
09:45:46 <nyc> Most of what performance is measuring is semiconductor technology.
09:46:16 <doug16k> good
09:46:48 <doug16k> usually leads to a "jeez, how many iterations is it overlapping to go that fast??!"
09:47:02 <mrvn> doug16k: is that the PC being slow or the VGA card having horribe memory access?
09:47:23 <doug16k> mrvn, largely that yeah and reading from VGA BIOS ROM across abysmally slow ISA
09:47:30 <nyc> The difference that CPU design and software and such make isn't well-isolated, and CPU design etc. can't really be traded out anywhere near as easily as software.
09:47:36 <rajasrijan> dminuoso: he's supposedly changed now, much more friendlier.
09:47:38 <mrvn> doug16k: slow is when you can see the memcpy going down top to bottom when the console scrolls.
09:47:47 <doug16k> ya
09:47:58 * rajasrijan waiting for next meltdown patch
09:48:29 <mrvn> doug16k: doing a memmove of a 1920x1280 32bit framebuffer on an RPi with caches disabled is zen.
09:48:37 <graphitemaster> I mean I have a 4.2ghz CPU and can still read dir as it goes by doug16k
09:48:42 <graphitemaster> terminal emulators are shit slow
09:48:49 <graphitemaster> lxterminal anyways
09:49:22 <mrvn> graphitemaster: have you tried writing one? They are horrible complex.
09:49:42 <doug16k> the one I use has insane performance. a big ls puts me instantly at the end of the listing
09:49:59 <graphitemaster> there's lots of really fast ones but they look like crap
09:50:03 <doug16k> gnome-terminal I think
09:50:09 <graphitemaster> plus I need tabs
09:50:18 <graphitemaster> the killer feature is tabs
09:50:43 <graphitemaster> lxterminal is the only one that seems to do subpixel font rendering correctly
09:50:44 <doug16k> it has tabs. I also use terminator to wrap them so I can split one window into lots of terminals
09:50:52 <mrvn> graphitemaster: I want a terminal emulator that uses screen/tmux for the tabs.
09:51:03 <graphitemaster> never looked at gnome-terminal
09:51:33 <nyc> https://patchwork.kernel.org/patch/8300341/ is my best indicator of how to do an hcall.
09:51:53 <graphitemaster> oof, gnome terminal does not do font rendering right either
09:52:04 <nyc> H_PUT_TERM_CHAR is also #defined to 0x58 in the same patch.
09:52:05 <graphitemaster> at least not by default
09:52:38 <nyc> I mostly need fcitx etc. multilingual input support.
09:53:12 <graphitemaster> I think the problem is GTK font rendering (probably freetype / cairo backed) is using the front subpixel pattern
09:53:25 <graphitemaster> the only apps that appear to get it right are lxterminal, geany, hexchat and firefox-nightly
09:53:35 <graphitemaster> s/front/font
09:58:01 <doug16k> I've never seen a terminal that doesn't get at least one thing wrong yet
09:58:14 <nyc> So I can hammer out things in Turkish (e.g. kaplumbağa, aşçilar, kız), Russian (безусловно), and such. I'm still at a loss for adequate Vietnamese, Korean, or Hindi input, and haven't learned enough Chinese to attempt to use what's there regardless of whether it would work (for Korean and Hindi, I'm mostly still learning the writing system; Vietnamese is slow back burner work, and none of what I mentioned is anywhere near re
09:58:14 <nyc> ading, writing, speaking, or listening competency --- the only things getting anywhere on that front are German and a few Romance languages).
09:59:45 <nyc> file:///home/nyc/Downloads/LoPAPR_DRAFT_v11_24March2016_cmt.pdf p. 584 documents more detail about how to invoke or otherwise pass arguments to H_PUT_TERM_CHAR.
10:01:04 <doug16k> I timed a big find. I got 14,779 logical lines per second in gnome-terminal (some wrapping to two physical lines)
10:01:14 <nyc> You're kidding me.
10:01:30 <nyc> Who knows where that PDF came from.
10:02:34 <doug16k> oops hang on. screwed up
10:03:18 <doug16k> 126,839 lines per second
10:03:58 <nyc``> doug16k: That's a lot of lines.
10:04:42 <doug16k> I bet a lot of them never made it to my screen due to 60 fps compositing, but it is as if it is that fast
10:05:36 <graphitemaster> doug16k, timed rxvt-unicode, about 455k logical lines per second
10:05:48 <doug16k> nice
10:06:02 <graphitemaster> timed lxterminal, about 8000 lines per second
10:06:37 <graphitemaster> I swear it's because lxterminal is written in Vala or something now
10:07:55 <nyc``> I hope the whole semiconductor technology bubble or whatever collapses ASAP so computer science actually matters instead of just whatever sort of physics or chemistry is behind clock speeds and fabrication processes.
10:08:09 <dminuoso> graphitemaster: what exactly did your test look like?
10:09:43 <doug16k> my test was `find /usr` to warm up disk cache, then `time find /usr` then find /usr | wc -l the divide the lines by wallclock time
10:10:30 <graphitemaster> const char msg[] = "hello world\n"; for (int i = 0; i < 1000000; i++) { write(STDOUT_FILENO, msg, sizeof msg - 1); }
10:10:38 <graphitemaster> then time divided by 1m
10:10:42 <dminuoso> I like graphitemaster's test more ;)
10:10:43 <graphitemaster> *divided
10:10:58 <doug16k> ya it is definitely better
10:11:31 <graphitemaster> you can't really test the raw terminal performance with stdio either since buffering
10:11:36 <graphitemaster> so I went straight to the syscall
10:11:51 <graphitemaster> and yeah your find involves disk io / vfs crap too
10:11:51 <doug16k> I went with what a program will probably be doing as my test
10:12:16 <doug16k> ya, has a tad of work to do around printing, also realistic
10:13:13 <doug16k> but ya to raw bench blasting lines, it isn't a proper test.
10:14:52 <dminuoso> graphitemaster: which time did you take?
10:15:24 <dminuoso> real?
10:16:32 <dminuoso> Im getting to about 500k lines per second with alacritty, which seems fine.
10:16:46 <graphitemaster> realtime
10:20:10 <doug16k> I get 423,728 lines/s with graphitemaster's test
10:26:39 <nyc> The ppc64 manuals are in a rather strange style. It's tough to smoke out how to load an immediate from them.
10:29:43 <jmp9> kernel/start.asm:(.text+0x1): relocation truncated to fit: R_X86_64_32 against `.bss'
10:29:44 <jmp9> what is this
10:30:59 <jmp9> oh
10:31:17 <jmp9> i forgot that i use 64 bit assembly
10:31:21 <doug16k> jmp9, you tried to reference something that is linked at such a high address it cant fit the offset in the address encoding
10:32:02 <nyc> I suspect I can actually fit the entire string "Hello, world!\r\n" into a single hcall.
10:32:53 <doug16k> you need either -mcodemodel=kernel and link in 0xFFFFFFFF80000000-0xFFFFFFFFFFFFFFFF range, or -fPIE position independent executable or -mcmodel=large and link anywhere
10:33:04 <jmp9> why that range?
10:33:07 <doug16k> s/mcodemodel/mcmodel/
10:33:23 <doug16k> x86_64 didn't bloat everything up to a full 64 bits in the instruction encoding
10:33:24 <jmp9> i use this address as base kernel mapping
10:33:25 <jmp9> 0x3FFFC01FC0000000
10:34:02 <jmp9> kernel start 0b1111111111111111 00000000 01111111 000000000 000000000 000000000000
10:34:05 <doug16k> if you want to do that you need to use PIC which uses offsets from the next instruction for addressing
10:34:28 <doug16k> you cant just slap it anywhere like that unless you take steps to make it generate code that can do that, mentioned above
10:34:46 <doug16k> it has to change codegen somewhat for that to work
10:35:33 <doug16k> can I see the instruction it is complaining about?
10:35:46 <doug16k> I'll explain why it can't work if I can
10:36:40 <doug16k> put -fPIE in your compile and you'll be fine, the problem will disappear if you link things sanely close together
10:36:41 <jmp9> i just tried to put 64 bit address in eax
10:36:52 <jmp9> in assembly
10:37:01 <jmp9> mov esp,stack_top
10:37:15 <doug16k> what code model are you using then?
10:37:17 <jmp9> it should be rsp instead of esp
10:38:46 <jmp9> i don't
10:39:12 <doug16k> code linked at 0x3FFFC01FC0000000 with data within +/- 2GB of it won't work, regardless of that bug, unless it is position independent or uses large model. if you somehow dereference zero global variables, you won't notice
10:39:36 <doug16k> oh lol
10:39:44 <doug16k> and there is a much worse problem
10:40:10 <doug16k> you can't put stuff there, there is only 48 bits of address space implemented
10:40:23 <doug16k> sorry, "only" 256TB of address space available
10:41:14 <doug16k> you can only set bit 47:0 with an address, and bit 63:48 must be the same value (0 or 1) as bit 47
10:42:40 <doug16k> if you attempt to dereference a pointer that isn't "canonical" (sign extended to 64 bits from 48 bits) then you get #GP
10:42:54 <doug16k> the page tables can't represent virtual addresses outside that either
10:44:21 <doug16k> your kernel addresses will look like this: 0xFFFF'X___'____'____ where the X digit is 8 thru F
10:44:47 <doug16k> you get 11.75 hex digits to play with
10:45:42 <doug16k> not 16
10:47:42 <doug16k> userspace addresses would look like: 0x0000'X___'____'____ where the X digit is 0 thru 7
10:52:06 <doug16k> jmp9, I assume you're using a system compiler? it's probably configured for address space randomization and generates position independent code by default, and it's just a fluke that you didn't need to set any code model
10:53:38 <nyc``> ppc64 doesn't seem to prefix its register numbers with anything.
10:57:41 <nyc``> The register numbers are just there with no R or anything.
11:01:12 <doug16k> nyc``, gnu assembler has lots of totally different syntax variations across architectures
11:02:02 <qookie> hi, does anyone know why even after setting the correct bit in the TSS I/O bitmap a process would receive a general protection fault?
11:02:38 <qookie> the problem is random, and some of the time it works, and port I/O operations work fine
11:02:40 <bcos_> qookie: Intel's manual (the part describing GPF) has a list of possible causes - there's about 50 things that can cause it
11:02:41 <doug16k> nyc``, if you're doing stuff on wildly different architectures, you'll notice
11:05:11 <doug16k> qookie, in qemu?
11:05:16 <qookie> yes
11:05:45 <bcos_> qookie: If it's "random GPF when accesssing IO port ", maybe gets the GPF handler to check of the TR is right and if IO permission bitmap is right
11:06:06 <geist> nyc``: you can specify r when writing assembly, and when doing disassembly there's a switch on objdump to use R prefixes
11:06:06 <doug16k> did you break into it and get the actual tss base that is active from qemu monitor `info registers` then really check the bitmap right at that moment?
11:06:26 <geist> but yes, the official IBM docs dont use any prefixes. it's highly annoying, especially for long ass instructions like rlwinmin
11:06:40 <geist> rlwinm
11:07:17 <qookie> the bitmap offset is always the same, I copy each processes I/O bitmap into the bitmap array pointed at by the TSS right before a task switch occurs
11:07:27 <nyc``> doug16k: I'm seemingly the only person here doing non-x86 or multiple architectures.
11:07:29 <doug16k> you can put that breakpoint as early as possible at your #GP entry point and it will be early enough
11:07:38 <doug16k> nyc``, no way. tons of arm here
11:07:43 <geist> nyc``: i take some amount of offence at that
11:07:54 <geist> also my LK project runs on about 5 or 6 different arches
11:07:58 <doug16k> geist has done a wide variety of arches too
11:08:06 <doug16k> ya that :D
11:08:08 <geist> and i'm about to sit down and fiddle with riscv64 in a few minutes
11:08:22 <geist> plus newos runs on x86, ppc, and sh-4
11:08:24 <bluezinc> speaking of multiple arches... why does x86 just have to be a pain about everything?
11:08:31 <nyc``> geist: You're more of a host than a guest.
11:08:38 <geist> ?
11:08:59 <doug16k> nyc``, I'm deeply into x86 because they have been closest to my face most of the time, but I have done lots of little embedded cpus and stuff too
11:09:18 <doug16k> and stuff with fpgas and whatnot
11:09:28 <geist> but yes, on the average most folks work with x86-pc, for pretty good reason
11:09:32 <nyc``> doug16k: Excelente!
11:09:55 <geist> microblaze is a fun one too. it's a nice little risc machine and easy to punch out when dealing with xilinx stuff
11:10:08 <geist> dont think there's a 64bit version though
11:10:19 <bluezinc> ah, microblaze is fun.
11:10:45 <bcos_> nyc``: "Embedded folk" tend to have more variety (compared to "server folk" who mostly only bother with 80x86 because everything else is unobtainable// ;-)
11:10:53 <geist> yah it's about as simple and straightforward and yet still fully capable arch you can find
11:12:09 <geist> yep. still some interesting variety in the embedded side of things
11:12:32 <geist> and i still love harping about arm64 since it's at least as interesting and powerful as x86 and yet new and actually obtainable
11:13:33 <bcos_> Sadly, for serverse "ARM" is just "same old PC everything (ACPI, UEFI, PCI, ...), just with different CPU"
11:14:26 <doug16k> if arm relented on the cleanliness of their instruction frontend and allowed a few bigger instructions, I think x86 would have no chance
11:14:54 <bcos_> (I really really really don't like ACPI - UEFI and PCI I don't mind)
11:15:52 <geist> doug16k: oh yeah? what are you thinkin?
11:16:04 <doug16k> make those load constants go away 1st
11:16:11 <geist> what do you mean?
11:16:24 <doug16k> not fitting a full sized immediate in one insn
11:16:27 <qookie> thanks for pointing me in the right direction with putting a breakpoint right at the entry to the GPF handler
11:16:40 <qookie> now I know the I/O bitmap has incorrect bits set
11:16:44 <geist> oh. yeah that's not gonna happen, but.... they actually did do a lot of things in that space that actually work a lot better than you think
11:16:49 <geist> depending on what kind of constant you're trying to get
11:16:50 <doug16k> qookie, nice
11:16:51 <nyc``> I don't know anything. My career is ten years dead and I'll never be heard from again once the laptop that's the last remnant of my former income breaks down.
11:16:55 <geist> ie, the adrp instruction is good for loading address
11:17:16 <geist> and stuff like the logical ops (orr, eor, etc) have a fairly powerful bitfield generating immediate thing
11:17:36 <doug16k> geist, yeah, just thinking it will hurt very very slightly not to have the straight thru locality of an immediate
11:17:45 <geist> it really doesn't show up as much as you think. its only really annoying if you're trying to load a gigantic 'random' looking constant
11:18:13 <geist> in general arm64 doesn't do the 'load constant from thing just off the end of the function' anymore
11:18:23 <geist> that's got some terrible cache polluting behavior anyway
11:18:23 <doug16k> usually a PC relative encoding will do and it fits then?
11:18:35 <qookie> also I know the GPF is most likely because of the in instruction without permission to access that port
11:18:37 <doug16k> usually fits I mean?
11:18:40 <qookie> so my assumption was correct
11:18:41 <geist> yah. you can easily synthesize an address within +/- 4GB which is where a lot of it works
11:18:47 <geist> due to the adr/adrp instructions
11:19:09 <geist> i think riscv picked this up too, it's quite powerful
11:19:30 <jmp9> doug16k: i use cross compiler
11:19:37 <jmp9> for x86-64 target
11:19:57 <geist> and for bitfield twiddling, the immediate encoding is pretty neat. iirc it's an 8 bit immediate with arbitrary rotation, negation, and duplication
11:20:11 <doug16k> jmp9, you should use -fPIE and use rip relative addressing for stuff in .data and .bss
11:20:11 <geist> so you can do stuff like orr x1, x1, 0x5555555555555555 and whatnot
11:20:17 <geist> or any number of runs of bits set to 1 or 0
11:20:34 <doug16k> ya that rotating one is amazingly clever
11:20:40 <graphitemaster> I wish compilers were any good at optimizing bitfields
11:20:55 <graphitemaster> hand coding manual bitfield manip usually produces more optimal code
11:21:06 <geist> it's also got a lot of fancy bitfield insertion/extraction instructions
11:21:10 <jmp9> okay i created mappings for my 64 kernel
11:21:19 <jmp9> then i should enter long mode 32 bit mode?
11:21:25 <geist> that's the sort of thing why i say arm64 is not a particularly 'hard core' risc machine
11:21:47 <geist> a hard core risc machine would never include complicated instructions like bitfeld insertion since you can synthesize it with a few less powerful instructions
11:21:49 <doug16k> jmp9, procedure is in intel SDM volume 3, section 9.8.5
11:22:26 <doug16k> when I first looked at arm, the instructions looked CISC as hell
11:22:42 <doug16k> the addressing modes actually
11:22:44 <geist> indeed
11:22:56 <geist> load/store with pre and post decrement? wow!
11:23:31 <geist> i do miss writing code in old fashioned arm32, but that ship sailed a long time ago when everyone mostly switched to thumb2 anyway, which is far less fun to write code in
11:24:40 <mahackamus> graphitemaster: terminal emulators aren't all shit slow, xterm is obscenely fast, so fast that PCI bottlenecks it. i just put in a PCIe card after getting a FULL-HD monitor, and the speed improvement is absurd. but maybe it's something in the linux ttm buffer manager.
11:26:07 <mahackamus> *user experience may vary depending on rendering stack*
11:28:28 <graphitemaster> desktop composited terminal emulators written in garbage languages tend to be slow
11:28:32 <graphitemaster> the linux tty is fast
11:29:41 <mahackamus> /dev/dri/card0 dumb buffer is plenty fast, the problem is when you start putting mesa in the mix, and rendering API of the current decade
11:29:55 <mahackamus> then constraining it to vsync
11:30:43 <doug16k> man you know what drives me nuts? bad UI code (games) that require at least one mousemove event on a button before a click works.
11:31:14 <bluezinc> doug16k: how often do you see that?
11:31:16 <doug16k> you cant have the mouse already there ready to click it - have to wait, then perturb the mouse position, then click
11:31:38 <geist> yah i've seen that sort of thing
11:31:42 <doug16k> when a new dialog appears or whatever
11:31:55 <bluezinc> ok.
11:32:06 <bluezinc> at least I can see how they screwed that one up.
11:32:34 <geist> doubleplus so when it's a crummy port of a platform title
11:32:47 <geist> which sadly happens more often than not nowadays
11:33:04 <nyc> -mregnames
11:33:05 <geist> or where the ui is crammed up in the upper left quarter of the screen because it wasn't designed for high rez
11:33:13 <geist> nyc: yah that's it
11:33:30 <geist> i dont think the assembler requires it, but it's handy when dissassembling
11:39:51 <nyc> The docs on how to spread a string across the argument registers are scanty enough I'm going to punt and just call the hypervisor for one character at a time.
11:44:51 <graphitemaster> in video games, more often than not the UIs are immediate mode, which means frame-to-frame the entire contents of the UI are generated for that frame (usually there's about a frame of latency for double buffering) all the logic is checked for that frame too. This is terribly inefficient as you can imagine
11:45:14 <graphitemaster> so the common optimization is to cap the frame rate of the UI updates so you're not doing too much excessive work for nothing
11:45:37 <graphitemaster> the other optimization they do is basically avoid flushing the contents of that frame of UI updates until something interacts with the UI, like mouse movement
11:45:41 <graphitemaster> doug16k, ^
11:47:51 <graphitemaster> immediate mode has pros and cons, the pro is that you have no retained state and can basically declare the UI as function calls and the calls themselves are the logic, so you do if (button("contents")) { button_was_clicked(); }, but it complicates many things because to build the UI you have to run through all your UI logic
11:48:07 <graphitemaster> every frame in this case
11:48:21 <graphitemaster> animations are difficult too
11:53:25 <jmp9> Switch to IA32e (compatibility mode), by setting bit 8 (Long Mode Enable) in MSR 0xC0000080
11:53:30 <jmp9> sorry for dumb question
11:53:32 <jmp9> what is MSR?
11:55:41 <nyc> Model Specific Register
11:58:55 <jmp9> oh yes
11:58:58 <jmp9> rdmsr instruction