Search logs: #osdev - 10 February 2019

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=19&m=2&d=10

Sunday, 10 February 2019

12:01:01 <mrvn> and when that happens the first thing you should do is implement your double fault handler. Don't try to fix the actual crash, fix the catching of the crash. Then fix catching the crash and last fix the crash.
12:24:38 <froggey> zhiayang: to disable the io bitmap I set the bitmap base address to the TSS limit, and to enable it again I set it to the real bitmap base
12:25:31 <froggey> changing the limit should work too, as long as the bitmap base ends up at or above the limit
12:26:13 <froggey> and I double checked - setting the bitmap base to 0 will cause the cpu to treat the start of the tss as the bitmap instead of disabling it. so don't do that
12:31:31 <jmp9> okay
12:31:35 <jmp9> I have question about NASM
12:31:40 <jmp9> how do I do this thing
12:31:56 <jmp9> mov esp,(0xC0000000 - stack_bottom)
12:32:11 <mrvn> you just do
12:32:26 <jmp9> error: invalid operand type
12:32:41 <mrvn> stack is set to the top though
12:32:47 <jmp9> yes i know
12:32:52 <jmp9> after this line i do
12:32:58 <jmp9> add esp,0x7FFC
12:33:14 <mrvn> should be 0x8000
12:33:25 <jmp9> Ok
12:33:26 <jmp9> But
12:33:30 <jmp9> How do this
12:33:31 <mrvn> not that it matters
12:33:35 <jmp9> mov esp,(0xC0000000 - stack_bottom)
12:34:08 <Mutabah> ... why are you doing that subtraction?
12:34:19 <mrvn> jmp9: Get a proper assembler
12:34:27 <jmp9> I setting up paging
12:34:48 <Mutabah> Actually, it's probably not working becuse that relation can't be encoded in the relocation entries
12:35:10 <Mutabah> jmp9: Do you actually wan to do `mov esp, (stack_bottom - 0xC0000000)`?
12:35:25 <mrvn> jmp9: are you sure you don't mean stack_bottom - 0xC0000000 or 0xC0000000 + stack_bottom?
12:35:25 <jmp9> yes
12:35:28 <Mutabah> To get an esp value that's in physical memory
12:35:54 <Mutabah> Also - It's usually a good idea to have a symbol defined for the top of the stack too
12:36:03 <mrvn> jmp9: Many people define a PHYS_TO_VIRT or VIRT_TO_PHYS
12:36:06 <Mutabah> so you just do `mov esp, (stack_top - KERNEL_BASE)`
12:36:13 <jmp9> yes
12:36:18 <jmp9> and this doesn't work
12:36:36 <mrvn> jmp9: re you in 32bit mode?
12:36:37 <jmp9> wait
12:36:39 <jmp9> it works
12:36:41 <jmp9> nvm
12:36:42 <jmp9> thanks
12:37:07 <mrvn> Mutabah: relocations can't do const - label?
12:37:27 <mrvn> qqdf -h
12:37:32 <mrvn> ups
12:37:49 <Mutabah> Don't think so, iirc they can only do `label + literal` (where a subtraction can be encoded as a large addition)
12:38:13 <jmp9> it didn't work because I do it wrong
12:38:18 <jmp9> it must be address - KERNEL_BASE
12:38:58 <mrvn> jmp9: you can also do stack_bottom + stack_size - KERNEL_BASE
12:39:13 <mrvn> No need to add the size as second opcode.
01:01:10 <jmp9> Okay
01:01:16 <jmp9> If i want map it to 0xC0000000
01:01:18 <jmp9> mov edi,(page_directories - 0xC0000000 + 0x300)
01:01:29 <jmp9> this is correct offset in page directories?
01:05:26 <geist> well, do the math. do you know how?
01:06:01 <jmp9> oh, I get it
01:06:11 <jmp9> I mapped 0xC0000000
01:06:20 <jmp9> but I didn't mapped first 1 MB
01:06:21 <geist> what did you figure out?
01:06:30 <jmp9> so it's crashed when I enabled paging
01:06:51 <geist> yep, you have to keep the current thing you're unning in mapped
01:06:59 <geist> unforunately there's no way to enable paging and jump at the same time
02:15:57 <jmp9> Okay it doesn't work anyway
02:51:28 <jmp9> oh i'm fucking hate when i do mistakes in assembly
02:51:52 <jmp9> why i can't just write code and it works, why i'm making mistaeks
02:52:44 <zhiayang> froggey: ah ok, thanks!
02:53:22 <mischief> jmp9: relax.
02:53:32 <mischief> programming is hard, let's go shopping
02:57:35 <nyc> As long as someone else is paying, let's please go shopping!
02:57:43 <nyc> I need a new laptop!
03:02:21 <nyc> I can't figure out how to configure laptops for more RAM etc. on lenovo.com
03:05:34 <jmp9> i have installed 16 GB of ram on my laptop
03:05:35 <jmp9> asus k56c
03:05:39 <jmp9> everything fine
03:05:44 <jmp9> and i'm happy with this
03:06:32 <doug16k> 16,777,216 KB ought to be enough for anyone
03:07:14 <nyc> Well, I could really use more computing power so I can code for larger simulated systems in qemu.
03:07:33 <nyc> Though I'm not to the point where it makes that big a difference yet.
03:08:38 <eryjus> nyc, you're having enough trouble with lenovo (isn't your current laptop a lenovo?) -- why go back for more pain?
03:10:08 <nyc> eryjus: I just don't have the parts (bolts & replacement fans), tools (torx screwdriver), and supplies (screw glue/paste/etc.) to service it. It's actually held up really well. It is a bit small to run large VM's on, though, and it's a wee bit short on SSD space, too.
03:12:04 <nyc> eryjus: Something with more cores & threads, more RAM, and more SSD space would be helpful beyond still having intact fans and case and such.
03:12:41 <nyc> esp. once I start running SMP VM's.
03:14:57 <eryjus> i get the upgrade -- you misunderstand.. i thought your current laptop was a lenovo and I thought it was not old enough to justify the problem you were having.
03:15:11 <knebulae> I got a hell of a deal on an aluminum HP @ Microcenter (if you have one in the tristate). quad i7, 16GB, 512GB NVME, GeForce GT 940 (bleh, I know, but not integrated), 15" 1080p non-touch for $799. It ticked all the boxes, despite being an HP. It's been a year, holding up great.
03:17:03 <eryjus> i invested in an esxi server with a drobo several years ago. the biggest problem I have had was the extra physical disk I wrote my file-level backups to crapped on me. if you can afford it, the drobo is the way to go for disk.
03:17:16 <nyc`> eryjus: It dates to 2013.
03:17:46 <knebulae> @eryjus: you too?
03:18:26 <eryjus> which me too?
03:19:30 <knebulae> @eryjus: likes drobo stuff
03:19:34 <knebulae> sorry
03:19:54 <eryjus> nyc, that's older than i thought.
03:20:05 <nyc`> I just need to get to where I can crowdfund hosts and have people ship engineering samples to port to.
03:21:05 <eryjus> knebulae: yup! totally sold on it.
03:25:33 <jmp9> I don't know why
03:25:37 <jmp9> but some weird shit happens
03:25:42 <jmp9> that breaks everything
03:25:48 <jmp9> esp register get trashes
03:29:30 <nyc`> I think some of the higher-end packages have 24 cores.
03:29:58 <nyc`> Maybe they're only threads.
03:31:11 <nyc`> I think there are some that have b4 threads per core, too.
03:34:21 <nyc> https://www.intel.com/content/www/us/en/products/processors/xeon/scalable/platinum-processors/platinum-8176.html
03:37:16 <jmp9> I don't know why it doesn't work
03:37:31 <jmp9> page tables and page directories FINE
03:37:55 <jmp9> after it do or 0x80010000 to cr0
03:38:07 <jmp9> mov eax,cr0 or eax,0x80010000 mov cr0,eax
03:38:09 <jmp9> it crashes
03:39:14 <eryjus> jmp9: cr3 is set?
03:39:17 <jmp9> yes
03:39:23 <jmp9> mov eax,(page_directories - 0xC0000000)
03:39:26 <jmp9> mov cr3,eax
03:39:38 <nyc> https://en.wikipedia.org/wiki/SPARC_T5 <--- whoa that would be bloody awesome
03:40:01 <eryjus> what is the value of page_directories - 0xc0000000?
03:40:18 <eryjus> in particular, is that a physical address?
03:40:21 <jmp9> yes
03:40:34 <eryjus> your stack is mapped?
03:40:34 <jmp9> it have correct page directory entries
03:40:42 <jmp9> stack yes
03:40:52 <jmp9> my kernel uses only 1 MB, but i mapeed 16 BM
03:40:54 <jmp9> 16 mb
03:41:03 <jmp9> and stack is 32 KiB
03:41:08 <jmp9> so no reasons to worry about stack
03:41:08 <nyc> ```The 64-bit SPARC Version 9 based processor has 16 cores supporting up to 128 threads per processor, and scales up to 1,024 threads in an 8 socket system.[4] Other changes include the support of PCIe version 3.0 and a new cache coherence protocol.[5]```
03:41:19 <eryjus> and the code you are executing is identity mappes as well as the upper memory?
03:41:23 <jmp9> yes
03:41:23 <eryjus> *mapped
03:41:33 <jmp9> it's identity maps first 16 MB
03:41:40 <jmp9> and maps first 16 MB to 0xC0000000
03:41:47 <eryjus> ok, do you have interrupts set up?
03:42:07 <jmp9> I disabled interrupts
03:42:10 <eryjus> can you take a #PF exception?
03:42:14 <jmp9> no
03:42:36 <nyc> ```On October 26, 2015, Oracle announced a family of systems built on the 32-core, 256-thread SPARC M7 microprocessor.[15] Unlike prior generations, both T- and M-series systems were introduced using the same processor.```
03:42:36 <jmp9> is it important?
03:42:46 <jmp9> because all pages that I use it present in memory
03:42:49 <eryjus> ok, i recommend taking a diversion to get that running... produce a register dump when you ahve an exception
03:42:55 <mischief> nyc: look at the fujitsu sparc servers
03:43:11 <jmp9> can I do this in debugger
03:43:12 <eryjus> your abillity to debug is going to be severely limited without that info.
03:43:50 <jmp9> I don't know why I should take care about #PF because all pages are present, and i'm just mapping frist 16 MB to execute kernel code
03:43:54 <jmp9> and then did normal mapping in C
03:44:36 <eryjus> because I'm betting *something* is not quite right in the paging tables and you want to be able to know about the cpu state when you have that execption
03:44:39 <jmp9> it crashes immediately after i set cr0 mapping bit
03:44:57 <eryjus> it crashes immediately wheny ou enable paging
03:45:03 <jmp9> yes
03:45:03 <Ameisen> those sparc systems are... a little expensive.
03:45:17 <jmp9> i put jmp $ after mov cr0,eax
03:45:23 <jmp9> and it doesn't hang in jmp $
03:45:26 <jmp9> didn't
03:45:34 <nyc> ```Huge capacity for the largest workloads and consolidation/modernization implementations. Up to a maximum of 384 cores, 3072 threads, 32 TB of DDR4 memory and 928 PCIe slots.```
03:45:43 <jmp9> 32 TB of ram
03:45:44 <eryjus> You are getting a triple fault?
03:45:51 <jmp9> how i can get triple fault
03:45:56 <jmp9> there is disabled interrupts
03:46:14 <eryjus> ahhh.... that's where there is a disconnect
03:46:36 <jmp9> I followed this tutorial
03:46:37 <jmp9> https://wiki.osdev.org/Higher_Half_x86_Bare_Bones
03:47:05 <eryjus> depending on your system/emulator, a triple fault might reset (reboot) the computer or just hand and report a fatal problem (hardware typically reboots)
03:47:15 <jmp9> QEMU
03:47:32 <jmp9> triple faults triggers when there is exception in exception handler
03:47:37 <eryjus> now, think this through -- if you have a problem with a page when you enable paging, you will get a #PF (page fault)
03:47:40 <jmp9> but there is no exception handler yet
03:47:46 <jmp9> huh
03:47:46 <jmp9> ok
03:48:14 <eryjus> if you are not able to handle the expction because you do not have the interrupt tables set up, then you will get a #DF (double fault)
03:48:36 <jmp9> how I can supposed to do it in fcking assembly entry?
03:48:41 <jmp9> entry function
03:48:43 <eryjus> if you are not able to handle the #DF, then the system panics because it is a triple fualt and reboots.
03:49:12 <eryjus> even with interrupts disabled and no interrupt tables set up -- you still need to handle CPU faults and traps
03:49:28 <jmp9> how can I do this with QEMU?
03:49:55 <eryjus> which part? You need to write a basic exception handler and build an IDT table
03:50:29 <eryjus> then (similar to your GDT) you load its location to the cpu
03:50:43 <nyc> I don't believe my ideas about 2**16 = 16K CPU's was that far off. We've already got 3K on Fujitsu boxen.
03:50:52 <jmp9> align 4096; page_directories resb 4096; page_tables resb 4194304
03:51:08 <jmp9> this in .bss section
03:52:15 <jmp9> oh, osdev is a hard sex, but i'm getting in love with this masochism
03:52:47 <knebulae> @nyc: it's coming. it's going to get crazy in about 10 years.
03:53:19 <knebulae> on the desktop / workstation front
03:54:27 <nyc> knebulae: Retrofitting will get to be a big big problem.
03:54:38 <nyc> Scalability to RAM will be an issue, too.
03:55:10 <knebulae> Lots of challenges, but so much fun!
03:55:14 <jmp9> 0x00106003 0x00107003 0x00108003 0x00109003
03:55:19 <jmp9> this is my page directory
03:55:20 <knebulae> Seriously... So awesome.
03:55:21 <jmp9> looks legit
03:56:00 <jmp9> wait
03:56:23 <jmp9> i get it
03:56:30 <jmp9> page directory entry points to wrong location
03:56:37 <nyc> knebulae: 2**16 CPU's * 64GB = 2**36 * 2**16 B RAM = 2**52 B RAM = 4PB RAM
03:56:47 <jmp9> whoever will need so much memory
03:56:51 <jmp9> it's like TempleOS
03:56:58 <jmp9> running with 16 gb of ram and 64 cores
03:57:17 <jmp9> he show this in one of his streams
03:57:38 <jmp9> but entire TempleOS fits in 2.1 megs
03:57:57 <eryjus> jmp9: i offer my advice. paging is one of those things that is complete misery to debug and get right but once you do you really don't have to look back on it.. while you are in misery, give yourself the tools to make that process as short as possible.
03:59:09 <nyc> jmp9: You generally want your kernel overhead to be minimal. One might want to argue from asymptotics rather than absolute overheads, though, e.g. O(lg(RAM)) overhead or similar.
03:59:47 <jmp9> I will not use swapping
03:59:50 <jmp9> huh
03:59:53 <jmp9> but which tools?
04:01:13 <nyc> One thing that burns a lot of kernels is having an O(RAM) set of data structures like Linux' mem_map[].
04:01:32 <nyc> You can't hope to get to O(lg(RAM)) with something like that.
04:01:39 <eryjus> jmp9 -- create an exception handler for your kernel and for all the exceptions dump the registers
04:01:49 <jmp9> oh okay
04:01:51 <eryjus> you really want cr2 and ip
04:01:57 <jmp9> I already created exception handlers
04:02:05 <jmp9> and i have C handlers so this will not be a problem
04:02:24 <eryjus> you have an IDT?
04:03:25 <eryjus> i got distracted... you will want the other registers as well, but ip and cr2 will be thr first 2 registers you look at in your dump
04:03:33 <jmp9> yes
04:03:34 <jmp9> and
04:03:39 <jmp9> i fixed page dirs
04:03:43 <jmp9> it works!
04:04:42 <jmp9> Yeah
04:04:43 <jmp9> It works!
04:05:19 <eryjus> awesome.
04:09:09 <jmp9> Wow
04:09:16 <jmp9> I passed to IDT virtual address
04:09:18 <jmp9> and it works
04:09:21 <jmp9> huh
04:21:28 <jmp9> if i do push ds
04:21:34 <jmp9> it will push 4 bytes to stack?
04:21:51 <nyc> Hmm. jal to save the return address and then trying to compute from there the distance to _GLOBAL_OFFSET_TABLE_ might be the only way to compute this.
04:23:35 <nyc> I can't entirely figure out the difference between what's being held in $gp and _GLOBAL_OFFSET_TABLE_, though. It's clear they are different, but there aren't many hints as to where the offset is coming from.
04:24:59 <doug16k> jmp9, it will in 32 bit mode
04:25:18 <jmp9> ds is 16 bit or 32 bit in pmode?
04:26:16 <doug16k> it will push 32 bits unless you tell it to push 16 bits with an operand size override
04:27:12 <jmp9> push ds will take 4 bytes, right?
04:30:02 <nyc> I'm trying to cmdline arg the GOT away.
04:32:36 <jmp9> how MUCH will take push ds?
04:32:43 <jmp9> in 32 bit pmode
04:33:04 <doug16k> yes 4 bytes, 32 bits
04:34:32 <jmp9> thank you very much!
04:35:29 <doug16k> but if you tell the assembler to push word, it will push 16 bits, so don't tell it that. just say to push %ds and it should do the right thing
04:36:57 <jmp9> ok
04:37:12 <jmp9> struct __attribute__((packed) name {};
04:37:15 <jmp9> or
04:37:25 <jmp9> struct name __attribute__((packed)) {};
04:37:28 <doug16k> if you did want to pack them together and they are uint16_t or otherwise 16 bit, then you do want to push word
04:38:15 <doug16k> I put that attribute before the ;
04:38:26 <doug16k> at the end
04:40:42 <doug16k> in headers you should use __attribute__((__packed__)) in case some nut does #define packed
04:43:22 <doug16k> if they define __packed__ then it's UB and you don't really care what happens
04:44:01 <nyc> https://www.forbes.com/sites/aarontilley/2017/05/16/hpe-160-terabytes-memory/#6f27b8c5383f <--- 160TB system
04:47:07 <doug16k> 47 bits of address lines used if I counted correctly
04:48:26 <nyc> https://www.quora.com/What-is-the-most-RAM-a-computer-has-ever-had <--- A Chinese 1375TB box existed once. UIUC had a 1.5PB box. Oak Ridge 2.7PB. So multi-PB boxen are definitely starting to go around.
04:48:30 <doug16k> no must be all 48 actually
04:50:30 <doug16k> 128GB dimm though?
04:51:14 <geist> yah a 1TB machine is easy now, and that's 40 bits
04:51:19 <geist> so it gets up there pretty fast
04:51:59 <doug16k> hints at an optocoupled interconnect to handle the distance better
04:52:07 <nyc> doug16k: NFI. My main thought is that the scale of memory needs to be thought of as getting much larger than present-day kernels are designed for.
04:54:08 <doug16k> my physical allocator runs out of breath at 8TB. soon to be 8TB per node
04:54:37 <doug16k> 2^31 * 4KB in case you wonder where that comes from
04:54:49 <geist> yeah 44 bits is a common one
04:55:01 <geist> i think linux may do that as well, without at least setting some flag to expand it
04:55:31 <nyc> Signed 32-bit page frame numbers?
04:55:39 <geist> unsigned
04:55:45 <geist> 32 + 12
04:55:49 <doug16k> I reserve the upper bit in my case
04:55:59 <geist> oh for 8, yeah
04:56:05 <nyc> doug16k: So that's not a sign bit, okay.
04:56:27 <nyc> It doesn't seem like a hard limit to lift.
04:56:28 <doug16k> right, but is conveniently places so that the compiler can do tricks as though it is signed :)
04:56:34 <doug16k> placed*
04:57:01 <doug16k> js will check that top bit even if it isn't really a sign bit, etc
04:58:03 <doug16k> yeah could expand it to infinity by using 64 bit values
04:58:18 <doug16k> 2^64 ~= inf
04:58:36 <doug16k> for low values of inf
04:59:28 <nyc> I don't think that there will be so much difficulty representing the memory as there will be with the efficiency of algorithms to manage that much memory.
04:59:55 <doug16k> it would allow for 2^75 bytes of memory per node
05:00:36 <geist> an example of where that screws up everything: a cavium thunder x1 machine has a highly irregular physical memory map
05:00:52 <nyc> doug16k: Well, just the bits to track pfn's etc. isn't such a big problem.
05:01:12 <geist> with node 0s ram at 0 and node 1...Ns ram being something like [N:44 bits of address]
05:01:22 <doug16k> I handle arbitrarily bad memory maps. if it is excessively bad then I'll have higher overhead
05:01:27 <geist> ie bit 45 ond 46 encode the node number
05:01:38 <nyc> geist: Just tracking ranges would have no problem with that.
05:01:45 <geist> or something like that. you end up with a gigantic jump there
05:01:46 <doug16k> ah, in that case you'd use separate allocators per node
05:01:59 <geist> right, but if you start passing around a PFN with a 32bit number it'll exceed it, is all
05:02:03 <geist> because of the 44 bit thing
05:03:13 <nyc> geist: That doesn't seem like too big an issue. I think discontiguous memory isn't handled well by a lot of kernels and there's a lot of linear searching out there, too.
05:03:53 <geist> it isn't, just pointing out that the assumption that ram is largely 0...+ is not always the case
05:04:06 <geist> ie, ram doesnt *have* to be clustered around 0, like it is on PC platforms
05:05:07 <nyc> I'm trying to cmdline the GOT away on MIPS. Something should eventually pop up. I may have to ditch gcc as a wrapper around as or some such.
05:05:48 <geist> just load the $gp, i suspect you can't on mips64
05:06:02 <geist> i played around with godbolt a bunch last night and it seems that the GOt is far more of a thing on mips64
05:06:27 <geist> it sees to act effectively as a PC relative like mechanism
05:06:33 <geist> presumably because i think it's fairly hard to get to PC on mips?
05:06:41 <nyc> geist: My big triple whammy for big RAM involves (a) secret sauce (b) contiguity reduction to O(fragments) (c) further reduction to O(lg(fragments))
05:07:27 <nyc> geist: jal 1f; 1: addiu $t0, $ra, 0 or some such.
05:07:51 <geist> yah which is probably not a good idea
05:08:06 <geist> because of the overhead of branches, branch predictorr, return predictor, etc
05:08:14 <geist> and the delay slot
05:08:29 <nyc> Well, saving $ra so as not to clobber it helps. And I'm not too worried about efficiency in early boot.
05:08:31 <geist> hence why the ABI just hard pinned a register to make it much simpler
05:08:56 <geist> you could pretty easily use the trick to load $gp by doing some sort of 'get current pc, add to it the distance betwee _gp and .
05:08:59 <geist> '
05:09:03 <geist> seems like that should work
05:09:11 <geist> the linker should be able to patch up that constant
05:09:58 <nyc> I get errors from the assembler about expressions being too complex from trying to subtract two addresses that should resolve to a small difference at link time.
05:10:09 <geist> well, okay then. i dont care
05:10:13 <geist> figure it out.
05:10:47 <nyc> Something will work out.
05:11:18 <geist> yah
05:11:31 <geist> i need to quit trying to solve this. you keep taunting me with a problem
05:11:38 <geist> every time you state the problem i can't help but try to fix it
05:11:43 <geist> so... i need to let it lie
05:12:06 <nyc> With memory maps like MIPS' or a lot of things, the actual RAM capacity of the 64-bit physical address space is actually nowhere near a full 64 bits.
05:12:46 <doug16k> what's a non-x86 modern desktop with lots of pcie and ddr4, usb, etc?
05:12:50 <geist> yah that static carving up of virtual memory like that is not really a new thing
05:13:01 <nyc> So I foresee a PAE-like affair ongoing with 64-bit to compensate for the "routing bits" that reduce the actual capacity.
05:13:09 <geist> doug16k: the thunderx2 machine at work i have is pretty much the same thing, just with an arm64 machine
05:13:25 <geist> and it take easily take at least one TB ram, since a few folks at work have upgraded theirs
05:13:59 <doug16k> it's an aarch64 then?
05:14:03 <geist> yes
05:14:13 <geist> UEFI, ACPI, PCI, etc
05:14:16 <nyc> I have my doubts anything will really move to > 64-bit virtual addressing, though.
05:14:18 <geist> pretty much a standard looking thing
05:14:37 <doug16k> can qemu emulate it easily?
05:14:58 <geist> not precisely it, but it can emulate a pretty generic machine
05:15:02 <doug16k> sounds like a lot of PCI hardware would plug right in
05:15:04 <geist> effectively the same thing. -machine virt
05:15:19 <geist> the memory map is different and whatnot but you can use uefi/acpi/pci
05:15:26 <geist> so in tht regarrd it's a 'compatible' arm64 machine
05:15:26 <doug16k> ah so if I got that to work it wouldn't be too far off from really working ?
05:15:32 <geist> basically yes
05:15:47 <doug16k> thanks
05:16:04 <geist> -machine virt is a good starting point on a lot of qemu platforms
05:16:09 <doug16k> do I remember you (geist) mentioning that there is an OVMF for arm ?
05:16:10 <geist> there's one for rriscv too
05:16:17 <geist> yep. you can apt get it even
05:16:21 <doug16k> awesome!
05:16:27 <geist> apt search for uefi and aarch64
05:16:33 <doug16k> my EFI code should just work then
05:16:34 <geist> i forget the precise name. it installs in /usr/share somewhere
05:16:52 <doug16k> or nearly work anyway
05:16:56 <geist> you can then pass -rom <ovmf image> to qemu and it boots that
05:17:07 <doug16k> yes I do that for EFI now
05:17:13 <doug16k> the ubuntu apt get one works fine
05:17:15 <geist> and i found hat if you pass -kernel <your PE loader> it hands it dirrectly to the ovmf without actually reading from storage
05:18:30 <doug16k> I prefer to pump out an actual disk or whatever and actually boot it, so I continually test my actual boot process
05:18:33 <geist> but otherwise it'll do the full storage boot thing. the default boot file is a different name for arm64, but it works the same way for uefi
05:18:38 <geist> boot-aarch64.efi or something
05:18:45 <doug16k> cool
05:18:54 <geist> yeah it's kind of neat
05:19:06 <doug16k> nice so a single distro could have all arches right there ready to go
05:19:16 <doug16k> all two anyway :P
05:19:22 <geist> yah exactly. i dont think anyone does in practice but you could
05:19:34 <geist> i remember reading the uefi and acpi specs and riscv has at least got their extensions in too
05:19:45 <geist> i dunno if there's a prebuilt ovmf for riscv, but i suspect there will be soon if not
06:39:27 <nyc> All I had to do was cpp separately from as instead of calling gcc as a driver.
06:47:26 <nyc> Yay, no more GOT.
06:50:02 <nyc> https://pastebin.com/LehSYprc <--- look ma, no GOT
06:51:34 <nyc> https://pastebin.com/bX3602Jc <--- disassembly sans GOT
06:56:33 <nyc`> I should stop faulting and be able to work on bitbanging serial IO. Then with that I should move on to getting C to work. Then I'll move on to getting other arches up to hello world in asm and jumping to C. Then I sweep them all to get gdb stubs going.
06:58:29 <geist> how did you get rid of the got?
06:59:01 <geist> _gp is still there, i suspect you're still going to end up being required to use it
06:59:05 <nyc> geist: I cpp'd with actual cpp instead of using gcc as a driver.
06:59:06 <geist> keep an eye out for it
06:59:23 <geist> uh.okay.
06:59:49 <nyc> geist: Well, I can actually write the code to set it up in asm now instead of having all address calculations to compute what to put in $gp mangled by the gcc driver.
07:00:01 <geist> but isn't the C code just going to end up using it anyway?
07:00:15 <geist> as in, why do you think you dont need to set $g0?
07:01:08 <nyc> geist: Yeah, getting C working will mean putting things into $gp. I'm being super-duper-minimal and outputting a string through an asm port without even setting up $sp, never mind $gp.
07:01:28 <geist> okay
07:01:42 <nyc> s/asm port/serial port in asm/
07:02:33 <nyc> I'm trying to track down VAX microcode for simh as we speak.
07:02:44 <nyc> NetBSD eat my dust!
07:03:15 <geist> i think the micrrocode/rom stuff is part of simh
07:03:20 <geist> you shouldnt need to track it down
07:04:46 <nyc> Ubuntu's package description says: DEC VAX (but cannot include the microcode due to copyright)
07:04:57 <geist> ah, just build it from source
07:05:07 <geist> it's super easy. plus i think the ubuntu package is incredibly out of date
07:05:24 <nyc> I'll work it out probably a long while after I move on to SPARC etc.
07:05:55 <nyc> geist: My special sauce will make things like VAX with 512B pages and MIPS with 1KB pages very happy.
07:05:55 <geist> alrright
07:06:05 * geist tries not to get involved
07:07:02 <nyc> geist: I was trying to say that there's an architectural feature that makes there a meaningful point to doing something with the VAX.
07:07:26 <geist> right. i understand your point
07:15:20 <nyc> Actually, it does something useful on x86, too, so I guess it might make sense to (gulp) do things there, too, but I've already done bare metal from zero code on x86 before.
07:18:30 <nyc> It's emulating 128MB RAM so 64MB into kseg0 should actually work. Hmm. It looks like it's faulting on the first instruction, which isn't a memory access.
07:19:51 <nyc> (qemu) x/2i 0xffffffff84001000
07:19:52 <nyc> 0xffffffff84001000: lui t5,0x0
07:19:52 <nyc> 0xffffffff84001004: lui at,0x8401
07:20:06 <nyc> x seems to think the instructions are there.
07:48:31 <doug16k> any ideas on a good way to associate modules with pci or usb device classes, vendors, and ids? I was thinking of having a system defined program header type that points out some kind of device info data structure within the module, then I can iterate each module and cache the info in a file
07:50:04 <doug16k> then at boot time match each pci device against the cached module info and load needed driver modules that way
07:51:40 <nyc`> I should probably look up the physical addressing in VAX MMU's to see what the theoretical RAM capacity of the VAX was. There are probably hacks I could do to simulators to make for even more dramatic demonstrations.
07:52:47 <nyc`> doug16k: The precise type of the ID's to match will probably need to depend on the bus.
07:53:32 <nyc`> doug16k: That'll probably matter more when you get ti systems with different buses.
07:53:39 <doug16k> yes, some overlap some specific
07:54:02 <doug16k> well, I already have multiple bus drivers. the pci drivers, and pci class drivers on the usb bus
07:54:12 <doug16k> usb class*
07:55:26 <doug16k> but they are highly similar. both have the concept of device classes and vendors and device ids
07:55:29 <geist> nyc: i think something like 256 or 512
07:55:45 <nyc> geist: MB?
07:55:48 <geist> yes
07:56:00 <geist> iirc vax has a similar thing to kseg0 or whatnot
07:56:07 <geist> so i think the max ram is limited to whatever that aperture is
07:56:48 <geist> https://en.wikipedia.org/wiki/VAX_8000 down in the memory thing says something like 512MB
07:56:57 <geist> which is i think right. a vax 8000 was a big machine
07:57:49 <geist> ah a vax 9000 may have been more
07:58:39 <nyc> https://en.wikipedia.org/wiki/VAX_7000/10000 <-- max 3.5GB
07:59:03 <geist> ah yes. i was just about to paste that same line
07:59:12 <geist> so i guess the kseg0 thing wasn't a limit per se
07:59:28 <geist> most likely the .5 was were peripherals and whatnot were
08:00:33 <geist> vax paging is a little funny. worth looking at
08:00:42 <nyc> 3.5GB is approaching the limits of what I can reasonably simulate/emulate, so it's enough to make a point with 512B pages and kernel virtualspace strictly limited to 1GB.
08:00:46 <geist> it's kind of a radix tree page table, except iirc it's not heirarchial
08:01:09 <nyc> geist: Recursive IIRC.
08:01:11 <geist> it's a single flat page table, one for kernel (in physical) and then the userr page table is virtual
08:01:20 <geist> so it probably lives in kernel space somewhere, and has a max len
08:01:56 <geist> and the kernel one doesn't need to be translated since you probably locate it in the direct physical map region
08:02:12 <nyc> Four base/bound pairs, okay.
08:02:14 <geist> note that simh emulates a fairly old vax. one of the originals
08:02:49 <geist> well, a 780 and microvax i think. it's not going to emulate the mega ones. probably 64MB or so max, which is likely to be larger than the original machine its emuating would have ever realistically had
08:03:20 <geist> also fun fact: the living computer museum up here has a few vaxen you can get accounts on
08:03:29 <geist> one is running VMS, it's a dual core 7000 i think
08:03:40 <geist> and another one is a 11/785 running one of the BSDs
08:03:47 <nyc> geist: I'll probably just have to hack the source to enable large memory usage.
08:03:50 <geist> a third one is a 11/730 running a current BSD
08:04:02 <geist> nyc: i doubt that'll work. it'll piss off the firmware
08:04:21 <nyc> If I can build the firmware, too, then it'll be okay.
08:04:21 <geist> these things boot up a fairly large amount of firmware that does self tests and whatnot
08:04:27 <geist> you can't build the firmware. they're rom
08:04:32 <geist> rom dumps from original machines
08:04:34 <nyc> Ouch.
08:05:00 <geist> simh is there to emulate a few old legacy machines. it's not intended to let you tweak tons of knobs and do unrealistic things. it's far more about properly emulating a few real machines
08:05:07 <geist> primarily so you can go run old software on it
08:05:55 <mrvn> You know, for those mission critical things in case one of the real machines goes down and you have to "replace" it.
08:06:02 <geist> and for that
08:06:15 <nyc> I'll give putting more RAM in things a shot just to see if I can show off some things my algorithms will let me handle well, but won't really stop the world to make it work or anything.
08:06:16 <geist> i have a pair of rpis here emulating a pdp8 and pdp11 24/7
08:06:27 <geist> running simh with an attached fake front panel
08:06:46 <geist> nyc: heh, wait until you get ahold of the source code to simh. it's.... esoteric
08:07:00 <nyc> What was the PDP-11 vendor called? TOAD?
08:07:30 <nyc> PDP-10 sorry.
08:07:33 <geist> case in point: https://github.com/simh/simh/blob/master/VAX/vax_cpu.c
08:07:50 <geist> TOAD i think made an asic version of a PDP-10 yes. there are a few at the living computer mseum
08:07:59 <geist> looks like a 1u rrackmount
08:08:01 <nyc> https://en.wikipedia.org/wiki/XKL
08:09:02 <geist> they also have a full PDP10 and decsystem 20
08:09:08 <geist> it's a really neat museum
08:09:28 <nyc> I can probably get more awe-inspiring than just the VAX doing some sort of PAE analogue (if there isn't an emulated hardware one preexisting) on 32-bit MIPS with 1KB pages.
08:09:51 <nyc> The only trouble will be with the capacity of my systems to emulate it.
08:10:24 <geist> https://photos.app.goo.gl/myMrh5NLsyWEQ7zB9
08:10:38 <geist> vax directly in the back, the little blue thing on the table is a TOAD
08:10:46 <geist> back right corner is a PDP-10
08:11:32 <nyc> I think 32-bit MIPS specifies a 36-bit physical address space.
08:12:28 <geist> https://photos.app.goo.gl/WrrY2NS3BoAQPzHUA is a fun machine. pdp-11 running unix v7
08:12:41 <geist> you can telnet into it and send messages to the line printer and surprise people walking around the museum
08:14:43 <nyc> With a 512MB kernel virtualspace and 1KB pages a 64GB MIPS32 box will be an impressive demo. It would be like a 64GB 3/1 split x86 with a factor of 2 (for address space) * 4 (for page size) = 8 times as many pages.
08:16:32 <nyc> So beyond just the spectrum of page sizes, the 32-bit MIPS affairs have a lot going for them.
08:22:24 <nyc> The VAX is still worth doing to make a point though.
08:25:17 * mobile_c when 80% of 91,123 results are ".dt_init = mdm4x_pon_dt_init," and "if (l->l_info[DT_INIT] == NULL" with a total of 100 pages to look through -_- https://github.com/search?l=C&p=67&q=DT_INIT&type=Code starting to think its 20% l->info 60% mdm4x_pon
08:27:33 <nyc> Hmm! It's taking an exception on an address in the IO space with the PC logged at a different instruction!
08:28:18 <mobile_c> definitly 20% libc 60% android kernel
08:31:25 <nyc> Okay, a different address didn't fault.
08:31:57 <nyc> Now I'm foggy on why there's no output.
08:32:49 <mobile_c> its like people just fork tf out of popular software to fill the search results with spam
08:33:33 * geist tries so hard not to keep trying to help nyc
08:34:04 <nyc> I already found the linuxmips docs on the hardcoded addresses for ISA IO space.
08:34:40 * doug16k hands geist a cold glass of water
08:35:24 <nyc> Well, it's executing the IO space accesses without raising exceptions, if that's any consolation.
08:37:15 <nyc> I'm in for the long haul. With what I've got in mind to prove in all this, I'm en route to developing an intense and long relationship with MIPS.
08:38:25 <nyc`> I'll bug people if I stay stuck on something too basic for too long.
08:39:02 <mobile_c> https://github.com/search?l=C&p=100&q=DT_INIT&type=Code AND it is 99% spam ._.
08:51:41 <doug16k> R_AMD64_PC16 signed? must be right? that'd be what? an address size override near call to something within +/- 32KB?
08:52:36 <nyc> I don't understand the question.
08:52:55 <doug16k> it's a relocation type, probably rarely needed. I cover every one though
08:53:03 <doug16k> just not 100% sure of its required semantics
08:53:07 <nyc> I know it's a relocation type.
08:53:29 <geist> doug16k: it's not documented in the abi?
08:53:57 <doug16k> let me double check. extremely vaguely IIRC.
08:54:21 <geist> usually they do something like S + A + R or whatnot, and they fairly concretely define what thetypes of those are
08:54:47 <doug16k> word16
08:54:54 <doug16k> S + A
08:55:16 <doug16k> ah that wouldn't be pc relative actually
08:55:19 <doug16k> it'd have P
08:55:39 <geist> hmm, you sure?
08:55:53 <doug16k> S + A means symbol + addend
08:56:01 <doug16k> PC relative ones have "- P" term
08:56:28 <geist> aaah okay, yes
08:56:53 <geist> that being said, the manual i'm looking at calls it
08:57:19 <geist> R_X86_64_PC16 and the formula is S + A - P
08:57:28 <geist> the one just above it is R_X86_64_16 and is just S + A
08:57:46 <doug16k> ya that one
08:57:48 <geist> type 13 and 12, respectively
08:57:51 <doug16k> that one must be absolute
08:57:59 <doug16k> 12 must be absolute
08:58:00 <geist> so the PC16 does have a - P in it
08:58:50 <geist> my guess is the non PC16 one is extremely rare
08:58:54 <geist> i wonder what would cause it?
08:59:15 <doug16k> .short something something: .int 42
08:59:22 <doug16k> then something linked < 64KB line
08:59:39 <geist> and only absolute, which is not likely to ever happen
09:00:11 <geist> and even then the .short something will likely not assemble
09:00:40 <doug16k> trying it now :D
09:00:42 <geist> if it's patching an instruction with a 16bit field in it
09:00:58 <doug16k> ya an address size override
09:01:06 <geist> maybe then maybe
09:01:07 <doug16k> 16 bit addressing mode
09:01:24 <geist> but dunno if you can actually do that in 64bit mode
09:01:30 <geist> i always forget those sort of details
09:01:31 <doug16k> mov something+42(%bx),%eax
09:01:42 <doug16k> ridiculous yes, but possible
09:01:50 <geist> anyway.
09:01:55 <doug16k> going to be relocation truncated to fit every time for me
09:02:00 <geist> right
09:02:10 <geist> that's usually what happens with that sort of thing in assembly
09:02:11 <doug16k> I don't allow use of 0 to 4MB
09:02:20 <geist> right, nor does pretty much any OS
09:02:41 <geist> so then if you're linking it PIC with the base address of 0, the PIC code should also not assume the pointer is short
09:02:49 <geist> so it'd expect a full 64bit in that case
09:03:10 <geist> so what might do it is if it had some sort of split field somewhere
09:03:18 <geist> where it needs to put the bottom 16bits of the address somewhere
09:03:30 <geist> this sort of thing happens all the time on risc machines, where you can't load the entire address in one instrruction
09:03:43 <geist> arm64 has a bunch of relocation types that are like 'take the bottom N bits of address and put here'
09:04:17 <geist> there's even an assembler instristic for it as a prefix to the symbol
09:04:24 <geist> somethng like :lo12:symbol
09:04:51 <doug16k> neat. bit like the @GOT thing or whatever
09:04:59 <geist> yah kinda
09:05:31 <geist> arm64 has a very powerful PC relative address calculation instruction: adrp
09:06:06 <geist> so you get something like adrp x0, <symbol>; orr x0, x0, :lol2:<symbol>
09:06:18 <geist> that will constrruct the full address of a symbol within 4GB ofPC
09:06:35 <geist> since ldr/str instructions have 12 bit reach you can actually avoid one instruction
09:06:52 <geist> adrp x0, <symbol>; ldr x1, [x0, :lol2:<symbol>]
09:06:58 <doug16k> cool
09:07:15 <geist> adrp has a 22 bit immediate and gives you the page aligned address of a thing relative to PC
09:07:20 <geist> within 4GB
09:08:02 <geist> i think riscv has something like this too
09:11:29 <nyc> It looks like my ISA addresses are off by a nybble.
09:12:26 <doug16k> how can an address be off by half a byte?
09:12:50 <nyc> doug16k: Someone typed an extra 0.
09:13:19 <geist> oh more like they're shifted over an extra nibble
09:13:38 <nyc> yes
09:15:04 <nyc> At least some ISA-relevant memory region says that its starting point was different from what was listed on https://www.linux-mips.org/wiki/QEMU#Memory_map
09:15:14 <nyc> That is, in gdb.
09:16:04 <doug16k> info qtree in qemu monitor should tell you those things quite explicitly
09:16:19 <doug16k> it will show addresses and things for every device
09:18:38 <nyc> Hmm, mtree/qtree say it's basically at zero.
09:20:03 <doug16k> what -machine do you use?
09:20:18 <nyc> -M mips
09:21:24 <doug16k> look at address-space: cpu-memory in info mtree
09:21:50 <nyc> 0000000010000000-0000000010ffffff (prio 0, i/o): isa-mem
09:22:03 <nyc> 0000000014000000-000000001400ffff (prio 0, i/o): alias isa-io @io 0000000000000000-000000000000ffff
09:22:45 <doug16k> ya
09:34:58 <nyc> I might as well throw in something to loop until the line status register says it's ready.
09:49:59 <nyc> hmm. It seems to be seeing a 1 bit in the CTS bit of the MSR but I'm not seeing the output.
09:54:03 <nyc`> I'm betting that I'm tripping over some issue where the serial output is isn't visible.
10:27:27 <nyc> It now works.
10:33:51 <nyc> https://pastebin.com/wu1Xm9iT
11:44:05 <nyc`> Great, I'm getting some kind of linker garbage trying to link in an empty C main().
11:47:33 <zhiayang> what's the point of ds/ss in long mode?
11:47:42 <zhiayang> cs determines the cpl, but the data segments do what, exactly?
11:49:05 <zhiayang> (the real question here is whether or not i need to save/restore them)
11:49:22 <knebulae> @zhiayang: you need valid seg regs
11:50:01 <zhiayang> knebulae: elaborate, please?
11:50:13 <knebulae> @zhiayang: cs(64); ds=es; fs/gs are for tls.
11:50:27 <zhiayang> yes, ds=es
11:50:33 <zhiayang> but does it matter what they actually ar
11:50:34 <zhiayang> e
11:50:53 <knebulae> @zhiayang: you need pl0 code32, code64, data; pl3 code32, code64, data
11:51:14 <knebulae> @zhiayang: and whatever (if anything) you decide to do with fs/gs
11:51:35 <knebulae> @zhiayang: so it could be ds=es=fs=gs
11:51:56 <knebulae> @zhiayang: oh, and ss
11:53:24 <knebulae> @zhiayang: make sense?
11:54:09 <knebulae> @zhiayang: also a wrinkle- if you use fs/gs for tls, there are instructions to swap usermode/kernelmode selectors (see swapgs instr)
11:57:12 <zhiayang> experiment conclusion: i can load null selectors into everything
11:59:21 <zhiayang> knebulae: yes, i am aware of those
11:59:43 <zhiayang> my question was whether i needed to save/restore ds and es, and the conclusion is no
11:59:44 <knebulae> @zhiayang: it should be noted that code32 segments are not required if you're not executing any 32-bit code (compatibility mode).
12:00:45 <knebulae> @zhiayang: across what boundaries?
12:00:52 <zhiayang> any boundaries
12:01:05 <zhiayang> i don't plan to run any 32-bit code at all
12:01:10 <knebulae> slaps forehead;
12:01:20 <zhiayang> ?
12:03:50 <knebulae> @zhiayang: wait, I'm confused. It's too early. Are you saying you came to the conclusion you don't need to save/restore data segments on, say, entry to an isr or exception handler?
12:04:29 <zhiayang> yes, that's my conclusion
12:04:34 <zhiayang> is that not a legit conclusion
12:04:55 <knebulae> @zhiayang: you're not taking into account that the segments are different at different privilege levels
12:05:21 <zhiayang> does the cpu even check dpl for the segments?
12:05:23 <knebulae> @zhiayang: you have to save/restore them on context switches
12:05:37 <knebulae> @zhiayang: yes
12:05:56 <zhiayang> "A data-segment-descriptor DPL field is ignored in 64-bit mode, and segment-privilege checks are not performed on data segments. System software can use the page-protection mechanisms to isolate and protect data from unauthorized access."
12:06:57 <knebulae> Well, hmm. I just ported to x64, so I guess I am unclear about this.
12:08:10 <zhiayang> pretty much all long mode code i've seen appears to want to load valid selectors into ds/es/ss
12:08:22 <zhiayang> but maybe they assume the kernel will want to run code in compat mode
12:13:24 <knebulae> @zhiayang: maybe that's it.
12:14:00 <nyc> My hello world stopped working.
12:14:10 <knebulae> @zhiayang: but only the code segment differs. 32 vs 64 doesn't matter for data segs.
12:14:55 <knebulae> @nyc: 1 step forward, 2 steps back :)
12:16:40 <knebulae> @zhiayang: since there's no privilege checks or anything else, if you're not running 32-bit code, you may very well be correct- that is, it is not really necessary to even have more than 1 data segment in long mode.
12:17:10 <knebulae> @zhiayang: other than for tls
12:17:25 <zhiayang> lol, i wonder if even 1 is needed
12:17:43 <knebulae> It'd be interesting to test
12:17:47 <zhiayang> you don't even need descriptors for fs and gs
12:18:21 <knebulae> @zhiayang: have you verified that on real hardware and not an emu?
12:18:27 <bcos_> Would check the behaviour of stack segment loads (they were different to data semgent loads for 32-bit, so ...); and check what happens when "NULL segment" is loaded in all the cases
12:18:40 <knebulae> I just wouldn't expect any mainstream operating system to behave that way, due to 32-bit backcompat.
12:18:52 <zhiayang> i haven't tested on hardware, no
12:20:31 <bcos_> Note that there's some funky stuff with SS at CPL=3 (where AMD decided "NULL implies nested" because "SS = NULL" isn't legal for any other case); so if you need a "not NULL, present" descriptor for SS then there's nothing lost by making it look sane for 32-bit)
12:20:44 <bcos_> D'oh. "funky stuff with SS at CPL=0"
12:20:55 <zhiayang> oh right
12:21:07 <zhiayang> the cpu will gpf if ss contains null on iret
12:21:15 <zhiayang> if the target cpl is 3
12:21:16 <zhiayang> ._.
12:21:24 <zhiayang> bamboozled
12:21:38 <knebulae> @zhiayang: you almost got away with it too!
12:21:58 <zhiayang> if it weren't for these meddling teenagers
12:28:08 <nyc> Okay, it's back.
12:28:18 <zhiayang> nyc: mips really sounds like a massive pain
12:28:24 <zhiayang> based entirely on your irc logs
12:30:10 <nyc> zhiayang: Most of it reflects my unfamiliarity with (a) privileged MIPS arch esp. early boot (b) qemu and (c) being 10 years out of practice.
12:32:00 <nyc> Okay, rodata looks like the issue.
12:35:15 <nyc> zhiayang: No matter how painful, MIPS is a goldmine for showing the effectiveness of the algorithms I've cooked up.
12:39:07 <mrvn> zhiayang: I think you need a data segment to switch to 64bit mode and then ds is set.
12:39:18 <zhiayang> uefi does the switch for me
12:39:37 <zhiayang> but yea i know at least in 32-bit mode there needs to be valid descriptors
12:40:10 <mrvn> not sure if syou need it if you use the hack to go from 16bit to 64bi directly.
12:41:06 <mrvn> anyway, I have a data segment and set ds,es,fs,gs and ss at boot just in case.
12:42:57 <nyc> Okay, C stack is set up and hello world is output.
12:45:26 <nyc> It's time to move on to SPARC.
12:46:18 <knebulae> @zhiayang: well I can report that that particular experiment with seg regs resulted in a triple fault post-haste under VBox. Sooo... Hmm.
12:46:32 <knebulae> @zhiayang: sorry, with my codebase.
12:46:39 <nyc> I might take a crack at my build system for a bit, too.
12:46:42 <zhiayang> hm... you mean setting ds/es to 0?
12:46:57 <knebulae> @zhiayang: not saving/restoring seg regs on context switch
12:47:04 <zhiayang> hm, interesting.
12:47:08 <knebulae> @zhiayang: even in pure64 code
12:47:10 <zhiayang> i'll try it in vbox
12:49:31 <knebulae> @zhiayang: sorry, I had an invalid build option. I can build with msvc or clang, but not with clang in msvc mode. Stand by.
12:55:29 <knebulae> @zhiayang: still t.f. on enabling interrupts without saving/restoring data segment :/ But it could be anything, not just *that*.
12:57:15 <knebulae> @zhiayang: and this is on switching code that at least works otherwise.
12:57:41 <nyc> knebulae: https://pastebin.com/wu1Xm9iT <--- it hasn't really changed much apart from a call to main(); the rest really has more to do with other files.
01:00:19 <knebulae> @nyc: right; I do a lot of old school serial port work with the games, but I'm less than worthless at looking at MIPS asm at the moment.
01:01:03 <knebulae> I'm an intel guy; I see three operands and I slowly back away.
01:01:07 <knebulae> :)
01:01:29 <knebulae> Hope nobody saw me come into the room.
01:01:51 <nyc> knebulae: My hitting a milestone of sorts might be nice to see.
01:02:16 <knebulae> @nyc: oh, absolutely. I was actually just making sure my qemu had the mips flavor :)
01:02:36 <knebulae> I was just letting you know I probably couldn't actually be useful.
01:02:39 <mrvn> knebulae: did you set ds/es/fs/gs/ss once at boot?
01:02:52 <knebulae> @mrvn: uefi
01:03:09 <mrvn> knebulae: who knows what that sets
01:03:58 <knebulae> @mrvn: you know, that's probably why there's a triple fault. I never thought to normalize the seg regs after taking over from uefi.
01:04:20 <knebulae> @mrvn: once interrupts start firing, it gets papered over.
01:04:24 <mrvn> do you use es/fs/gs? Might just be ss that is a problem.
01:04:49 <knebulae> @mrvn: not yet; no usermode at all atm.
01:04:59 <knebulae> @mrvn: no kernel threading either.
01:07:03 <mrvn> knebulae: I noticed I don't reload ss, only ds,es,fs,gs but I'm booting with bios+grub.
01:08:55 <knebulae> @mrvn: gotcha. I'm kind of old-school. The very first thing I want is my own stack.
01:09:21 <mrvn> knebulae: me too: movl $(stack + STACK_SIZE), %esp
01:09:23 <mrvn> /* Reset EFLAGS. */
01:09:23 <mrvn> pushl $0
01:09:23 <mrvn> popf
01:09:43 <knebulae> @mrvn: fair enough;
01:10:12 <mrvn> or is there any other way to set EFLAGS?
01:10:27 <knebulae> @mrvn: with uefi, I actually want to invalidate the previous selector entirely, because I have no way of knowing what was actually running on the machine prior.
01:11:10 <mrvn> You can assume cs/ds is valid or you are screwed. But ss doesn't have to be sane for code to work.
01:11:21 <knebulae> @mrvn: ok
01:11:51 <knebulae> @mrvn: right; it'll just chew things up until it gags.
01:11:57 <mrvn> except. Maybe uefi expects you to set the gdt and realod ds without any data access.
01:12:30 <knebulae> @mrvn: by the time you get to setting your own gdt, uefi is long gone.
01:12:41 <mrvn> lgdt gdt
01:12:42 <mrvn> mov $0x10, %ax
01:12:42 <mrvn> mov %ax, %ds
01:12:48 <mrvn> That should work without ds set, right?
01:12:49 <knebulae> In my experience you get dpl0 code, code64 and data. Everything else is a crapshoot
01:13:16 <knebulae> yes
01:13:46 <mrvn> Ok. Then only CS must be set when efi calls you. No idea what the specs say what segments are set though.
01:14:09 <knebulae> well - I don't see how it can't; if the processor isn't verifying privilege, it treats the base as 0 and the limit as MAX, then how could it not work? It's not checking anything.
01:14:44 <mrvn> knebulae: it checks some bits. base+limit is ignored in 64bit but others aren't. At leat for cs.
01:15:01 <knebulae> @mrvn: I'm speaking to data segments, not code.
01:15:44 <mrvn> I have this:
01:15:46 <mrvn> .quad 0x00cf92000000ffff /* __KERNEL_DS */
01:15:47 <knebulae> @mrvn: that was zhiayang's point; if you're pure64, there is effectively no reason to even define a data segment for any pl.
01:15:48 <mrvn> .quad 0x00cff2000000ffff /* __USER_DS */
01:16:00 <mrvn> Must be a reason why I made two different segment descriptors.
01:16:18 <knebulae> @mrvn: that's what everyone else does so they can run 32-bit code.
01:16:48 <mrvn> SYSCALL meddles weith the descriptors for you. Maybe you need it for that.
01:16:52 <knebulae> @mrvn: nvm. that doesn't make sense either.
01:17:02 <mrvn> (been years since I wrote that boot.S code for x86_64)
01:17:55 <mrvn> 32bit needs a USER_CS32 but I think the 64bit USER_DS works there too.
01:18:01 <knebulae> @mrvn: I will have to read the relevant section of the intel manual on sysenter/sysexit (or syscall). I never implemented a fast system call mechanism, so I'm unclear on the semantics of its use.
01:18:30 <mrvn> knebulae: I do remember that it expects segment descriptors in a certain layout to work.
01:19:38 <mrvn> I would also expect a NULL descriptor loaded into a segment register to throw an exception when the segment is used.
01:20:15 <knebulae> @mrvn: right; but you should be able to get by with just 1.
01:20:39 <knebulae> @mrvn: and by 1, I mean 1 valid data segment descriptor.
01:21:28 <knebulae> @mrvn: and you can eliminate saving/restoring those registers from all context-switching code
01:21:42 <mrvn> try it. You need user space though to see if the CPL in the descriptor matters.
01:22:08 <knebulae> It doesn't privilege is not checked; zhayang posted the relevant section earlier this morning.
01:22:37 <knebulae> No privilege check, no limit check on data segments in long mode. Intel says use paging to enforce access.
01:23:35 <knebulae> That would also explain why AMD removed the opcodes to push the seg regs to the stack in long mode. No one could figure it out. The likelihood is this is a performance advantage that may not be on other people's radar. Or I'm just not thinking it through...
01:24:21 <mrvn> knebulae: restoring segments takes a lot of time. No point doing that if it's useless.
01:24:51 <knebulae> @mrvn: right, but others have said (I have not verified) that current, modern oses still do that.
01:25:29 <knebulae> @mrvn: could be out of legacy, or just an incomplete understanding of AMD's intent.
01:26:42 <mrvn> in 64bit mode FS/GS are special and have MSRs to set the full 64bit. DS needs to be sane or nothing works and no idea what ES would be for in 64bit mode.
01:27:13 <mrvn> Can user space load es?
01:28:02 <mrvn> knebulae: I didn't see the full paste earier. But isn't the CPL checked when loading segments?
01:28:16 <knebulae> @mrvn: code only
01:29:09 <knebulae> which makes we wonder if there's a glaring security vulnerability out there. Do people assume the CPL *is* verified?
01:31:50 <mrvn> all OSes use paging so that is covered.
01:35:01 <knebulae> @mrvn: right, but I'm thinking of instances where data from different privilege levels might be mapped into the same process, but only intended for code at a different pl.
01:35:28 <knebulae> @mrvn: like a trampoline, for instance.
01:35:44 <mrvn> it's either mapped KERNEL or USER.
01:36:26 <mrvn> and should be using W^X
01:36:42 <knebulae> @mrvn: right;
01:37:47 <knebulae> @mrvn: I have also seen some misconceptions relating to whether you can have code you can't read on x86en. From the manual it appears you can (I think it's explicit), but the most popular answers online say you can't. :/
01:38:00 <zhiayang> mrvn: ds needs to be sane or nothing works?
01:38:04 <mrvn> knebulae: Note: If there were an opcode to push segment registers to the stack then processes could exchange secret messages by loading stuff in es and reading it back after a context switch.
01:38:33 <knebulae> @mrvn: I hadn't considered the user-accessible seg regs.
01:39:31 <mrvn> zhiayang: I assumed so but maybe not. You need a valid SS because SS=null is used to detect recursion in the exceptions.
01:39:41 <zhiayang> right, that one i realised
01:39:54 <knebulae> @mrvn: 16 year old me would've seen that as an opportunity to increase IPC speed!
01:40:07 <mrvn> should be easy enough to test loading null in DS and see if the kernel keeps running
01:40:23 <zhiayang> it has, so far
01:40:27 <mrvn> knebulae: like put the syscall number in es?
01:40:39 <zhiayang> currently debugging ring3 stuff, so i can't test that yet
01:40:46 <knebulae> Yes, or a pointer to a buffer
01:41:03 <ashkitten> if i ever write a kernel it will never have a version 5.0, because i think linus is a coward for not sticking to his boycott of 5.0
01:41:14 <mrvn> knebulae: problem is you can only load stuff from the GDT. So a small index of preset values.
01:41:30 <ashkitten> it will also never have a 1.0, because i'm a dumbass and can't code for shit
01:42:02 <klange> will i ever have a 2.0? we'll see
01:42:17 <mrvn> I'm tempted to make my kernel version aproach the golden ratio.
01:43:04 <knebulae> @mrvn: why couldn't use simply use es as a gpr? If you have no code that uses it.
01:43:14 <nyc> I'll have my first release be X for the extra buzz.
01:43:19 <knebulae> nvm. it's triple fault.
01:43:25 <knebulae> I'm a dumb sh*t right now.
01:43:31 <mrvn> knebulae: because you can't load a 64bit value into it. No MSR for that.
01:43:37 <knebulae> Trying to do too many things at once.
01:43:47 <nyc> I think IBM marketing always said to have your first release be 3.2.
01:43:50 <mrvn> knebulae: AMD only did that for FS/GS.
01:44:08 <mrvn> knebulae: One of them isn kernel only and the other is used for TLS.
01:44:09 <knebulae> @mrvn: right. I understand.
01:45:45 <knebulae> Ok, so zhiayang, your conclusion is that the 1 data segment to rule them all on x64 is legit?
01:46:03 <mrvn> knebulae: at least you need one for SS.
01:46:22 <mrvn> And you need one for SYSCALL.
01:46:35 <zhiayang> wait what
01:46:37 <knebulae> @mrvn: can't it be the same one?
01:46:38 <zhiayang> i need one for syscall?
01:46:45 <mrvn> knebulae: I bet it can.
01:46:57 <knebulae> @mrvn: remember, no limit, no privilege checks
01:47:22 <mrvn> zhiayang: SYSCALL expects a DS and CS in a specific order in the gdt.
01:47:26 <zhiayang> ah, ok
01:48:02 <knebulae> this is a nice little wrinkle. I like seeing savings on every context switch.
01:48:16 <mrvn> it might need a user CS/DS in specific oder in the GDT for SYSRET. And CS32/DS32 for 32bit legacy.
01:48:49 <knebulae> @mrvn: all doable though
01:49:09 <mrvn> sure. But that would give you 6 segments, 3 of them data.
01:49:54 <knebulae> @mrvn: no; pl0 code32/code64; pl3 code32/code64; then just setup a pl0 data segment - the one data segment to rule them all.
01:50:42 <mrvn> Check the specs for SYSCALL. I only remember that it takes a pair of CS/DS. So you can't palce them randomly in the gdt.
01:50:52 <knebulae> ds=ss=fs, gs has its own sh*t with tls.
01:51:09 <knebulae> @mrvn: will do
01:52:03 <zhiayang> hm, you can write to the STAR msr to set the selectors
01:52:29 <zhiayang> oh, but only the offset
01:52:43 <zhiayang> sysret-ing to 64-bit loads ss = cs+0x10
01:52:51 <zhiayang> wait, ss = cs+0x8
01:53:33 <zhiayang> "if sysret is returning to 64-bit mode, the cs selector is set to this field+16"
01:54:11 <zhiayang> so it apparently requires <32-bit cs> <32-bit ss> <64-bit r0 cs> <64-bit r0 ss> <64-bit r3 cs> <64-bit r3 ss>
01:54:22 <knebulae> @zhiayang: just have to play with the gdt layout on paper to make sure your rhythm is on point.
01:55:08 <knebulae> @zhiayang: and by rhythm, I just mean how your switching and interrupts flow.
01:55:54 <knebulae> oh, I see; they can't be separate.
01:55:56 <knebulae> bummer.
01:56:08 <knebulae> So 2 data segments -- restricted by gdt layout.
01:57:01 <knebulae> sorry 3.
01:57:23 <zhiayang> "SS.Sel is set to this field + 8, regardless of the target mode"
01:57:26 <zhiayang> wtaf
01:57:40 <knebulae> yeah, that was stupid with the x64 segmentation changes
01:57:56 <zhiayang> ok was it really that difficult to make one more MSR
01:58:00 <zhiayang> call it SSTAR or whatever
01:58:15 <zhiayang> let us specify ring0 cs/ss and ring3 cs/ss
01:58:23 <zhiayang> 16 bits each fits nicely into one msr
01:59:01 <knebulae> sounds like an optimization that will be made available on the Phi.
01:59:15 <zhiayang> the what?
02:00:10 <knebulae> I mean, this only manifests when using fast syscall instructions, which have the limitations on the gdt entries. So if you use the fast syscall function, you then *have* to save and restore those seg regs. Did they just move the cost elsewhere and hornswaggle us?
02:00:32 <zhiayang> huh? you don't need to save/restore it
02:01:15 <knebulae> I guess it's going away, but here: https://en.wikipedia.org/wiki/Xeon_Phi - this type of product is likely not going away though.
02:01:21 <zhiayang> it only touches cs and ss, which shouldn't really change between context switches
02:01:30 <knebulae> @zhiayang: duh.
02:01:46 <knebulae> ok. I call timeout on myself. See you guys later!
02:01:58 <zhiayang> \o
02:02:49 <zhiayang> yay, usermode works
02:02:50 <knebulae> @zhiayang: I was completely overlooking that the whole point of sysenter/sysexit was to eliminate that code in the first place.
02:03:06 <zhiayang> well at least they had the foresight to give us swapgs
02:03:48 <knebulae> I try :)
02:03:59 <zhiayang> unrelated: what's the "right" way to handle the flags for intermediate page tables
02:04:11 <zhiayang> eg. i want to map address V, but the pdpt/pdir/ptab are not present
02:04:36 <zhiayang> what flags should those have, given that a non-user pdpt will make the entire 512gb non user-readable
02:05:06 <zhiayang> or should i do it such that for pdpt < 256 i set the user flag on intermediate things, and for pdpt >= 256 i don't?
02:05:52 <knebulae> @zhiayang: P | W (no bit for supervisor)
02:06:47 <knebulae> @zhiayang: P | R code (again, no bit for supervisor, user bit for user)
02:07:22 <knebulae> @zhiayang: with appropriate size & global flags where appropriate
02:08:13 <knebulae> @zhiayang: and the important one is the PS bit, which tells the cpu whether it is looking at a page, or a further level of tables.
02:09:24 <knebulae> If they are actually not even there, just zeros.
02:09:57 <zhiayang> i'm asking about the intermediate structures
02:10:24 <zhiayang> let's say i want to map some address X, but the pdpt at the index say 200 does not exist
02:10:38 <zhiayang> so i need to put the physical address of the pdpt into index 200 of the pml4
02:10:43 <zhiayang> but what should the flags be for that pdpt?
02:10:50 <knebulae> AFAICT, you need 1 PML4 entry (not sure if all 512 entries are required), and 1 pdpt. That would map a single 1GB page and mark all other memory non-present.
02:11:09 <zhiayang> i think we're not on the same page here
02:11:11 <knebulae> I see
02:11:12 <zhiayang> pun not intended
02:11:18 <zhiayang> ok maybe slightly intended
02:11:26 <knebulae> Yeah, I missed your post while I was typing my last
02:11:31 <nyc> My build system is taking shape.
02:11:41 <knebulae> @nyc: \o/
02:12:23 <knebulae> my flags were correct for that scenario
02:12:24 <zhiayang> eg: https://hastebin.com/odumusatef.php
02:12:29 <zhiayang> what should <????> be
02:13:08 <zhiayang> presuming i only use 4k pages
02:15:27 <knebulae> @zhiayang: I pm'ed you with some info
02:21:53 <klange> *pops champagne bottle* 2500 stars on github
02:22:25 <FireFly> \o/
02:24:37 <zhiayang> yay
02:27:55 <zhiayang> investigation concluded: parent structs need to be set user mode for child structs to be user-moded https://i.imgur.com/wfGBCk6.png
02:34:12 <zhiayang> investigation concluded: parent structs need to be set user mode for child structs to be user-moded https://i.imgur.com/wfGBCk6.png
02:34:14 <zhiayang> oops
02:34:30 <knebulae> @zhiayang: again, which was counterintuitive to me on its surface. I contend the mere presence of a privilege flag on sub-structures indicates the potential for override. But I was wrong.
02:35:07 <knebulae> @zhiayang: and thanks for doing the legwork on that.
02:36:16 <zhiayang> yep, no problem
02:37:58 <mrvn> zhiayang: I have one entry for kernel that is identical in every process and everything else is user space. 255 entries in the higher half are reserved for kernel so far.
02:38:33 <zhiayang> so is it the case that the intermediate flags for the lower-half have the user bit, and those for the higher-half do not?
02:38:37 <mrvn> zhiayang: sub structs can override, but only to give less rights.
02:44:55 <knebulae> @mrvn: he said the manual says no.
02:45:09 <knebulae> A user page under a parent supervisor structure will fault
02:46:04 <knebulae> Or did I miss the point twice now?
02:46:28 <zhiayang> knebulae: i think what he's saying is that a supervisor page in a user pdir will fault when cpl=3
02:47:29 <knebulae> he said "sub structs can override, but only to give less rights," which would imply a user page in a supervisory parent structure.
02:47:55 <zhiayang> wouldn't less rights mean a supervisor page in a user dir?
02:48:35 <zhiayang> (with the notion that supervisor pages are "less rights" than user pages)
02:49:15 <knebulae> I'm going PML4->PDPT->PD->PT. So a parent (to the left; i.e. against the arrows) would be supervisor, granting less privileges to a page mapped by it.
02:49:30 <zhiayang> yes
02:49:53 <knebulae> You indicated that that scenario would cause a fault, no?
02:49:58 <zhiayang> it would
02:50:12 <knebulae> ok
02:50:32 <zhiayang> supervisor PD -> user PT would cause a fault
02:50:40 <zhiayang> user PD -> supervisor PT would also cause a fault
02:50:56 <zhiayang> basically if there is no user bit everywhere along the chain you will fault
02:51:05 <mrvn> The reason is this: The page walk check the permissions on every level. If that check fails it stops walking.
02:51:27 <mrvn> So Access to 0xFFFFFFFF00000000 will check the PLM4 entry and fault from userspace.
02:51:28 <knebulae> @mrvn: so the question becomes, which end does it start checking permission at first?
02:51:38 <mrvn> knebulae: obviosuly the top
02:51:43 <knebulae> Ok, so top down, just like I thought
02:51:50 <knebulae> @mrvn: right
02:51:58 <mrvn> Any other way wouldn't be able to fault early.
02:52:45 <knebulae> I get it. So it faults either way; basically it's up to the os whether or not to treat the mapping as an override.
02:52:59 <knebulae> Either fix the fault or don't.
02:53:19 <mrvn> For the kernel this has another benefit: You can map another processes PT, PD, PDPT under a KERNEL only PML4 entry to copy stuff between processes and the user space won't get access to it accidentally.
02:54:15 <knebulae> @mrvn: the semantics are clear in my head now. Let's just hope it stays that way.
02:56:42 <knebulae> @zhiayang: my sticking point is that I kept thinking there were still 2 cases (a user page under a supervisory parent or a supervisory page under a user parent), but obviously since you can only restrict privilege, that just leaves the 1 case.
03:12:37 <nyc> Hmm. I might want to deal with generic code and dependencies with C and headers.
03:31:26 <vladoski> Hi, I'm reading Tanenbaum's Modern Operating System book. I can't understand where is the page table located.
03:31:51 <vladoski> Because first it says that the page table it's inside the MMU, then in main memory
03:32:10 <vladoski> If it's the second one, how can the MMU manage the page table in main memory?
03:32:33 <mrvn> It's in physical memory. The MMU caches it.
03:33:16 <mrvn> Some CPUs also have no page table at all, only the TLB and others allow recusive page tables where the lower ones are in virtual memory mapped by the top one.
03:36:01 <knebulae> @vladoski: on x86en, you tell the cpu where the base paging structure is located by modifying the cr3 register.
03:39:23 <vladoski> Ah okay thanks!
03:40:58 <vladoski> mrvn, why can I achieve with a recursive page table? Because Tanenbaum doesn't speak about it
03:41:03 <vladoski> what*
03:42:26 <mrvn> It's a trick specific to x86 that handles mapping the page tables for the kernel with a single PML4 entry. It's not portable nor widely used.
03:43:18 <mrvn> It's makes accessing page tables trivial but you have to take extra care unmapping them again.
03:45:00 <vladoski> Understood, thanks
04:44:27 <zhiayang> oh no
04:44:32 <zhiayang> my swapgses are unbalanced somehow
04:59:46 <nyc> My assignments to variables with computed names are failing in my makefiles. What hit me?
05:03:25 <zhiayang> currently facing a problem where i need %gs to know whether or not i need to swapgs
05:03:29 <zhiayang> ._.
05:04:12 <zhiayang> ok -- to handle per-cpu data, i store a pointer to stuff in gsbase
05:04:35 <zhiayang> on entering the scheduler, i swapgs so i can get the per-cpu scheduler stuff, then i swapgs when i exit the scheduler
05:04:49 <zhiayang> problem: outside of the critical section (the scheduler), gsbase is the user gs (which is 0)
05:05:17 <zhiayang> so, i can't do anything involving the scheduler while in kernel mode, but outside the scheduler
05:05:21 <zhiayang> eg. add threads to the queue
05:05:28 <zhiayang> am i doing it wrong or is there a good solution to this?
05:09:04 * eryjus is interested in zhiayang's current problem
05:10:12 <zhiayang> a potentially hacky solution i'm thinking of is to queue up scheduler-related modifications, and only do those while in the critical section
05:10:19 <zhiayang> but that doesn't seem like a smart move
05:10:36 <knebulae> @zhiayang: are you swapping gs on *every* switch (even kernel->kernel) or only on privilege transitions?
05:11:10 <zhiayang> knebulae: as yet undecided
05:11:26 <zhiayang> at first i was doing it every time, which i quickly realised to be a problem
05:11:38 <knebulae> I believe you should only swapgs when you transition between privilege levels.
05:11:52 <zhiayang> the ideal is of course to only swapgs when moving between rings, but the problem is i can't know if i'm moving between rings without knowing what ring i'm currently in
05:11:56 <zhiayang> and guess where that information is stored
05:11:57 <zhiayang> gsbase!
05:12:34 <knebulae> wasn't there a question on here yesterday about how to use a couple instructions to see if you were at CPL=0?
05:12:44 <zhiayang> oh?
05:12:46 <knebulae> It didn't rely on gs
05:12:59 <zhiayang> cs?
05:13:24 <zhiayang> cpl=(%cs & 0x3)
05:16:21 <zhiayang> wait no, i need to schedule while in ring0
05:17:07 <knebulae> @zhiayang: I'm drawing blanks atm, but I *know* I came across a 3 or 4 instruction sequence that would tell you the current cpl with no side effects.
05:17:24 <knebulae> in user or supervisor mode
05:17:55 <knebulae> Unless I'm just being a hard head again, and I should read your posts before typing........
05:20:31 <zhiayang> hm, i can't find any relevant mention of 'cpl' or 'ring' on the irc logs
05:22:30 <zhiayang> when entering the critical section, nobody knows whether the previous context was ring0 or ring3, so nobody knows whether to or not to swapgs
05:22:53 <zhiayang> even though i can decide whether or not to swap when *leaving* the scheduler
05:23:14 <knebulae> @zhiayang: I know I just came across this. Give me a few.
05:23:31 <zhiayang> (potentially look upwards at the stack to find the cpu-pushed value of cs???)
05:23:46 <zhiayang> (sounds v hacky too)
05:30:26 <knebulae> @zhiayang: in order to maintain a high-level of performance, I have to think there is a method to know which transition you're making, but it may be reliant on the design of your os.
05:30:27 <nyc`> I must really be rusty if I'm tripping over my shoestrings on makefiles. Then again I am doing something a little unusual.
05:33:07 <knebulae> @zhiayang: I hate to post stack overflow, but this discussion seems relevant: https://stackoverflow.com/questions/5223813/how-does-the-kernel-know-if-the-cpu-is-in-user-mode-or-kenel-mode
05:34:18 <knebulae> most recommend just trapping a privileged instruction, but that not appropriate in a kernel context.
05:34:35 <nyc`> I guess every kernel needs a baroque build system.
05:34:48 <knebulae> @nyc: comes with the territory :/
05:35:05 <zhiayang> according to this: https://www.kernel.org/doc/Documentation/x86/entry_64.txt linux just checks the cpu-pushed cs
05:35:19 <klange> why am I awake at 2:35am...
05:35:31 <klange> My build system is ostensibly just a Makefile with a bit of extra plumbing.
05:35:37 <knebulae> @zhiayang: that presumes your operating system has the same mechanisms and flow as linux.
05:35:51 <knebulae> but maybe not.
05:35:55 <klange> And the extra plumbing just automagically generates more Makefiles for the userspace...
05:36:00 <zhiayang> well i have an interrupted stack frame, cs should always be in the same spot
05:36:12 <knebulae> right
05:36:48 <klange> https://github.com/klange/toaruos/blob/master/Makefile pretty straightforward
05:37:08 <klange> works with https://github.com/klange/toaruos/blob/master/util/auto-dep.py to automate building userspace libs/apps
05:38:51 <nyc`> I'm working on things largely related to multiple architectures.
05:38:54 <knebulae> @klange: very nice
05:39:46 <Ameisen> https://godbolt.org/z/j8gF3x
05:39:52 <Ameisen> more experiments with branching codegen
05:43:25 <zhiayang> ok, this works: https://hastebin.com/ixayibiher.pl
05:43:33 <zhiayang> i just put this on the entry/exit points of the scheduler
05:44:11 <knebulae> @Ameisen: I'm jealous. My modern C++ fu is weak, and I am envious.
05:44:28 <zhiayang> (searching 'swapgs' on github was surprisingly useful)
05:48:40 <knebulae> @zhiayang: thank you
05:48:52 <Ameisen> Also more experimentation into the differences between GCC and Clang codegen
05:48:53 <Ameisen> https://godbolt.org/z/yx--OF
05:48:53 <nyc`> My makefile stuff is relatively dull. I'm just figuring out how to line up per-arch deps on generic code (right now, just an empty main () function in C) without repeating myself too much. I'm building for sparc64, MIPS64, and aarch64 at the moment, modulo the build process being actively developed.
05:49:05 <Ameisen> note that Clang, by default, disables the inc/dec instructions for sandy bridge up
05:50:35 <knebulae> @Ameisen: other than atomics (which are obvs assembled), is it problematic?
05:50:35 <zhiayang> knebulae: whatever for?
05:51:21 <knebulae> @zhiayang: the swapgs snippet
05:51:54 <zhiayang> ah, sure
05:52:34 <Ameisen> knebulae - no, the codegen is different though
05:52:40 <Ameisen> the timings are a bti different, lengths are...
05:52:56 <Ameisen> I find the branch differences curious between clang and gcc
05:53:04 <Ameisen> clang generates the same code for three of the functions
05:53:12 <Ameisen> gcc does too... but for a different set of three functions.
05:53:45 <Ameisen> clang generates the same code for 0, 1, and 3. gcc does for 1, 2, and 3
05:53:51 <knebulae> I wonder what msvc does.
05:53:59 <Ameisen> generally worse.
05:54:09 <knebulae> Codegen is fascinating to me, but I'm just not there yet.
05:54:17 <Ameisen> MSVC also has the issue that it has no ability to mark the likelyhood of branches.
05:54:17 <knebulae> @Ameisen: so, as expected ;)
05:54:31 <Ameisen> so the b3 test case is useless ther
05:54:33 <Ameisen> ethere*
05:55:25 <knebulae> @Ameisen: I wonder if that'll still be true with the upcoming 2019 release. It's not really relevant to me as I use Clang for my codegen, but I have been ensuring I get a clean build with msvc.
05:56:34 <Ameisen> they have not added any new functionality in that regard yet.
05:56:43 <zhiayang> hm. it's 2 am, should i sleep or should i do syscalls
05:56:46 <Ameisen> right now all of my macros for that are basically nullops on msvc
05:56:49 <klange> nyc`: arch/$ARCH/, do automated discovery in that for arch-specific files, do the same for an extra include path if you have arch-specific headers, easy peasy
05:57:04 <klange> * note: haven't done this myself, toaruos is still thoroughly x86 only
05:57:05 <knebulae> @Ameisen: I really wouldn't expect them to. My inkling is that they'll move to llvm within 10 years.
05:57:17 <nyc`> Ameisen: My big wish is for GURRR etc.
05:57:22 <klange> 2019 is the year of the multi-platform ToaruOS desktop
05:57:24 <eryjus> i used lk as a template...
05:57:53 <eryjus> nyc: https://github.com/littlekernel/lk
05:58:06 <nyc`> klange: I guess I treat make as a programming in prolog sort of affair.
05:59:04 <zhiayang> it just occurred to me that i'm doing all of this per-cpu safety and stuff, but i haven't even looked into starting up APs
05:59:16 <zhiayang> it'll probably end up being a dumpster fire when i actually start them and crash everywhere
06:01:38 <Ameisen> /homeparams generates fun code on MSVC
06:01:53 <Ameisen> it forces all parameters to be copied to the stack
06:02:09 <Ameisen> knebulae - well, clang/c2 is dead
06:02:26 <Ameisen> so... there is presently no version of clang that can generate microsoft IL
06:03:20 <Ameisen> do note - in MSVC, never ever ever use /Ox
06:08:40 <nyc> Okay, I guess the thing to say that I'm doing that is a rarely-encountered wrinkle is wanting to parallelize multiple cross-builds nonrecursively.
06:16:36 <geist> nyc: you mean build different versions of the system at the same time?
06:17:15 <geist> thats usually quite possible, if you make sure that absolutely all the output goes in a separate dir, and then you name the dirs according to the configuration
06:17:31 <geist> i do it in LK all the time. you can build all projects simultaneously if youw ant
06:17:38 <geist> but.. not in the same make invocation
06:18:07 <Ameisen> if you change char_t from char to int, llvm generates the same code for each function
06:18:10 <Ameisen> which is weird
06:18:16 <Ameisen> it should be able to widen it if it wants.
06:18:37 <geist> that's likely to be part of the ABI, if the caller or the calee widens
06:19:05 <Ameisen> char_t is internal to the function. It's iterating over a const char * array
06:19:14 <Ameisen> and storing the temporaries as char_t
06:19:14 <nyc> giest: Yes.
06:22:56 <geist> nyc: yah thats not a strange requirement
06:23:06 <geist> basically allowing multiple configurations of a build to coexist is fairly common
06:27:54 <knebulae> @Ameisen: yeah, I see that. Honestly, it was worthless. Better to use msvc OR clang, not try to have one pretend to be the other.
06:28:54 <nyc`> geist: MIPS is hello worlding and sets up C. I'm on SPARC once I get the build system good enough.
06:28:57 <knebulae> @Ameisen: but I still think maintaining their own compiler in the face of llvm getting ever more powerful is just not a great long-term investment. It's a compiler. And not a great one.
06:29:13 <Ameisen> https://godbolt.org/z/Frt8cP
06:29:25 <Ameisen> not sure why it generates the extra branch
06:29:27 <geist> nyc`: grats. be warned. the register window thing you have to handle pretty fast
06:29:36 <Ameisen> (c0 == c1) already asserts they're equal
06:29:44 <Ameisen> so (c0 != 0) && (c1 != 0) should be equivlaent.
06:29:49 <Ameisen> and thus rolled into one branch
06:31:40 <geist> Ameisen: you may be talking to the wrong crowd?
06:32:02 <Ameisen> possibly. Figured y'all would be interested as well, though.
06:32:11 <Ameisen> on that note, compilers are weird.
06:32:51 <geist> usually there's some tuning reason it does something different. or at least a mis-tune
06:33:14 <geist> but compilers doing weird things is not that particularly novel. the far more interstring part is digging in and figuring out *why*
06:33:28 <geist> there are tons of debug switches to clang or whatnot where you can tell it to dump what it was thinking
06:33:31 <geist> that's more interesting
06:33:32 <Ameisen> yup.
06:33:41 <Ameisen> There's also a ton of passes that are enabled or disabled by default
06:33:46 <geist> right
06:33:50 <Ameisen> it's quite possible there's a pass that handles this case that is just disabled.
06:34:02 <geist> pointing out a compiler is generating weird code is like pointing out the sky is blue
06:41:50 <zhiayang> https://wiki.osdev.org/SWAPGS
06:42:02 <zhiayang> not sure where else to link this from, but i've added a link to the x86-64 page.
06:46:33 <geist> hmm?
06:46:51 <geist> oh you just wrote this? thanks!
06:46:52 <zhiayang> existing osdev resources on swapgs were a little... nonexistent
06:46:54 <zhiayang> yep
06:47:11 <geist> yah i found it subtle and tricky
06:47:18 <mrvn> Ameisen: it reorders it to (c0 != 0) && (c1 != 0) && (c0 == c1)
06:47:52 <geist> it's possible it reorders it that way because its actually better to microschedule
06:48:03 <geist> ie, the two comparisons can happen in parallel, and comparing against 0 may be slightly simpler
06:48:04 <mrvn> Ameisen: probably because by the time the memory read happens those two test will be deep in the pipeline and cost no time.
06:48:12 <geist> yah
06:48:19 <mrvn> and what geist said too
06:48:53 <geist> i know simple tests against constants and branch are folded into a single op on most modern x86s
06:48:58 <mrvn> Note that the streq1 reorders the tests too
06:49:08 <geist> but i suspect a two register comparison doesn't get that treatment, since they have more than one register dependency
06:50:03 <geist> zhiayang: there's another thing that's related: fsgsbase instructions
06:50:20 <geist> probably not worth writing it there, but in newer processors there are instructions to directly manipulate the MSRs, faster
06:50:31 <geist> and can be done in user space with at least fs.base and gs.base
06:50:49 <mrvn> geist: newer? That's an long mode thing iirc.
06:51:14 <zhiayang> i'm a bit curious about those; what purpose would there be for user code to modify gs/fs?
06:51:15 <mrvn> or rather all amd64 have it. 32bit may or may not have it.
06:51:25 <geist> mrvn: no. it's added around haswell or so
06:51:26 <Ameisen> geist - there are plenty of cultures that would disagree that the sky is blue (though that's linguistics)
06:51:31 <mrvn> zhiayang: fs/gs is used for TLS (thread local storage).
06:51:35 <Ameisen> which was what I originally studiewd
06:51:37 <Ameisen> studied*
06:51:46 <zhiayang> right, so why would user code want to modify the base address for its tls structure?
06:51:57 <geist> because it's a user space concern
06:52:06 <mrvn> zhiayang: because it just created a new thread and needs to store the pointer to the TLS memory.
06:52:11 <Ameisen> mrvn - even reordered, would it still be faster to have both of the checks against 0?
06:52:22 <geist> right. many kernels take a hands off approach to user space fs.base and gs.base.
06:52:29 <zhiayang> hm, i see
06:52:35 <geist> as in, give user space the ability to put what it wants in it
06:52:35 <mrvn> Ameisen: wrong question. Right question is: Would it be faster to optimize that away?
06:52:37 <zhiayang> i was under the impression stuff like that would be done in kernel space
06:52:46 <zhiayang> but i guess i'm too monolithic in my thinking :D
06:52:47 <Ameisen> mrvn: only one way to find out
06:52:50 <geist> prior to fsgsbase instructions you needed a syscall to do it, since user space couldn't directly write to it
06:53:09 <geist> fsgsbase instructions let CPL=3 code directly modify those, if enabled
06:53:10 <mrvn> zhiayang: can't be done in kernel space since all libraries need to run their TLS init code aagin.
06:53:15 <Ameisen> well, two ways, but I don't want to build a planet-sized supercomputer to figure it out.
06:53:29 <mrvn> Ameisen: 42
06:53:34 <geist> so, for example, in zircon we have a syscall that takes a thread handle and sets fsbase or gsbase
06:53:37 <zhiayang> wait a minute, so i need to save/restore user gs and user fs on context switch..?
06:53:49 <geist> user fs.base and user gs.base, be precise about it
06:53:54 <mrvn> zhiayang: only one of them, the other can only be set by the kernel.
06:53:56 <geist> and yes. you do, if you enable this feature
06:54:03 <geist> mrvn: not true. most systems allow both
06:54:11 <zhiayang> right, the actual selector doesn't matter
06:54:14 <mrvn> geist: CPU side I mean
06:54:15 <zhiayang> hm, i see
06:54:21 <geist> mrvn: explain?
06:54:33 <mrvn> geist: iirc one of them can't be set from user space through the MSR.
06:54:43 <geist> none of them can be set from user space through the MSR
06:54:53 <geist> the only way to set them from usr space is to use the fsgsbase instructions
06:54:53 <zhiayang> wrgsbase can't modify kernelgsbase is what i think mrvn means
06:55:02 <geist> ah yes that's correct
06:55:04 <zhiayang> but it shouldn't
06:55:08 <zhiayang> that would be pretty bad
06:55:19 <Ameisen> mrvn: the fact that we are not yet using mice to compile software is shameful.
06:55:29 <geist> the fsgsbase instructions can only modify the plain FS_BASE and GS_BASE, which are essentially 'owned' by the user space context
06:55:34 <mrvn> hmm, must be remembering it wrong. And yes, there is a user and kernel gsbase. So one more to save.
06:55:54 <geist> the kernel one you dont save as part of the context of the thread, it's owned by the kernel
06:55:58 <zhiayang> the kernel gsbase should be the same across all contexts, no?
06:56:03 <zhiayang> (on the same cpu anyway)
06:56:12 <mrvn> geist: unless you store something thread/process specific in it.
06:56:17 <geist> yes. of course it's confusing because when you're in the kernel you swapgs and then the kernel one is actually holding user context
06:56:39 <geist> but the point is of the 3 base contexts, two are owned by the user thread, the third one is the kernels
06:56:54 <mrvn> zhiayang: I store the current_task pointer in gsbase. So every context switch writes the new task to it.
06:57:09 <geist> the naming of the MSRs is confusing but think of it as named as if you were running ring3 code
06:57:18 <mobile_c> what does this mean in terms of code
06:57:20 <mobile_c> "To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. This address is truncated to the nearest multiple of the maximum page size. The corresponding p_vaddr value itself is also truncated to the nearest multiple of the maximum page size. The base address is the difference between the truncated memory address and the truncated p_vaddr value."
06:57:26 <geist> ie, the GS_BASE_KERNEL is the MSR that will be holding the swapped out, kernel gsbase while a process is running
06:57:39 <geist> they should have named it GS_BASE_ALT or something
06:58:08 <geist> mrvn: be careful, it's slow. we timed it on a modern skylake and writing to those MSRs is in the order of 40 cycles or so
06:58:24 <geist> so another thing you kinda want to do is delay writing the MSRs as long as you can
06:58:29 <geist> and/or avoid it
06:58:34 <zhiayang> how so?
06:58:47 <geist> or, if present, use the fsgsbase instructions, which seem to not be serializing and thus stall out the processor for 40 cycles
06:59:06 <mrvn> geist: but it is so nice if you can access %gs:0x24 to get the threads foo value.
06:59:17 <geist> zhiayang: well one thing you can do, for example, is remember that a process has or hasn't set anything interesting in fs.base, which is possible
06:59:21 <geist> and then not save/restore it
06:59:28 <geist> mrvn: sre, but there's a cost, is all
06:59:46 <geist> i understand why it's handy. usually systems use gs: to point at the current cpu structure
06:59:53 <geist> and then the current thread pointer hangs off that
06:59:59 <geist> such that you get somehting like (gs:0x8)
07:00:09 <geist> a single deref, to get to current thread
07:00:17 <zhiayang> wouldn't checking if fsbase has something "interesting" involve reading fsbase
07:00:26 <geist> zhiayang: yes
07:00:31 <zhiayang> so... where's the savings?
07:00:37 <geist> but reading the MSR is much faster than writing it
07:00:39 <zhiayang> oh
07:00:42 <zhiayang> i see...
07:00:50 <geist> writing it is a serializing instructino, as it is to write most MSRs
07:00:55 <zhiayang> makes sense, makes sense
07:00:56 <geist> so i think iirc it takes on the orer of 40 cycles or so
07:01:18 <zhiayang> somewhat related: my asm-fu is not that great, does lea not work with segments
07:01:25 <zhiayang> so i can't use lea %gs:0, %rax
07:01:35 <geist> https://fuchsia.googlesource.com/fuchsia/+/master/zircon/kernel/arch/x86/thread.cpp#116 is our little blurb about it
07:01:53 <geist> also has the workaround about writing 0 to gs cloberring gs.base
07:01:54 <olsner> it works, it's just that the "effective address" is the part without the segment base applied
07:02:08 <geist> also uses fsgsbase if it's there
07:02:14 <zhiayang> welp, so it doesn't really work
07:02:47 <olsner> iirc the "linear address" is the name for the effective address plus the segment offset
07:02:47 <mrvn> So it just adds a useless GS prefix byte to the opcode?
07:03:34 <geist> zhiayang: yes, it's a common practice to store a pointer to itself at offset 0 in things that you point fs and gs at
07:03:41 <mrvn> geist: so restoring gs should be: if (saved_gs != read_gs()) write_gs(saved_gs);
07:03:44 <zhiayang> geist: ah, good to know
07:03:47 <geist> that way you can get a pointer to yourself by derefing fs:0 and gs:0
07:04:21 <zhiayang> does gsbase get hit when writing to gs though?
07:04:30 <geist> mrvn: correct, however it's highly unlikely that gs base will be the same between threads so it's probably not worth doing
07:04:41 <mrvn> geist: why? Is that because the C compiler can't be told to access a pointer through fs/gs so one derefs that once manually?
07:04:44 <geist> zhiayang: i forget the specific details and on which microarch
07:05:00 <geist> mrvn: yes
07:05:01 <mrvn> (why %fs:0?)
07:05:03 <geist> yes
07:05:17 <mrvn> geist: stupid compilers. I want my far pointers back.
07:05:30 <geist> zhiayang: but. the gist iirc (you should look it up and verify) is that if gs was previously nonzero and you write a zero to it, it nulls out the internal .base and other registers
07:05:39 <olsner> iirc clang has address space attributes for fs and gs-relative addresses
07:05:45 <geist> mrvn: well, or if you put an object there, or if you want to memcpy to it, etc
07:05:49 <zhiayang> "When a null selector is loaded into FS or GS, the contents of the corresponding hidden descriptor register are not altered."
07:05:54 <geist> it's not something you do all the time, but if you need to get a pointer to it you can
07:06:02 <zhiayang> (unless .base is not part of the hidden descriptor register?)
07:06:10 <geist> zhiayang: yes, but which manual are you reading?
07:06:17 <geist> i think it's a microarch detail
07:06:19 <zhiayang> ahh
07:06:23 <geist> exactly
07:06:26 <zhiayang> i prefer to use the amd one
07:06:31 <zhiayang> intel's manual is horseshit
07:06:36 <geist> yes. i think this is one of the few major intel vs amd differences
07:06:40 <geist> intel i believe does the zeroing
07:06:50 <Griwes> mrvn, you take that comment back, far pointers are why STL allocators are a thing :|
07:07:15 <geist> i say all of this because i personally got burned by it when writing this code
07:07:30 <geist> those type of details you remember much more vividly :)
07:07:58 <mrvn> Griwes: STL allocators are good for many things.
07:08:07 <geist> note to self, see if we should be doing this to fs as well, line 126
07:08:09 <Griwes> they are also not good for many reasons
07:08:45 <Griwes> their design sucks, because initially they were meant to just tag the pointers
07:08:48 <geist> but in this exact case, the trouble is that since GS_BASE is pointing at the kernels in this instant, when loading 0 into gs it blows away the kernel gs context, which is Very Bad
07:08:56 <Griwes> only C++11 started making them _actually_ useful
07:09:20 <Griwes> and designs like pmr (or a static version of that that I recently did for Thrust) are what actually makes life sensible
07:09:45 <Griwes> (because allocator becomes mostly just a pointer to an underlying structure it _actually_ allocates from)
07:10:16 <Griwes> anyway, let's go back to the usual osdev discussions :P
07:11:07 <mrvn> Griwes: I nearly used them last month but then I noticed that I can splice std::list and other node based containers. So instead of having a memory pool for a million same sized structs I now just splice them from one container to the other.
07:11:14 <mobile_c> "To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. This address is truncated to the nearest multiple of the maximum page size. The corresponding p_vaddr value itself is also truncated to the nearest multiple of the maximum page size. The base address is the difference between the truncated memory address and the truncated p_vaddr value."
07:11:17 <mobile_c> how do i do that
07:11:34 <mrvn> mobile_c: what are you traing to do? write a linker?
07:11:40 <mrvn> trying
07:11:42 <mobile_c> yes
07:11:51 <geist> so you can see how the itnel vs amd chose different internal microarches for hidden parts of segments early on and they're stuck with it. loading a 0 into a segment disables it in classic x86 protected mode
07:11:55 <mobile_c> as my current way is apparently incorrect
07:11:58 <mobile_c> as this is my mapping function https://paste.pound-python.org/show/UUebi8ECV1eO5nU9wSwF/
07:12:12 <geist> so intel x86 slams zeros across it, and AMD probably chose to just have an enable bit and not actually zero it out
07:12:23 <geist> and normally that didn't matter, because it's all hidden
07:12:25 <mobile_c> and atttempting to assign to the base_address + roffset results in the program seg faulting
07:12:26 <mrvn> "nearest multiple of the maximum page size"? That sounds like 0xFFF should round up to 0x1000
07:13:13 <geist> max page size may be larger than that. it's a concept in binutils at least, and it's (depending on the arch) sometimes things like 64k or 2M
07:13:15 <mrvn> mobile_c: pagesize for amd64 is 2MB and not 4k.
07:13:17 <Griwes> mrvn, be advised that std::list has recently been fucked up; since C++11, size is required to be O(1), which means splicing lists together now has to iterate over the iterators to figure out how the sizes change
07:13:39 <geist> correct. you can override it with -max-page-size=foo to the linker, i believe
07:13:58 <Griwes> which is super frustrating (and basically takes away the only reason I had for wanting to use list in some situations
07:14:01 <Griwes> )
07:14:22 <geist> the idea is that if the OS or environment wanted to use large pages to map this binary, if it's linked to have padding and be aligned to the large page size, the OS could do it
07:14:23 <mrvn> Griwes: ouch. That is realy fucked up. Who made that change?
07:14:35 <geist> whereas if everything was crammed into 4K aligned pages you basically couldn't use large ones
07:14:41 <mrvn> geist: and disk space is cheap
07:14:43 <Griwes> dunno exactly why that happened, it was, so to say, before my time
07:15:01 <Griwes> (it feels interesting to be able to say that in this context :V)
07:15:03 <mrvn> Griwes: List shouldn't even have a size()
07:15:10 <geist> arm64 for example has a max page size of 64k, becaus eyou literally may be using 64k as a base page granule if you'd like
07:15:42 <zhiayang> oh no it is 3am
07:15:47 <Griwes> mrvn, it should, but it should be like it was before C++11: O(n)
07:15:57 <zhiayang> bye guys
07:16:01 <Griwes> though I guess splice() got fucked because people didn't like that
07:16:05 <Griwes> zhiayang, night
07:16:07 <mrvn> Griwes: that would be more list.length() or something different from the O(1) calls
07:16:14 <geist> yah soundslike std::list is required to store a size_ field
07:16:38 <mrvn> Luckily I only splice single elements. So still O(1)
07:16:59 <geist> zhiayang: thanks for writing the page
07:17:00 <Griwes> yeah, that's still O(1)
07:17:13 <mrvn> Griwes: splicing is O(n) if you change allocators. Otherwise iirc it is said to be O(1).
07:17:34 <Griwes> splicing has to iterate over the elements to figure out the change in sizes
07:18:09 <Griwes> it doesn't have to modify anything on that pass, but since list iterators are just bidirectional, you have to chase the pointers to calculate the distance
07:18:26 <mrvn> https://en.cppreference.com/w/cpp/container/list/splice
07:18:33 <mrvn> 3) Constant if other refers to the same object as *this, otherwise linear in std::distance(first, last).
07:18:53 <mrvn> You are right. splicing to a different list is O(n) now
07:18:55 <Griwes> yes
07:19:29 <geist> well, the c++11 ship sailed long ago
07:19:33 <geist> so probably not worth getting upset about
07:19:58 <Griwes> well, it still makes me angry, because it effectively killed most of uses of list that I was interested in
07:20:04 <olsner> according to the discussion, splice() had the same complexity requirement in older C++
07:20:24 <zhiayang> geist: no worries, it occurred to me that a lot of useful information changes hands on irc but isn’t well documented
07:20:38 <geist> yah and it's nice that someone writes it down :)
07:21:03 <geist> re: using gs in user space. it's not in the ABI, but i have seen a few runtimes, like androids JVM I believe, use it to point at additional runtime state
07:21:15 <geist> and/or maybe Go's runtime
07:21:46 <zhiayang> :o
07:21:54 <geist> we had to add it pretty quickly and treat fs.base and gs.base equivalently as stuff was brought up on zircon
07:21:58 <mrvn> Griwes: totaly. I'm not sure where I read that splice would be O(1) as long as the allocator was the same. Maybe it wasn't for List but some other node container.
07:23:02 <Griwes> there's some weird cases when you do operations on objects with non-equal allocators
07:23:18 <Griwes> it's super annoying to deal with sometimes :P
07:23:57 <mrvn> Griwes: I'm not quite sure how that us supposed to work. Copy the items over from one allocator to the other?
07:24:23 <Griwes> you basically have to reallocate
07:25:38 <mrvn> as in call the move constructor/operator.
07:25:49 <Griwes> don't ask me too many questions about this, because you may learn about things you wish you didn't know about
07:26:07 <geist> zhiayang: i'm trying to find where the intel manual mentions the zeroing behavior
07:26:11 <geist> i remember it being pretty subtle
07:26:28 <zhiayang> i can’t find it either
07:26:33 <Griwes> mrvn, like the complexity of the second overload of https://en.cppreference.com/w/cpp/container/vector/operator%3D
07:26:46 <zhiayang> there’s loads of stupid shit like that, which is why i prefer the amd manuals
07:26:53 <Griwes> at which point you start learning about allocator propagation
07:27:23 <Griwes> and it is only downhill from there
07:27:28 <zhiayang> it could be them saying that when a selector is loaded to the register, it loads the low 32 bits from the descriptor to the hidden register
07:27:33 <zhiayang> and the high 32 bits are zeroes
07:27:38 <zhiayang> *zeroed
07:27:49 <zhiayang> that’s the least subtle paragraph i could find
07:28:09 <mrvn> Griwes: ahh, nices. So moving a vector can move the allocator with it if the allocator allows it.
07:28:24 <Griwes> it'd _copy_ the allocator
07:28:43 <Griwes> if allocators compare equal, just swap the storages
07:28:58 <geist> zhiayang: yeah that's probably what it is, or in this case the null selector has a zero in it
07:28:59 <zhiayang> (so if gs was set to 0, would it load the 32-bit bass address of 0 from the null descriptor?)
07:29:01 <geist> so it loads zero
07:29:08 <zhiayang> yea
07:29:11 <Griwes> otherwise, if allocator propagates on container move assignment, free the storage, copy the allocator, steal the storage from rhs
07:29:11 <zhiayang> bloody intel
07:29:12 <geist> i think the special case is that AMD *doesn't* do this
07:29:26 <doug16k> amd loads the base from the segment descriptor
07:29:31 <Griwes> otherwise, allocate storage using your allocator and then move the *elements* of rhs into your new storage
07:29:33 <geist> but you see why the code i linked makes a special case, because in this casei fyou write zeros to gs in the kernel, it immediately trashes the kernel context
07:29:35 <Griwes> it's slightly insane
07:29:38 <doug16k> but it is limited to 32 bits wide
07:29:57 <Griwes> and implementing it correctly for CUDA and random __host__ __device__ annotated function is heaps of fun
07:29:58 <mrvn> Griwes: makes perfect sense to me
07:30:12 <doug16k> ah but maybe you meant zero segment? that's not what I meant
07:30:15 <geist> but i think the slightly more subtle detail is it only does itif you're writing zeros to something that did't previously have a zero in it
07:30:21 <zhiayang> geist: yep, that would be bad
07:30:35 <mrvn> geist: but why would the kernel load gs?
07:30:45 <Griwes> mrvn, want to hear the insane thing? std::pmr allocators don't propagate
07:30:55 <Griwes> for some fucking reason
07:31:17 <Griwes> which means std::pmr::vector move assignment can _super_ easily fall into that last otherwise
07:31:27 <Griwes> it's a mess
07:32:01 <Griwes> a mess that makes me glad that Thrust isn't _literally_ an implementation of the standard library and that we can do our own shit in a less insane way
07:32:17 <geist> mrvn: to zero it out, so that whatever user space wrote into it doesn't leak into other processes
07:32:41 <zhiayang> doug16k: the amd manual explicitly states that writing 0 to fs or gs will not clear the cached base address
07:32:41 <geist> since you can't just tell user space that i can't write to the segment registers, then you have to at least do *something* about it
07:32:47 <geist> or it's a info leak between processes
07:32:49 <Griwes> x86 is such a hacked-together mess
07:32:52 <doug16k> on smp x86_64 you should be using gs for cpu-local context, swapgs and associated msr are there for that
07:33:07 <geist> doug16k: you're a bit late to this discussion
07:33:18 <geist> zhiayang wrote up a nice page on swapgs btw, in the wiki
07:33:21 <mrvn> Griwes: wouldn't you do that once you swapped out the kernel GS and before restoring the user GS on task switch?
07:33:57 <Griwes> did you mean to hl geist?
07:33:58 <doug16k> two minutes is late. ok
07:34:02 <geist> the point is sif you ever write to fs and gs in the kernel (or anywhere for that matter) it'll trash the fs.base an gs.base msrs
07:34:05 <mrvn> geist: ^^^
07:34:10 <mrvn> Griwes: yes, too many Gs
07:34:12 <doug16k> I can leave
07:34:21 <geist> no i mean if you scroll back...
07:34:25 <geist> ack. well, sorry man
07:34:41 <zhiayang> welp
07:34:43 <Griwes> what.
07:34:51 <geist> well, i know the feeling
07:35:08 <mrvn> geist: seems like a minor thing though. The kernel would only write to fs/gs at one point, the task switch.
07:35:09 <geist> annnyway, yah i think i found the part in the intel manual that mentions that it basically always loads the .base part
07:35:16 <geist> mrvn: yes. precisely
07:35:24 <geist> which is why it's important to note this, because it will bite you in the ass
07:35:33 <geist> and it will behave differently on intel and amd
07:35:47 <geist> section 3.4.4 in the current intel manual vol 3
07:36:03 <mrvn> geist: the dangerous part is that it will probably work for years because nobody sets fs/gs in userspace unless you want to crash the kernel.
07:36:24 <geist> not true. lots of code does
07:36:36 <mrvn> 64bit code?
07:36:37 <geist> with TLS and whatnot being a thing, fs/gsbase is very important nowadays
07:36:39 <geist> of course
07:36:51 <mrvn> geist: TLS writes the full 64bit
07:36:59 <geist> yessss
07:37:04 <geist> that doesn't matter
07:37:20 <geist> oh you mean sets fs and gs *registers*. yes
07:37:31 <geist> that's true. no one should be writing to that, but you can't just block it (which would be fantastic)
07:37:37 <mrvn> yes, I mean load fs/gs the old way, which would trash the upper 32bit on intel.
07:37:58 <geist> yah usually it's fairly fatal anyway, except you can probably do it with gs all you want in most code that's not using gs.base
07:38:05 <mrvn> I blame AMD for not making loading es, fs, gs illegal.
07:38:15 <geist> but if you wrote to fs in your user program you probably immediately trash fs.base and thus you end up destroying your TLS
07:38:29 <geist> well, it can't be made illegal across the board
07:38:38 <geist> but it *could* have been made illegal in CPL=3 perhaps
07:38:38 <zhiayang> like poorly written user code will crash, as long as nothing happens to the kernel
07:39:08 <geist> you can't remove the functionality, because you still need to be able to load segments in a 64bit kernel as you're switching to a 32bit process
07:39:24 <geist> so the whole mechanism needs to work, it just doesn't get enforced in 64bit mode
07:39:40 <mrvn> geist: what happens when you load ds, es, fs, gs, ss with the NULL desciptor or 32bit data descriptor in 64bit mode? earlier we discussed that nothing checks the register anyway so (unless you use TLS) their contents doesn't matter
07:40:01 <geist> it loads the segment stuff from the descriptor
07:40:06 <geist> into the hidden parts of the regs
07:40:14 <geist> but then doesn't enforce it
07:40:17 <mrvn> and then just keeps on running, right?
07:40:21 <geist> (except for fs.base and gs.base)
07:40:42 <geist> this is so you can still preload all of that stuff just before you iret to a 32bit process or whatnot
07:41:01 <zhiayang> (i wonder if qemu emulates the intel it amd behaviour)
07:41:12 <mrvn> So do you actually need a DS segment descriptor for 64bit mode? One thing we figured out is that SS can't be null as that signals a recursion.
07:41:21 <geist> mrvn: no, not that i'm aware of
07:41:41 <geist> see the code i linked a long time ago. it zeros out ds/es/fs/gs on context switch
07:41:53 <geist> but, if you were running a 32bit process, ds had better be pointing at something valid before you switch to it
07:42:00 <geist> it's all about 32bit compatibility
07:42:01 <mrvn> geist: totaly.
07:42:33 <geist> zhiayang: good question. lemme see. probably pretty easy to find
07:42:34 <zhiayang> thought experiment: would it be possible to make an x64 processor that did 0 legacy stuff
07:42:34 <mrvn> Does loading 0 into ss cause a fault?
07:42:46 <zhiayang> mrvn: not immediately, no
07:42:56 <mrvn> zhiayang: like ARM64 that don't have 32bit support?
07:43:54 <mrvn> zhiayang: Should be easy to do. Boot it with UEFI in 64bit mode and run a Linux without 32bit compat mode and it should just work.
07:44:35 <zhiayang> mrvn: i mean not in terms of not touching legacy code
07:44:45 <geist> yah seems easy enough to do
07:44:55 <zhiayang> but like a processor that physically does not have the transistors to deal with the old nonsense
07:45:07 <geist> sure. you'd probably save a little bit too
07:45:08 <nyc`> zhiayang: Well, if you're going that far, just get ARM64. The entire arch is legacy.
07:45:13 <mrvn> zhiayang: me too. Without 32bit compat compiled in it shouldn't touch any compat stuff so a CPU that doesn't have any would work.
07:45:32 <geist> nyc`: what do you mean the entire arch is legacy?
07:45:48 <zhiayang> i’m hoping to start an aarch64 port once i get my x64 system working sufficiently well
07:46:13 <zhiayang> i’m guessing he means x86 and not arm?
07:46:19 <mrvn> geist: amd64 has too much baggage left over from 8086
07:46:28 <geist> mrvn: not news to me
07:46:39 <nyc`> geist: The 32-bit emulation issues with IA64 drove x86-64 to a large degree.
07:46:55 <geist> nyc`: i'd be a little surprised, frankly, since AMD did the x86-64 stuff
07:47:14 <geist> what i did hear was that intel supposedly had their own prototype implementation of x86-64 that did things a little differently
07:47:59 <geist> if you really wanted to drop compat you'd definitely get rid of the segment stuff, move the cpu state (user vs supervisor) into control registers so it can be saved like any other regs on iret or whatnot
07:48:09 <geist> and clean up syscall/sysenter, since those are highly tied to segments
07:48:17 <geist> and get rid of TSS. move all the IST stuff into a series of MSRs
07:48:22 <mrvn> geist: that would make a new arch, not just drop the legacy bits
07:48:34 <geist> correct, but i suspect that's the gist of the line of questioning
07:49:03 <mrvn> geist: no, I think he really ment just drop all the legacy bits that are unused in long mode and fail if someone tries to switch to 32bit.
07:49:08 <geist> this is effectively what ARM did with armv8. they decided that since the future will come eventually, when you run a 64bit EL1 (supervisor mode) it switches gears and does exceptions/etc in a completely new and cleaner way
07:49:13 <mrvn> or 16 bit for that matter.
07:49:22 <geist> which means temporarily an armv8 cpu has two sets of microcode to act like different arches in supervisor mode
07:49:32 <geist> but oncey ou fully drop 32bit mode in supervisor, then it starts to really clean up
07:49:35 <nyc> The feedback I was getting from the enterprise space was that the hardware emulation for x86-32 getting dropped or getting shaky on the IA64 roadmaps stirred a feeding frenzy of ISV's and IHV's begging for x86-64.
07:50:02 <geist> nyc: yep, and AMD stepped in to fill the gap
07:50:17 <mrvn> nyc: they should have ported their code to IA64 if they wanted speed.
07:50:29 <zhiayang> at least amd made some wise decisions
07:50:42 <geist> yah they didn't completely nail it, but they did a solid effort
07:50:51 <zhiayang> imagine if x64 was literally x86 with bigger address space and more registers
07:50:52 <zhiayang> ugh
07:50:59 <geist> at least the hackyness level of x86-64 is in line with the rest of the hackyness of the architecture
07:51:14 <nyc> zhiayang: Well, that is pretty much what it is.
07:51:15 <mrvn> they produced the least effort upgrade path. Not the best way but the most popular.
07:51:41 <geist> yep. what x86 taught us was at least at that time (80s through mid 2000s), compatibility was more important than anything else
07:51:59 <geist> and i dont think anyone predicted that quite yet, even intel didnt with ia-64
07:52:01 <mrvn> With open source that is changing though as ARM shows.
07:52:05 <geist> they assumed that the world would just come along
07:52:07 <zhiayang> nyc`: it’s marginally better
07:52:19 <geist> mrvn: correct, but even then we are still only really using 2 or 3 major arches
07:52:21 <zhiayang> geist: there’s a funny graph on the wikipedia article for ia64
07:52:24 <geist> vs the 10 or so back in the late 80s and 90s
07:52:33 <zhiayang> where intel predicted exponential growth in its jun
07:52:36 <zhiayang> *itanium sales
07:52:39 <geist> and things like big endian are going away, effectively, etc
07:52:40 <nyc> zhiayang: You're saying that as you're fiddling with segment registers on the supposed 64-bit cleanup.
07:52:50 <zhiayang> then as the years went by the graph for more and more sad
07:52:50 <geist> zhiayang: sbsolutely. i graduated college right in the middle of that
07:53:02 <mrvn> geist: but now when you get a new CPU you just download the other linux image and compile a few inhouse sources new. You don't have to buy a new compiler for $50k
07:53:06 <geist> the entire job market was full of 'help transition X to ia64'
07:53:35 <zhiayang> :o
07:53:35 <mrvn> geist: and C/C++ has become a bit more portable with stdint and friends.
07:53:40 <geist> at the same time i was really into different arches, and i saw them get killed off one by one
07:53:44 <geist> alpha, sparc, etc
07:53:50 <mrvn> I miss my alpha
07:54:14 <geist> thats why i'm pretty happy that we at least have newish things now in arm64 and riscv
07:54:25 <geist> in as much as it does or doesn't matter, it's my cup of tea
07:54:44 <zhiayang> nyc: they cleaned up some stuff at least, like having a consistent interupt stack, and ignoring segmentation
07:55:04 <geist> yep, and they didnt' bother dragging hw task switching into 64bit
07:55:14 <geist> nyc`, nyc: which handle is active here?
07:55:16 <nyc> zhiayang: How is segmentation ignored when swapgs is such a beast?
07:55:33 <mrvn> nyc: the limit isn't checked
07:55:42 <geist> well, it's not 100% ignored, but it basically is
07:55:50 <nyc> geist: nyc` is my phone, which I'm using when I get up every 5 or 10 minutes to do various things.
07:55:57 <geist> the only details are exactly what we've been blabbing about for an hour. it's the gist of it, basically
07:56:08 <geist> kay
07:56:19 <zhiayang> i’m sure part of the choices were driven by the need to remain compatible
07:56:29 <geist> in that the only parts that remain in 64bit mode are pretty much everything we talked about, not much more than that
07:56:43 <geist> which is *far* simpler than the mess that is segmentation and all the call gates and nonsense
07:56:44 <mrvn> zhiayang: as geist said you need to be able to load segments to switch to 32bit mode.
07:56:50 <geist> right
07:57:11 <geist> that i think s the main reason. you can't fundamentally get rid of GDT and LDT and all the old selectors and all that old stuff
07:57:15 <geist> because compatibility mode
07:57:31 <nyc> Those who are into numerics (e.g. myself) can get truly offended about floating point on it all.
07:57:40 <geist> from a HW point of view that's probably almost entirely just piles and piles of microcode that you can nuke if you dropped it
07:57:54 <mrvn> It's too bad they don't have a virtual 32bit protected mode where the hidden segment base is 64bit so you can access all of memory.
07:58:06 <geist> i remember talking to someone at ARM about droppined 32bit EL1 support but retaining EL0
07:58:23 <mrvn> geist: so 32bit userland but not kernel?
07:58:26 <geist> apparently that's a fairly large win. the 32bit decoder remains, but you can throw out all of that microcode dealing with the old ARM eception model
07:58:41 <geist> and all of those legacy cpu modes and banked registers and whatnot
07:58:44 <geist> mrvn: right
07:59:02 <geist> and the armv8 model is perfectly okay with that. it's flexible in that it states that you can build exactly a cpu that does that
07:59:22 <mrvn> sounds like that would save a lot. Half the registers for one thing. Although you said the 32bit regs + banked regs just show up as the 64bit regs.
07:59:31 <geist> so now the only real penalty you have is the arm32 decoder, and that's not too bad apparently
07:59:44 <geist> no yo dont get the registers back, because the 64bit arm state is >= arm32 state
07:59:59 <geist> the arm32 register state fits nicely within all of the 64bit regs
08:00:26 <mrvn> geist: that's what I said. So you don't actually save any regs, just the banking logic.
08:00:31 <geist> so once again they wer epretty forward thinking by taking the hit in the short term for needing to support two different exception/EL1 models
08:00:41 <geist> with the idea that you can eventually transition everyone off the old one
08:01:25 <mrvn> ARM design is pretty slick with their limited version backward compatibility
08:01:25 <geist> and the new 64bit EL1 model is much cleaner than the old one
08:01:42 <mrvn> +s
08:01:45 <geist> right. it's interesting talking to their engineers. they work in 5 and 10 year time frames
08:02:00 <geist> sort of interesting to think that way, but they have to plan on where things will be then, and they architect things accordingly
08:02:19 <geist> i'm fairly impressed with their engineering and their thouroughness
08:02:30 <mrvn> and my RPi is probably 10 years older than that on top.
08:02:40 <geist> exactly, which is why i wish it weren't so popular
08:02:57 <geist> its like everyone getting excited about intel atom processors, and thinking it's all you get in x86
08:03:16 <zhiayang> in 10 years: microcode is written in js and cpus are just chromium in hardware
08:03:37 <mrvn> geist: RPi3 is 64bit so that is somewhat nearer to the curve.
08:03:51 <geist> correct, except that all the damn rPi distros are still 32bit only
08:04:01 <geist> so that they dnt need to maintain two versions
08:04:03 <mrvn> Not much point with <= 2GB ram.
08:04:31 <geist> perhaps, but still, it's effectively a cortex-a7. an -a53 is about the same speed as an a7, just 64bit capable
08:05:01 <mrvn> Would 64bit mode even be faster for users?
08:05:39 * CompanionCube doesn't think the improvements are lesser than on x86
08:05:53 <mrvn> I imagine the memory overhead would cancel out any speed improvement the opcodes give you
08:06:15 <mrvn> CompanionCube: x86 is so register starved that amd64 gives a HUGE boost.
08:06:28 <CompanionCube> yep
08:06:43 <mrvn> ARM32 already has lots of registers.
08:07:09 <geist> exactly
08:07:32 <geist> i dont think it's as big of a boost, and the code density is definitely far worse than an equivalent thumb2 piece of code
08:07:54 <CompanionCube> geist: iirc archlinuxarm supports aarch64
08:07:59 <CompanionCube> though i don't use it
08:08:01 <mrvn> Which means a 64bit linux on RPi3 would be a negative for the user.
08:08:08 <geist> oh of course. ubuntu/debian/etc all have 64bit versions
08:08:22 <geist> but the raspbian distros which it hink most folks use on rpi are all 32bit only
08:08:43 <mrvn> geist: well, raspian is ARMv6. you want something newer.
08:08:43 <geist> mrvn: entirely likely
08:09:03 <mrvn> I think the RPi3 can run Debian arm.
08:09:04 <geist> i guess my summary of RPI is it's a fine board to do stuff with if you're using linux
08:09:21 <geist> if you want to low level bare metal hack, it's really crappy
08:09:22 <mrvn> geist: it's not quite fine. It's just cheap and common.
08:09:36 <geist> correct
08:09:45 <mrvn> And for hacking it's as good as most other boards.
08:09:55 <geist> i've used it in a few embedded things, and it works okayish for that
08:10:03 <geist> but right, other boards are just as good if not better for that
08:10:15 <geist> i just hate it when folks want to bare metal hack it as their first project
08:10:21 <mrvn> which leaves the price as the deciding factor.
08:10:23 <geist> i like to steer folks away from it
08:10:36 <mrvn> geist: give me an alternative.
08:10:46 <geist> odroid c2
08:10:52 <mrvn> Something that isn't using USB for everything please.
08:10:58 <geist> stuff with RK something something in it
08:11:05 <geist> that's showing up a lot
08:11:28 <geist> anything with zynq in it (though those tend to be more expensive)
08:11:36 <geist> but the xilinx docs for it are fantastic compared to almost anything else out there
08:12:02 <geist> like i said: there's hack on the board with linux and there's bare metal hacking
08:12:25 <geist> for the latter, you want something that's more standard and at least somewhat documented. the broadcomm cpus tend to fail those tests much harder that most
08:12:39 <mrvn> geist: do you have docs how to do video decoding with the MALI?
08:12:40 <geist> they're highly nonstandard and docs are not great (though folks hve figured it out)
08:12:50 <geist> nope. if you want to do bare metal video there's not much you can do
08:13:00 <geist> aside from sit on top of whatever SW goop there is on broadcomm
08:13:13 <geist> if that's a hard requirement, then you're RPI and life's a bitch
08:13:16 <mrvn> geist: yeah, that's the part I still miss. The RPi has specs for the VC and a fully free GL code for it. But I want video.
08:13:27 <geist> yep. that's a constraint that severely limits you
08:13:36 <geist> but one that frankly most folks that are just starting with osdev dont have
08:13:47 <mrvn> But the video interface for the original VC firmware is documented.
08:13:53 <geist> probably so
08:14:01 <geist> i'm not saying it's not true, i dont care to argue that point
08:14:16 <geist> it's all a big matrix of what you get with this or that board and what you need to do
08:14:28 <geist> RPi i think has far more Xs in the boxes than checkmarks, but it depends on what you want to do
08:14:53 <geist> since i'm primarily interested in low level, serial port is okay, get working on the cpu level stuff, RPi is really crummy
08:15:03 <zesterer> Hi all. Long time, no see.
08:15:25 <mrvn> geist: have you worked with ARMADA 388 socs?
08:15:41 <geist> i have not. my experience is marvell stuff is fairly clean and well designed
08:15:46 <zesterer> klange: I saw a video of you doing a talk at some university a few years back. It was a really good talk, but I can't seem to find it on YT. Do you happen to have a link so I can show a friend?
08:15:49 <geist> but they're really bad about releasing docs to the world
08:15:55 <knebulae> Well that convo went on very nicely guys. Zhiyang, nice job man. Very well written article. I have to say that my biggest question about all of this is whether it's simply more economical to run 32-bit code under vmx extensions rather than putting any code into the context switching that supports 32-bit.
08:16:11 <mrvn> yeah. I tried for a weekend to get the 2nd core started on mine and didn't manage.
08:16:31 <geist> knebulae: i seriously doubt it. the code needed to support VMX is at least a few orders of magnitude more complex than context switching
08:16:43 <geist> plus you'd have to o it all over again for SVM for AMD machines
08:16:51 <mrvn> geist: do you know if the ethernet on the odroid-c2 is USB based? I don't see it in the web.
08:16:58 <geist> nope it's a hard eth mac
08:16:59 <knebulae> @geist: true, but worth it if I can go faster.
08:17:15 <geist> knebulae: okay
08:17:35 <geist> i can tell you the answer is a hard no, but hey, worth a try right?
08:17:37 <knebulae> but I know you're right; 20-30 instructions vs. 1,000s.
08:17:49 <geist> and the thousand or so cyces it takes to vmenter/vmexit
08:17:57 <knebulae> @geist: understood
08:18:08 <mrvn> geist: No SATA though. It's hard to get a cheap board with real SATA. The BananaPi had it but they replaced it with USB-SATA in the next revision.
08:18:22 <geist> mrvn: yah we went through this whole discussion a few weeks ago. not a lot of choices if you want hard sata
08:18:40 <geist> your demands from the ARM community simply dont exist for the most part
08:19:14 <mrvn> geist: well, is it to much to ask for a GBit that isn't only 300MBit like 90% of the boards?
08:19:16 <geist> the only vendor that really makes fairly cheap arm SoCs with sata and pci and whatnot are marvell, and they're kind of like a closed state
08:19:31 <geist> they only deal with high volume customers and have no real interest in engaging with the open source community
08:19:38 <geist> mrvn: odroid c2 has that
08:19:47 <mrvn> geist: yep, one of the 10% ones.
08:19:57 <geist> and a fairly fast MMC interface. its the cheapest, simplest, straightforward quad core a53 i know of
08:20:24 <geist> it has mali too, i dunno the state of the linux drivers for it
08:20:36 <geist> but it's fairly easy to get code on the cpu. the loader is just uboot and it'll just blat you into ram
08:20:44 <mrvn> does it have 64bit secure mode and hardware VM?
08:20:49 <geist> amlogic isn't great about documenting things, but my experience is they dont do too much wonky shit
08:20:52 <geist> yes
08:21:00 <geist> all a53s have full 4 levels
08:21:18 <mrvn> I always wanted to check if the RPi actually has all 4 levels.
08:21:35 <geist> it does, but their timers and interrupt controllers are non standard and not virtualizable
08:21:45 <geist> which basically nerfs if not outright kills something like KVM
08:22:00 <geist> there are rumors that there are some patches to enable KVM on rpi, but its to workaround all the nonstandad timer/interrupt controller shit
08:22:13 <mrvn> geist: yeah. It was never ment for it. They just glued a newer ARM design onto their existing VC
08:22:27 <geist> basically all cortex-a* cores implement all of the 4 run levels and nested paging
08:22:38 <geist> of the armv8s at least
08:22:56 <geist> including all the way down to a35 at least. a32 is 32bit only, dunno if it has all of it
08:23:24 <geist> the problem is that some vendors have goop already running in El2. Qualcomm does this
08:23:45 <geist> so for those SoCs you usually can't get hypervisor level access. the cpu starts in El1 after running through all the bootloader stages
08:24:49 <geist> basically if you wait for the perfect arm board you'll be waiting a long time
08:24:54 <mrvn> geist: I know.
08:25:17 <geist> may have better luck with some newer riscv boards, since theyre somewhat targetting hobbyist and 'i wanna build a completely free desktop' crowd
08:25:25 <geist> may end up paying a ton and having crappy erformance but that's what you get
08:26:12 <mrvn> Give me an FPA that can run a riscv at 1GHz and I'm happy.
08:26:32 <mrvn> (give, not tell me how much it costs :)
08:27:33 <geist> on that's easy: get a zynq dev board
08:27:39 <geist> about $250 and you should be able to get all you want
08:27:53 <geist> oh at 1Ghz? well, that's harder
08:28:12 <mrvn> yeah. those synq are no way that fast.
08:28:24 <geist> no but you can probably get up in the 500 mhz range, maybe
08:28:36 <geist> 7 series xilinx fabric is pretty fast
08:29:00 <mrvn> wasn't the fpga based riscv board around $1000?
08:29:37 <geist> yes
08:29:50 <geist> i think it's an asic connected to a fpga
08:30:01 <geist> it's clearly a pre-production dev board. priced as such too
08:30:19 <mrvn> .oO(apt dist-upgrade ... installing riscv 1.2.3-4 ... compiling new CPU ... please reboot now)
08:30:30 <geist> no the cpu is implemented in asic on that one
08:30:36 <geist> the sifive unleased is what you're talking about
08:30:37 <mrvn> :(
08:31:01 <geist> and i dunno why you think that's funny. that's *precisely* what happens if you're using fpga emulation of the cpu you're on
08:31:17 <geist> you either reload the fabric at run time or you blat it out to a spi flash and reboot
08:31:22 <mrvn> geist: the funny part would be to apt it.
08:31:22 <geist> that's fpga work for ya!
08:31:27 <geist> ah yeah
08:31:51 <geist> whic is probably what you want. i think compiling a large riscv thing like that would easily be in the hour or more size
08:32:20 <mrvn> you probably would download the binary image and not the source.
08:32:44 <geist> right
08:33:13 <geist> these sort of large fpga compiles are precisely why large workstations exist
08:33:19 <mrvn> And you need some small ARM or something as boot processor that then loads up the fpga. Can't reprogramm itself while it's running, right?
08:33:24 <geist> it's one of those things that really makes you wish you had an incredibly beefy machine
08:33:38 <geist> yah a zynq would be neat for that, because that's exactly what you get
08:33:49 <geist> you could boot linux on the arm side of the world and then dynamically load the rest of it
08:33:57 <geist> even shut it own, reload it, restart it
08:33:59 <mrvn> geist: yesterday my factorio game crashed on save because 16GiB ram just wasn't enough.
08:34:28 <immibis> the flash chip can be modified while the FPGA is running since the FPGA isn't actually using it. woe betide you if it doesn't work though
08:34:42 <geist> yah you'd have to go back to jtag to reload it
08:35:04 <mrvn> immibis: you mean have a 2nd small FPGA for flashing the full one?
08:35:09 <immibis> no different from recompiling a kernel and having to hope it works. maybe they can have a backup flash chip
08:35:16 <geist> yeah
08:35:19 <immibis> mrvn: no? the FPGA can rewrite the boot flash because it's not actually using the boot flash
08:35:34 <immibis> mrvn: the boot flash is read into RAM when the FPGA starts up
08:35:43 <mrvn> immibis: ahh, that you mean. It should have enough flash for more than one image so you can boot the previous one.
08:35:49 <geist> right. generally what the fpga does is it has some hard codeed logic to load its image out of the flash, but after that it leaves it alone
08:36:30 <mrvn> Or flash and SD card with a switch to set boot order.
08:36:37 <geist> yep
08:36:51 <mrvn> You would play with the SD card and the flash would be backup.
08:36:58 <immibis> if your FPGA supportrs reading the bitfile from the SD card
08:37:05 <immibis> of course you could also just have a small microcontroller to do that
08:37:32 <immibis> if you have a micro load the bitfile you can have whatever shenanigans you want
08:37:54 <mrvn> immibis: even something as simple as an AVR could do that.
08:38:05 <immibis> yes, it could'
08:40:05 <mrvn> time for supper. What will it be today? Chicken+Chinakohl on rice or Pig with bacon and salbei. Choices, choices.
08:41:03 <mrvn> chicken has the lower best-until date so chicken it is.
08:41:15 <mrvn> Back in 30-40 minutes *wave*
08:41:39 <geist> same, i need to go try to unhook a draping tree from a phone line outside
09:16:01 <nyc> My all-singing all-dancing makefiles are close to being there.
09:19:47 <nyc> I can live without cpp-generated deps until I actually have files that include other files.
09:22:48 <mrvn> nyc: WTF? generated debs is the second thing you implement.
09:22:57 <mrvn> DEPENDFLAGS := -MD -MP
09:23:19 <mrvn> include $(wildcard *.d)
09:23:20 <mrvn> done
09:23:39 <mrvn> nyc: do you have out-of-tree build yet?
09:24:27 <nyc> mrvn: I have to redirect the output to computed build dirs and also run the rules for generic code through m4 to support simultaneous build on multiple targets.
09:24:53 <mrvn> nyc: VERBOSE/silent flag? builds recursiley in subdirs? LTO?
09:25:12 <nyc> mrvn: I also have no way to test it because nothing actually includes anything else yet.
09:25:39 <nyc> mrvn: I wrote my makefiles nonrecursively.
09:26:39 <mrvn> nyc: why do you need m4? I've implemented out-of-tree builds and all compiled now happen in _build-<arch-tripplet>. So different targets don't conflict.
09:27:36 <mrvn> nyc: no stdarg.h and alloca.h yet?
09:27:57 <nyc> mrvn: It's an artifact of having to generate dependencies for the nonrecursive build. It's not particularly difficult, it's just obscure enough to not be documented anywhere.
09:28:19 <mrvn> nyc: implement printf. It's a usefull function and you can include krpintf.h
09:28:47 <nyc> mrvn: I'm doing hello world out of just registers and setting up the C stack for a bunch of different arches. I'm only on my second arch at the moment, SPARC.
09:29:33 <mrvn> I heard. But how do you test the C setup if you don't printf("Hello %s!\r\n", "World");?
09:29:45 <nyc> mrvn: When I do that for all the arches I have in mind for the first pass, my plan is for the next thing to be enough interrupt handling for a gdb stub.
09:30:22 <nyc> mrvn: I hand-wrote the assembly to output a constant string to serial with just registers and no stack accesses on MIPS.
09:30:49 <mrvn> nyc: hehe, that's how I always start too. Step one: output 'x' on serial over and over.
09:31:06 <mrvn> Just for fun smetimes I switch to '#' or 'X'
09:31:55 <nyc> mrvn: Well, I'm getting used to the toolchains and emulators and just plain doing code again after 10 years of mostly not writing a line of code.
09:32:32 <nyc> mrvn: Plus it sets up the arch target etc. basics.
09:34:34 <nyc> I think a good first pass would be the 64-bit variants of ARM, MIPS, OpenRISC, RISC-V, and SPARC, though 32-bit MIPS is probably the "ultimate target architecture" for the algorithms I'm looking to implement.
09:35:39 <nyc> I'll throw in x86 and VAX once I get far enough along to bring up secondary arches.
09:36:30 <nyc> You know, POWER would be a good idea to have going, too.
09:37:21 <nyc> x86 and VAX both actually have fine points (PAE, large gap in page sizes, tiny page size) that make them meaningful demonstrations of my algorithms as well.
09:37:36 <mrvn> my list is arm, arm64, amd64, mips, m68k, ppc
09:38:16 <nyc> mrvn: I'm not really distinguishing between 32-bit and 64-bit variants of the same general architecture.
09:38:46 <mrvn> for bootstrap they are pretty different.
09:39:21 <nyc> mrvn: Page size spectra are relatively aligned.
09:40:08 <mrvn> page tables, cpu+mmu ctrl flags, interrupt+exception handlers. Pretty much a rewrite.
09:41:06 <nyc`> mrvn: Which arches?
09:41:20 <mrvn> all of them
09:41:45 <nyc`> I've never actually done early boot on anything but ia32 before.
09:43:50 <nyc`> IA64 was not only running C but outright running userspace on SDV's by the time I got to it.
09:44:43 <nyc`> The earliest things ever went down on my retrocomputing farm was after console output.
09:45:45 <nyc> The earliest in boot things had trouble for me was when I was doing pgcl because the boot-time ia32 asm had constants calculated based on PAGE_SIZE.
09:47:21 <nyc> I guess Russell King had me debug a little bootmem on his Netwinder and I can't remember the name of the guy at SGI who gave me logs of the boot-time timer interrupt livelock.
09:47:47 <mawk> when I boot linux with correct segment register but bad GDTR will it work ?
09:48:11 <mawk> eg will linux overwrite GDTR with its own thing
09:48:32 <mrvn> it will overwrite everything
09:48:49 <mawk> nice
09:48:52 <nyc> I thought the GDTR was where the segment attributes were gotten from.
09:48:56 <mawk> yeah
09:48:58 <mawk> but they're cacehd
09:48:59 <mawk> cached
09:49:06 <mawk> so unless linux reloads them it will be fine
09:49:54 <mawk> the selector inside the gdt array is in %cs %ds %es %fs %gs %ss, alongside with the cached value of the gdt entry
09:50:03 <mawk> and the gdt itself is at address indicated in %gdtr
09:50:53 <mrvn> the kernel, with mutliboot, gets called in 32bit mode. It sets up ad gdt, loads it and reloads all segments to switch to 64bit mode. There, all gone.
09:51:12 <mawk> yeah I call it in 32 bits mode directly
09:51:38 <mawk> I provide it with its struct boot_params, a heap, stack, correct segment selectors
09:51:43 <mawk> then it's supposed to take on from that
09:51:57 <mawk> but I wasn't sure if the gdtr itself must be correct
09:52:24 <mawk> I'm making a thin linux loader for kvm
09:52:33 <mawk> eg what qemu is doing when you call with -kernel I guess
09:52:56 <mawk> read that hideous bzImage thing, split it apart, load the correct stuff, configure, and run it
09:53:41 <mawk> also there are a few other registers I'm not sure about, eg the apic_base register
09:54:17 <mawk> am I supposed to make up a reserved mmio area for tha apic thing and set it up on both the vCPU and the virtual APIC ? or should I just let kvm do its business ? the docs are pretty lacking
09:54:23 <mawk> the docs are "coming in some time" it's said
09:55:57 <nyc> It may take me a bit to figure out where the serial port is on Niagara.
09:57:39 <nyc> mrvn: What are your kernel's design goals?
09:57:56 <mrvn> microkernel with message passing IPC
09:58:21 <nyc> mrvn: Amoeba 2.0?
09:58:31 <mrvn> everything async
09:58:40 <nyc> Everything async is definitely good.
09:59:43 <mrvn> also everything KISS
10:02:04 <nyc> mrvn: That one scares me.
10:02:14 <mrvn> you like overly complex?
10:03:04 <nyc> mrvn: I'm more MIT than New Jersey.
10:06:26 <nyc> I'm probably actually more perfectionistic than the Wikipedia-stated MIT philosophy.
10:25:01 <jjuran> I wouldn't mind switching away from Raspberry Pi for the various reasons stated, but I have a concerns about graphics. With RPi I can change the display resolution with an ioctl on /dev/fb0, and the GPU will auto-scale to fill the screen. This doesn't work on either of the desktop PC's I've tried, so apparently it's not standard functionality.
10:25:30 <mrvn> jjuran: Normaly the monitor can do that too
10:26:28 <mrvn> It works on the RPi because the framebuffer is just a texture for the VC that gets rendered on every frame.
10:27:29 <mrvn> But seriously speaking: Why would you ever want to use something other than the monitors native resolution for the display?
10:27:56 <jjuran> mrvn: For running an application that only works in, say, 512x342.
10:28:15 <mrvn> jjuran: why would you port such a stupid application to your own kernel?
10:28:35 <jjuran> Not porting — emulating.
10:29:00 <mobile_c> what does "0x200e78 to the nearest multiple of 0x1000" mean
10:29:09 <mrvn> then get GL to work and stick the framebuffer in a texture and let the GPU scale it.
10:29:16 <mobile_c> what does "0x200e78 truncated to the nearest multiple of 0x1000" mean *
10:29:24 <jjuran> I've resorted to centering the emulated screen in the default case, and leaving fullscreen as an option.
10:29:58 <jjuran> That sounds like a lot more work than "make one ioctl call"
10:29:58 <mrvn> mobile_c: nearest would be 0x201000 but I think they mean 0x200000
10:30:30 <mrvn> jjuran: just make an ioctl for it
10:31:44 <jjuran> mrvn: I'm not deving the native OS: https://www.v68k.org/advanced-mac-substitute/
10:33:14 <mrvn> if your running that on linux then just let xorg change the video mode.
10:34:37 <jjuran> framebuffer, not X
10:39:13 <geist> mobile_c: that's round down and round up, basicaly
10:40:00 <geist> but yes, their terminology is not great
10:40:21 <geist> usually with these sort of things in programming you almost always want a hard round up or round down
10:40:27 <geist> not 'find the nearest, higher or lower'
10:40:37 <mrvn> A little bit of pseudocode would make it so much clearer: res = addr & (pagesize - 1), done. no questions left.
10:40:52 <mrvn> +~
10:41:01 <geist> well, be careful there. that already implies a bunch of fancy bits
10:41:11 <geist> as in it's a power of 2, its using 2s compliment, etc
10:41:12 <nyc> Hmm. I'm not seeing a SPARC virtualspace reservation scheme. It looks like it pretty much just has bits in the MMU for cache and privilege attributes and comes up with the MMU off to give a kernel time to initialize paging.
10:41:17 <mrvn> geist: but bits every programmer knows.
10:41:34 <geist> well, sure, but that's not necessarly the sort of thing ou want to put in a documen
10:41:40 <geist> if you're documenting it you document precisely what it is
10:41:45 <geist> not using clever programming tricks
10:41:54 <mrvn> geist: you want to put in both.
10:42:04 <geist> so that there's no ambiguity, and in 20 years the document still stands, etc
10:42:16 <geist> it's one of he differences between good and bad docs
10:42:27 <geist> stuff written by programmers tends to be too clever. you get a tech writer to make it clearer
10:42:30 <mrvn> Like the ARM docs with pseudocode what the CPU does for an instruction step by step
10:42:47 <geist> right, and in that case it all works, because they can precisely define what it means
10:43:03 <nyc`> SPARC bringup looks like it's going to make MIPS bringup look like a cake walk.
10:43:14 <mobile_c> so i just either hardcode it to round down or up?
10:43:23 <geist> mobile_c: yes, almost assuredly
10:43:27 <geist> depending on how you interpret the statement
10:43:31 <mobile_c> ok
10:43:34 <mrvn> mobile_c: pretty sure it is down in that case
10:43:40 <geist> same
10:44:09 <mobile_c> which would be appropriate for this context:
10:44:13 <mobile_c> An executable or shared object file's base address is calculated during execution from three values: the virtual memory load address, the maximum page size, and the lowest virtual address of a program's loadable segment. To compute the base address, one determines the memory address associated with the lowest p_vaddr value for a PT_LOAD segment. This address is truncated to the nearest multiple of the maximum page size. The
10:44:15 <mobile_c> corresponding p_vaddr value itself is also truncated to the nearest multiple of the maximum page size. The base address is the difference between the truncated memory address and the truncated p_vaddr value.
10:44:31 <mrvn> mobile_c: the idea behind that whole complex sentence is that the lower bits of the address remain as offset inside the page in both the file and in memory. The upper bits, multiples of page size can be anything.
10:44:35 <geist> yah 'truncate to the nearest' is almost assuredly meaning 'round down'
10:44:58 <geist> and of course wit power of 2 on a binary machine, rounding down is trivial
10:45:33 <mrvn> mobile_c: have you considered simply using libelf?
10:47:23 <mobile_c> so it would be this? https://paste.pound-python.org/show/k9MwAlFCfBzNEijtp87B/
10:47:40 <nyc`> I can probably get to hello world with paging off, but there will be linking issues with jumping to code in kernel virtualspace at addresses usable at runtime.
10:49:07 <geist> mobile_c: that is some seriously hard to read code
10:49:21 <mawk> linker script nyc`
10:49:32 <mawk> you can play with virtual vs physical and all kind of things
10:49:38 <mawk> but the syntax is pretty rebutting
10:50:17 <geist> those two if/else clauses can be combined for sure
10:50:32 <nyc> mawk: Hmm. Actually, I don't think there's an issue now that I think about it. I can just 1:1 map kernelspace like MMU disabled/bypassed and everything in userspace runs in other ASID's.
10:52:43 <mawk> hm
10:52:47 <mawk> yeah I guess
10:54:31 <nyc> mawk: It's still a bit of a bridge of asses to cross to have to handle TLB misses on kernelspace in software.
10:55:55 <nyc> At least if I'm reading this right.
10:55:58 <mawk> why do you have to do that ?
10:57:13 <nyc> mawk: TLB misses are handled in software in general and there doesn't appear to be any kind of special arrangement for the kernel to avoid having to handle them like exists on MIPS.
10:58:41 <mawk> so it's up to the kernel to maintain its own page tables or something ?
10:58:48 <mawk> the hardware can't read the page tables itself ?
11:01:20 <nyc> mawk: It wasn't designed to.
11:01:34 <mawk> I see
11:02:31 <geist> yes thats common
11:06:43 <nyc> I'm pretty sure I can do everything out of registers like on MIPS, and so avoid register windowing issues.
11:06:51 <geist> for simple asm, sure
11:07:37 <nyc> (1) bitbang hello world out the uart (2) set up registers & stack for C (3) call main() is all that's going on.
11:08:03 <geist> alrighty
11:17:30 <nyc> The OpenBIOS page dates from 2013 and building that from source seems to be the closest approximation to a supply of firmware images.
11:22:03 <nyc> I sort of punted on getting MIPS firmware and went for the completely unrealistic system with no firmware. This is going to turn into a mess and I'm tired from having stayed up all night. Fingers crossed #qemu will turn up something.
11:32:42 <mobile_c> so the base address would be (nil) ?
11:32:44 <mobile_c> o.o
11:34:40 <mobile_c> library[library_index]._elf_program_header[lowest_idx].p_paddr = 0x000000200e78
11:34:42 <mobile_c> library[library_index]._elf_program_header[lowest_idx].p_vaddr = 0x000000200e78
11:40:29 <mobile_c> The base address is the difference between the truncated memory address and the truncated p_vaddr value.
11:53:36 <nyc> mobile_c: Most of the time p_paddr and p_vaddr are the same. It only makes a difference for kernels and such.
11:55:27 <zesterer> klange: Do you happen to have a link to that talk you did some years ago?
11:55:53 <klange> never did a talk; did a talk at a former employer, it's way back in my youtube channel
11:57:15 <klange> never did a talk at a uni*
11:57:51 <zesterer> Hm, I can't seem to find it. Looked through your channel history several times. I might be derping though
11:59:09 <klange> oh for some reason it's unlisted... I don't recall marking it as such https://www.youtube.com/watch?v=Wp5kl-NfpM8
11:59:15 <klange> it's super outdated
11:59:26 <klange> for, like, a lot of reasons
11:59:37 <FireFly> hehe
11:59:39 <zesterer> Ah, thanks very much! And yeah, I'm aware. I just wanted to show a friend, since I found it really interesting when I watched it.
11:59:53 * mrvn just ordered a "Intel 660p Series SSD 2TB QLC PCIe NVMe 3.0 x4 - M.2 2280". Lowest price / GB of all listed cards.