Search logs: #osdev - 11 November 2018

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=18&m=11&d=11

Sunday, 11 November 2018

12:14:45 <aalm> .roa
12:14:45 <glenda> 95 Expand or die.
12:14:48 <aalm> .theo
12:14:48 <glenda> No way.
12:16:46 <klange> sounds about right
12:49:32 * geist yawns
01:45:19 * eryjus is away: Gone away for now
01:46:22 * eryjus is back
02:45:27 * eryjus is away: Gone away for now
02:48:52 <Mutabah> eryjus: Could you disable that script? It's useless line-noise
02:50:28 * klange is away: hunting down eryjus
03:14:24 * eryjus is back
03:15:58 <Mutabah> eryjus: Good. Now can you turn that script off?
03:15:58 <eryjus> klange: lol!!
03:16:32 <Mutabah> (A better script is one that auto-replies if you're messaged while away)
03:16:36 <klange> eryjus: seriously though please don't post away messages as actions
03:16:59 <eryjus> already done
03:18:38 * aalm never left
03:19:18 <eryjus> i was only using what was available by default in the client -- i've been on IRC for only about 24 hours by the clock... still learning
03:19:41 <aalm> np.
03:22:27 <Mutabah> Was that option enabled by default?
03:24:01 <eryjus> no, but wasn't sure what it would print... On the webchat, I know I could hide all those messages
03:54:42 <klys> sup
03:55:02 * klys is now running linux-4.19.1
03:57:07 <klange> toaru-1.8.0-17fca7a0
11:54:22 <cormet_> hi
11:55:45 <cormet_> how paging-struct caching works? all PLM4/PDPTE/PDE entries per process are cached?
11:56:13 <lkurusa> i don't think they are cached per se
11:56:21 <lkurusa> the address translation result is cached in the TLB
11:56:38 <lkurusa> i.e., when you access $virtaddr, and it translates to $physaddr via the pagewalk
11:56:45 <Mutabah> Well, they are kinda... the CPU (most architectures) stores individual mappings in the TLB (as lkurusa said)
11:56:46 <lkurusa> then $virtaddr -> $physaddr will be stored in the TLB
11:56:58 <cormet_> In addition to the TLBs, a processor may cache other information about the paging structures in memory
11:57:10 <cormet_> this is what I read from intel doc, so it is not TLB cache, but something else?
11:57:23 <isaacwoods> I mean it's normal memory, it can proabably be cached
11:57:31 <isaacwoods> When a page-walk happens
11:57:37 <Mutabah> Maybe it caches parts of the page walk?
11:57:41 <bcos_> cormet_: Yes - newer CPUs have "paging structure caches" (to cache higher level stuff like PDPT entries)
11:57:42 <lkurusa> Yah
11:57:59 <lkurusa> I remember hearing about what bcos_ just said
11:58:09 <cormet_> bcos_: what exactly this cache contains? the phys addresses of all paging strictures?
11:59:15 <bcos_> cormet_: Exact contents depend on which CPU (might be PDPT entries, might be PML4 entries, might be both or neither or something else; might be the whole entries, might just be parts of it)
11:59:40 <isaacwoods> interesting, didn't know there were specialised caches!
11:59:46 <isaacwoods> cormet_: check out http://kib.kiev.ua/x86docs/SDMs/317080-003.pdf
11:59:57 <cormet_> isaacwoods: this pdf seems to be very old
12:00:14 <bcos_> ..the Intel's manual is carefully worded to allow this - using words like "may cache" without any clue if it does/doesn't, so that Intel are free to do things differently in future
12:00:55 <cormet_> bcos_: so, this cache is completely handled by CPU, nothing from kernel side?
12:01:04 <lkurusa> cormet_: it might be old, but it looks like what you are trying to learn more about
12:01:27 <cormet_> I will take a look of this doc then, thanks :)
12:02:10 <lkurusa> cormet_: from a quick glance at the doc:
12:02:10 <lkurusa> Software
12:02:10 <lkurusa> should rely on neither their
12:02:12 <lkurusa> presence nor their absence.
12:02:13 <bcos_> cormet_: Sort of - when you use INVLPG (to invalidate TLB) CPU might also invalidate higher level paging structure caches; but apart from that it should be mostly transparent (OS can mostly ignore it)
12:02:19 <lkurusa> (sorry, wow that paste broke)
12:02:55 <cormet_> ok, so I don't care about those cache...yet :)
12:03:18 <cormet_> I am curious to understand how it works with PCID
12:04:57 <bcos_> PCID is conceptually simple - instead of having all "things" (TLB entries, paging structure caches) depend on the current CR3 and flushing it all when CR3 changes, it just adds an "ID" to the tag of all those things so they don't need to be flushed when CR3 changes (it just checks if the tag is right when the thing is used)
12:06:19 <cormet_> it is still not so clear for me, with PCID, when cr3 changes, only TLB entries of old cr3 address space are flushed, right?
12:06:23 <bcos_> In practice it's messy though (it adds up to a lot more complexity for things like "multi-CPU TLB shootdown" because you can't assume that a translation for a different virtual address space isn't cached)
12:06:33 <bcos_> No..
12:06:41 <bcos_> When CR3 is changed, nothing is flushed
12:06:54 <bcos_> (if PCID is being used)
12:07:18 <lava> even if not using PCID cr3 changes are suspiciously fast
12:07:37 <mrvn> cormet_: when the TLB tries to match an address it also matche the current PCID agianst the PCID stored in the TLB entry
12:07:39 <lava> i'd bet they have some optimizations even when not using PCIDs
12:07:46 <lkurusa> oh no, not another sidechannel
12:08:12 <bcos_> lava: Well.. there's "global" pages too
12:08:20 <cormet_> but pcid are limited to 2^12 entries, when CPU decides to flush TLB entries out?
12:08:44 <mrvn> that many?
12:09:23 <cormet_> so, with more than 4k processes, how it works? :)
12:09:44 <bcos_> cormet_: CPU uses some kind of "(pseudo?) least recently used" to evict TLB entries when it needs to make space for new ones; but apart from that (when PCID is used) it doesn't flush any TLB entries unless the OS tells it to
12:09:52 <mrvn> cormet_: INVLPG flushes entries and I bet there is some PCID opcode to invalidate an PCID
12:10:33 <cormet_> ok, so it is up to the OS
12:10:38 <mrvn> yes.
12:16:25 <lava> cormet_: see figure 2 and the text on that part here: https://gruss.cc/files/prefetch.pdf
12:18:44 <cormet_> lava: nice. Those paging struct caches are per process? if I invalidate a pcid also this paging cache are flushed?
12:19:16 <lava> per core
12:19:36 <lava> these caches are what is casually called "the TLB"
12:20:03 <lava> it's just a multi-level cache, which makes a lot of sense for the implementation
12:20:20 <cormet_> intel docs says "in addition to TLBs"
12:20:40 <lava> yeah, but people usually mean the entire thing when they speak about "the TLB"
12:21:11 <cormet_> so, this paging cache can contain page structs from different processes too
12:22:10 <cormet_> how complicated is this part of the CPU :) I have a lot of questions, like how IPI handles this? if a process switches to another core..
12:22:42 <lava> well, generally, at one point in time only one process can be scheduled on a core
12:22:56 <lava> for hyperthreading they have some mechanism iirc
12:23:18 <lava> maybe it was tagging which hyperthread it was or so, i don't remember what exactly it was
12:23:20 <cormet_> so if a process switches from core1 to core2, the core1 paging cache is flushed to avoid de-sync?
12:23:49 <cormet_> (a desync can happen if core2 changes some page permissions, for example)
12:30:09 <lava> not sure
12:30:15 <lava> lots of things to try ther
12:30:16 <lava> e
12:31:16 <mrvn> cormet_: the OS has to keep them in sync
12:32:03 <cormet_> or just flush the paging cache for this process on core1
12:32:17 <lava> the os would have to flush it mostly
12:32:33 <lava> or: architecturally the OS has to
12:33:01 <lava> what happens actually in the hardware can only be found out using side channels i guess ^^
12:34:33 <mrvn> cormet_: remember that processes can have threads running on multiple cores. If thread1/core1 munmap()s some pages then you have to send a IPI to all cores the process runs on to invalidate the pages.
12:34:34 <cormet_> I am not at you level yet to check this :P
12:35:02 <cormet_> mrvn: right
04:28:30 <john_cephalopoda> Hi.
04:30:08 <SopaXorzTaker> john_cephalopoda, have I brought you there? :D
04:30:16 <SopaXorzTaker> Go on, poke fun at my ideas :3
04:31:26 <bluezinc> SopaXorzTaker: what ideas?
04:32:30 <SopaXorzTaker> bluezinc, in #forth I proposed my idea of writing an OS around a very lightweight virtual machine for something Forth-like
04:32:43 <SopaXorzTaker> (with drivers and such being interpreted, of course)
04:32:48 <bluezinc> I've actually been considering something similar.
04:33:03 <john_cephalopoda> The platform-dependent code is not very big. It would have to define some basic words, the rest could be done in Forth.
04:33:07 <SopaXorzTaker> If done properly, imagine how secure that approach can be
04:33:58 <SopaXorzTaker> Like, writing a network stack in C without serious security experience is probably a terrible idea
04:34:03 <john_cephalopoda> DUP, DROP, OVER, SWAP, +, -, *, /, !, @, the same for floats. Maybe also c@ and c!.
04:34:16 <SopaXorzTaker> john_cephalopoda, what's ! and @?
04:34:18 <john_cephalopoda> KEY and EMIT must be defined, too.
04:34:34 <SopaXorzTaker> john_cephalopoda, + strings and arrays for convenience
04:34:38 <bcos_> Erm
04:34:44 <john_cephalopoda> SopaXorzTaker: ! saves to memory, @ loads from memory.
04:35:08 <john_cephalopoda> What would + do on strings and arrays?
04:35:12 <SopaXorzTaker> john_cephalopoda, like pop data; pop addr; mem[addr] = data?
04:35:17 <bluezinc> john_cephalopoda: concatenate?
04:35:28 <SopaXorzTaker> john_cephalopoda, no, I suggested implementing strings and arrays
04:35:34 <SopaXorzTaker> but "+" would concat them indeed
04:35:42 <bcos_> Instead of security problems of the underlying hardware + security problems of the OS; you want underlying hardware + security problems of the VM + performance disaster + security problems of the OS?
04:35:49 <john_cephalopoda> bluezinc: Could also add element n of array 1 with element n of array 2.
04:36:25 <SopaXorzTaker> bcos_, well, yes
04:36:29 <bluezinc> john_cephalopoda: I'd rather consign that to ".+"
04:36:38 <SopaXorzTaker> that sounds a bit more like a sound approach
04:36:42 <john_cephalopoda> Strings in ANS Forth are usually done in a way, that you have the base address and the length given.
04:37:13 <SopaXorzTaker> but I'm probably playing a young, stupid cypherpunk here by being naive about the potential security issues of the VM itself
04:37:19 <bluezinc> SopaXorzTaker: I'd also assume that you'd need some kind of JIT/compiler so the perormance isn't completely terrible.
04:37:27 <SopaXorzTaker> john_cephalopoda, how do you alloc memory in ANS?
04:37:47 <SopaXorzTaker> bluezinc, well, for practical purposes, yes
04:37:55 <SopaXorzTaker> but the point is building around a VM
04:38:05 <SopaXorzTaker> how that VM would work is entirely implementation-defined
04:38:23 <john_cephalopoda> There is "ALLOT", which basically reserves memory.
04:38:45 <SopaXorzTaker> john_cephalopoda, and free?
04:38:48 <john_cephalopoda> I wouldn't define too many things in the base word set. Appending arrays and strings is something that should probably be reserved for an extension.
04:38:54 <bluezinc> isn't half the point of writing in a higher level language _not_ needing malloc()?
04:39:39 <SopaXorzTaker> bluezinc, Forth is as high-level as you want it to
04:39:51 <john_cephalopoda> I am not sure if there is a "free".
04:40:04 <SopaXorzTaker> like, it can be treated as assembly for a stack machine
04:40:05 <bluezinc> SopaXorzTaker: right, so why would you need to worry about memory allocation?
04:40:18 <SopaXorzTaker> bluezinc, for potentially implementing a VM in it?
04:40:21 <SopaXorzTaker> s/in/for
04:40:49 <john_cephalopoda> In Forth you got a pre-reserved memory area and a pointer for where in that area you are.
04:40:50 <SopaXorzTaker> that would require a lot of higher-level code for everything that isn't handled natively
04:40:52 <bluezinc> wait... so you're writing a VM with the hypervisor being in forth?
04:41:01 <SopaXorzTaker> >> s/in/for
04:41:17 <bluezinc> to be fair, I probably wouldn't go for pure forth, per se...
04:41:23 <SopaXorzTaker> I'm not writing anything yet, just thinking about how esoteric one can get with OSdev
04:41:40 <SopaXorzTaker> >> inb4 someone mentions TempleOS
04:41:43 <bluezinc> SopaXorzTaker: it's not that esoteric...
04:41:53 <glauxosdever> Use brainfsck
04:41:58 <bluezinc> I've had stranger ideas before.
04:42:12 <glauxosdever> It definitely checks the fs on the brain medium
04:42:14 <SopaXorzTaker> glauxosdever, brainfsck?
04:42:17 <john_cephalopoda> Forth isn't esoteric. It has been used before to write small OS.
04:42:42 <SopaXorzTaker> And brainfsck would of course be a Turing-complete interpreted language that is programmed in by deliberate filesystem corruption
04:42:43 <john_cephalopoda> And there is even an OS that uses colorforth.
04:43:08 <glauxosdever> lol
04:43:10 <bluezinc> the great thing about forth/rpl/etc. is that they're very easy to expand.
04:43:30 <bluezinc> so you can just go defining new datatypes/etc.
04:43:34 <glauxosdever> So, corrupt the fs, then check how corrupt it is and attempt to "fix" it
04:43:57 <aalm> .roa
04:44:07 <SopaXorzTaker> glauxosdever, hopefully causing memory corruption in fsck and making it do what you want
04:44:13 <bluezinc> aalm: what?
04:44:26 <aalm> nvm.
04:44:36 <john_cephalopoda> Modify CRC sums and let the program correct the data so in the end you come out with the data you want.
04:45:08 <glauxosdever> Hm, self-modifying code that corrupts itself
04:45:18 <aalm> *corrects
04:46:32 <virtx> hello
04:47:03 <virtx> what is the name of intel sdm manual for arm?
04:49:09 <froggey> you want the ARM ARM (Architecture Reference Manual)
04:49:51 <john_cephalopoda> Nice name.
04:54:14 <bluezinc> why not just call it the ARM^2
04:54:34 <bluezinc> (pronounced arm-square, like arm-chair)
05:34:57 <pZombie> hello friends
05:40:04 <pZombie> Why does the memory benchmark i use show the L1 cache go from ~2000 GB/s to ~600 GB/s when i turn off 5 out of 8 cores, when L1 caches isn't supposed to be shared?
05:41:46 <pZombie> hm, it might actually compute the value by summing up all the cores' scores
07:17:34 <virtx> on x86 with 4 level pages, the bit R/W is present in all array level? (PML4, PDP, PDE, PTE)
07:24:03 <pZombie> that A12 apple CPU is crazy. Not just the CPU but the GPU on it as well. It is almost as fast a the fastest intel IGP, the p580
07:25:24 <pZombie> should rate at about 1-1.2 terraflops
07:27:05 <pZombie> i wonder how difficult it would be to get OS X running on the A12
07:37:42 * pZombie drops a pin
07:38:01 <clever> pZombie: https://homes.cs.washington.edu/~bornholt/post/z3-iphone.html
07:40:38 <geist> virtx: pretty sure yes. an inner level with R set overrides the lower level
07:41:38 <virtx> geist: why this perm is for dirs also and not only for pages?
07:41:51 <radens> geist: do you display the name for the Instruction Fault Status Code value in the ESR? I think I found a bug in qemu and want to make sure it doesn't affect zircon.
07:44:18 <geist> virtx: not sure i understand that question
07:44:38 <geist> radens: probably not. lets see
07:45:09 <geist> what are you seeing? there's some funny business with IFSR that took me a while to figure out
07:45:43 <pZombie> clever It's just one benchmark where A12's low latency cache helps a lot. There are probably benchmarks where intel would wipe the floor with A12. But the point is that A12 has more than enough performance, both as CPU and GPU to suffice for a desktop that even allows for latest games to be played at low resolutions and frames
07:45:59 <virtx> geist: I mean, why W bit is needed for PML4, PDP and PDE entries and not for PTE only? for example if there is a PML4 with W=0, it doesn't mean that all PDP, PDE and all PTEs are read-only, right?
07:46:03 <pZombie> and it does all this with a fraction of the energy consumption
07:46:12 <geist> virtx: yes, that's a feature
07:46:42 <geist> that letsyou map an existing sub page table structure and override the read/write bits
07:46:49 <geist> or at least make it more restrictive
07:47:20 <clever> pZombie: yeah, cache is the main thing that lets it win
07:47:45 <virtx> geist: but it is useful only for write perm, right? because if a PDE has W=0, it is possible to have a PTE W=1?
07:48:06 <geist> virtx: i forget the details. you should consult the intel and amd manual for futher details
07:48:10 <radens> geist: it's like an off by one error when qemu does its encoding.
07:48:17 <geist> radens: can you be more specific?
07:48:37 <geist> i thought youwere about to point out that it doesn't fully decode the instruction on in EL1. and that turns out to be how it's designed
07:48:39 <virtx> geist: I will try to find the right doc
07:48:54 <geist> it only fills out the instrution decode stuff at EL2
07:49:10 <geist> but, we use it in the hypervisor bits, so i'm pretty sure its correct
07:49:13 <radens> geist: look at target/arm/internals.h where it encode ARMFault_Permission in arm_fi_to_lfsc
07:49:18 <radens> then look at this link:
07:49:41 <radens> infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0488c/CIHIDFFE.html
07:49:53 <radens> 4.3.50. Exception Syndrome Register, EL1 and EL3
07:50:19 <geist> okay. so what differs?
07:51:41 <radens> Wait never mind, they start numbering the levels at 1 not zero.
07:52:07 <geist> ah
07:53:30 <clever> pZombie: and on the subject of L1 caches being shared, my AMD FX(tm)-8350 is claiming to share some of the L1's with pairs of cores, due to its partial hyperthreading design https://imgur.com/a/azWYuWl
07:53:40 <radens> still I'm seeing 0xc for that bit, which is not a valid status code and would happen when fi->level is zero. So there's a bug, but not in the encoding.
07:53:56 <mrvn> virtx: The higher level allows setting the mode for a big region so the MMU doesn't have to page walk the whole way.
07:54:00 <clever> pZombie: each core has its own L1d, but then shares the L1i with its neighbor
07:54:52 <pZombie> clever - neighbor as in logical core neighbor or real physical core?
07:55:07 <geist> clever: yes the first few bulldozer designs did that. eventually i think by piledriver they split the decoders
07:55:13 <clever> pZombie: i think the fx8350 blurs the lines with that definition
07:55:27 <geist> at the end of that line i believe only the fpus ended up shared, the rest of the front end is pretty independent
07:55:32 <mrvn> pZombie: obviously physical. the cache will be placed between the two units.
07:56:19 <clever> L2's are also shared between the same pairs, and likely between them as well?
07:56:24 <geist> but the effect of having shared L1i/L1d caches definitely kicks in when hyperthreading is involved
07:57:03 <mrvn> If you have threads doing the same thing then the shared cached might even be helpful.
07:57:31 <clever> my laptop is showing very clear signs of hyperthreading, https://i.imgur.com/T9DIYii.png
07:57:41 <clever> lstopo renders it as 4 cores, rather then the 8 the desktop has
07:57:53 <clever> i7-7700HQ for the laptop
07:57:59 <geist> yep
07:58:09 <geist> that's a pretty standard looking intel design.
07:58:11 <mrvn> what? only one socket?
07:58:17 <pZombie> well, on wikipedia it says that my 5960x only shares the L3 cache. However, when i run the AIDA64 benchmark, turning off cores results in much lower cache bandwidth.
07:58:22 <clever> mrvn: let me fetch the router...
07:58:40 <pZombie> much lower L1 cache bandwidth*
07:59:08 <geist> hmm, what does the kaveri look like (/me goes to boot the machine)
07:59:38 <clever> mrvn: https://i.imgur.com/U0fYAaE.png my router is a dual-socket Intel(R) Xeon(TM) CPU 3.20GHz, cpuinfo doesnt say much more
08:00:27 <mrvn> clever: boring. you need sockets, cores and threads to make the biggest picture
08:00:44 <clever> that reminds me....
08:00:50 <geist> http://tkgeisel.com/pics/thunderx2.topo.png anyone?
08:01:17 <clever> mrvn: i have ssh to a friends box, with 24 cores, lstopo word-wraps and becomes unreadable
08:01:36 <mrvn> see, geist has real hardware.
08:02:13 <clever> https://i.imgur.com/Iu3vgbu.png
08:02:31 <clever> geist: holy crap, lol
08:02:34 <geist> http://tkgeisel.com/pics/kaveri-topo.png is the little derpy kaveri amd machine
08:03:03 <pZombie> to settle this issue, i would have to find some benchmark which relies strongly on fast L1 cache, then run the benchmark on a single core only. Then turn off 6 out of 8 cores in the bios and run the benchmark again on a single core. If it runs slower with the 6 cores turned off, then the L1 cache is probably shared in one way or another
08:03:11 <geist> http://tkgeisel.com/pics/cavium-topo.png is a fun one too
08:03:14 <geist> that's a thunder x1
08:04:06 <mrvn> pZombie: run a memcpy with one core. then run it again with a different second core also doing memcpy of a different region for each core.
08:04:17 <mrvn> pZombie: or simply check the topology
08:04:20 <clever> geist: wow, lol
08:04:32 <geist> oh interesting, the thunderx2 one doesn't dump the cache. probably lstopo doesn't know about that architecture
08:04:50 <geist> but it's pretty straightforward, each of the cores has as fairly traditional L1/L2 stack, and there's a L3 per socket
08:04:54 <geist> much more intel-looking
08:05:04 <pZombie> mrvn - but if memcpy also relies on L3 cache, then i might get bad results, because L3 cache is definitely shared
08:05:29 <mrvn> pZombie: L3 cache is way slower
08:05:30 <clever> geist: does that X1 have a seperate set of ethernet cards for each numa node?
08:06:19 <geist> yeah, i think each socket has like 8 eth interfaces built into the SoC, those are the ones at pci 1774:
08:06:28 <mrvn> pZombie: don't even memcpy, just read memory the size of the cache in a loop. If a second core shared the cache the speed will dive.
08:06:32 <pZombie> mrvn and how exactly do i check the topology?
08:06:35 <geist> we also put in an e1000 you can see
08:07:27 <mrvn> pZombie: lstopo
08:07:43 <geist> the cavium ethernet interface is i think fully supported by linux, but for whatever reason the guy that installed it wanted an e1000 as well
08:08:28 <geist> http://tkgeisel.com/pics/varmit-topo.png is a pretty boring dual socket xeon workstation
08:09:02 <geist> this stuff is all fresh on my mind because we're curently designing the topology code for zircon, and there are a bunch of edge cases
08:09:14 <geist> especially with split LLCs and whatnot
08:09:22 <mrvn> My cpu is much more interesting: Machine (32KiB) CPU#0 +-- serial
08:09:49 <geist> http://tkgeisel.com/pics/ripper-topo.png being a pretty bad case
08:10:01 <geist> since it's dual socket (effectively) and then has a split LLC. all of the zens have that
08:10:17 <geist> so detecting the split LLC and appropriately accounting for it requires some smarts
08:10:19 <mrvn> geist: no shared L2 caches?
08:10:37 <pZombie> wonder if i can check with lstopo on a virtualbox
08:10:49 <pZombie> brb, need to reboot this virtualmachine with all cores enabled
08:11:02 <geist> virtualbox wil.... okay he's gone
08:11:06 <geist> mrvn: on which one?
08:11:14 <mrvn> geist: the last one
08:11:26 <geist> the ripper? no. it's just like two ryzen cores on a single chip
08:11:38 <geist> http://tkgeisel.com/pics/ryzen-topo.png is a plain ryzen 1700x
08:11:51 <mrvn> worst case could be L1i shared 2 ways, L2 shared 4 ways and L3 shared 8 ways.
08:12:11 <geist> yah. so as far as i know nothing like that exists yet, where there's multiple levels of split caches
08:12:22 <geist> at least up to the point that you leave the 'die'
08:12:41 <geist> so i think most systems check for a split LLC (last level cache) prior to going off die
08:12:44 <mrvn> Is there a system with L4 cache yet?
08:13:11 <geist> there are, but they're exceedingly rare. intel shipped almost a one off core back 5 years ago or so that had a large L4 victim cache
08:13:23 <lkurusa> what's that app that draws you those topologies?
08:13:27 <geist> there was even an article written about it on anandtech the other day, seeing if the L4 actually held up
08:13:31 <clever> mrvn: https://en.wikichip.org/wiki/intel/crystal_well
08:13:41 <clever> mrvn: i know somebody that has a laptop with an L4 cache of 128mb
08:13:58 <geist> yep. they apparently abandoned that, sicne i think they only made one of those
08:14:11 <geist> lkurusa: lstopo is the app, it's part of the hwloc package on debian at least
08:14:28 <lkurusa> geist: sweet - thanks
08:15:00 <pZombie> my topology https://imgur.com/a/xJcOQD8
08:15:05 <geist> but... the L4 cache was not interesting topolgy wise, it was just a large 4th level cache that completely encompassed the entire L3
08:15:21 <geist> pZombie: that's not correct
08:15:27 <geist> did you run that in virtualbox?
08:15:35 <pZombie> yes
08:15:52 <geist> cache topolgies are largely meaningless inside a VM, sincei t presents its own take
08:16:04 <geist> most VMs just export a multi socket machine with a bunch of single cores
08:16:10 <geist> like this
08:16:10 <pZombie> even with vt-d extensions?
08:16:29 <mrvn> pZombie: Please use a service that doesn't track you a million ways
08:16:41 <geist> it's entirely up to the VM to present whatever topology it wants, diesn't matter what tech it uses
08:16:56 <pZombie> i am already tracked in a million ways. 1 million +1 won't make a difference
08:16:57 <geist> most VMs will not expose the internal topology of the host machine, since you're running on a subset of cores (presumably)
08:17:20 <geist> and gives the VM the ability to dcide what host cores to run things on
08:17:30 <pZombie> technology has made us all transparent. No way to escape it anymore
08:17:31 <lkurusa> https://ahti.space/~lkurusa/images/1471385a.png
08:17:37 <lkurusa> my topology isn't too nice haha
08:17:43 <lkurusa> intel core 2 quad
08:18:05 <geist> ah yes. was just thinking about that. that was basically two dies glued together
08:18:12 <geist> well, same die, but two designs stamped out
08:18:21 <geist> Q6600, what a workhorse. i loved it
08:18:38 <geist> that was one of my favorite machines, served me well in the late 2000s
08:19:08 <geist> i have a marvell ARM machine (macciatobin) that has a split design like that too
08:20:21 <geist> http://tkgeisel.com/pics/macciatobin-topo.png though it doesn't kow the numbers, it's because there are two L2 caches
08:20:26 <lkurusa> This is a Q9550, still a great machine
08:20:33 <geist> indeed
08:20:49 <geist> it's like my old 2010 era nehalem box. E5520. still a trooped
08:21:01 <geist> trooper. i think it'll end up being my longest running machine in service
08:21:02 <pZombie> great is relative. Only if someone else has to pay your power bill
08:21:08 <geist> since i still can't find a good reason to replace it
08:21:22 <geist> http://tkgeisel.com/pics/four-topo.png
08:21:40 <pZombie> the reason is a low electricity bill, which in the course of 2 years would probably buy you a new system
08:21:53 <geist> not this again
08:22:22 <pZombie> yes Mr 30kwh per day
08:22:26 <geist> fuck off
08:22:30 <geist> seriously, just fuck off
08:22:39 <pZombie> the truth hurts
08:23:09 <pZombie> really?
08:24:19 <pZombie> people are so sensitive those days.
08:24:58 <rain1> yeah
08:25:50 <lkurusa> wow dbus-daemon just segfaulted
08:25:53 <lkurusa> brought my entire machine down
08:27:26 <jjuran> related: https://www.youtube.com/watch?v=z9nkzaOPP6g
08:27:48 <glauxosdever> Pretty sure the energy consumption of your computer is just a tiny fraction of the electricity bills (unless you have a 16-core server)
08:28:29 <rain1> yikes
08:28:33 <aalm> pZombie, i run my 8core 4.7ghz amd 24/7 :]
08:28:55 <jjuran> https://upload.wikimedia.org/wikipedia/commons/b/b7/Sun_Starfire_10000.jpg
08:29:38 <pZombie> aalm - That does not tell me much. If this machine is performing work 24/7 and is of the latest generation, then there is nothing wrong with that
08:30:00 <aalm> like 5years old atleast i guess
08:31:25 <pZombie> aalm - but if for example you were instead using some old q6600 to do 24/7 work, you would end up consuming about 2.4kwh more daily than with an equivalent modern machine. Which is only 50 cents more per day on average, depending on how much you pay per kwh. But in a year, this would be around $185 dollars
08:31:49 <pZombie> in two years, $365 - the cost of a laptop
08:32:03 <aalm> laptops suck anyway
08:32:56 <lkurusa> $365 == cost of a laptop?
08:33:00 <lkurusa> huh? in what world
08:33:11 <aalm> 4th
08:33:12 <lkurusa> typing this on a $2500 ssh client
08:33:19 <aalm> lol
08:33:20 <pZombie> well, you can build a server with $365 i guess. In the case of having a q6600 machine, you would have to just replace the motherboard+ram+cpu
08:33:33 <glauxosdever> I got mine for 600€, without Windows preinstalled
08:33:44 <lkurusa> sure enough that's more reasonable
08:34:05 <glauxosdever> That makes it about 650$ I think?
08:34:17 <pZombie> well, you can get one of the newest intel NUCs with an 8th generation CPU + iris plus 655 graphics for $280
08:34:29 <lkurusa> that's not a laptop
08:34:44 <pZombie> well, there are even cheaper laptops than $365
08:34:52 <lkurusa> used ones?
08:34:58 <pZombie> nah, new ones
08:35:01 <lkurusa> for real?
08:35:04 <lkurusa> link me one please
08:35:54 <lkurusa> i highly doubt i can work on that machine effectively so the electricity savings of $365 over two years are not worth my time or loss in efficiency, imo
08:36:23 <pZombie> https://geizhals.de/lenovo-v130-15ikb-81hn00jage-a1840624.html a quick look, this one is €399 with 8gb ram and 256gb ssd. I can find cheaper ones for sure
08:36:41 <pZombie> good i5-7200u cpu too
08:37:33 <glauxosdever> Well, I think there are some really cheap ones. But I wouldn't assume they are actually useful (unless you ditch the preinstalled software and only use a basic desktop environment -- forget about doing complicated work with blender, gimp or audacity)
08:37:44 <lkurusa> ^
08:37:51 <lkurusa> exactly
08:37:58 <virtx> mrvn: yes, but it is possible to have an higher level read-only and a lower w/r?
08:38:56 <pZombie> https://www.newegg.com/Product/Product.aspx?Item=N82E16834316607 8th generation 8250u cpu. Refurbished ok, but who cares?
08:39:32 <lkurusa> isn't U an underclocked mess?
08:39:32 <glauxosdever> lol, that first one says the OS is FreeDOS, yet Windows 10 is shown
08:40:31 <glauxosdever> Well, the second one says 1.60 GHz
08:40:34 <pZombie> 8250u has a single core performance of 141 in cinebench r15 and multi 568.9 https://www.notebookcheck.net/Intel-Core-i5-8250U-SoC-Benchmarks-and-Specs.242172.0.html
08:40:50 <pZombie> it has the same single core performance as my 5960x
08:41:19 <pZombie> and about half the multicore, even though it has only 2 cores vs 8 mine
08:41:45 <pZombie> 1.6ghz is only the base frequency
08:41:57 <pZombie> 3.4ghz with turbo boost
08:42:41 <pZombie> what's not so good about the 2nd laptop i linked is that it uses only one 4gb module, which means they castrated it
08:42:50 <glauxosdever> ^ That will use more electricity.. ;-)
08:42:54 <pZombie> no dual memory bandwidth
08:43:10 <glauxosdever> Turbo boost, that is
08:44:05 <glauxosdever> Anyway, why do you care that much about the electricity bills?
08:45:14 <pZombie> i am just pointing out that it does not make sense to use old hardware, when it would cost you the money of a more modern system in electric bills within the course of 2-3 years
08:45:26 <pZombie> and after that you would be stuck with an old system still
08:46:30 <pZombie> this laptop i linked will have more than twice the performance a q6600 and consume less than 1/4 of the energy while doing so
08:46:39 <pZombie> q6600 has*
08:46:46 <glauxosdever> You are probably overestimating the energy consumption of your machine
08:47:15 <pZombie> no i am not, because i am not estimating. I am going by data available
08:47:56 <ArsenArsen> where are AH=42h INT 12h error codes defined (those stored in AH if CF is set)
08:50:03 <glauxosdever> Did you mean: INT 13h?
08:50:08 <ArsenArsen> I do, excuse my typo
08:50:48 <glauxosdever> http://www.ctyme.com/intr/rb-0606.htm#Table234
08:51:49 <ArsenArsen> 0Ch unsupported track or invalid media - yeah that aligns with the test error I produced
08:51:51 <ArsenArsen> thanks!
08:52:02 <glauxosdever> np :-)
08:56:03 <pZombie> Suffice to say that i am getting annoyed at informing people about their bad decisions and being told to fuck off
08:57:06 <aalm> .roa
08:57:37 <lkurusa> it broke again, huh
08:57:39 <aalm> glenda<3r.i.p.
08:57:48 <lkurusa> 2018 - 2018
09:00:53 <aalm> i wonder if mischief is just saving up on the electricity bill
09:01:16 <lkurusa> holy shit i'm laughing so hard
09:01:30 <mischief> i just dont have it autoreconnecting on the shell
09:01:35 <mischief> .theo
09:01:35 <glenda> No way.
09:01:39 <lkurusa> .ken
09:01:39 <glenda> A well installed microcode bug will be almost impossible to detect.
09:01:43 <aalm> .roa
09:01:44 <glenda> 113 Always have sex with the boss.
09:01:50 <lkurusa> oookay
09:01:54 <aalm> -_-
09:07:00 <geist> hrm, the pdp-11 thing wedged up
09:50:31 <geist> huh, was just thinking. in the old days when linux (or whatever) would emulate x87 fpu state for you in kernel space
09:50:42 <geist> i guess i had to keep a copy of the virtual fpu regs in kernel space
09:51:05 <geist> since it's having to emulate the entire state, not just the lack of instructions
09:52:00 <dormito> geist: was it actually done in kernel space (and not say, as part of a signal)? (I definetly was not using linux at the time)
09:52:19 <geist> i think it could provide emulation for you, yes
09:52:25 <geist> not that it'd necessarily be a good idea
09:54:25 <dormito> IHMO, and in retrospect, I think basically everything about the x87 was terrible
09:54:35 <geist> yeah pretty much
09:57:17 <geist> https://lwn.net/Articles/728382/ last comment is interesting: older pre-sse machines dont necessarirly see sse instructions as illegal
09:57:38 <geist> never occurred to me that that would happen, but i guess they recycled the rep prefix for sse
10:04:07 <geist> ah i see. so sse instructions are pretty much all [0x66, 0xf2, 0xf3] 0x0f ....
10:04:17 <geist> where 0x66, 0xf3, 0xf3 are all old prefixes
10:04:40 <geist> basically it's those prefixes against the 0x0f opcode which was the legacy 2 byte opcode escape byte
10:04:49 <geist> gross!
10:05:55 <geist> and f2 and f3 are repne/rep
10:07:08 <geist> looks like all the VEX stuff starts off with a 0x67 (address size override) prefix and then escape out from there
10:18:26 <mrvn> geist: I guess you have to check the cpu for being new enough before using SSE opcodes.
10:23:44 <geist> yah
10:25:56 <mrvn> how likely are you to find such an old system still in working order able to execut modern kernel+libc+software? You probably don't even have enough ram to boot.
10:27:33 <mrvn> well, I'm off with 1kg Lasange and an episode of DrWho.
10:29:19 <geist> godspeed!
10:29:40 <lkurusa> fun evening
10:29:41 <lkurusa> enjoy!
11:26:28 <geist> well, that's interesting. so even on a 386 a manual context switch (between supervisor mode threads) is still quite a bit faster than a tss
11:26:50 <geist> ie, 20500 switches/sec vs 12345 (on this particularly slowass 386sx
11:26:53 <geist> )
11:27:24 <geist> i guess i shouldn't be surprised. and the tss switch can still be technically a bit more useful since it can do a direct ring transfer
11:29:45 <geist> and it's substantially faster on qemu emulation, which also makes sense. the tss emulation i guess just takes a lot of logic
11:54:26 <eryjus> poll for those who have been there: I am happy with my v0.1 of my microkernel for now and ultimately my plan is to support other architectures -- would you recommend going wide and add an architecture first, or deep and build out other fucntionality first?
11:54:50 <geist> probably adding at least one more is useful
11:55:09 <geist> it tends to point out where your architecture abstractions are weak
11:55:38 <eryjus> I'm certain there is a ton of opportunity there....