Search logs: #osdev2 - 12 May 2022

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=22&m=5&d=12

Thursday, 12 May 2022

00:23:00 <geist> well been looking at it
00:23:00 <geist> it's not running LK :( but it seems to be riscv
00:23:00 <geist> gsp.bin is a straight up riscv64 ELF file, though stripped
00:24:00 <geist> looks like the core doesn't support compressed instructions FWIW
00:24:00 <klange> imagine being a position where you can be disappointed it's not running _your_ thing ;)
00:24:00 <geist> i do know that nvidia does like to generally use LK, but seems to be mostly in the Tegra department
00:31:00 <heat> from what I've read in #dri-devel they were speculating that the newer archs got open-sourced because they have riscv cores
00:32:00 <heat> which make them able to do a lot more stuff
00:32:00 <heat> (possibly, the stuff they don't want to open source)
00:33:00 <geist> yah that's what graphitemaster was saying. you can move the secret bits into riscv and now the kernel driver may be sending more high level stuff to it
00:33:00 <geist> it's not ideal, but it does mean you can at least use say iommus or whatnot to keep it from doing damage
00:33:00 <clever> that kinda sounds like how the old broadcom GL stack worked on the rpi
00:33:00 <geist> since at least that part of it that's interacting with the kernel is known
00:33:00 <heat> unfortunately this driver really can't be open sourced in its current state
00:34:00 <clever> where the opengl library was just an RPC shim, forwarding nearly every api call to the firmware blob
00:34:00 <Griwes> yes, a non zero chunk of the binary blob kernel driver has been offloaded to the on-board cpu, and that's a reason why all this is feasible in the first place
00:35:00 <Griwes> and arguably a lot of what has been offloaded should've been in firmware in the first place :p
00:35:00 <heat> the secret(tm) bits
00:35:00 <Griwes> vOv call it what you want :P
00:36:00 <Griwes> there's always secret bits in almost any device you plug into your computer
00:36:00 <heat> yup
00:36:00 <heat> i honestly don't get why they're secret in the first place
00:36:00 <heat> maybe I'm just too naive
00:37:00 <geist> i am pleased to see that the firmware is not encrypted or otherwise that hidden
00:37:00 <geist> someone can easily decompile it and try to figure out what its doing
00:38:00 <heat> clever, btw making direct opengl calls to a device sounds horrible
00:39:00 <Griwes> eh, it's a piece of the card. that's the correct way to look at it. but I know that some people will make a crime out of not open-sourcing every goddamned bit of the product
00:39:00 <clever> heat: its got kinda lower latency, due to the firmware core being on the same silicon as the arm core
00:40:00 <heat> Griwes, my way to look at it: it's just some firmware, is it really that critical to the IP and the card's performance?
00:40:00 <Griwes> yes
00:40:00 <Griwes> yes, it is
00:40:00 <clever> in the case of the broadcom gl stack on the pi, it contains the GLSL->QPU compiler
00:40:00 <heat> ok I stand corrected then
00:41:00 <clever> so there could be secret sauce in its optimizer
00:41:00 <clever> where you can make the shaders run better
00:41:00 <heat> I'm mostly more picky with the machine's actual firmware, I don't really mind device firmware
00:42:00 <clever> with a functional iommu, you can firewall off the device
00:42:00 <Griwes> I believe that all of our compiler bits are in userspace driver libs
00:42:00 <heat> like, memory training isn't rocket science, and so isn't an x86 board
00:42:00 <heat> intel figured it out, amd did as well, so do other smaller companies
00:42:00 <clever> Griwes: the rpi's mesa gl stack moved shader and CL generation into userland
00:43:00 <heat> shader compilation in userspace is the only way to do it tbh
00:43:00 <heat> sane way
00:43:00 <clever> the kernel driver just audits the CL's to ensure they are valid, and replaces opaque tokens with physical addresses
00:43:00 <Griwes> anyway, my main gripe is with people who somehow are fine with proprietary silicon, but not with proprietary bits of software on that same silicon?
00:43:00 <clever> so you opengl cant abuse the 3d driver as an arbitrary physical memory write primitive
00:44:00 <Griwes> and like, it's only in software because it was easier to do it that way than doing it in hw
00:44:00 <Griwes> it's still an integral part of what's going on lol
00:44:00 <clever> Griwes: my view on that, is that silicon cant change, so its less likely to be manipulated by an evil maid and turned against you
00:45:00 <Griwes> but it could've been evil from the start
00:45:00 <clever> where-as firmware blobs, could be doing almost anything, and change what they are doing when things get updated
00:45:00 <Griwes> also "it can't change" is a lie
00:45:00 <clever> yes, you do still need to audit it some
00:45:00 <clever> i mean more like an upgrade in the future, containing a payload that specifically targets you
00:45:00 <Griwes> there's bits you can flip and fuses you can burn through to change the hardware
00:45:00 <clever> yep
00:46:00 <clever> but can you change a fuse so it will catch fire when the string "Griwes" is discovered in ram?
00:46:00 <Griwes> so I really don't get why there's this perceived fundamental difference
00:46:00 <clever> with a firmware update, you could
00:46:00 <clever> and with closed-source firmware, nobody is going to know
00:47:00 <clever> but with open source firmware, the PR is going to get rejected, and never make it to any end-user
00:47:00 <Griwes> there could be a hidden piece of hardware that could be programmable with a string
00:48:00 <Griwes> "pr is going to get rejected" assumes that noone is going to be fed a malicious load that's not a direct build from the OSS source
00:48:00 <clever> yeah, thats where other security comes in, either getting builds from trusted source or building yourself
00:49:00 <clever> as for hidden hw, i would assume that a skilled user with a microscope can identify flash memory cells
00:49:00 <clever> and fuses seem to be in tight supply
00:49:00 <Griwes> you could have programmable "memory" with a series of fuses
00:50:00 <clever> ] otp_dump_all
00:50:00 <clever> full otp dump
00:50:00 <clever> 00:00000000
00:50:00 <clever> ...
00:50:00 <Griwes> there's ways to obfuscate
00:50:00 <clever> 66:00000000
00:50:00 <heat_> do you have an example of closed source firmware being programmed in bad faith?
00:50:00 <clever> the rpi hardware has 67 OTP slots, of 32 bits each
00:50:00 <clever> Griwes: so thats a total of 2144 bits worth of fuses
00:50:00 <clever> heat: i cant think of any off the top of my head
00:50:00 <Griwes> now compare the number of transistors in an rpi and in an H100 :P
00:51:00 <Griwes> anyway a vendor publishing a malicious firmware load is a suicide
00:51:00 <heat> clever, exactly
00:51:00 <Griwes> and if it's a third party, it's arguably harder to create a malicious load when it's not open source
00:51:00 <clever> Griwes: i would also assume that there is no erasable flash in any rpi silicon, given the lack of use and the need for an spi flash chip
00:51:00 <heat> i'm not really worried about security
00:52:00 <clever> Griwes: what if they are forced by court order to reveal the secrets to a TLA?
00:52:00 <clever> or other govt agency
00:52:00 <Griwes> to a three-letter acronym?
00:52:00 <heat> three letter agency
00:52:00 <clever> yeah
00:52:00 <heat> CIA, NSA, FBI, etc
00:52:00 <clever> yep
00:52:00 <clever> they could force the secrets out of a company, then make their own malicious firmware
00:53:00 <clever> and mitm it into your updates
00:53:00 <heat> vote?
00:53:00 <clever> ?
00:54:00 <Griwes> how is that different from them obtaining a court order to get themselves mitm'd into your trusted provider's update channel?
00:54:00 <heat> if you disagree with the people that made those laws and gave all that power to those agencies, vote
00:54:00 <heat> nothing is actually going to stop them yeah
00:54:00 <Griwes> I beg you to not use that argument or I will become agitated
00:54:00 <Griwes> (I've seen too many 'just vote' takes on twitter over the past two weeks and I'm already angry at all those people)
00:54:00 <clever> Griwes: foss is more likely to use gpg ontop of https, and throw a bigger fit when somebody tried to force them to hand over the gpg keys?
00:54:00 <clever> i would assume
00:55:00 <Griwes> under a gag order?
00:55:00 <clever> heat: what if i'm in the wrong country, say canada
00:55:00 <clever> and the gpu i'm using was made by an american company
00:55:00 <clever> i cant vote on what america is doing with those gpu secrets
00:55:00 <Griwes> anyway I don't think there's a fundamental difference here
00:56:00 <Griwes> if you want to be sure there's no backdoors, you need to make all of your hardware yourself
00:56:00 <clever> Griwes: the other major factor, seperate from all of that mess, is customization
00:56:00 <gog> brb building my own fab
00:56:00 <clever> as an example, if you use the official rpi firmware, you dont have any control over what media the system is booting from
00:56:00 * Griwes hands heat a znc
00:57:00 <clever> so there is no way to just jam a copy of tianocore onto a bigger SPI chip, and make the system fully efi compatible
00:57:00 <heat> my router is crap
00:57:00 <clever> you must have a start4.elf on one of: sd/usb/nvme/tftp
00:57:00 <clever> start4.elf on spi isnt a valid option
00:57:00 <heat> anyway, as I said, I didn't want to agitate anyone :)
00:57:00 <clever> heat: 2022-05-11 21:56:22 < clever> Griwes: the other major factor, seperate from all of that mess, is customization
00:57:00 <clever> (you missed that line)
00:58:00 <heat> i personally don't mind device firmware
00:58:00 <Griwes> Customization has its own host of problems
00:58:00 <heat> I do mind machine firmware
00:58:00 <Griwes> What if I get a device with fully customizable firmware, replace it with a malicious load, and resell the device?
00:58:00 <clever> and the rpi blurs the lines, because the firmware is required both to boot, and provides runtime services
00:59:00 <clever> Griwes: in the case of the rpi, you can just boot an SD card with recovery.bin to re-flash the firmware, and its been clensed
00:59:00 <heat> the rpi is crap in a cheap pcb
00:59:00 <heat> the way you describe it, it makes me wonder at how that thing even boots
00:59:00 <clever> heat: what makes it crap? the specs? the lack of docs? the firmware?
00:59:00 <heat> yes
00:59:00 <heat> :)
01:00:00 <Griwes> How does a person who just wants to game and who just buys a gpu ensure they don't have malicious firmware on it?
01:00:00 <heat> sign it
01:00:00 <heat> and trust the vendor
01:00:00 <clever> Griwes: thats a bit harder, there isnt a clear way to re-flash the firmware without first booting your system with it
01:01:00 <clever> in the case of the rpi4(00), the 1st stage is signed by both an hmac-sha1 key and an rsa keypair
01:01:00 <Griwes> Heat: I'm specifically talking about someone buying a second hand device
01:01:00 <clever> but the hmac-sha1 is weak (i already know the key)
01:01:00 <Griwes> Most people aren't savvy enough to run some magic commands to verify the firmware isn't malicious if you allow customization
01:01:00 <clever> and the rsa isnt enabled by default
01:01:00 <heat> cryptographically signed firmware yeah, make the GPU refuse to load any non signed firmware
01:01:00 <gamozo> Until code isn't riddled with 10 billion security bugs, firmware being open kinda doesn't really matter for security. It's gonna have exploitable bugs kinda either way
01:02:00 <clever> Griwes: that reminds me, ive seen a LTT video, where they bought a "dead" gpu off ebay, and i think it just had crypto mining firmware, which lacks video output
01:02:00 <heat> do I want to customise my nvidia gpu's firmware? no, what's the point?
01:02:00 <clever> they re-flashed it with the stock firmware, and boom, working gpu
01:02:00 <Griwes> Yeah I think we're aligned on this, heat
01:02:00 <heat> yup
01:02:00 <clever> and signed firmware leads to new issues
01:02:00 <Griwes> clever did bring the customization point up though :P
01:03:00 <clever> if the firmware is signed, by which keys? how do you customize it then?
01:03:00 <Griwes> You don't
01:03:00 <Griwes> You either have strong validation from the vendor, or you have customization
01:03:00 <Griwes> I really don't see how you can have both
01:04:00 <clever> exactly
01:04:00 <clever> the rpi4 kinda has both
01:04:00 <gamozo> That's always a hard problem I've thought of in my head. What's the best way to control root keys on something like that, and I don't think there is one
01:04:00 <heat> if you want the firmware to be open, you could have reproducible builds + signed
01:04:00 <clever> Griwes: the 1st stage can be verified by a broadcom rsa keypair, it will then enforce that the 2nd stage matches a customer generated rsa keypair
01:04:00 <clever> and the sha256 of that customer pubkey is held in OTP
01:05:00 <clever> but its an optional thing, only recommended for industrial use on the CM4
01:05:00 <clever> the only major complaint, is that broadcom holds the keys to the 1st stage
01:05:00 <Griwes> Rpi *is* a very special case thing: a dev board / specialized device. In both those uses it doesn't really matter that much because people handling it know what they are doing to an extend
01:06:00 <clever> so broadcom or a TLA could still modify it maliciously
01:06:00 <Griwes> A gpu tho? Thing that gets resold to non tech savvy people?
01:06:00 <clever> yeah
01:06:00 <clever> once you enable CM4 secure-boot, its permanently on
01:06:00 <clever> and you must know the private key to modify /boot
01:06:00 <clever> if you loose it, its bricked
01:11:00 <heat> anyway, point being that h i r e m e n v i d i a
01:11:00 <clever> lol
01:11:00 <heat> thank u, i like program kernel very much
01:11:00 <heat> it make gpu go vruuuuuuuuuuuum
01:12:00 <clever> before: https://ext.earthtools.ca/private/rpi/standard-gfxconsole.mp4 after: https://ext.earthtools.ca/private/rpi/faster-console-1.mp4
01:12:00 <clever> heat: this is what ive doen most recently, with my gpu stuff
01:13:00 <heat> nice
01:13:00 <heat> gpu acceled?
01:13:00 <clever> yeah
01:13:00 <clever> previously, it just used a copyrect function to scroll
01:13:00 <clever> now it just treats a single bitmap like a ringbuffer
01:13:00 <clever> and then tells the gpu to just display 2 crops of that bitmap, with different y offsets
01:14:00 <clever> so the new design, means you never have to copy image data upon scrolling
01:15:00 <clever> the only time image data is ever manipulated, is when your drawing a char into the bitmap
01:15:00 <clever> plus some fixed math to manipulate the width/height/offsets
01:16:00 <clever> oh, and the clear-line function
01:17:00 <clever> which blanks an entire line out with bg color
01:17:00 <clever> but its never reading the bitmap, which is probably a major cost
01:19:00 <clever> so the performance no longer depends on the height at all
01:20:00 <clever> and the width is far less of a factor
01:20:00 <clever> heat: https://www.youtube.com/watch?v=l7lIewA9fm4 is an entirely different form of gpu accel ive done recently
01:20:00 <bslsk05> 'vpu accelerated mandelbrot, final version' by michael bishop (00:00:18)
01:21:00 <clever> this time, vector opcodes to compute mandelbrot in parrallel
01:21:00 <heat> nice
01:21:00 <heat> i dont do 3d acceleration yet
01:21:00 <heat> nor do I know how I'm going to approach the thing
01:23:00 <clever> neither of those examples are using the 3d core either
01:23:00 <clever> https://www.youtube.com/watch?v=GHDh9RYg6WI this one is using the 3d core, with just a simple fragment shader and no vertex shaders
01:23:00 <bslsk05> '2d and 3d demo' by michael bishop (00:00:21)
01:24:00 <clever> i also have a rough theory on how to use a fragment shader for mandelbrot, but havent tried it
01:25:00 <heat> 1) port mesa?; 2) write vulkan drivers and maybe steal some stuff from fuchsia? Then possibly use mesa's zink on top of them
01:26:00 <clever> porting mesa is a rather bit task
01:26:00 <heat> swiftshader is also a possible solution for fast-ish software vulkan if I want to avoid mesa
01:26:00 <clever> ive got a cheat-code, i can use an x86 mesa to compile glsl to qpu-asm
01:26:00 <clever> and then .incbin that asm into my code
01:26:00 <heat> porting mesa would require me to port DRM and then integrate it into my build system
01:27:00 <clever> in my case, i'm more after fixed-function programs
01:27:00 <clever> rather then any opengl program
01:27:00 <clever> so i can pre-compile the shaders on another box
01:27:00 <heat> virtio-gpu doesn't have venus upstream just yet so there's technically no vulkan support
01:28:00 <heat> theoretically I could add opengl for that as I've heard it's not too hard
01:29:00 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c
01:29:00 <bslsk05> github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
01:29:00 <clever> this 600 line file, is the complete driver behind that triangle example from earlier
01:30:00 <clever> lines 143-153 is the fragment shader
01:30:00 <clever> lines 349-382 is a vertex array
01:31:00 <clever> lines 409-411 creates a triangle from 3 indexes into the vertex array
01:31:00 <heat> yup
01:31:00 <clever> the rest you can leave as-is for the most part
01:31:00 <heat> i'm considering that rewriting mesa would be too hard
01:31:00 <heat> like mesa can be staggeringly quick
01:32:00 <clever> this example lacks vertex shading
01:32:00 <heat> when radv (the radeon vulkan driver) got started it was beating the AMD proprietary vulkan driver by quite a lot
01:32:00 <clever> so you need to vertex shade yourself first
01:32:00 <heat> i think it still does
01:34:00 <heat> upstreamed mesa nvidia vulkan driver though 👀
01:35:00 <heat> if I got my OS to run a 3D program on an nvidia GPU I would stan them forever
01:35:00 <heat> or a vidya game
01:35:00 <heat> but that's probably wishful thinking
01:36:00 <heat> i would need a linux compat layer at least
01:36:00 * heat wonders how kernel-demanding a video game is
01:37:00 <clever> that reminds me, one of my side-projects is porting the openlara
01:37:00 <clever> https://github.com/XProger/OpenLara
01:37:00 <bslsk05> XProger/OpenLara - Classic Tomb Raider open-source engine (338 forks/3842 stargazers/BSD-2-Clause)
01:38:00 <heat> yup i've heard of it
01:40:00 <klange> i should port mesa again
01:42:00 <heat> if you fully port mesa you could get to use the new intel gpus in toaruos
01:42:00 <heat> think of the possibilities :O
02:06:00 <heat> the module just got its first contribution from someone called "bigswag420"
02:06:00 <heat> their bio is "sussy balls"
02:07:00 <heat> also a pull request from "poopbarrel"
02:09:00 <heat> poopbarrel submitted a pull request that deletes the whole repo
02:10:00 <heat> i feel sorry for nvidia
02:10:00 <heat> and above all, andy ritger which is literally responding to every pull request and issue
02:11:00 <heat> man's going to have nightmares with "fix typo" pull requests
03:21:00 <moon-child> https://godbolt.org/z/8xKPx1EPo I think gcc is wrong to use rep movsq here, and clang is wrong to use memcpy
03:21:00 <bslsk05> godbolt.org: Compiler Explorer
03:21:00 <moon-child> because they have no guarantee that the pointers do not overlap
03:21:00 <moon-child> fwiw tcc uses memmove
03:22:00 <moon-child> am I missing something?
03:26:00 <Mutabah> might be strict aliasing?
03:26:00 <Mutabah> not sure
03:26:00 <moon-child> an S is always allowed to alias an S though...
03:26:00 <Mutabah> (The compiler might be assuming that the two either perfecetly overlap, or don't overlap at all)
03:27:00 <moon-child> oh, so then gcc is ok but clang is wrong?
03:27:00 <moon-child> interesting
03:41:00 <heat> clang is also right
03:42:00 <moon-child> why?
03:42:00 <heat> if two objects perfectly overlap, the memcpy will = rep movsq
03:43:00 <moon-child> which memcpy?
03:43:00 <moon-child> c spec sez: 'If copying takes place between objects that overlap, the behavior is undefined.'
03:43:00 <heat> the one it's doing
03:43:00 <heat> spec stupid
03:44:00 <heat> none of this makes sense anyway because the alignment of S is 1
03:44:00 <heat> the two objects can overlap as far as I can see
03:45:00 <moon-child> 'spec stupid' so? Doesn't mean you get to ignore it unilaterally; if you're in llvm's position, you have a contract with functions like memcpy, and that contract says you don't get to give them overlapping objects
03:46:00 <klange> "overlap" is distinct from "are the same".
03:46:00 <heat> making both gcc and clang wrong
03:46:00 <heat> yes
03:46:00 <heat> there may be something we're missing
03:46:00 <klange> Which is an obtuse pedanticism, but important for the spec.
03:46:00 <moon-child> klange: if they're the same, then they overlap completely. And spec literally specifies dest/src as 'restrict', which means they can't be the same
03:46:00 <klange> And for once, the reason for non-equivalent overlapping things to be undefined is one that is obvious for everyone who has had a memcpy vs. memmove fuckup
03:47:00 <moon-child> sure, I know how it can break in practice. That doesn't mean I think it's ok to break the contract just because it'll probably work fine
03:48:00 <graphitemaster> <geist> i am pleased to see that the firmware is not encrypted or otherwise that hidden
03:48:00 <graphitemaster> geist, should be noted that the security coprocessor stuff is separate from those cpus and that stuff is locked down hard
03:48:00 <graphitemaster> plus the risc-v isa nv has is non standard so you need to reverse engineer instructions too :P
03:48:00 <geist> makes sense. the riscv perhaps doesn't have much more ability beyond what was already done on the host
03:49:00 <geist> possibly. i ran it through the dissassembler but i should run it again and see what instructions it couldn't decode
03:49:00 <geist> but it's probably just assist stuff, so not likelky to be too difficult
03:49:00 <moon-child> heat: re 'alignment of 1' I don't think it's legal for two S to alias each other, regardless of alignment
03:49:00 <moon-child> though -fno-strict-aliasing doesn't change gcc's position...
03:51:00 <graphitemaster> geist, some people have already tried disassembling the firmware and found some instruction encodings are disassembling to the wrong instructions but still decoding indicating nv switched the encoding around on some instructions and added instructions that share the same encoding as standard riscv ones
03:51:00 <graphitemaster> little shuffled around *shrug*
03:52:00 <graphitemaster> could just be bad decoding too though, no one knows what is different about it from standard riscv
03:52:00 <graphitemaster> except nv of course
03:53:00 <geist> yah though really smart people could almost certainly figure it out pretty easily. much more complicated reverse engineering of ISAs have been done
03:53:00 <geist> alas, its not my wheelhouse. probably shouldn't due to work, etc etc
03:54:00 <graphitemaster> yeah, plus there's the vbios on the gpu that you don't have access to either, since updates only ever patch that
03:54:00 <heat> don't worry i hope you can still submit typo patches to the nvidia driver
03:54:00 <heat> that's the only patches you'll ever need
03:54:00 <graphitemaster> dunno if anyone has ever dumped that yet
03:54:00 <heat> those are*
03:55:00 <graphitemaster> the nouveau people have been talking about a rewrite all day
03:56:00 <graphitemaster> a new nouveau which uses the firmware for new cards and then there's an open source vulkan driver for nv just around the corner too
03:57:00 <graphitemaster> then gl via mesa on zink targeting vulkan as the driver
03:57:00 <graphitemaster> there's your new nv open sores linux graphics stack
03:57:00 <heat> they could port the kernel driver
03:57:00 <heat> probably easier than rewriting 1 million LOC
03:58:00 <graphitemaster> most of kernel driver code is shared by all the drivers
03:58:00 <graphitemaster> the actual nouveau part is quite small
04:03:00 <heat> how many drivers are there and why doesn't nouveau cover all of that?
04:05:00 <graphitemaster> I mean most graphics drivers on Linux work ontop of DRM and KMS (on the kernel side), with most of everything else in userspace handled by DRM (for talking to the kernel side) and then the whole API side handled by gallium 3d + DRI part of mesa
04:06:00 <graphitemaster> The actual device driver for the hardware in the kernel is very small comparatively speaking.
04:06:00 <heat> well yes but the nvidia one does too
04:06:00 <graphitemaster> It's nowhere near 1m lines of code, NV's KMD is though because it has to implement all that crap that already exists
04:07:00 <graphitemaster> nouveau doesn't though
04:07:00 <graphitemaster> So that's why they're discussing a rewrite
04:07:00 <graphitemaster> They want to use all the stuff that exists already in the ecosystem
04:07:00 <graphitemaster> Not stuff NV provides
04:08:00 <graphitemaster> To put things in perspective, something like this is probably closer to ~100k lines of C
04:08:00 <graphitemaster> Maybe even less
05:29:00 <sikkiladho> what's the difference between Gathering and "merge access" in AArch64 Memory Model?
05:30:00 <klange> merge is a verb; the CPU can determine that code is making multiple accesses and turn them into one access - it can _merge_ them
05:31:00 <klange> Gathering is the name of the attribute that specifies whether that is allowed
05:34:00 <sikkiladho> Thank you! So that is the same thing basically.
05:36:00 <klange> I suppose what I mean to say is the difference is linguistic; they are are different aspects of one piece of functionality - a bit and an action :)
05:45:00 <klys> ]
10:28:00 <mrvn> How do I multiply 2 uint16_t in a protable way? In a*b integer promotion first turns both into int (32/64bit systems) or unsigned int (16bit systems). So on 32/64bit systems it can then overflow a signed integer which is UB.
10:28:00 <mjg> just -fwrapv man
10:29:00 <mrvn> that isn't portable
10:30:00 <mjg> it's an artificial problem imo
10:31:00 <kingoffrance> hmmm, someone asked something liek that a week or so ago. i just have a bunch of 2unit library/functions. that doesnt solve anything, just "big int" code uses that, so theoretically just one single place to fix anyhow
10:34:00 <kingoffrance> i.e. the result for that i call a "2unit" (because i just use char/short/etc. not the 16_t etc.)
10:35:00 <kingoffrance> or, 65535*65535 -> fits in 32 bits
10:41:00 <mrvn> kingoffrance: but not int
10:44:00 <kingoffrance> yes, i believe i just avoided long long as "base unit" for "big int" stuff because multiply and maybe divide wants something like that
10:46:00 <mrvn> For big int you want to cast to the next bigger unsigned type instead of having it overflow. So kind of different thing.
10:47:00 * mrvn still wants a std::more_bits<T>::value_type
11:03:00 <moon-child> mrvn: you are looking for something portable? uint16_t isn't portable
11:05:00 <mrvn> it might not exist but then the code won't compile. That's ok. I just don't want UB.
11:06:00 <mrvn> + I want to rant about the integer promotion being stupid :)
11:06:00 <moon-child> cast to uint32_t then?
11:10:00 <mrvn> int might be 64bit
11:14:00 <moon-child> what if it is?
11:14:00 <moon-child> you won't overflow it
11:16:00 <zid> yea this is all a non-issue, if you have uint16_t you have uint32_t
11:16:00 <moon-child> .oO( 17-bit ptrdiff_t? )
11:16:00 <moon-child> but seriously just -fwrapv
11:18:00 <zid> technically you could use uint_least_32_t
11:18:00 <zid> To support a weird machine that has uint16_t and uint64_t but no uint32_t
11:18:00 <zid> cast one side, if it compiles, then uint16_t * uint16_t is safe on that cpu
11:39:00 <mrvn> moon-child: it int is 64bit you have the same problem for uint32_t as starting point.
11:40:00 <moon-child> I thought you want to multiply uint16_ts
11:40:00 <moon-child> not uint32_ts
11:40:00 <mrvn> it's just an example
11:41:00 <moon-child> you can do conservative overflow check with clz
11:41:00 <mrvn> lets make it generic: I want to multiply a tempate<typename T>
11:42:00 <moon-child> log(x) + log(y) = log(x*y). So ceil(log(x)) + ceil(log(y)) > log(x*y). So check if ceil(log(x)) + ceil(log(y)) > 32
11:44:00 <moon-child> hmm there's a uintmax_t right? Just cast to for all your intermediates
11:44:00 <zid> I already told you the portable way to do it
11:44:00 <zid> So just do exactly what I say, all of the time, imo
11:50:00 <mrvn> zid: template<typename T> T mul(T a, T b) { return ???; }
11:50:00 <zid> Not what was asked unfortunately
11:51:00 <mrvn> zid: I'm asking now
11:51:00 <zid> I don't actually know C++
11:51:00 <psykose> just C+?
11:51:00 <zid> Just C
11:52:00 <mrvn> template<typename T> T mul(T a, T b) { return (unsigned decltype(a*b))a * b; } works, right?
11:53:00 <mrvn> s/mul/umul/
13:21:00 <j`ey> I had '.got.plt : { *(.got.plt) }' in my linkerscript, I then deleted it and put '*(.got.plt)' in my .rodata section and I get 'error: undefined section `.got.plt' referenced in expression', weird?
13:24:00 <Mutabah> does it point at the newly added line?
13:24:00 <Mutabah> if so, kinda weird
13:25:00 <j`ey> ohh good shout, it doesn't indeed!
13:25:00 <j`ey> Mutabah: thanks
13:26:00 <Mutabah> quack
13:26:00 <j`ey> :-)
13:27:00 <gog> meow
13:39:00 * vdamewood gives gog a fishy
13:39:00 * gog eats fishy
13:40:00 * sbalmos dangles yarn over gog
13:40:00 * gog knits yarn
13:40:00 <sbalmos> talented cat
13:51:00 <vdamewood> Kitty go prrr
14:10:00 <mrvn> *wuff wuff* Kitty go up the tree.
14:51:00 <heat> computer go
14:51:00 <heat> beep boop
14:52:00 <vdamewood> No pet computer.
14:52:00 <gog> what about the commodore PET?
14:53:00 <heat> computer sad because vdamewood no pet for beep boop
15:11:00 <mrvn> heat: https://www.youtube.com/watch?v=XPT6PIFGK94
15:11:00 <bslsk05> 'Computerliebe (Die Module spielen verrückt)' by Paso Doble - Topic (00:03:39)
18:42:00 <gamozo> Mornin everyone!
18:44:00 <GeDaMo> Hi gamozo :)
18:45:00 <mrvn> Why does std::byte alias but redefining byte like https://en.cppreference.com/w/cpp/types/byte says std::bytes should be suddenly doesn't alias anymore? https://godbolt.org/z/qTxqd9Koe
18:45:00 <bslsk05> en.cppreference.com: std::byte - cppreference.com
18:45:00 <bslsk05> godbolt.org: Compiler Explorer
18:48:00 <Griwes> The aliasing allowance isn't because of how it's defined in code
18:49:00 <Griwes> There's an explicit provision for std::byte specifically that says it can alias
18:49:00 <j`ey> yay special cases
18:52:00 <mrvn> how does the compiler detect that? type named "byte" in namespace "std" only?
18:52:00 <Griwes> Yes
18:53:00 <Griwes> it's conceivable for an implementation to have an attribute that it attaches to such a type too, though
18:54:00 <j`ey> that's how rust does things
18:56:00 <mrvn> And here I was thinking I should use std::byte* for every buffer so it doesn't trigger an aliasing with everything.
18:57:00 <mrvn> "using foo = std::byte" preserves the aliasing super powers.
18:57:00 <Griwes> yeah, because it looks specifically at the *linkage* name, which is canonical
18:58:00 <Griwes> if you look at libc++ for instance, you'll see that byte is directly in std, with a comment pointing out that it is purposefully not versioned
18:58:00 <mrvn> have to be carefull replicating that in my kernel
18:59:00 <mrvn> .oO(and here I was not having namespace std in the kernel)
19:02:00 <geist> yeah ugh. i was thinking of doing a using into my namespace for that purpose (if i was to go that route)
19:02:00 <geist> ie namespace lk { using byte = std::byte; } kinda stuff
19:02:00 <mrvn> namespace MyKernel { using byte = std::byte; }?
19:02:00 <mrvn> hehe
19:04:00 <mrvn> That's exactly why I tested if "using" preserves the super powers
19:10:00 <geist> thats dissapointing about the compiler hack
19:13:00 <mrvn> the std::byte hack?
19:15:00 <geist> yeah
19:39:00 <sbalmos> always have to be special rules somewhere. ;)
19:45:00 <mrvn> writing to byte sucks, you always have to cast first.
20:17:00 <heat> Griwes, wait the compiler looks at the linkage name in those situations?
20:17:00 <Griwes> By "linkage name" I don't mean "mangled name", if that's what you're thinking
20:17:00 <Griwes> Just the name that is used for linkage purposes
20:19:00 <Griwes> So foo in struct foo {}, bar in typedef struct {} bar (yes, this exact case is handled by assigning bar as the "canonical" name for the type, to match with what C does in this specific case)
20:26:00 <heat> ah right
20:27:00 <heat> i would've though you could use something like may_alias to make that possible
20:27:00 <heat> vs hardcoding the namespace
20:42:00 <zid> C has bits of the standard that say if the tag name matches between TUs the structs have to match.. but nobody checks
20:42:00 <zid> insert fun punning
20:54:00 <heat> funning
21:13:00 <gamozo> is that related to funrolling?
23:05:00 <zid> everybody talking about nvidia's news and as far as I can tell.. it's just a case of who does the mmap for the blob, to allow HPC machines to boot properly without the package manager
23:17:00 <geist> i thinkit's a bit more complicated than that but that's the gist of it in general
23:21:00 <klange> It is both not the "holy shit they actually did it" unprecedent move from Nvidia everyone wanted, but it's also not completely useless.
23:23:00 <Griwes> it's an approach that allows everyone to eventually have a fully featured kernel-level driver vOv binary firmware blobs are nothing new
23:24:00 <klange> for cards no one owns *shrug*
23:28:00 <Griwes> For cards where this is feasible.
23:29:00 <Griwes> Also saying noone owns Pascal+ is... very silly
23:29:00 <klange> Your own docs say it's Turing+.
23:30:00 <Griwes> Oh, is it turing+? I thought it went further back
23:30:00 <klange> (I have a Pascal card, so that difference matters to me.)
23:31:00 <Griwes> Okay that's slightly less silly but saying "noone owns them" is still somewhat silly
23:32:00 <klange> I _am_ being obtuse and hyperbolic, but it does seem like ownership by actual human beings is shockingly low for anything from the last five years.
23:34:00 * heat wonders how different the amdgpu driver is in terms of firmware/driver ratio
23:35:00 * Griwes wonders why he still has "pascal+" in his brain as the support chart for this
23:35:00 <klange> does Pascal have the same "fun stuff moved to firmware" or was that introduced in RTX/Turing?
23:35:00 <heat> because blaise pascal was a very cool individual
23:36:00 <klange> (does the Turing GTX 16-series chipset have that?)
23:36:00 <heat> i think so
23:36:00 <heat> for the second question that is
23:43:00 <Griwes> Yeah it seems that only Turing+ has the current gen of the on-card coprocessor; and I'm fairly certain it's an architectural feature, so whether you have rt cores or not doesn't matter (but don't quote me on this)