Miguel Ventura's blog : Dotnet core debugging in Apple Silicon (M1)

Apple is transitioning its processors' architecture from Intel (x86_64) to ARM (arm64e) for its line of laptop and desktop computers. The M1 is the first processor in this transition. Programs built for one architecture don't run on the other, but to make this transition smooth, Apple is using a few tricks it already used in past transitions. One of these tricks is Rosetta, which effectively acts as an emulator, allowing x86_64 applications to run in arm64e as if they were running in x86_64. The other trick is fat binaries, which Apple calls universal binaries. These are binaries that contain multiple copies of the code, one for each target architecture. When the time comes to run the program, the system selects the copy of the code that matches the architecture where it's running. The disadvantage is that the binaries get a lot larger, but it's great for distribution as users don't need to worry about downloading the right thing for the architecture they have.

During this transition phase, there are a few apps already using fat binaries, but a lot of apps are still running only on x86_64. At the time this text is being written, this is the case for dotnet core, and for all dotnet core based applications.

Dotnet runs well on arm64 architecture, and has been running on Raspberry Pi, smartphones, AWS Graviton, etc. But on the M1, being macOS, the runtime installation defaults to x86_64.

m1mac$ file  `which dotnet`
/usr/local/share/dotnet/dotnet: Mach-O 64-bit executable x86_64

This isn't a problem — Rosetta allows the code to run flawlessly. The problem comes if we attempt to debug this process using LLDB and dotnet-sos. The problems start right after running dotnet sos install. After that command, attempt to run lldb will result in the following errors before the prompt

error: this file does not represent a loadable dylib
error: 'setsymbolserver' is not a valid command.

This means that LLDB is trying to load the SOS plugin, but there's an architecture mismatch — LLDB is running in arm64 and attempting to load a library built for x86_64. Rosetta doesn't help here, because Rosetta only applies to programs being run, not to libraries being loaded into programs that are already running.

We can fix the architecture of the SOS library being loaded, by running dotnet sos install --architecture arm64. This will apparently work (the errors at the beginning of the session will be gone), but no SOS command will work.

(lldb) dbgout
Debug output logging enabled

(lldb) sos ClrStack
LoadLibrary(/usr/local/share/dotnet/shared/Microsoft.NETCore.App/5.0.8/libmscordaccore.dylib) FAILED 00000000
Failed to load data access module, 0x80131c4f
You can run the debugger command 'setclrpath <directory>' to control the load of libmscordaccore.dylib.
If that succeeds, the SOS command should work on retry.

From the complaint, it looks like SOS can't locate the CLR runtime libraries, but the problem is actually that we're running LLDB+SOS in arm64 and those CLR libraries are … in x86_64.

Fortunately, there's another workaround. LLDB is installed as a fat binary, which we can easily validate by running

m1mac$ file  `which lldb`
/usr/bin/lldb: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64e:Mach-O 64-bit executable arm64e]
/usr/bin/lldb (for architecture x86_64):    Mach-O 64-bit executable x86_64
/usr/bin/lldb (for architecture arm64e):    Mach-O 64-bit executable arm64e

Since we'll be debugging a dotnet process (x86_64) and we'll need to load the dotnet libraries (x86_64), forcing LLDB to start through x86_64 with Rosetta should fix all our issues. This can be done by passing the command through /usr/bin/arch -x86_64, so instead of lldb you would run /usr/bin/arch -x86_64 lldb.

m1mac$ /usr/bin/arch -x86_64 lldb -p 21787
Aded Microsoft public symbol server
(lldb) process attach --pid 21787
Process 21787 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff2031bcde libsystem_kernel.dylib`__psynch_cvwait + 10
libsystem_kernel.dylib`__psynch_cvwait:
->  0x7fff2031bcde <+10>: jae    0x7fff2031bce8            ; <+20>
    0x7fff2031bce0 <+12>: movq   %rax, %rdi
    0x7fff2031bce3 <+15>: jmp    0x7fff20319ad9            ; cerror_nocancel
    0x7fff2031bce8 <+20>: retq
Target 0: (dotnet) stopped.

Executable module set to "/usr/local/share/dotnet/dotnet".
Architecture set to: x86_64-apple-macosx-.

We finally have a working debugger, which we can easily validate by issuing some SOS commands.

(lldb) sos ClrStack
OS Thread Id: 0x358c9 (1)
        Child SP               IP Call Site
0000000304A6B6D8 00007fff2031bcde [HelperMethodFrame: 0000000304a6b6d8] System.Threading.Thread.SleepInternal(Int32)
0000000304A6B820 000000010F3808B7 System.Threading.Thread.Sleep(Int32) [/_/src/coreclr/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 257]
0000000304A6B830 000000010F81F443 TestProj.Program.D() [~/TestProj/Program.cs @ 32]
0000000304A6B840 000000010F81F3FE TestProj.Program.C() [~/TestProj/Program.cs @ 27]
0000000304A6B850 000000010F81F3BE TestProj.Program.B() [~/TestProj/Program.cs @ 22]
0000000304A6B860 000000010F81F37E TestProj.Program.A() [~/TestProj/Program.cs @ 17]
0000000304A6B870 000000010F815AFB TestProj.Program.Main(System.String[]) [~/TestProj/Program.cs @ 12]

It took me a lot more than I'm proud to admit to get to this point, and it was disappointing that throughout the journey, Googling for most error messages yielded no useful results. Hopefully this post may be found by someone hitting the same problems. At the pace dotnet is evolves, it's likely that this information will soon be obsoleted by a native dotnet arm64e version for macOS.