Jump once for x86_64 PLT

DJ: Direct jump
IJ: Indirect jump
PLT: Produce linkage table


main (int argc, char *argv[])
    puts ("Hello");
    puts ("World");
    return 0;

x86_64 PLT(DJ + IJ)

    callq   puts@plt
    jmpq    *0x2fe2(%rip) # .got.plt
                          # 1st call: point to 1:
                          # non-1st call: point to puts:
    pushq   $0x0          # symbol index
    jmpq    _dl_runtime_resolve

1st call:
main -> puts@plt -> _dl_runtime_resolve -> puts

non-1st call:
main -> puts@plt -> puts

x86_64 PLT(IJ)

    callq   *0x2fe2(%rip) # .got.plt
                          # 1st call: point to puts@plt
                          # non-1st call: point to puts:
    pushq   $0x0          # symbol index
    jmpq    _dl_runtime_resolve

1st call:
main -> puts@plt -> _dl_runtime_resolve -> puts

non-1st call:
main -> puts


Run GNOME on NanoPi M4 (RK3399)


/usr/bin/gnome-shell: symbol lookup error: /lib/libmutter-4.so.0: undefined symbol: gbm_bo_get_offset

Arch Linux
Refer: https://archlinuxarm.org/platforms/armv8/generic

Mali GPU user space drivers
Drivers: https://github.com/heiher/libmali-rk3399

git clone https://github.com/heiher/libmali-rk3399
cd libmali-rk3399
# Build gbm wrapper
# Install to system
sudo cp conf/mali.conf /etc/ld.so.conf.d/
sudo cp -rd lib /usr/lib/mali
# Update ld.so cache
sudo ldconfig

Force use GLESv2

# /etc/clutter-1.0/settings.ini

Fix crash when maximizing window
Patch: https://gitlab.gnome.org/GNOME/mutter/merge_requests/515

GPU device access permission

# /etc/udev/rules.d/50-mali.rules 
KERNEL=="mali0", MODE="0666"
sudo chmod 0666 /dev/mali0

Start gdm.


[MIUI] Fix the problem communicate with Google servers

There was a problem communicating with Google servers.

How to fix
1. Download Open Google apps pico.
2. Extract PrebuiltGmsCorePi.apk.

unzip open_gapps-arm64-9.0-pico-20190309.zip
lunzip Core/gmscore-arm64.tar.lz
tar xf Core/gmscore-arm64.tar
adb push gmscore-arm64/nodpi/priv-app/PrebuiltGmsCorePi/PrebuiltGmsCorePi.apk /sdcard

3. Replace PrebuiltGmsCorePi.apk.

dd if=/sdcard/PrebuiltGmsCorePi.apk of=/system/priv-app/PrebuiltGmsCorePi/PrebuiltGmsCorePi.apk bs=1M


Run QEMU with hardware virtualization on macOS

在macOS上通过虚拟机运行其它操作系统,又不想用商业软件,那么开源的QEMU是一个比较好的选择。QEMU的功能支持还是比较全面的,除了功能以外,使用虚拟机软件的用户最关心的就是性能了,一个好消息是macOS 10.10+版本已经引人了硬件虚拟化支持框架,也就是Hypervisor.framework,另一个好消息是QEMU也已支持该框架,也就是hvf accelerator。

1. macOS 10.10+
2. Macports

已经使用过的用户可能已经发现,QEMU使用hvf accelerator并开启多核是有问题的呀。的确,QEMU使用hvf accelerator以单核运行时没有问题,当使用-smp参数指定多核时,很大概率上虚拟机硬件初始化都完成不了就死机了。
不过,好消息是该问题也已经修复了,导致这个问题的原因是hvf accelerator代码设计没有考虑到虚拟机启动后所有hvf vcpu都在并行执行指令,其中包括硬件初始化的I/O模拟操作,多个CPU同时对同一硬件执行初始化显然是不行的。

Patch (已经提交上游社区,Review中,期望尽快合并)

Install QEMU

cd ~
git clone https://github.com/hevz/macports
sudo vim /opt/local/etc/macports/sources.conf
# Add local repositories
file:///Users/[YOUR USER NAME]/macports
rsync://rsync.macports.org/macports/release/tarballs/ports.tar [default]
cd ~/macports
sudo port install qemu

Run Arch Linux
1. 下载Arch Linux安装ISO镜像。
2. 创建一个虚拟机磁盘镜像。
3. 开始安装新的系统。
4. 启动安装后的系统。

mkdir ~/system/images
qemu-img create -f qcow2 ~/system/images/arch.qcow2 40G
qemu-system-x86_64 -no-user-config -nodefaults -show-cursor \
    -M pc-q35-3.1,accel=hvf,usb=off,vmport=off \
    -cpu host -smp 4,sockets=1,cores=2,threads=2 -m 4096 \
    -realtime mlock=off -rtc base=utc,driftfix=slew \
    -drive file=~/system/images/arch.qcow2,if=none,format=qcow2,id=disk0 \
    -device virtio-blk-pci,bus=pcie.0,addr=0x1,drive=disk0 \
    -netdev user,id=net0,hostfwd=tcp::2200-:22 \
    -device virtio-net-pci,netdev=net0,bus=pcie.0,addr=0x2 \
    -device virtio-keyboard-pci,bus=pcie.0,addr=0x3 \
    -device virtio-tablet-pci,bus=pcie.0,addr=0x4 \
    -device virtio-vga,bus=pcie.0,addr=0x5 \
    -cdrom ~/archlinux-2019.01.01-x86_64.iso -boot d



Transparent proxy per application on Linux

This is a transparent proxy per app based on iptables + network classifier cgroup on Linux, and it’s more general than proxychains.

Build and install tproxy

git clone --recursive https://github.com/heiher/hev-socks5-tproxy
cd hev-socks5-tproxy
sudo cp bin/hev-socks5-tproxy /usr/local/bin/
sudo cp conf/main.ini /usr/local/etc/hev-socks5-tproxy.conf

Install systemd serivce

# /etc/systemd/system/hev-socks5-tproxy.service
ExecStart=/usr/local/bin/hev-socks5-tproxy /usr/local/etc/hev-socks5-tproxy.conf

Install tproxy wrapper

# /usr/local/bin/tproxy
if [ ! -e ${NET_CLS_DIR} ]; then
	sudo sh -c "mkdir -p ${NET_CLS_DIR}; \
		chmod 0666 ${NET_CLS_DIR}/tasks; \
		echo ${NET_CLS_ID} > ${NET_CLS_DIR}/net_cls.classid; \
		iptables -t nat -D OUTPUT -p tcp \
			-m cgroup --cgroup ${NET_CLS_ID} \
			-j REDIRECT --to-ports ${TP_TCP_PORT}; \
		iptables -t nat -D OUTPUT -p udp --dport 53 \
			-m cgroup --cgroup ${NET_CLS_ID} \
			-j REDIRECT --to-ports ${TP_DNS_PORT}; \
		iptables -t nat -I OUTPUT -p tcp \
			-m cgroup --cgroup ${NET_CLS_ID} \
			-j REDIRECT --to-ports ${TP_TCP_PORT}; \
		iptables -t nat -I OUTPUT -p udp --dport 53 \
			-m cgroup --cgroup ${NET_CLS_ID} \
			-j REDIRECT --to-ports ${TP_DNS_PORT};" 2>&1 2> /dev/null
echo $$ > ${NET_CLS_DIR}/tasks
exec "$@"

How to use?

tproxy COMMAND
# For example
tproxy wget http://xxx.com/xxx
tproxy makepkg


Dump VDSO via GDB

gdb /bin/bash
(gdb) b main
(gdb) r
(gdb) info proc map
Mapped address spaces:
          Start Addr           End Addr       Size     Offset objfile
      0x7ffff7fd1000     0x7ffff7fd3000     0x2000        0x0 [vdso]
(gdb) dump binary memory /tmp/vdso.so 0x7ffff7fd1000 0x7ffff7fd3000
(gdb) quit
file /tmp/vdso.so
/tmp/vdso.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=1a3fac101214fe3ecfb3788d4f8af3018f1f2667, stripped


Reordering on an Alpha processor

A very non-intuitive property of the Alpha processor is that it allows the following behavior:

Initially: p = & x, x = 1, y = 0

    Thread 1         Thread 2
  y = 1         |    
  memoryBarrier |    i = *p
  p = & y       |
Can result in: i = 0

This behavior means that the reader needs to perform a memory barrier in lazy initialization idioms (e.g., Double-checked locking) and creates issues for synchronization-free immutable objects (e.g., ensuring. that other threads see the correct value for fields of a String object).

Kourosh Gharachorloo wrote a note explaining how it can actually happen on an Alpha multiprocessor:
The anomalous behavior is currently only possible on a 21264-based system. And obviously you have to be using one of our multiprocessor servers. Finally, the chances that you actually see it are very low, yet it is possible.

Here is what has to happen for this behavior to show up. Assume T1 runs on P1 and T2 on P2. P2 has to be caching location y with value 0. P1 does y=1 which causes an “invalidate y” to be sent to P2. This invalidate goes into the incoming “probe queue” of P2; as you will see, the problem arises because this invalidate could theoretically sit in the probe queue without doing an MB on P2. The invalidate is acknowledged right away at this point (i.e., you don’t wait for it to actually invalidate the copy in P2’s cache before sending the acknowledgment). Therefore, P1 can go through its MB. And it proceeds to do the write to p. Now P2 proceeds to read p. The reply for read p is allowed to bypass the probe queue on P2 on its incoming path (this allows replies/data to get back to the 21264 quickly without needing to wait for previous incoming probes to be serviced). Now, P2 can derefence P to read the old value of y that is sitting in its cache (the inval y in P2’s probe queue is still sitting there).

How does an MB on P2 fix this? The 21264 flushes its incoming probe queue (i.e., services any pending messages in there) at every MB. Hence, after the read of P, you do an MB which pulls in the inval to y for sure. And you can no longer see the old cached value for y.

Even though the above scenario is theoretically possible, the chances of observing a problem due to it are extremely minute. The reason is that even if you setup the caching properly, P2 will likely have ample opportunity to service the messages (i.e., inval) in its probe queue before it receives the data reply for “read p”. Nonetheless, if you get into a situation where you have placed many things in P2’s probe queue ahead of the inval to y, then it is possible that the reply to p comes back and bypasses this inval. It would be difficult for you to set up the scenario though and actually observe the anomaly.

The above addresses how current Alpha’s may violate what you have shown. Future Alpha’s can violate it due to other optimizations. One interesting optimization is value prediction.

From: http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html