Sep 2018¶

Sep 20¶

```c [ 54.602054] nr_pcache_pee_free: 0 [ 54.602537] nr_pcache_pee_free_kmalloc: 0 [ 1468.765410] mlx4_msi_x_interrupt(): IRQ: 27 CPU: 1 [ 1468.766956] event PORT_MNG_CHG arrived [ 1468.768193] handle_port_mgmt_change_event: rereg
[ 1468.813660] ib_cache: ib_cache_update(): Updated port 1 of dev 0000:00:08.0 [ 1468.815097] ib_sa_event(): TODO [ 1479.178651] mlx4_msi_x_interrupt(): IRQ: 27 CPU: 1 [ 1479.180201] event PORT_MNG_CHG arrived [ 1479.181430] handle_port_mgmt_change_event: rereg [ 1479.190813] bad: scheduling from the idle thread! [ 1479.192158] CPU: 1 PID: 0 Comm: swapper/1 4.0.0-lego+ #146 [ 1479.193622] Stack: [ 1479.194408] ffff88083fddf980 ffffffff8101eefc ffff88083fc45d80 ffff88083fc45d80 [ 1479.196826] ffff88083fddf9a8 ffffffff8101ace4 00000001810067d4 ffff88083fe43000 [ 1479.199226] ffffffffffff0000 ffff88083fddf9e0 ffffffff81078bf6 ffffffff8100e8ea [ 1479.203615] ffffffffffff0000 0000000000000000 ffff88083fe43000 ffff88083fe43000 [ 1479.206532] ffff88083fddf9f8 ffffffff81078ca3 7fffffffffffffff ffff88083fddfa68 [ 1479.208791] Call Trace: [ 1479.209606] [ 1479.210322] [] dequeue_task_idle+0x48/0x60 [ 1479.211726] [] deactivate_task+0x44/0x50 [ 1479.213092] [] __schedule+0x146/0x1e0 [ 1479.214410] [] ? smp__apic_timer_interrupt+0x6a/0x70 [ 1479.215960] [] schedule+0x13/0x30 [ 1479.217211] [] schedule_timeout+0x12a/0x1a0 [ 1479.218625] [] __down_common+0xaa/0x103 [ 1479.219904] [] __down+0x18/0x1a [ 1479.221046] [] down+0x3c/0x40 [ 1479.222163] [] __mlx4_cmd+0x1d7/0x3c0 [ 1479.223397] [] mlx4_MAD_IFC+0x22e/0x490 [ 1479.224666] [] __mlx4_ib_query_pkey+0x181/0x240 [ 1479.226045] [] mlx4_ib_query_pkey+0x13/0x20 [ 1479.227365] [] ib_query_pkey+0x14/0x20 [ 1479.228617] [] ib_cache_update+0x237/0x480 [ 1479.229862] [] ib_cache_event+0x28/0x30 [ 1479.231026] [] ib_dispatch_event+0x40/0x70 [ 1479.232222] [] handle_port_mgmt_change_event+0x158/0x1c0 [ 1479.233602] [] mlx4_ib_event+0x7c/0xa0 [ 1479.234744] [] mlx4_dispatch_event+0x65/0x90 [ 1479.235968] [] mlx4_eq_int+0x273/0x4f0 [ 1479.237113] [] mlx4_msi_x_interrupt+0x36/0x40 [ 1479.238352] [] handle_irq_event_percpu+0x24/0xa0 [ 1479.239584] [] handle_irq_event+0x28/0x50 [ 1479.240696] [] handle_edge_irq+0x5e/0xc0 [ 1479.241794] [] do_IRQ+0x43/0xd0 [ 1479.242779] [] ? apic_timer_interrupt+0x54/0x90 [ 1479.243971] [] common_interrupt+0x6a/0x6a [ 1479.245084] [] ? cpu_idle+0x10/0x30 [ 1479.246123] [] start_secondary_cpu+0x55/0x60 [ 1479.247278]

```

Sep 17¶

Can not believe I’m wasting time on this crap X again.

Sep 16¶

Tests done today:

Setting	Log	nr_workers	Tracing (strace/counter/profiling)	Runtime (s)	pcache_flush_net (us)
TF-MNIST, Linux				13.2s
TF4-MNIST, 128MB	0916-w14-1	1	ON	avg 48.5s	9891
TF4-MNIST, 128MB	0916-w14-2	1	OFF	(46.1+44.6+45.5+45.7+44)/5 = 45.2s	N/A
TF4-MNIST, 128MB	0916-w14-4	4	ON	(43.4+44+43.9+42.6+42.1)/5=43.2	8351
TF4-MNIST, 128MB	0916-w14-3	4	OFF	(40.1+42.1+42.0+41.7+42.1)/5 = 41.6	N/A
TF4-Cifar, Linux				235.5s
TF4-Cifar, 128MB	0916-w14-5	4	OFF	(636.2+635.0+636.8+637.2+634.1)/5=635.8	N/A
TF4-Cifar, 128MB	0916-w14-6	1	OFF	(660.2+662.2+662.8+663.8+661+5)/5=663s	N/A
TF4-Cifar, 256MB	0916-w14-7	1	OFF	486s	N/A

Sep 15¶

DAMN.

Let us summarize today. Okay. Fixed the double-post-cqe issue. Hehe. The post part is the only fucking left code that I did not look into at fit_poll_recv_cq. And, ironically, there is no error checking for ib_post_recv(), which won’t generate any error/warning.

error checking error checking…

Anyway fuck it.

Today I created a new tag v0.0.9, hope we have a stable net. The RPC profile code is very stressing, and fit survived.

The following wanring is fixed by post rx_depth/2. c [ 1812.017204] fit: To align first QPN, we skipped: #72 #72 #73 #74 #75 #76 #77 #78 #79 [ 1812.157570] fit: fit_post_receives_message()-628 CPU 2 Fail to post recv conn_id: 12 [ 1812.166013] ------------[ cut here ]------------ [ 1812.171152] WARNING: CPU: 2 PID: 16 at net/lego/fit_internal.c:629 fit_post_receives_message.isra.7+0xce/0x100 [ 1812.182302] CPU: 2 PID: 16 Comm: ib-initd 4.0.0-lego+ #95 [ 1812.188314] Stack: [ 1812.190544] ffff880ff98bfd50 ffffffff8101299b 0000000000000cff 0000000000000060 [ 1812.198689] 0000000000000d00 0000000000000100 ffff880ff98dc030 ffff880ff98bfd60 [ 1812.206834] ffffffff81012a8f ffff880ff98bfdc8 ffffffff810743de fffffff4fffffff4 [ 1812.214978] ffff880ff98bfd80 0000000000000000 0000000000000cff 0000000000000000 [ 1812.223124] 0000000000000000 ffff880ff98dc000 0000000000000000 000000000000000c [ 1812.231269] Call Trace: [ 1812.233984] <TSK> [ 1812.236116] [<ffffffff810129a7>] __warn.constprop.0+0xa7/0x100 [ 1812.242613] [<ffffffff81012a8f>] warn_slowpath_null+0xf/0x20 [ 1812.248915] [<ffffffff810743de>] fit_post_receives_message.isra.7+0xce/0x100 [ 1812.256770] [<ffffffff81076a1a>] fit_add_newnode+0xca/0x170 [ 1812.262974] [<ffffffff81079d10>] fit_establish_conn+0x7b0/0xaa0 [ 1812.269568] [<ffffffff81073ce8>] ? ibv_add_one+0x98/0x120 [ 1812.275580] [<ffffffff810741f0>] ? ibapi_get_node_id+0x20/0x20 [ 1812.282076] [<ffffffff81074258>] lego_ib_init+0x68/0xf0 [ 1812.287893] [<ffffffff81023261>] kthread+0x111/0x130 [ 1812.293421] [<ffffffff81023150>] ? __kthread_parkme+0x70/0x70 [ 1812.299820] [<ffffffff8100eaf2>] ret_from_fork+0x22/0x30 [ 1812.305735] <EOT> [ 1812.307868] ---[ end trace 0000000000000000 ]---

Sep 11¶

Got this log, 5 machine, p2s_open, S side has this issue. Damn.

```c [ 1672.962279]

***** Fail to to get the CQE from send_cq after 20 seconds!
***** This means the packet was lost and something went wrong
***** with your NIC… ***** connection_id: 11 dest node: 0

[ 1673.061668] ------------[ cut here ]------------ [ 1673.074937] WARNING: CPU: 10 PID: 4624 at /root/ys/LegoOS_2M/linux-modules/fit/fit_internal.c:956 fit_internal_poll_sendcq+0xda/0x130 fit [ 1673.101557] Modules linked in: storage(OF) fit(OF) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT tun bridge stp llc ebtable_filter ebtable s ip6table_filter ip6_tables iptable_filter xprtrdma sunrpc ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_c m iw_cm ib_addr x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ipmi_devintf ablk_helper cryptd ipmi_si iTCO_wdt ipmi_msghandler iTCO_vendor_support dcdbas sg pcspkr shpchp acpi_power_meter lpc_ich mfd_core wmi mperf uinput binfmt_misc ip_tables ext4 mbcache jbd2 mlx4_ib [ 1673.182609] ib_sa ib_mad ib_core mlx4_en sd_mod crc_t10dif mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci crc32c_intel libahci mlx4_core libata tg3 nvme megaraid_sas ptp i2c_core pps_core dm_mirror dm_region_hash dm_log dm_mod [ 1673.222604] CPU: 10 PID: 4624 Comm: lego-storaged Tainted: GF W O 3.11.1-vanilla #1 [ 1673.235825] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.5.4 10/002/2015 [ 1673.248883] 0000000000000009 ffff88102186b9f8 ffffffff8159a5a4 0000000000000000 [ 1673.261795] ffff88102186ba30 ffffffff810641bd ffff882027180400 00000004a817c800 [ 1673.274499] 00000180dc3abde5 0000000000000000 0000000000000000 ffff88102186ba40 [ 1673.287034] Call Trace: [ 1673.299259] [] dump_stack+0x45/0x56 [ 1673.311371] [] warn_slowpath_common+0x7d/0xa0 [ 1673.323268] [] warn_slowpath_null+0x1a/0x20 [ 1673.334892] [] fit_internal_poll_sendcq+0xda/0x130 [fit] [ 1673.346348] [] ? check_preempt_curr+0x85/0xa0 [ 1673.357575] [] fit_send_message_with_rdma_write_with_imm_request+0x107/0x3f0 [fit] [ 1673.368777] [] ? wake_up_worker+0x24/0x30 [ 1673.379741] [] fit_reply_message+0x89/0xa0 [fit] [ 1673.390497] [] ibapi_reply_message+0x1b/0x20 [fit] [ 1673.401039] [] handle_open_request+0xa5/0xe0 [storage] [ 1673.411367] [] storage_manager+0x106/0x300 [storage] [ 1673.421470] [] ? 0xffffffffa0645fff [ 1673.431297] [] ? 0xffffffffa0645fff [ 1673.440797] [] kthread+0xc0/0xd0 [ 1673.450034] [] ? insert_kthread_work+0x40/0x40 [ 1673.459063] [] ret_from_fork+0x7c/0xb0 [ 1673.467837] [] ? insert_kthread_work+0x40/0x40 [ 1673.476400] —[ end trace f9b19a31d409f910 ]— [ 1695.867276] storage_self_monitor(): in_handler=1 [ 1695.875906] handle_replica_flush: 0 [ 1695.884613] handle_replica_vma: 0 [ 1695.893265] handle_replica_read: 12740 [ 1695.901920] handle_replica_write: 0 [ 1713.012565] INFO: rcu_sched self-detected stall on CPU { 10} (t=60001 jiffies g=7646 c=7645 q=0) [ 1713.013339] sending NMI to all CPUs: [ 1713.013573] INFO: rcu_sched detected stalls on CPUs/tasks: { 10} (detected by 15, t=60002 jiffies, g=7646, c=7645, q=0) [ 1713.014807] NMI backtrace for cpu 0 [ 1713.015685] CPU: 0 PID: 4591 Comm: wq_handler Tainted: GF W O 3.11.1-vanilla #1 [ 1713.016624] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.5.4 10/002/2015 [ 1713.017575] task: ffff88201f193b40 ti: ffff88101a34a000 task.ti: ffff88101a34a000 [ 1713.018530] RIP: 0010:[] [] waiting_queue_handler+0x75/0x140 [fit] [ 1713.018530] RIP: 0010:[] [] waiting_queue_handler+0x75/0x140 [fit] [ 1713.019512] RSP: 0018:ffff88101a34be78 EFLAGS: 00000296 [ 1713.020444] RAX: 0000000000080080 RBX: ffff8820200253f0 RCX: ffff88201f193b40 [ 1713.021364] RDX: 0000000000000001 RSI: ffff88103f414760 RDI: ffff88103f4146c0 [ 1713.022260] RBP: ffff88101a34bec8 R08: 0000000000000000 R09: 0000000000000001 [ 1713.023138] R10: 0000000000000001 R11: ffffffffa0636b55 R12: ffff8820200253c0 [ 1713.024000] R13: ffff881022005000 R14: ffffffffa063b8e4 R15: ffffffffa063b8e4 [ 1713.024839] FS: 0000000000000000(0000) GS:ffff88103f400000(0000) knlGS:0000000000000000 [ 1713.025676] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1713.026481] CR2: 00007fd77ef46000 CR3: 0000000001876000 CR4: 00000000001407f0 [ 1713.027273] Stack: [ 1713.028035] ffff881000000000 ffff881000300660 ffff882000000003 0000000000000000 [ 1713.028818] ffff881000000000 ffff881021eafc38 ffff881022005000 ffffffffa0636ae0 [ 1713.029585] 0000000000000000 0000000000000000 ffff88101a34bf48 ffffffff81085ec0 [ 1713.030350] Call Trace: [ 1713.031098] [] ? fit_send_message_with_rdma_write_with_imm_request+0x3f0/0x3f0 [fit] [ 1713.031875] [] kthread+0xc0/0xd0 [ 1713.032641] [] ? insert_kthread_work+0x40/0x40 [ 1713.033407] [] ret_from_fork+0x7c/0xb0 [ 1713.034171] [] ? insert_kthread_work+0x40/0x40 ```

Sep 08¶

Check this log out: ``` ]— [ 427.218569] STDOUT: —[ INFO:tensorflow:Graph was finalized.

]— [ 427.416043] BUG: unable to handle kernel NULL pointer dereference at (null) [ 427.424583] IP: [] fit_poll_recv_cq+0x5cb/0x860 [ 427.431370] mlx4_msi_x_interrupt(): IRQ: 27 CPU: 0 [ 427.436702] PGD 0 [ 427.438932] CQ_ERROR CQ overrun on CQN 000082 [ 427.443780] Oops: 0002 [#1] SMP PROCESSOR [ 427.448240] event qp_event arrived [ 427.452022] CPU: 6 PID: 18 Comm: FIT_RecvCQ-0 4.0.0-lego+ #23 [ 427.458421] event qp_event arrived [ 427.462203] RIP: 0010:[] [] fit_poll_recv_cq+0x5cb/0x860 [ 427.471704] RSP: 0000:ffff881023e3fe60 EFLAGS: 00010287 [ 427.477618] RAX: 0000000000000000 RBX: 000000002aaaaaab RCX: 0000000000000004 [ 427.485570] RDX: 0000000000000000 RSI: 0000000000000053 RDI: 0000000000000000 [ 427.493520] RBP: ffff881023e3fec0 R08: 0000000000000001 R09: ffff881039900000 [ 427.501470] R10: 0000000000000000 R11: ffff881039918000 R12: ffff8810398f2000 [ 427.509421] R13: 0000000000000000 R14: 0000000000000001 R15: ffff881023e25008 [ 427.517371] event qp_event arrived [ 427.521153] FS: 0000000000000000(0000) GS:ffff88107fc60000(0000) knlGS:0000000000000000 [ 427.530169] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 427.536569] CR2: 0000000000000000 CR3: 000000000117a000 CR4: 00000000000406a0 [ 427.544519] event qp_event arrived

```

Trying to tune FIT’s number polling threads. This could be the throughput/latency killer.

128M

P num_polling	M worker	M num_polling	Runtime (s)
1	1	1	46.8s
1	4	1

Sep 07¶

Set up Infiniswap again. What a fucking crap code, and crash the kernel out of nowhere. crap crap crap.

Hmm, Linux will tune the CPU freq during runtime, will be higher than 2.4GHz. So disable it, make it a fair comparison with Lego.

intel_pstate=disable.

Sep 06¶

Did two optimizations on pcache, both are buffer management. Especially the pcache rmap case. In both opts, we kind of use static/pre-allocated array to serve dynamic allocation.

This is a better solution than using kmem_cache, faster. kmem_cache will be a more general solution here.

kmem_cache, FIFO queue (thpool buffer), static preallocated array (rmap, clflush)… Buffer management is really a very important thing in system building. I should be aware at the beginning next time.

These changes are in commits: 6e0cf6c5c64edbe445a27cf55f86ac51f8a897b3 73377cafce95ffa0cfb155f77cac97456a5e4a71

Sep 05¶

Alright. Besides some flaws/bugs in some kfree stuff, LegoOS now actually is very robust! Ran a quick git summary:

project : LegoOS repo age : 1 year, 11 months active : 358 days commits : 1540 files : 1161 authors : 1317 Yizhou Shan 85.5% 120 root 7.8% 36 hythzz 2.3% 27 yilun 1.8% 16 Yutong Huang 1.0% 10 Build Android 0.6% 8 Yiying Zhang 0.5% 4 sumukh1991 0.3% 1 Yizhou SHan 0.1% 1 Sumukh Hallymysore Ravindra 0.1%

Of course, there are still PLENY room for improvement, and I know where. At this time, I really think we need something like kmem_cache, which is so fucking useful. It can probably further reduce much overhead.

Sep 04¶

Trying the perset eviction list mechanism, instead of victim cache. The benefit of using this is: we will no longer be bottelnecked by victim cache anymore. Each faulting thread will do eviction/flush within its own context.

For 4 threads MNIST, I saw 3 seconds reduction.

Removed the bitmap, use per pcache set counter for quick reference.

Sep 03¶

With DEBUG_MM, try enable HAVE_FREE directory by directory

-¶

-

update_wall_time+0x44 is where we call tsc_read. And this has been called many times (HZ per second). All of a sudden, the pointer got crashed. Who wrote to this code memory?? Remote RDMA? ```c [ 1052.470714] general protection fault: 0000 [#1] SMP PROCESSOR [ 1052.477113] CPU: 0 PID: 15 Comm: ib_mad1 4.0.0-lego+ #509 [ 1052.483125] RIP: 0010:[] [] update_wall_time+0x44/0x6f0 [ 1052.492530] RSP: 0000:ffff88103ad9fc88 EFLAGS: 00010046 [ 1052.498445] RAX: 4510ffffffff8118 RBX: 0380ffffffffffff RCX: 0000000000000001 [ 1052.506396] RDX: ffff88103ad9fd28 RSI: 0000000000000000 RDI: 4510ffffffff8118 [ 1052.514346] RBP: ffff88103ad9fcd0 R08: 000000000000001f R09: 0000000000000000 [ 1052.522298] R10: 0000000000000029 R11: ffff881013f8e130 R12: aaff0000024a2677 [ 1052.530248] R13: 0000000000000000 R14: ffff88103ad85228 R15: ffff88103ae0c000 [ 1052.538199] FS: 0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 [ 1052.547216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1052.553616] CR2: 0000000000000000 CR3: 000000000117b000 CR4: 00000000000406b0 [ 1052.561567] Stack: [ 1052.563797] 0000000000000086 ffff88107fc05d80 ffff88103ad85000 0000000000000000 [ 1052.571941] ffff88107fc04980 0000000000000000 0000000000000000 ffff88103ad85228 [ 1052.580085] ffff88103ae0c000 ffff88103ad9fce8 ffffffff81017557 000000003ad9fe10 [ 1052.588230] ffff88103ad9fd10 ffffffff810067a4 ffffffff81088040 ffff88107fc05d80 [ 1052.596375] ffff88103ad85000 ffff88103ad9fdf8 ffffffff8100e8ea ffff88103ad9fd28 [ 1052.604520] Call Trace: [ 1052.607236] [ 1052.609368] [] tick_handle_periodic+0x67/0x70 [ 1052.615961] [] apic_timer_interrupt+0x54/0x90 [ 1052.622555] [] smp__apic_timer_interrupt+0x6a/0x70 [ 1052.629633] [] ? __schedule+0xf8/0x1e0 [ 1052.635548] [] schedule+0x13/0x30 [ 1052.640978] [] ib_mad_completion_handler+0x5de/0xc20 [ 1052.648250] [] ? dequeue_task_rt+0x1b/0x180 [ 1052.654648] [] ? ib_mad_send_done_handler.isra.22+0x4e0/0x4e0 [ 1052.662793] [] kthread+0xf6/0x110 [ 1052.668223] [] ? __kthread_parkme+0x70/0x70 [ 1052.674622] [] ret_from_fork+0x22/0x30 [ 1052.680538] [ 1052.682670] Code: db e4 16 00 79 0d f3 90 80 3d d0 e4 16 00 00 7e f5 eb ea 48 8b 1d fd fa 1f 00 48 8b 05 e6 fa 1f 00 4c 8b 25 f7 fa 1f 00 48 89 c7 50 28 49 89 c7 48 89 d8 4d 29 e7 48 d1 e8 49 21 df 48 f7 d0 [ 1052.703711] RIP [] update_wall_time+0x44/0x6f0 [ 1052.710498] RSP

```

Last update: October 9, 2018