mdadm

floating point exception creating a very large array

Bug #1851724 reported by Michael Hudson-Doyle on 2019-11-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	mdadm	New	Undecided	Unassigned

Bug Description

Running these commands:

t=$(mktemp -d)
mount -t tmpfs tmpfs $t
truncate -s 2400T $t/0.img
truncate -s 2400T $t/1.img
l1=$(losetup -f --show $t/0.img)
l2=$(losetup -f --show $t/1.img)
devname=/dev/md/test-$(uuidgen)
mdadm --verbose --create --metadata default --level raid1 --run -n 2 \
--assume-clean $devname $l1 $l2

results in:

mdadm: size set to 2576980245504K
mdadm: automatically enabling write-intent bitmap on large array
Floating point exception

A little light gdb-ing shows that the crash is in super1.c:calc_bitmap_size where __le32_to_cpu(bms->chunksize) is 0. The backtrace is this:

#0 0x00005555555a2126 in bitmap_bits (array_size=5153960491008, chunksize=0) at mdadm.h:1419
#1 0x00005555555a21a6 in calc_bitmap_size (bms=0x55555562d000, boundary=4096) at super1.c:179
#2 0x00005555555a4b05 in getinfo_super1 (st=0x555555620360, info=0x555555625d70, map=0x0) at super1.c:1043
#3 0x000055555557f18d in Create (st=0x555555620360, mddev=0x7fffffffd940 "/dev/md/test-2acca3e1-7896-4a4c-a9cb-c603bb429e3c",
name=0x7fffffffd948 "test-2acca3e1-7896-4a4c-a9cb-c603bb429e3c", uuid=0x0, subdevs=2, devlist=0x555555620450,
s=0x7fffffffdef0, c=0x7fffffffdf40, data_offset=1) at Create.c:913
#4 0x000055555555f900 in main (argc=14, argv=0x7fffffffe468) at mdadm.c:1586

super1.c:1043 is this line:

unsigned long long size = calc_bitmap_size(bsb, 4096);

bsb doesn't seem totally wrong, the magic is right here:

(gdb) p/x bsb[0]
$5 = {magic = 0x6d746962, version = 0x4, uuid = {0x5, 0x11, 0x6c, 0xd9, 0x8, 0x9e, 0x3c, 0xd3, 0x74, 0xd4, 0x68, 0xae, 0x43, 0x90,
    0x2b, 0x45}, events = 0x0, events_cleared = 0x0, sync_size = 0x4affffbf800, state = 0x0, chunksize = 0x0, daemon_sleep = 0x5,
  write_behind = 0x0, sectors_reserved = 0x0, nodes = 0x0, cluster_name = {0x0 <repeats 64 times>}, pad = {
    0x0 <repeats 120 times>}}

But clearly chunksize being 0 is bogus.

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2019-11-07:

Oh the issue is is that add_internal_bitmap1 computes the chunksize to be 1LL<<32 but it is stored as a int32 and so is truncated to 0. Now obviously devices this large are pretty implausible, but this failure mode is unfortunate!

Revision history for this message

Dimitri John Ledkov (xnox) wrote on 2019-11-08: Re: [Bug 1851724] Re: floating point exception creating a very large array

It's only 2.5 petabyte, and there are 3.6 petabyte "servers" which I guess
ones could assemble into a single raid array, no?!

https://www.broadberry.co.uk/petarack.php

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.