floating point exception creating a very large array

Bug #1851724 reported by Michael Hudson-Doyle
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm
New
Undecided
Unassigned

Bug Description

Running these commands:

t=$(mktemp -d)
mount -t tmpfs tmpfs $t
truncate -s 2400T $t/0.img
truncate -s 2400T $t/1.img
l1=$(losetup -f --show $t/0.img)
l2=$(losetup -f --show $t/1.img)
devname=/dev/md/test-$(uuidgen)
mdadm --verbose --create --metadata default --level raid1 --run -n 2 \
      --assume-clean $devname $l1 $l2

results in:

mdadm: size set to 2576980245504K
mdadm: automatically enabling write-intent bitmap on large array
Floating point exception

A little light gdb-ing shows that the crash is in super1.c:calc_bitmap_size where __le32_to_cpu(bms->chunksize) is 0. The backtrace is this:

#0 0x00005555555a2126 in bitmap_bits (array_size=5153960491008, chunksize=0) at mdadm.h:1419
#1 0x00005555555a21a6 in calc_bitmap_size (bms=0x55555562d000, boundary=4096) at super1.c:179
#2 0x00005555555a4b05 in getinfo_super1 (st=0x555555620360, info=0x555555625d70, map=0x0) at super1.c:1043
#3 0x000055555557f18d in Create (st=0x555555620360, mddev=0x7fffffffd940 "/dev/md/test-2acca3e1-7896-4a4c-a9cb-c603bb429e3c",
    name=0x7fffffffd948 "test-2acca3e1-7896-4a4c-a9cb-c603bb429e3c", uuid=0x0, subdevs=2, devlist=0x555555620450,
    s=0x7fffffffdef0, c=0x7fffffffdf40, data_offset=1) at Create.c:913
#4 0x000055555555f900 in main (argc=14, argv=0x7fffffffe468) at mdadm.c:1586

super1.c:1043 is this line:

   unsigned long long size = calc_bitmap_size(bsb, 4096);

bsb doesn't seem totally wrong, the magic is right here:

(gdb) p/x bsb[0]
$5 = {magic = 0x6d746962, version = 0x4, uuid = {0x5, 0x11, 0x6c, 0xd9, 0x8, 0x9e, 0x3c, 0xd3, 0x74, 0xd4, 0x68, 0xae, 0x43, 0x90,
    0x2b, 0x45}, events = 0x0, events_cleared = 0x0, sync_size = 0x4affffbf800, state = 0x0, chunksize = 0x0, daemon_sleep = 0x5,
  write_behind = 0x0, sectors_reserved = 0x0, nodes = 0x0, cluster_name = {0x0 <repeats 64 times>}, pad = {
    0x0 <repeats 120 times>}}

But clearly chunksize being 0 is bogus.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Oh the issue is is that add_internal_bitmap1 computes the chunksize to be 1LL<<32 but it is stored as a int32 and so is truncated to 0. Now obviously devices this large are pretty implausible, but this failure mode is unfortunate!

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1851724] Re: floating point exception creating a very large array

It's only 2.5 petabyte, and there are 3.6 petabyte "servers" which I guess
ones could assemble into a single raid array, no?!

https://www.broadberry.co.uk/petarack.php

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.