JOIN
Get Time
forums   
Search | Watch Thread  |  My Post History  |  My Watches  |  User Settings
View: Flat (newest first)  | Threaded  | Tree
Previous Thread  |  Next Thread
inconsistent number of bands in 'ms' of Full data set | Reply
for example, airport_0_0_ms.tif in train is having only 4 bands instead of 8.
This is also verified with metadata in airport_0_0_ms.json (mentions only four wavelengths).

Like to know how to handle such thing if anybody is using full data set.

Thanks
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
Hi! I'd love to suggest some ideas, however the competition rules prohibit sharing information:

"Use the match forum to ask general questions or report problems, but please do not post comments and questions that reveal information about possible solution techniques."


I really wish it wasn't this way, other competition sites allow sharing information as long as sharing is done publicly.

Best - Andres

p.s. I was going to just submit a general idea based on a paper about how to handle the case where you have inconsistent neural of images you need to feed to a NN.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by antor) | Reply
hmm, actually I posted it wrong. I should have referred to admins/copilots.

As per the data description from the problem it has to be 8 bands for every sample.
I thought I would proceed with the 4 bands, but getting tiff file errors on some of the files. Sometimes byte offsets and counts are missing from the tiff file header. So it is clear that they did not test the data for errors.

No known libraries are able to read them. Tried to fix the headers but didn't get it solved. At the end, I gave up on that data.

With the hopes on this data I stopped training using RGB data.
Having bought two 4TB HDDs and another SSD on the way, for this task, and missing around two weeks, feeling cheated.

Wondering why no one shared such info on the forum.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
Apologies for the confusion about the full dataset, we should have specified clearly that it contains both 4 and 8 band images. As can be read in the stakeholders' paper (linked from the fmow_baseline thread):
"All imagery used in fMoW was collected from the DigitalGlobe constellation. Images were gathered in pairs, consisting of 4-band or 8-band multispectral imagery in the visible to near-infrared region, as well as a pan-sharpened RGB image that represents a fusion of the high-resolution
panchromatic image and the RGB bands from the lower-resolution multispectral image. 4-band imagery was obtained from either the QuickBird-2 or GeoEye-1 satellite systems, whereas 8-band imagery was obtained from WorldView-2 or WorldView-3."


> No known libraries are able to read them.
Can anyone recommend a library to be used? Such discussion is totally fine if you don't share code that you have written.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by walrus71) | Reply
Hi Warlus, thanks for the response.

Regarding the 4 and 8 bands, I got the point. But a sample (i.e. unique train box_id) contains only one of kind either 4 or 8 not both. So it is not given which pair of samples represent the same location (one can infer with some processing but highly discouraging with so much of data). Even if a sample contains both, they are not paired.

Else, one can simply use the NIR band from either, it will provide significantly extra info compared to visible bands. That's what I tried.

Reading issue is with the erroneous (badly formatted) files, not all. I gave up because it is possible that the test data might have corrupted files.

Any way it is too late to use that data, only 18 days left. I can consider using this data if there is extension of deadline by 1-2 weeks, provided, test data is not having errors.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
Hi, I tested a few multi-band images with:

from osgeo import gdal


and for the few I tested I didn't see any issues, but this was just manually testing it.

Can you let me know a few files that give you errors to see if I can read them?
Re: inconsistent number of bands in 'ms' of Full data set (response to post by antor) | Reply
try this:
fMoW-full/train/airport/airport_228/airport_228_4_ms.tif
I used tifffile. Gives "warnings.warn("invalid page offset > file size")" and produces 0 sized array.

There are other frequent cases with warning of "empty byte count", (then fails to read). Don't remember the examples for this.
You could just try to read airport class to check.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
empty byte case:
fMoW-full/train/airport/airport_137/airport_137_3_ms.tif

page header tags for this file:
strip_offsets (8, 63096, ... 246674088, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
strip_byte_counts (63088, ... 63088, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

These leading zeros are not expected as per the image size.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
There you go:

$ python tif_test.py /fMoW-full/train/airport/airport_137/airport_137_3_ms.tif
{'TIFFTAG_RESOLUTIONUNIT': '1 (unitless)', 'TIFFTAG_XRESOLUTION': '1', 'TIFFTAG_YRESOLUTION': '1'}
Band 1 has type uint16 and shape (3969, 3943)
Band 2 has type uint16 and shape (3969, 3943)
Band 3 has type uint16 and shape (3969, 3943)
Band 4 has type uint16 and shape (3969, 3943)
Band 5 has type uint16 and shape (3969, 3943)
Band 6 has type uint16 and shape (3969, 3943)
Band 7 has type uint16 and shape (3969, 3943)
Band 8 has type uint16 and shape (3969, 3943)


$ python tif_test.py /fMoW-full/train/airport/airport_228/airport_228_4_ms.tif
{'TIFFTAG_RESOLUTIONUNIT': '1 (unitless)', 'TIFFTAG_XRESOLUTION': '1', 'TIFFTAG_YRESOLUTION': '1'}
Band 1 has type uint16 and shape (3845, 3915)
Band 2 has type uint16 and shape (3845, 3915)
Band 3 has type uint16 and shape (3845, 3915)
Band 4 has type uint16 and shape (3845, 3915)
Band 5 has type uint16 and shape (3845, 3915)
Band 6 has type uint16 and shape (3845, 3915)
Band 7 has type uint16 and shape (3845, 3915)
Band 8 has type uint16 and shape (3845, 3915)


I've visualized the images (look great) so there's no errors in those two that I can see.

I downloaded the torrent, btw.

$ sha1sum /fMoW-full/train/airport/airport_228/airport_228_4_ms.tif
4f2679e59d7b7dbc6ed6f80d9bb2e17462cc130b /fMoW-full/train/airport/airport_228/airport_228_4_ms.tif
 
$ sha1sum /fMoW-full/train/airport/airport_137/airport_137_3_ms.tif
0e5925e1da1baff7981b3a36c6ab5c962aaaeb39  /fMoW-full/train/airport/airport_137/airport_137_3_ms.tif
Re: inconsistent number of bands in 'ms' of Full data set (response to post by antor) | Reply
this test is just reading the header info or also loading the image array?
yes sh1sum matches (thanks for this)
may be tifffile doesn't handle !!
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
The code is just general knowledge of loading a MB TIFF (see https://stackoverflow.com/questions/12278653/how-can-generate-a-raw-file-from-multi-band-tif-file) so I don't think the organizers will oppose to this:

from osgeo import gdal
import numpy as np
#import iterm
import sys
 
f= sys.argv[1] 
ds = gdal.Open(f)
print ds.GetMetadata()
 
def to_uint8_raster(a):
        _min, _max = a.min(), a.max()
        print(a.shape, _min, _max)
        return (255. * (a - _min) / (_max - _min) ).astype(np.uint8)
 
# loop through each band
for bi in range(ds.RasterCount):
    band = ds.GetRasterBand(bi + 1)
    # Read this band into a 2D NumPy array
    ar = band.ReadAsArray()
    #iterm.show_image(to_uint8_raster(ar))
    print('Band %d has type %s and shape (%d, %d)'% (bi + 1, ar.dtype, ar.shape[0], ar.shape[1]))
    raw = ar.tostring()


The commented iterm.show_image(to_uint8_raster(ar)) shows the image in the console, just change it to save it to a JPG or PNG at that point and you will be able to inspect the images.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by antor) | Reply
great, so tifffile has some problem (I did not test gdal though)
it is too late anyway.
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
Looks like I wrongly reported a file before. I tested with gdal it gives the following error.

gdal.Open('./fMoW-full/train/airport/airport_228/airport_228_4_rgb.tif')
ERROR 1: TIFFFetchDirectory:./fMoW-full/train/airport/airport_228/airport_228_4_rgb.tif: Can not read TIFF directory count
ERROR 1: TIFFReadDirectory:Failed to read directory at offset 719949068

sha1sum is 2a66dfb896a97449fca72d0fbc625e85ace63bb8

The file with 0 byte counts get opened fine though.
I installed gdal2.2.2
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
My computer died and I will not be able to test anything before either tomorrow (with some luck) or THU....
Re: inconsistent number of bands in 'ms' of Full data set (response to post by tcghanareddy) | Reply
I was able to open that image, but my SHA1 of 86F1A99C359B6F6481BDD182C40784D57BFBB2AC for airport_228_4_rgb.tif is different than yours.
RSS