Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caffe.io.array_to_datum fails with some dtypes #2391

Closed
tubaybb321 opened this issue Apr 29, 2015 · 7 comments
Closed

caffe.io.array_to_datum fails with some dtypes #2391

tubaybb321 opened this issue Apr 29, 2015 · 7 comments

Comments

@tubaybb321
Copy link

I think there may be a problem with caffe.io.array_to_datum. It only seems to convert arrays of type uint8 or float64. I would have expected it to handle arrays of type int32 and float32.

This is what I saw:

>>> import caffe
>>> from caffe.proto import caffe_pb2 
>>> import numpy as np
>>> ar = np.array([[[1., 2., 3., 4.,],[5., 6., 7., 8.,],[11., 22., 33., 44.]]])
>>> ar.shape
      (1, 3, 4)

>>> arU8 = ar.astype(np.uint8)
>>> arI32 = ar.astype(np.int32) 
>>> arF32 = ar.astype(np.float32)
>>> arF64 = ar.astype(np.float64)

>>> a =caffe.io.array_to_datum(arU8)

>>> b = caffe.io.array_to_datum(arI32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/smgutstein/Caffe/caffe/python/caffe/io.py", line 89, in array_to_datum
    datum.float_data.extend(arr.flat)  #SG sub
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/containers.py", line 128, in extend
    new_values.append(self._type_checker.CheckValue(elem))
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/type_checkers.py", line 103,  in CheckValue
    raise TypeError(message)
TypeError: 1 has type <type 'numpy.int32'>, but expected one of: (<type 'float'>, <type 'int'>, <type 'long'>)

>>> c = caffe.io.array_to_datum(arF32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/smgutstein/Caffe/caffe/python/caffe/io.py", line 89, in array_to_datum
    datum.float_data.extend(arr.flat)  
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/containers.py", line 128, in     extend
    new_values.append(self._type_checker.CheckValue(elem))
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/type_checkers.py", line 103,   in CheckValue
    raise TypeError(message)
TypeError: 1.0 has type <type 'numpy.float32'>, but expected one of: (<type 'float'>, <type 'int'>,     <type 'long'>)

>>> d = caffe.io.array_to_datum(arF64) 
>>>

The problem, I believe, lies in the use of the extend function to store any non-uint8 data in the float_data field of a Datum object. Although this is very robust for lists, the float_data field for a Datum object is a google.protobuf.internal.containers.RepeatedScalarFieldContainer, which is more type sensitive when using the extend function, as can be seen in the error trace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/smgutstein/Caffe/caffe/python/caffe/io.py", line 89, in array_to_datum
    datum.float_data.extend(arr.flat)  
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/containers.py", line 128, in  extend
    new_values.append(self._type_checker.CheckValue(elem))
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/type_checkers.py", line 103, in CheckValue
    raise TypeError(message)
TypeError: 1.0 has type <type 'numpy.float32'>, but expected one of: (<type 'float'>, <type 'int'>, <type 'long'>)

The two ways of I see to fix this are:

Either to change protobuf's type checking, so that it allows for numpy.types. I believe this would involve modifying the following section of code (lines 196-208 in type_checkers.py):

# Type-checkers for all scalar CPPTYPEs.
_VALUE_CHECKERS = {
    _FieldDescriptor.CPPTYPE_INT32: Int32ValueChecker(),
    _FieldDescriptor.CPPTYPE_INT64: Int64ValueChecker(),
    _FieldDescriptor.CPPTYPE_UINT32: Uint32ValueChecker(),
    _FieldDescriptor.CPPTYPE_UINT64: Uint64ValueChecker(),
    _FieldDescriptor.CPPTYPE_DOUBLE: TypeChecker(
        float, int, long),
    _FieldDescriptor.CPPTYPE_FLOAT: TypeChecker(
        float, int, long),
    _FieldDescriptor.CPPTYPE_BOOL: TypeChecker(bool, int),
    _FieldDescriptor.CPPTYPE_STRING: TypeChecker(bytes),
    }

Or, to modify array_to_datum, so that when it flattens an array, the returned iterator does the type casting for protobuf:

def array_to_datum(arr, label=0):
    """Converts a 3-dimensional array to datum. If the array has dtype uint8,
    the output data will be encoded as a string. Otherwise, the output data
    will be stored in float format.
    """
    class npIterCast():                                 # +                  
      '''Class added so that iterator                   # +
         over numpy array will return                   # +
         values of float or int type,                   # +
         not np.float or np.int'''                      # +
      def __init__(self, myIter, myCast):               # +
        self.myIter = myIter                            # +
        self.myCast = myCast                            # +
      def __iter__(self):                               # +
        return self                                     # +
      def next(self):                                   # +
        return myCast(self.myIter.next())               # +

    if not isinstance(arr,np.ndarray):                  # +
         raise TypeError('Expecting a numpy array')     # +
    if arr.ndim != 3:
        raise ValueError('Incorrect array shape.')
    datum = caffe_pb2.Datum()
    datum.channels, datum.height, datum.width = arr.shape
    if arr.dtype == np.uint8:
        datum.data = arr.tostring()
    else:
        if np.issubdtype(arr.dtype, np.int) or \                                     # +
           np.issubdtype(arr.dtype, np.unsignedinteger):                             # +
            myCast = int                                                             # +
        elif np.issubdtype(arr.dtype, np.float):                                     # +
            myCast = float                                                           # +
        else:                                                                        # +
            raise TypeError('Expecting a numpy array of either a float or int type') # +

        castIter = npIterCast(arr.flat, myCast)  # +
        datum.float_data.extend(castIter)        # +
        datum.float_data.extend(arr.flat)        # -
    datum.label = label
    return datum

This gave me the following results, so it appears to be working fine:

>>> import caffe 
>>> from caffe.proto import caffe_pb2
>>> import numpy as np
>>> ar = np.array([[[1., 2., 3., 4.,],[5., 6., 7., 8.,],[11., 22., 33., 44.]]])
>>>
>>> arU8 = ar.astype(np.uint8)
>>> arI32 = ar.astype(np.int32)
>>> arF32 = ar.astype(np.float32)
>>> arF64 = ar.astype(np.float64)
>>>
>>> a =caffe.io.array_to_datum(arU8)
>>> b = caffe.io.array_to_datum(arI32)
>>> c = caffe.io.array_to_datum(arF32)
>>> d = caffe.io.array_to_datum(arF64)
>>>
>>> a.float_data
[]
>>> a.data
'\x01\x02\x03\x04\x05\x06\x07\x08\x0b\x16!,'
>>>
>>> b.float_data
[1, 2, 3, 4, 5, 6, 7, 8, 11, 22, 33, 44]
>>> c.float_data
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 11.0, 22.0, 33.0, 44.0]
>>> d.float_data
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 11.0, 22.0, 33.0, 44.0]
>>>

There may still be an issue that other than uint8, all ints will be the system's default int type (i.e. either 32 or 64 bit), all unsigned ints will also be the system's default int type and all floats will be the default float type. But, for now, unless there is a better solution, I think it's OK to use this. Am I correct in believing this? Is there a better approach?

Thanks,

Steven

@rohrbach rohrbach added the JL label May 4, 2015
@longjon longjon changed the title Possible Error & Proposed Fix for caffe.io.array_to datum caffe.io.array_to_datum fails with some dtypes May 8, 2015
@longjon longjon added bug Python and removed JL labels May 8, 2015
@longjon
Copy link
Contributor

longjon commented May 8, 2015

Thanks for reporting the issue. I've changed the title to be a little more specific. We should probably just change arr.flat to arr.astype(float).flat; you're welcome to PR that.

@tubaybb321
Copy link
Author

@longjon - Thanks. I like your fix better. I suppose I got too fussy about maintaining variable type, which may not be truly important for this function.

I'd be happy to "PR" this, but I don't know what it means to "PR" something....

@seanbell
Copy link

@tubaybb321 PR stands for pull request (docs).

ajschumacher added a commit to ajschumacher/caffe that referenced this issue May 19, 2016
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
@ajschumacher
Copy link
Contributor

Hi @longjon, @seanbell, @tubaybb321! Pull request #4182 contains the suggested fix, with a little additional discussion in a commit message there as well. I'll copy in that message for those finding this thread:

As recommended by @longjon, [PR #4182] will allow caffe.io.array_to_datum to handle, for example, numpy.float32 arrays.

It might be worth noting that datum.float_data is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])

This behavior is somewhat hidden because datum_to_array returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:

  • Require and return float32, consistent with the protobuf representation.
  • Change the protobuf to allow float32 or float64 and update surrounding code to support this.

@Coderx7
Copy link
Contributor

Coderx7 commented Oct 16, 2016

is this fixed in newer builds? I'm facing the same issue here where I'm stuck with float64 and I run out of memory because of it (cant save the normalized data into a leveldb , I have 32G of RAM by the way!)

@shelhamer
Copy link
Member

Fixed in #2391

shelhamer added a commit that referenced this issue Apr 13, 2017
convert non-uint8 dtypes to float; refs #2391
stingshen pushed a commit to stingshen/caffe-faster-rcnn that referenced this issue Jun 7, 2017
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
acmiyaguchi pushed a commit to acmiyaguchi/caffe that referenced this issue Nov 13, 2017
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
volgy pushed a commit to Fazecast/caffe that referenced this issue Jan 17, 2018
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
gauenk pushed a commit to PurdueCAM2Project/caffe that referenced this issue Feb 7, 2018
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
oscarriddle pushed a commit to oscarriddle/caffe that referenced this issue Mar 18, 2018
As recommended by @longjon, this will allow `caffe.io.array_to_datum` to handle, for example, numpy.float32 arrays.

It might be worth noting that `datum.float_data` is stored as protobuf type 2, which is float32, as opposed to protobuf type 1, which is float64. It is a little unintuitive that caffe currently requires data to be passed in as float64 but then writes float32 to LMDB. To demonstrate this:

```python
datum = caffe.io.array_to_datum(np.array([[[0.9]]]))
caffe.io.datum_to_array(datum)
# array([[[ 0.9]]])
datum_str = datum.SerializeToString()
new_datum = caffe.proto.caffe_pb2.Datum()
new_datum.ParseFromString(datum_str)
caffe.io.datum_to_array(new_datum)
# array([[[ 0.89999998]]])
```

This behavior is somewhat hidden because `datum_to_array` returns type float64, even though the data doesn't actually have that resolution if it has been stored as protobuf text anywhere (for example in LMDB).

Alternative solutions:
 * Require and return float32, consistent with the protobuf representation.
 * Change the protobuf to allow float32 or float64 and update surrounding code to support this.
@sara-eb
Copy link

sara-eb commented Jun 1, 2018

@longjon Thanks your solution worked for me, very simple and helpful. I just save the array as float before sending to arr_dat=caffe.io.array_to_datum(arr)

arr=arr.astype(float)
arr_dat=caffe.io.array_to_datum(arr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants