Skip to content

xr.DataSet.from_dataframe / xr.DataArray.from_series does not preserve DateTimeIndex with timezone #3291

Description

@fjanoos

Problem Description

When using DataSet.from_dataframe (DataArray.from_series) to convert a pandas dataframe with DateTimeIndex having a timezone - xarray convert the datetime into a nanosecond index - rather than keeping it as a datetime-index type.

MCVE Code Sample

print( df.index ) 
DatetimeIndex(['2000-01-03 16:00:00-05:00', '2000-01-03 16:00:00-05:00',
               '2000-01-03 16:00:00-05:00', '2000-01-03 16:00:00-05:00',
               ...
               '2019-08-20 16:00:00-05:00', '2019-08-20 16:00:00-05:00'],
              dtype='datetime64[ns, EST]', name='time', length=12713014, freq=None)
ds = xr.DataSet.from_dataframe( df.head( 1000 )  ) 
print( ds['time'] )
<xarray.DataArray 'time' (time: 7)>
array([946933200000000000, 947019600000000000, 947106000000000000,
       947192400000000000, 947278800000000000, 947538000000000000,
       947624400000000000, ...], dtype=object)
Coordinates:
  * time     (time) object 946933200000000000 ... 947624400000000000

Expected Output

After removing the tz localization from the DateTimeIndex of the dataframe , the conversion to a DataSet preserves the time-index (without converting it to nanoseconds)

df.index = df.index.tz_convert('UTC').tz_localize(None)
ds = xr.DataSet.from_dataframe( df.head(1000) ) 
print( ds['time] )
<xarray.DataArray 'time' (time: 7)>
array(['2000-01-03T21:00:00.000000000', '2000-01-04T21:00:00.000000000',
       '2000-01-05T21:00:00.000000000', '2000-01-06T21:00:00.000000000',
       '2000-01-07T21:00:00.000000000', '2000-01-10T21:00:00.000000000',
       '2000-01-11T21:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 2000-01-03T21:00:00 ... 2000-01-11T21:00:00

Output of xr.show_versions()

Details INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None

xarray: 0.12.3+81.g41fecd86
pandas: 0.24.2
numpy: 1.16.2
scipy: 1.2.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 1.1.4
distributed: 1.26.0
matplotlib: 3.0.3
cartopy: None
seaborn: 0.9.0
numbagg: None
setuptools: 40.8.0
pip: 19.0.3
conda: 4.7.11
pytest: 4.3.1
IPython: 7.4.0
sphinx: 1.8.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions