Pandas integers handling

Python 3.x doesn't have a limit of integer length. You can us as big integers as you like as long as they fit into your machine's memory. But Pandas uses int64 dtype by default which does have limitations.
I had run some experiments (in pandas 0.20.3) and here's what I've found:
  • pd default integer type is int64
  • maximum int64 value is 2**63-1
  • creating a series with 2**63 results in dtype uint64
  • creating a series with 2**63 AND with a negative value results in dtype object
  • creating a series with 2**64 results in OverflowError
  • creating a series with negative value and 2**64 afterwards results in dtype object
  • creating a series with 2**64 and negative value afterwards results in OverflowError
Here's some code to test (I'll reformat it one day):

In[2]: import pandas as pd
In[3]: pd.Series([0, 1])
Out[3]: 
0    0
1    1
dtype: int64

In[4]: pd.Series([0, 2**63-1])
Out[4]: 
0                      0
1    9223372036854775807
dtype: int64

In[5]: pd.Series([0, 2**63-1, 2**63])
Out[5]: 
0                      0
1    9223372036854775807
2    9223372036854775808
dtype: uint64

In[6]: pd.Series([0, 2**63-1, 2**63, -1])
Out[6]: 
0                      0
1    9223372036854775807
2    9223372036854775808
3                     -1
dtype: object

In[7]: pd.Series([0, 2**63-1, 2**63, 2**64])
...
OverflowError: Python int too large to convert to C unsigned long

In[8]: pd.Series([0, 2**63-1, 2**63, 2**64, -1])
...
OverflowError: Python int too large to convert to C unsigned long

In[9]: pd.Series([0, 2**63-1, 2**63, -1, 2**64])
Out[9]: 
0                       0
1     9223372036854775807
2     9223372036854775808
3                      -1
4    18446744073709551616
dtype: object

Comments