I’ve been exploring how different DataTypes in Spark SQL are imported from line delimited json to try to understand which DataTypes can be used for a semi-structured data set I’m converting to parquet files. The data won’t all be processed at once and the schema will need to grow, so it’s imperative that the parquet files have schemas that are compatible.

The only one I really can’t get working yet is the CalendarIntervalType.

Looking at the Spark source files literals.scala and CalendarInterval.java I would assume that CalendarInterval.fromString is called with the value, however I’m just getting nulls back when passing in a value like ‘interval 2 days’ which, when passed to CalendarInterval.fromString, returns a non-null CalendarInterval.

Source code for the tests is at: https://github.com/dyanarose/dlr-spark


-------------- DecimalType --------------
------- DecimalType Input
{'decimal': 1.2345}
{'decimal': 1}
{'decimal': 234.231}
{'decimal': Infinity}
{'decimal': -Infinity}
{'decimal': NaN}
{'decimal': '1'}
{'decimal': '1.2345'}
{'decimal': null}

------- DecimalType Inferred Schema
 |-- decimal: string (nullable = true)

|    decimal|
|     1.2345|
|          1|
|    234.231|
| "Infinity"|
|      "NaN"|
|          1|
|     1.2345|
|       null|

------- DecimalType Set Schema
 |-- decimal: decimal(6,3) (nullable = true)

|  1.235|
|  1.000|
|   null|
|   null|
|   null|
|   null|
|   null|
|   null|

-------------- BooleanType --------------
------- BooleanType Input
{'boolean': true}
{'boolean': false}
{'boolean': 'false'}
{'boolean': 'true'}
{'boolean': null}
{'boolean': 1}
{'boolean': 0}
{'boolean': '1'}
{'boolean': '0'}
{'boolean': 'a'}

------- BooleanType Inferred Schema
 |-- boolean: string (nullable = true)

|   true|
|  false|
|  false|
|   true|
|   null|
|      1|
|      0|
|      1|
|      0|
|      a|

------- BooleanType Set Schema
 |-- boolean: boolean (nullable = true)

|   true|
|  false|
|   null|
|   null|
|   null|
|   null|
|   null|
|   null|
|   null|
|   null|

-------------- ByteType --------------
------- ByteType Input
{'byte': 'a'}
{'byte': 'b'}
{'byte': 1}
{'byte': 0}
{'byte': 5}
{'byte': null}

------- ByteType Inferred Schema
 |-- byte: string (nullable = true)

|   a|
|   b|
|   1|
|   0|
|   5|

------- ByteType Set Schema
 |-- byte: byte (nullable = true)

|   1|
|   0|
|   5|

-------------- CalendarIntervalType --------------
------- CalendarIntervalType Input
{'calendarInterval': 'interval 2 days'}
{'calendarInterval': 'interval 1 week'}
{'calendarInterval': 'interval 5 years'}
{'calendarInterval': 'interval 6 months'}
{'calendarInterval': 10}
{'calendarInterval': 'interval a'}
{'calendarInterval': null}

------- CalendarIntervalType Inferred Schema
 |-- calendarInterval: string (nullable = true)

| calendarInterval|
|  interval 2 days|
|  interval 1 week|
| interval 5 years|
|interval 6 months|
|               10|
|       interval a|
|             null|

------- CalendarIntervalType Set Schema
 |-- calendarInterval: calendarinterval (nullable = true)

|            null|
|            null|
|            null|
|            null|
|            null|
|            null|
|            null|

-------------- DateType --------------
------- DateType Input
{'date': '2016-04-24'}
{'date': '0001-01-01'}
{'date': '9999-12-31'}
{'date': '2016-04-24 12:10:01'}
{'date': 1461496201000}
{'date': null}

------- DateType Inferred Schema
 |-- date: string (nullable = true)

|               date|
|         2016-04-24|
|         0001-01-01|
|         9999-12-31|
|2016-04-24 12:10:01|
|      1461496201000|
|               null|

------- DateType Set Schema
 |-- date: date (nullable = true)

|      date|
|      null|
|      null|

-------------- DoubleType --------------
------- DoubleType Input
{'double': 1.23456}
{'double': 1}
{'double': 1.7976931348623157E308}
{'double': -1.7976931348623157E308}
{'double': Infinity}
{'double': -Infinity}
{'double': NaN}
{'double': '1'}
{'double': '1.23456'}
{'double': null}

------- DoubleType Inferred Schema
 |-- double: string (nullable = true)

|              double|
|             1.23456|
|                   1|
|          "Infinity"|
|         "-Infinity"|
|               "NaN"|
|                   1|
|             1.23456|
|                null|

------- DoubleType Set Schema
 |-- double: double (nullable = true)

|              double|
|             1.23456|
|                 1.0|
|            Infinity|
|           -Infinity|
|                 NaN|
|                null|
|                null|
|                null|

-------------- FloatType --------------
------- FloatType Input
{'float': 1.23456}
{'float': 1}
{'float': 3.4028235E38}
{'float': -3.4028235E38}
{'float': Infinity}
{'float': -Infinity}
{'float': NaN}
{'float': '1'}
{'float': '1.23456'}
{'float': null}

------- FloatType Inferred Schema
 |-- float: string (nullable = true)

|        float|
|      1.23456|
|            1|
| 3.4028235E38|
|   "Infinity"|
|  "-Infinity"|
|        "NaN"|
|            1|
|      1.23456|
|         null|

------- FloatType Set Schema
 |-- float: float (nullable = true)

|        float|
|      1.23456|
|          1.0|
| 3.4028235E38|
|     Infinity|
|    -Infinity|
|          NaN|
|         null|
|         null|
|         null|

-------------- IntegerType --------------
------- IntegerType Input
{'integer': 1}
{'integer': 2147483647}
{'integer': -2147483648}
{'integer': 2147483648}
{'integer': '1'}
{'integer': 1.23456}
{'integer': '1.23456'}
{'integer': null}

------- IntegerType Inferred Schema
 |-- integer: string (nullable = true)

|    integer|
|          1|
| 2147483647|
| 2147483648|
|          1|
|    1.23456|
|    1.23456|
|       null|

------- IntegerType Set Schema
 |-- integer: integer (nullable = true)

|    integer|
|          1|
| 2147483647|
|       null|
|       null|
|       null|
|       null|
|       null|

-------------- LongType --------------
------- LongType Input
{'long': 1}
{'long': 9223372036854775807}
{'long': -9223372036854775808}
{'long': '1'}
{'long': 1.23456}
{'long': '1.23456'}
{'long': null}

------- LongType Inferred Schema
 |-- long: string (nullable = true)

|                long|
|                   1|
| 9223372036854775807|
|                   1|
|             1.23456|
|             1.23456|
|                null|

------- LongType Set Schema
 |-- long: long (nullable = true)

|                long|
|                   1|
| 9223372036854775807|
|                null|
|                null|
|                null|
|                null|

-------------- MapType --------------
------- MapType Input
{'map': {'a_key': 'a value', 'b_key': 'b value'}}
{'map': {'key': 1, 'key1': null}}
{'map': null}

------- MapType Inferred Schema
 |-- map: struct (nullable = true)
 |    |-- a_key: string (nullable = true)
 |    |-- b_key: string (nullable = true)
 |    |-- key: long (nullable = true)
 |    |-- key1: string (nullable = true)

|                 map|
|[a value,b value,...|
|  [null,null,1,null]|
|                null|

------- MapType Set Schema
 |-- map: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

|                 map|
|Map(a_key -> a va...|
|Map(key -> 1, key...|
|                null|

-------------- NullType --------------
------- NullType Input
{'null': null}
{'null': true}
{'null': false}
{'null': 1}
{'null': 0}
{'null': '1'}
{'null': '0'}
{'null': 'a'}

------- NullType Inferred Schema
 |-- null: string (nullable = true)

| null|
| null|
| true|
|    1|
|    0|
|    1|
|    0|
|    a|

------- NullType Set Schema
 |-- null: null (nullable = true)


-------------- ShortType --------------
------- ShortType Input
{'short': 0}
{'short': 1}
{'short': 32767}
{'short': -32768}
{'short': 32768}
{'short': 1.23456}
{'short': '0'}
{'short': '1'}
{'short': '1.23456'}
{'short': null}

------- ShortType Inferred Schema
 |-- short: string (nullable = true)

|  short|
|      0|
|      1|
|  32767|
| -32768|
|  32768|
|      0|
|      1|
|   null|

------- ShortType Set Schema
 |-- short: short (nullable = true)

| short|
|     0|
|     1|
| 32767|
|  null|
|  null|
|  null|
|  null|
|  null|
|  null|

-------------- TimestampType --------------
------- TimestampType Input
{'timestamp': '2016-04-24'}
{'timestamp': '2016-04-24 12:10:01'}
{'timestamp': 1461496201000}
{'timestamp': '0001-01-01'}
{'timestamp': '9999-12-31'}
{'timestamp': null}

------- TimestampType Inferred Schema
 |-- timestamp: string (nullable = true)

|          timestamp|
|         2016-04-24|
|2016-04-24 12:10:01|
|      1461496201000|
|         0001-01-01|
|         9999-12-31|
|               null|

------- TimestampType Set Schema
 |-- timestamp: timestamp (nullable = true)

|           timestamp|
|2016-04-24 00:00:...|
|2016-04-24 12:10:...|
|2016-04-24 12:10:...|
|0001-01-01 00:00:...|
|9999-12-31 00:00:...|
|                null|