Incompatible changes and deprecated APIs in 0.10.0
Gerrit #3737 The Java client has been repackaged under org.apache.kudu instead of org.kududb. Import statements for Kudu>
Gerrit #3055 The Java client’s synchronous API methods now throw KuduException instead of Exception. Existing code that catches Exception should still compile, but introspection of an exception’s message may be impacted. This change was made to allow thrown exceptions to be queried more easily using KuduException.getStatus and calling one of Status’s methods. For example, an operation that tries to delete a table that doesn’t exist would return a `Status that returns true when queried on isNotFound().
The Java client’s KuduTable.getTabletsLocations set of methods is now deprecated. Additionally, they now take an exclusive end partition key instead of an inclusive key. Applications are encouraged to use the scan tokens API instead of these methods in the future.
The C++ API for specifying split points on range-partitioned tables has been improved to make it easier for callers to properly manage the ownership of the provided rows.
The TableCreator::split_rows API took a vector, which made it very difficult for the calling application to do proper error handling with cleanup when setting the fields of the KuduPartialRow. This API has been now been deprecated and replaced by a new method TableCreator::add_range_split which allows easier use of smart pointers for safe memory management.
The Java client’s internal buffering has been reworked. Previously, the number of buffered write operations was constrained on a per-tablet-server basis. Now, the configured maximum buffer>
This change can negatively affect the write performance of Java clients which>setMutationBufferSpace API to increase a session’s maximum buffer>
The "remote bootstrap" process used to copy a tablet replica from one host to another has been renamed to "Tablet Copy". This resulted in the renaming of several RPC metrics. Any users previously explicitly fetching or monitoring metrics>
The SparkSQL datasource for Kudu no longer supports mode Overwrite. Users should use the new KuduContext.upsertRowsmethod instead. Additionally, inserts using the datasource are now upserts by default. The older behavior can be restored by setting the operation parameter to insert.
New features
Users may now manually manage the partitioning of a range-partitioned table. When a table is created, the user may specify a set of range partitions that do not cover the entire available key space. A user may add or drop range partitions to existing tables.
This feature can be particularly helpful with time series workloads in which new partitions can be created on an hourly or daily basis. Old partitions may be efficiently dropped if the application does not need to retain historical data past a certain point.
This feature is considered experimental for the 0.10>
Support for running Kudu clusters with multiple masters has been stabilized. Users may start a cluster with three or five masters to provide fault tolerance despite a failure of one or two masters, respectively.
Note that certain tools (e.g. ksck) are still lacking complete support for multiple masters. These deficiencies will be addressed in a following>
Kudu now supports the ability to reserve a certain amount of free disk space in each of its configured data directories. If a directory’s free disk space drops to less than the configured minimum, Kudu will stop writing to that directory until space becomes available. If no space is available in any configured directory, Kudu will abort.
This feature may be configured using the fs_data_dirs_reserved_bytes and fs_wal_dir_reserved_bytes flags.
The Spark integration’s KuduContext now supports four new methods for writing to Kudu tables: insertRows, upsertRows,updateRows, and deleteRows. These are now the preferred way to write to Kudu tables from Spark.