[SPARK-56221][SQL][PYTHON] Feature parity between spark.catalog.* vs DDL commands by HyukjinKwon · Pull Request #55025 · apache/spark

HyukjinKwon · 2026-03-26T00:44:09Z

What changes were proposed in this pull request?

SQL

SHOW CACHED TABLES: lists relations cached with an explicit name (CACHE TABLE, catalog.cacheTable, etc.); unnamed Dataset.cache() entries are not listed.

Catalog API (Scala / Java / PySpark)

listCachedTables(): same information as SHOW CACHED TABLES (CachedTable: name, storage level).
dropTable / dropView: drop persistent table or view (with ifExists, purge where applicable).
createDatabase / dropDatabase: create or drop a namespace (with options: ifNotExists / ifExists, cascade, properties map).
listPartitions: partition strings for a table (aligned with SHOW PARTITIONS).
listViews: list views in the current or given namespace; optional name pattern.
getTableProperties: all table properties (aligned with SHOW TBLPROPERTIES).
getCreateTableString: DDL from SHOW CREATE TABLE (optional asSerde).
truncateTable: remove all table data (not for views).
analyzeTable: ANALYZE TABLE ... COMPUTE STATISTICS (optional noScan).

Why are the changes needed?

Gives stable programmatic ways to do what users already do with SQL (SHOW CACHED TABLES, SHOW PARTITIONS, etc.), without routing everything through raw SQL.

Does this PR introduce any user-facing change?

Yes. New SQL command, new Catalog API API.

How was this patch tested?

Unittests were added.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2026-03-26T03:51:51Z

+    // [SQL] SafeJsonSerializer.safeMapToJValue: second parameter widened from Function1 to
+    // Function2 so the key is passed to the value serializer (progress.scala). Binary-incompatible
+    // vs spark-sql-api 4.0.0; not part of the public supported API (private[streaming] package).
+    ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.streaming.SafeJsonSerializer.safeMapToJValue"),


This is from 72fc87b

dongjoon-hyun

+1, LGTM.

srielau · 2026-03-26T20:27:24Z

+| tableName|      storageLevel|
+----------+------------------+
+|  my_table|   MEMORY_AND_DISK|
+----------+------------------+


Let's expose the fully qualified name here.
Ideally as three columns.

Actually.. How do you refer to this table? How woudl it qualify? Is it liek a temp table? What's the namespace...?

Yeah. SHOW CACHED TABLES shows the string stored when the cache was registered, not a normalized three part name. CACHE TABLE my_table AS SELECT ... creates a session local temp view named my_table and caches under that name.

So you refer to it like any other temp view in the session (my_table). There is no separate metastore catalog.database.table for that entry - only that registration string appears in SHOW CACHED TABLES (e.g. my_table).

CACHE TABLE spark_catalog.default.t (or CACHE TABLE on an existing table) uses the resolved multipart identifier, serialized as a single string (multipartIdentifier.quoted in the executor), so you typically see something like spark_catalog.default.t.

CREATE OR REPLACE TEMPORARY VIEW src AS SELECT 1 AS id; CACHE TABLE my_table AS SELECT * FROM src; CREATE TABLE default.demo_cached (id INT) USING parquet; INSERT INTO demo_cached VALUES (1), (2); CACHE TABLE spark_catalog.default.demo_cached; SHOW CACHED TABLES;

+---------------------------------+--------------------------------------+ |tableName |storageLevel | +---------------------------------+--------------------------------------+ |spark_catalog.default.demo_cached|Disk Memory Deserialized 1x Replicated| |my_table |Disk Memory Deserialized 1x Replicated| +---------------------------------+--------------------------------------+

srielau · 2026-03-26T20:29:16Z

+| tableName|      storageLevel|
+----------+------------------+
+|  my_table|   MEMORY_AND_DISK|
+----------+------------------+


Actually.. How do you refer to this table? How woudl it qualify? Is it liek a temp table? What's the namespace...?

HyukjinKwon · 2026-03-26T23:54:22Z

Merged to master.

cloud-fan · 2026-03-29T12:16:38Z

+### Examples
+
+```sql
+CACHE TABLE my_table AS SELECT * FROM src;


This seems like a new feature instead of feature parity, but anyway: can we add the more common example CACHE TABLE my_catalog.my_schema.my_table? CACHE TABLE my_table AS SELECT * FROM src is a shortcut of creating a temp view and then caching it. So generally speaking, SQL CACHE TABLE command can cache permanent tables and temp views (how about permanent views? I don't know, let's verify as well)

cloud-fan · 2026-03-29T12:20:06Z

+   *   name of the partitioned table; may be qualified with catalog and database (namespace).
+   * @since 4.2.0
+   */
+  def listPartitions(tableName: String): Dataset[CatalogTablePartition] = {


CatalogTablePartition is an internal API. We shouldn't return it.

See listTables, it returns a dedicated Table class added for this API, instead of the internal CatalogTable.

cloud-fan · 2026-03-29T12:21:48Z

+ * @since 4.2.0
+ */
+@Stable
+class CachedTable(val name: String, val storageLevel: String) extends DefinedByConstructorParams {


shall we follow Table and add namespace and catalog fields?

… SHOW CACHED TABLES / listCachedTables ### What changes were proposed in this pull request? Follow-up to #55025 addressing post-merge review comments: **1. Rename `CatalogTablePartition` → `TablePartition`** The public API class `org.apache.spark.sql.catalog.CatalogTablePartition` has the same name as the internal `org.apache.spark.sql.catalyst.catalog.CatalogTablePartition`, causing confusion and potential import ambiguity. **2. Remove `SHOW CACHED TABLES` SQL command and `listCachedTables()` catalog API** No other database has a `SHOW CACHED TABLES` command, and the programmatic `listCachedTables()` API was designed to complement it. Both are removed, along with the `CachedTable` class, connect proto `ListCachedTables`, and all related infrastructure. For SQL users who want to check cache status, a better approach would be to add an `isCached` column to the existing `SHOW TABLES` output, which achieves feature parity with the Scala/Python `isCached()` API. ### Why are the changes needed? - `CatalogTablePartition` name clashes with an existing internal class. - `SHOW CACHED TABLES` is a non-standard SQL command with no precedent in other databases; the programmatic API that complemented it is also unnecessary. ### Does this PR introduce _any_ user-facing change? Yes, within the unreleased master branch (4.2.0): - `CatalogTablePartition` is renamed to `TablePartition` (both Scala and Python). - `SHOW CACHED TABLES` SQL command is removed. - `spark.catalog.listCachedTables()` and the `CachedTable` class are removed. ### How was this patch tested? Removed tests for `SHOW CACHED TABLES`, `listCachedTables`, and the parser. Existing tests for `TablePartition` / `listPartitions` remain. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (claude-sonnet-4-6, claude-opus-4-6) Closes #55139 from cloud-fan/followup. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

HyukjinKwon force-pushed the SPARK-56221 branch from 57f3d46 to 480114d Compare March 26, 2026 03:51

HyukjinKwon commented Mar 26, 2026

View reviewed changes

dongjoon-hyun approved these changes Mar 26, 2026

View reviewed changes

HyukjinKwon force-pushed the SPARK-56221 branch 3 times, most recently from 6e0d342 to 45ceac2 Compare March 26, 2026 06:00

zhengruifeng approved these changes Mar 26, 2026

View reviewed changes

HyukjinKwon force-pushed the SPARK-56221 branch 4 times, most recently from f8e7eec to d04360a Compare March 26, 2026 11:38

sunchao approved these changes Mar 26, 2026

View reviewed changes

srielau reviewed Mar 26, 2026

View reviewed changes

srielau approved these changes Mar 26, 2026

View reviewed changes

Feature parity

3baabc2

HyukjinKwon force-pushed the SPARK-56221 branch from d04360a to 3baabc2 Compare March 26, 2026 23:51

HyukjinKwon closed this in 91d62ce Mar 26, 2026

cloud-fan reviewed Mar 29, 2026

View reviewed changes

cloud-fan mentioned this pull request Apr 1, 2026

[SPARK-56221][SQL][PYTHON][FOLLOWUP] Rename TablePartition and remove SHOW CACHED TABLES / listCachedTables #55139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56221][SQL][PYTHON] Feature parity between spark.catalog.* vs DDL commands#55025

[SPARK-56221][SQL][PYTHON] Feature parity between spark.catalog.* vs DDL commands#55025
HyukjinKwon wants to merge 1 commit intoapache:masterfrom
HyukjinKwon:SPARK-56221

HyukjinKwon commented Mar 26, 2026 •

edited

Loading

Uh oh!

HyukjinKwon Mar 26, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

srielau Mar 26, 2026

Uh oh!

srielau Mar 26, 2026

Uh oh!

HyukjinKwon Mar 26, 2026 •

edited

Loading

Uh oh!

srielau Mar 26, 2026

Uh oh!

HyukjinKwon commented Mar 26, 2026

Uh oh!

cloud-fan Mar 29, 2026

Uh oh!

cloud-fan Mar 29, 2026

Uh oh!

cloud-fan Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

HyukjinKwon commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

srielau Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

srielau Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srielau Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Mar 26, 2026

Uh oh!

cloud-fan Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HyukjinKwon commented Mar 26, 2026 •

edited

Loading

HyukjinKwon Mar 26, 2026 •

edited

Loading