Skip to content

Commit 57f3d46

Browse files
committed
Feature parity
1 parent 9bbc7b7 commit 57f3d46

30 files changed

Lines changed: 2081 additions & 65 deletions

File tree

docs/sql-performance-tuning.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ Spark SQL can cache tables using an in-memory columnar format by calling `spark.
3030
Then Spark SQL will scan only required columns and will automatically tune compression to minimize
3131
memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableName")` or `dataFrame.unpersist()` to remove the table from memory.
3232

33+
To list relations cached with an explicit name, use `SHOW CACHED TABLES` in SQL or `spark.catalog.listCachedTables()`. Entries cached only via `Dataset.cache()` without a name are not included.
34+
3335
Configuration of in-memory caching can be done via `spark.conf.set` or by running
3436
`SET key=value` commands using SQL.
3537

docs/sql-ref-syntax-aux-cache-cache-table.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ CACHE TABLE testCache OPTIONS ('storageLevel' 'DISK_ONLY') SELECT * FROM testDat
7979

8080
### Related Statements
8181

82+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
8283
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
8384
* [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
8485
* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)

docs/sql-ref-syntax-aux-cache-uncache-table.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ UNCACHE TABLE t1;
4949
### Related Statements
5050

5151
* [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
52+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
5253
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)
5354
* [REFRESH TABLE](sql-ref-syntax-aux-cache-refresh-table.html)
5455
* [REFRESH](sql-ref-syntax-aux-cache-refresh.html)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
layout: global
3+
title: SHOW CACHED TABLES
4+
displayTitle: SHOW CACHED TABLES
5+
license: |
6+
Licensed to the Apache Software Foundation (ASF) under one or more
7+
contributor license agreements. See the NOTICE file distributed with
8+
this work for additional information regarding copyright ownership.
9+
The ASF licenses this file to You under the Apache License, Version 2.0
10+
(the "License"); you may not use this file except in compliance with
11+
the License. You may obtain a copy of the License at
12+
13+
http://www.apache.org/licenses/LICENSE-2.0
14+
15+
Unless required by applicable law or agreed to in writing, software
16+
distributed under the License is distributed on an "AS IS" BASIS,
17+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
18+
See the License for the specific language governing permissions and
19+
limitations under the License.
20+
---
21+
22+
### Description
23+
24+
The `SHOW CACHED TABLES` statement returns every in-memory cache entry that was registered with an explicit table or view name, for example via [`CACHE TABLE`](sql-ref-syntax-aux-cache-cache-table.html) or `spark.catalog.cacheTable`. The result has two columns: `tableName` (the name used when caching) and `storageLevel` (a string description of how the data is cached).
25+
26+
Relations cached only through `Dataset.cache()` / `DataFrame.cache()` without assigning a catalog name are **not** listed.
27+
28+
### Syntax
29+
30+
```sql
31+
SHOW CACHED TABLES
32+
```
33+
34+
### Examples
35+
36+
```sql
37+
CACHE TABLE my_table AS SELECT * FROM src;
38+
39+
SHOW CACHED TABLES;
40+
+----------+------------------+
41+
| tableName| storageLevel|
42+
+----------+------------------+
43+
| my_table| MEMORY_AND_DISK|
44+
+----------+------------------+
45+
```
46+
47+
### Related Statements
48+
49+
* [CACHE TABLE](sql-ref-syntax-aux-cache-cache-table.html)
50+
* [UNCACHE TABLE](sql-ref-syntax-aux-cache-uncache-table.html)
51+
* [CLEAR CACHE](sql-ref-syntax-aux-cache-clear-cache.html)

docs/sql-ref-syntax-aux-show.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ license: |
1919
limitations under the License.
2020
---
2121

22+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
2223
* [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html)
2324
* [SHOW CREATE TABLE](sql-ref-syntax-aux-show-create-table.html)
2425
* [SHOW DATABASES](sql-ref-syntax-aux-show-databases.html)

docs/sql-ref-syntax.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ You use SQL scripting to execute procedural logic in SQL.
124124
* [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html)
125125
* [SET](sql-ref-syntax-aux-conf-mgmt-set.html)
126126
* [SET VAR](sql-ref-syntax-aux-set-var.html)
127+
* [SHOW CACHED TABLES](sql-ref-syntax-aux-show-cached-tables.html)
127128
* [SHOW COLUMNS](sql-ref-syntax-aux-show-columns.html)
128129
* [SHOW CREATE TABLE](sql-ref-syntax-aux-show-create-table.html)
129130
* [SHOW DATABASES](sql-ref-syntax-aux-show-databases.html)

python/docs/source/reference/pyspark.sql/catalog.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,30 +25,41 @@ Catalog
2525
.. autosummary::
2626
:toctree: api/
2727

28+
Catalog.analyzeTable
2829
Catalog.cacheTable
2930
Catalog.clearCache
31+
Catalog.createDatabase
3032
Catalog.createExternalTable
3133
Catalog.createTable
3234
Catalog.currentCatalog
3335
Catalog.currentDatabase
3436
Catalog.databaseExists
37+
Catalog.dropDatabase
3538
Catalog.dropGlobalTempView
39+
Catalog.dropTable
3640
Catalog.dropTempView
41+
Catalog.dropView
3742
Catalog.functionExists
43+
Catalog.getCreateTableString
3844
Catalog.getDatabase
3945
Catalog.getFunction
4046
Catalog.getTable
47+
Catalog.getTableProperties
4148
Catalog.isCached
49+
Catalog.listCachedTables
4250
Catalog.listCatalogs
4351
Catalog.listColumns
4452
Catalog.listDatabases
4553
Catalog.listFunctions
54+
Catalog.listPartitions
4655
Catalog.listTables
56+
Catalog.listViews
4757
Catalog.recoverPartitions
4858
Catalog.refreshByPath
4959
Catalog.refreshTable
5060
Catalog.registerFunction
5161
Catalog.setCurrentCatalog
5262
Catalog.setCurrentDatabase
5363
Catalog.tableExists
64+
Catalog.truncateTable
5465
Catalog.uncacheTable

python/pyspark/sql/catalog.py

Lines changed: 131 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
import sys
1919
import warnings
20-
from typing import Any, Callable, NamedTuple, List, Optional, TYPE_CHECKING
20+
from typing import Any, Callable, Dict, NamedTuple, List, Optional, TYPE_CHECKING
2121

2222
from pyspark.errors import PySparkTypeError
2323
from pyspark.storagelevel import StorageLevel
@@ -77,6 +77,15 @@ class Function(NamedTuple):
7777
isTemporary: bool
7878

7979

80+
class CachedTable(NamedTuple):
81+
name: str
82+
storageLevel: str
83+
84+
85+
class CatalogTablePartition(NamedTuple):
86+
partition: str
87+
88+
8089
class Catalog:
8190
"""User-facing catalog API, accessible through `SparkSession.catalog`.
8291
@@ -161,6 +170,127 @@ def listCatalogs(self, pattern: Optional[str] = None) -> List[CatalogMetadata]:
161170
)
162171
return catalogs
163172

173+
def listCachedTables(self) -> List[CachedTable]:
174+
"""Lists named in-memory cache entries (same as ``SHOW CACHED TABLES``).
175+
176+
.. versionadded:: 4.2.0
177+
"""
178+
iter = self._jcatalog.listCachedTables().toLocalIterator()
179+
out: List[CachedTable] = []
180+
while iter.hasNext():
181+
j = iter.next()
182+
out.append(CachedTable(name=j.name(), storageLevel=j.storageLevel()))
183+
return out
184+
185+
def dropTable(self, tableName: str, ifExists: bool = False, purge: bool = False) -> None:
186+
"""Drops a persistent table.
187+
188+
.. versionadded:: 4.2.0
189+
"""
190+
self._jcatalog.dropTable(tableName, ifExists, purge)
191+
192+
def dropView(self, viewName: str, ifExists: bool = False) -> None:
193+
"""Drops a persistent view.
194+
195+
.. versionadded:: 4.2.0
196+
"""
197+
self._jcatalog.dropView(viewName, ifExists)
198+
199+
def createDatabase(
200+
self, dbName: str, ifNotExists: bool = False, properties: Optional[Dict[str, str]] = None
201+
) -> None:
202+
"""Creates a namespace (database).
203+
204+
.. versionadded:: 4.2.0
205+
"""
206+
ju = self._sc._gateway.jvm.java.util
207+
m = ju.HashMap()
208+
if properties:
209+
for k, v in properties.items():
210+
m.put(k, v)
211+
self._jcatalog.createDatabase(dbName, ifNotExists, m)
212+
213+
def dropDatabase(self, dbName: str, ifExists: bool = False, cascade: bool = False) -> None:
214+
"""Drops a namespace.
215+
216+
.. versionadded:: 4.2.0
217+
"""
218+
self._jcatalog.dropDatabase(dbName, ifExists, cascade)
219+
220+
def listPartitions(self, tableName: str) -> List[CatalogTablePartition]:
221+
"""Lists partitions (same as ``SHOW PARTITIONS``).
222+
223+
.. versionadded:: 4.2.0
224+
"""
225+
iter = self._jcatalog.listPartitions(tableName).toLocalIterator()
226+
out: List[CatalogTablePartition] = []
227+
while iter.hasNext():
228+
j = iter.next()
229+
out.append(CatalogTablePartition(partition=j.partition()))
230+
return out
231+
232+
def listViews(self, dbName: Optional[str] = None, pattern: Optional[str] = None) -> List[Table]:
233+
"""Lists views in a namespace.
234+
235+
.. versionadded:: 4.2.0
236+
"""
237+
if pattern is not None and dbName is None:
238+
dbName = self.currentDatabase()
239+
if dbName is None:
240+
iter = self._jcatalog.listViews().toLocalIterator()
241+
elif pattern is None:
242+
iter = self._jcatalog.listViews(dbName).toLocalIterator()
243+
else:
244+
iter = self._jcatalog.listViews(dbName, pattern).toLocalIterator()
245+
views = []
246+
while iter.hasNext():
247+
jtable = iter.next()
248+
jnamespace = jtable.namespace()
249+
if jnamespace is not None:
250+
namespace = [jnamespace[i] for i in range(0, len(jnamespace))]
251+
else:
252+
namespace = None
253+
views.append(
254+
Table(
255+
name=jtable.name(),
256+
catalog=jtable.catalog(),
257+
namespace=namespace,
258+
description=jtable.description(),
259+
tableType=jtable.tableType(),
260+
isTemporary=jtable.isTemporary(),
261+
)
262+
)
263+
return views
264+
265+
def getTableProperties(self, tableName: str) -> Dict[str, str]:
266+
"""Returns table properties as a dict.
267+
268+
.. versionadded:: 4.2.0
269+
"""
270+
jm = self._jcatalog.getTableProperties(tableName)
271+
return {k: jm.get(k) for k in jm.keySet()}
272+
273+
def getCreateTableString(self, tableName: str, asSerde: bool = False) -> str:
274+
"""Returns ``SHOW CREATE TABLE`` DDL for a relation.
275+
276+
.. versionadded:: 4.2.0
277+
"""
278+
return self._jcatalog.getCreateTableString(tableName, asSerde)
279+
280+
def truncateTable(self, tableName: str) -> None:
281+
"""Truncates a table.
282+
283+
.. versionadded:: 4.2.0
284+
"""
285+
self._jcatalog.truncateTable(tableName)
286+
287+
def analyzeTable(self, tableName: str, noScan: bool = False) -> None:
288+
"""Runs ``ANALYZE TABLE ... COMPUTE STATISTICS``.
289+
290+
.. versionadded:: 4.2.0
291+
"""
292+
self._jcatalog.analyzeTable(tableName, noScan)
293+
164294
def currentDatabase(self) -> str:
165295
"""
166296
Returns the current default database in this session.

0 commit comments

Comments
 (0)