Skip to content

Commit f54c099

Browse files
BryanFaubleclaude
andcommitted
Rewrite and expand CLAUDE.md coverage to 16 files
Rewrote 5 existing files with enhanced behavioral conventions (reusable utilities, conditional behavior, concurrency patterns). Added 11 new module-level files for full directory coverage: operations, models/mixins, models/services, models/protocols, core/upload, core/download, core/constants, core/credentials, extensions/curator, synapseutils, and docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent eb74b2b commit f54c099

16 files changed

Lines changed: 446 additions & 28 deletions

File tree

CLAUDE.md

Lines changed: 28 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,20 @@ pip install -e ".[docs]" && mkdocs serve
3838
### Async-first with generated sync wrappers
3939
All new methods must be async with `_async` suffix. The `@async_to_sync` class decorator (`core/async_utils.py`) auto-generates sync counterparts at class definition time. Never write sync methods manually on model classes — the decorator handles it.
4040

41+
### `wrap_async_to_sync()` for standalone functions
42+
Use `wrap_async_to_sync()` (not `@async_to_sync`) for free-standing async functions outside of classes — see `operations/` layer for the pattern. The class decorator only works on classes.
43+
4144
### Protocol classes for sync type hints
4245
Each model in `models/` has a corresponding protocol in `models/protocols/` defining the sync method signatures. When adding a new async method to a model, add its sync signature to the protocol class so IDE type hints work.
4346

4447
### Dataclass models with `fill_from_dict()`
4548
Models are `@dataclass` classes, NOT Pydantic. REST responses are deserialized via `fill_from_dict()` methods on each model. New models must follow this pattern.
4649

4750
### Concrete types are Java class names
48-
`core/constants/concrete_types.py` maps Java class names (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding new entity types, register the concrete type string here.
51+
`core/constants/concrete_types.py` maps Java class names (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding new entity types, register the concrete type string here AND in `api/entity_factory.py` AND in `models/mixins/asynchronous_job.py` if it's an async job type.
52+
53+
### Options dataclass pattern
54+
The `operations/` layer uses dataclass option objects (`StoreFileOptions`, `FileOptions`, `TableOptions`, etc.) to bundle type-specific configuration for CRUD operations. Follow this pattern for new entity-type-specific options.
4955

5056
### Mixin composition for shared behavior
5157
Shared functionality lives in `models/mixins/` (AccessControllable, StorableContainer, AsynchronousJob, etc.). Prefer adding to existing mixins over duplicating logic across models.
@@ -60,23 +66,30 @@ Use `SYNPY-{issue_number}` or `synpy-{issue_number}` prefix for feature branches
6066

6167
```
6268
synapseclient/
63-
├── client.py # Synapse class — public entry point, REST methods, auth
64-
├── api/ # REST API layer — one file per resource type
65-
├── models/ # Dataclass entities (Project, File, Table, etc.)
66-
│ ├── protocols/ # Sync method type signatures for IDE hints
67-
│ ├── mixins/ # Shared behavior (ACL, containers, async jobs)
68-
│ └── services/ # Model-level business logic
69-
├── operations/ # High-level CRUD: get(), store(), delete()
69+
├── client.py # Synapse class — public entry point, REST methods, auth (9600+ lines)
70+
├── api/ # REST API layer — one file per resource type (21 files)
71+
│ └── entity_factory.py # Polymorphic entity deserialization via concrete type dispatch
72+
├── models/ # Dataclass entities (Project, File, Table, etc.) (28 files)
73+
│ ├── protocols/ # Sync method type signatures for IDE hints (18 files)
74+
│ ├── mixins/ # Shared behavior (ACL, containers, async jobs, tables) (7 files)
75+
│ └── services/ # Model-level business logic (storable_entity, search)
76+
├── operations/ # High-level CRUD: get(), store(), delete() — factory dispatch
7077
├── core/ # Infrastructure: upload/download, retry, cache, creds, OTel
7178
│ ├── upload/ # Multipart upload (sync + async)
7279
│ ├── download/ # File download (sync + async)
73-
│ ├── credentials/ # Auth chain (PAT, OAuth, env vars, config file)
74-
│ └── constants/ # Concrete types, config keys, limits
75-
├── extensions/ # Optional modules (curator)
76-
└── entity.py, table.py, ... # Legacy classes (pre-OOP rewrite)
80+
│ ├── credentials/ # Auth chain (PAT, env var, config file, AWS SSM)
81+
│ ├── constants/ # Concrete types, config keys, limits, method flags
82+
│ ├── models/ # ACL, Permission, DictObject, custom JSON serialization
83+
│ └── multithread_download/ # Threaded download manager
84+
├── extensions/
85+
│ └── curator/ # Schema curation (pandas, networkx, rdflib) — optional
86+
├── services/ # JSON schema validation services
87+
└── entity.py, table.py, ... # Legacy classes (pre-OOP rewrite, read-only)
88+
89+
synapseutils/ # Legacy bulk utilities (copy, sync, migrate, walk) — sync-only
7790
```
7891

79-
Data flows: Client REST methods → API service functionsModels with `fill_from_dict()` → returned to caller. The `operations/` layer provides a simpler interface over this chain.
92+
Data flow: User → `operations/` factory → model async methods`api/` service functions → `client.py` REST calls → Synapse API. Responses deserialized via `fill_from_dict()` on model instances.
8093

8194
## Constraints
8295

@@ -85,6 +98,8 @@ Data flows: Client REST methods → API service functions → Models with `fill_
8598
- Unit tests must not make network calls — `pytest-socket` blocks all sockets. Use `pytest-mock` for HTTP mocking.
8699
- `develop` is the default/main branch, not `main` or `master`. PRs target `develop`.
87100
- Legacy classes in root `synapseclient/` (entity.py, table.py, etc.) are kept for backwards compatibility. New features go in `models/` using the dataclass pattern.
101+
- Avoid adding new methods to `client.py` (9600+ lines) — prefer the `api/` + `models/` layered pattern.
102+
- `synapseutils/` is legacy sync-only (uses `requests`, NOT `httpx`). Do not add async methods there — new async equivalents go in `models/` or `operations/`.
88103

89104
## Testing
90105

docs/CLAUDE.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
User-facing documentation for the Synapse Python Client. Built with MkDocs + Material theme, deployed via GitHub Pages. Follows the Diataxis documentation framework with four content types: tutorials, guides, reference, and explanations.
6+
7+
## Stack
8+
9+
MkDocs with Material theme, mkdocstrings (Google-style docstrings), termynal (CLI animations), markdown-include (file embedding).
10+
11+
## Conventions
12+
13+
### Content types (Diataxis framework)
14+
- **tutorials/** — Step-by-step learning (competence-building). Themed around a biomedical researcher working with Alzheimer's Disease data. Progressive build-up: Project → Folder → File → Annotations → etc.
15+
- **guides/** — How-to guides for specific use cases (problem-solution oriented). Includes extension-specific guides (curator).
16+
- **reference/** — API reference auto-generated from docstrings via mkdocstrings. Split into `experimental/sync/` and `experimental/async/` for new OOP API.
17+
- **explanations/** — Deep conceptual content ("why" not just "how"). Design decisions, internal machinery.
18+
19+
### File inclusion pattern (markdown-include)
20+
Tutorial code lives in `tutorials/python/tutorial_scripts/*.py` and is embedded in markdown via line-range includes:
21+
```markdown
22+
{!docs/tutorials/python/tutorial_scripts/annotation.py!lines=5-23}
23+
```
24+
Single source of truth — edit the `.py` file, not the markdown. Changing line numbers in scripts requires updating the line ranges in the corresponding `.md` files.
25+
26+
### mkdocstrings reference generation
27+
Reference markdown files use `::: synapseclient.ClassName` syntax to trigger auto-generation from docstrings. Key configuration:
28+
- `docstring_style: google` — parse Google-style docstrings
29+
- `members_order: source` — preserve source code order
30+
- `filters: ["!^_", "!to_synapse_request", "!fill_from_dict"]` — private members, `to_synapse_request()`, and `fill_from_dict()` are excluded from docs
31+
- `inherited_members: true` — shows mixin methods on inheriting classes
32+
- Member lists are explicit — each reference page specifies which methods to document
33+
34+
### Anchor links for cross-referencing
35+
Pattern: `[](){ #reference-anchor }` in reference pages. Tutorials link to reference via `[API Reference][project-reference-sync]`. Explicit type hints use: `[syn.login][synapseclient.Synapse.login]`.
36+
37+
### termynal CLI animations
38+
Terminal animation blocks marked with `<!-- termynal -->` HTML comment. Prompts configured as `$` or `>`. Used in authentication.md and installation docs.
39+
40+
### Custom CSS (`css/custom.css`)
41+
- API reference indentation: `doc-contents` has 25px left padding with border
42+
- Smaller table font (0.7rem) for API docs
43+
- Wide layout: `max-width: 1700px` for complex content
44+
45+
### Navigation structure
46+
Defined in `mkdocs.yml` nav section. 5 main sections: Home, Tutorials, How-To Guides, API Reference, Further Reading, News. API Reference has ~85 markdown files (~40 legacy, ~45 experimental).
47+
48+
## Constraints
49+
50+
- Do not edit tutorial code inline in markdown — edit the `.py` script file in `tutorial_scripts/` and update line ranges if needed.
51+
- Reference docs auto-generate from source docstrings — to change method documentation, edit the docstring in the Python source, not the markdown.
52+
- `mkdocs.yml` is at the repo root, not in `docs/` — it configures the entire doc build.
53+
- Docs deploy via `mkdocs gh-deploy --force` targeting the `master` branch (not `develop`).

synapseclient/api/CLAUDE.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,28 +18,31 @@ async def verb_resource(
1818
- All functions are `async def`
1919
- `synapse_client` is always the last parameter, keyword-only (after `*`)
2020
- Use `Synapse.get_client(synapse_client=synapse_client)` to get the client instance
21-
- Use `TYPE_CHECKING` guard for `Synapse` import to avoid circular dependencies
21+
- Use `TYPE_CHECKING` guard for `Synapse` import — avoids circular dependencies between `api/` and `client.py`
2222

2323
### REST call pattern
2424
```python
2525
client = Synapse.get_client(synapse_client=synapse_client)
2626
return await client.rest_post_async(uri="/endpoint", body=json.dumps(request))
2727
```
28-
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint.
28+
Available methods: `rest_get_async`, `rest_post_async`, `rest_put_async`, `rest_delete_async`. Pass `endpoint=client.fileHandleEndpoint` for file handle operations; omit for the default repository endpoint. Use `json.dumps()` for request bodies — not raw dicts.
2929

3030
### Return values
3131
- Most functions return raw `Dict[str, Any]` — transformation happens in the model layer via `fill_from_dict()`
32-
- Some return dataclass instances (e.g., `EntityHeader`) when the data is only used internally
32+
- Some return typed dataclass instances (e.g., `EntityHeader` from `entity_services.py`) when the data is only used internally
3333
- Delete operations return `None`
3434

3535
### Pagination
3636
Use helpers from `api_client.py`:
37-
- `rest_get_paginated_async()` — for GET endpoints with limit/offset. Expects `results` or `children` key.
38-
- `rest_post_paginated_async()` — for POST endpoints with `nextPageToken`. Expects `page` array.
37+
- `rest_get_paginated_async()` — for GET endpoints with limit/offset. Expects `results` or `children` key in response.
38+
- `rest_post_paginated_async()` — for POST endpoints with `nextPageToken`. Expects `page` array in response.
3939
Both are async generators yielding individual items.
4040

41+
### Entity factory (`entity_factory.py`)
42+
Polymorphic entity deserialization via concrete type dispatch. Maps Java class names from `core/constants/concrete_types.py` to model classes. When adding a new entity type, register the type mapping here.
43+
4144
### Adding a new service file
4245
1. Create `synapseclient/api/new_service.py`
43-
2. Import and add all public functions to `api/__init__.py` and its `__all__`
46+
2. Add all public functions to `api/__init__.py` imports and `__all__` — every public function must be re-exported
4447
3. Use `json.dumps()` for request bodies (not dict)
4548
4. Reference `entity_services.py` for CRUD pattern, `table_services.py` for pagination pattern

synapseclient/core/CLAUDE.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,19 @@ Infrastructure layer — authentication, file transfer, retry logic, caching, Op
99
### async_to_sync decorator (`async_utils.py`)
1010
- Scans class for `*_async` methods and creates sync wrappers stripping the suffix
1111
- Uses `ClassOrInstance` descriptor — methods work on both class and instance
12-
- Detects running event loop: uses `nest_asyncio.apply()` for nested loops, raises on Python 3.14+
13-
- `wrap_async_to_sync()` for standalone functions (not class methods)
14-
- `wrap_async_generator_to_sync_generator()` for async generators — must `aclose()` in finally block
12+
- Detects running event loop: uses `nest_asyncio.apply()` for nested loops (Python <3.14), raises `RuntimeError` on Python 3.14+ instructing users to call async directly
13+
- `wrap_async_to_sync()` for standalone functions (not class methods) — used in `operations/` layer
14+
- `wrap_async_generator_to_sync_generator()` for async generators — must call `aclose()` in finally block
15+
- `@skip_async_to_sync` decorator excludes specific methods from sync wrapper generation (sets `_skip_conversion = True`)
16+
- `@otel_trace_method()` wraps async methods with OpenTelemetry spans. Format: `f"{ClassName}_{Operation}: ID: {self.id}, Name: {self.name}"`
1517

1618
### Retry patterns (`retry.py`)
17-
- `with_retry()`simple exponential backoff, fixed retry count (default 3)
18-
- `with_retry_time_based_async()` — time-bounded (default 20 min), exponential backoff with jitter
19+
- `with_retry()`count-based exponential backoff (default 3 retries), jitter 0.5-1.5x multiplier
20+
- `with_retry_time_based_async()` — time-bounded (default 20 min), exponential backoff with 0.01-0.1 random jitter
1921
- Default retryable status codes: `[429, 500, 502, 503, 504]`
20-
- `NON_RETRYABLE_ERRORS` list overrides status code retry (e.g., "is not a table or view")
22+
- `NON_RETRYABLE_ERRORS` list overrides status code retry (currently: `["is not a table or view"]`)
2123
- 429 throttling: wait bumps to 16 seconds minimum
24+
- Sets OTel span attribute `synapse.retries` on retry
2225

2326
### Credentials chain (`credentials/`)
2427
Provider chain tries in order: login args → config file → env var (`SYNAPSE_AUTH_TOKEN`) → AWS SSM. Credentials implement `requests.auth.AuthBase`, adding `Authorization: Bearer` header. Profile selection via `SYNAPSE_PROFILE` env var or `--profile` arg.
@@ -30,7 +33,28 @@ Provider chain tries in order: login args → config file → env var (`SYNAPSE_
3033
- Progress via `tqdm`; multi-threaded uploads suppress per-file messages via `cumulative_transfer_progress`
3134

3235
### concrete_types.py
33-
Maps Java class names from Synapse REST API for polymorphic deserialization. When adding a new entity type, add its concrete type string here AND in `api/entity_factory.py` type map.
36+
Maps Java class names from Synapse REST API for polymorphic deserialization. When adding a new entity type, add its concrete type string here AND in `api/entity_factory.py` type map AND in `models/mixins/asynchronous_job.py` ASYNC_JOB_URIS if it's an async job type.
37+
38+
### Key reusable utilities (`utils.py`)
39+
- `delete_none_keys(d)` — removes None-valued keys from dict. MUST call before all API requests — Synapse rejects null values.
40+
- `id_of(obj)` — extracts Synapse ID from entity, dict, or string
41+
- `concrete_type_of(entity)` — gets the concrete type string from an entity
42+
- `get_synid_and_version(id_str)` — parses "synXXX.N" strings into (id, version) tuples
43+
- `merge_dataclass_entities(source, dest, ...)` — merges fields from one dataclass into another
44+
- `log_dataclass_diff(obj1, obj2)` — logs field-by-field differences between two dataclass instances
45+
- `snake_case(name)` — converts camelCase to snake_case
46+
- `normalize_whitespace(s)` — collapses whitespace
47+
- `MB`, `KB`, `GB` — byte size constants
48+
- `make_bogus_data_file()`, `make_bogus_binary_file(n)`, `make_bogus_uuid_file()` — test file generators (in production code, used by tests)
49+
50+
### Exception hierarchy (`exceptions.py`)
51+
`SynapseError` base with 14+ subclasses: `SynapseHTTPError`, `SynapseMd5MismatchError`, `SynapseFileNotFoundError`, `SynapseNotFoundError`, `SynapseAuthenticationError`, etc. `_raise_for_status()` and `_raise_for_status_httpx()` handle HTTP error responses with Bearer token redaction via `BEARER_TOKEN_PATTERN` regex.
52+
53+
### Rolled-up subdirectories
54+
55+
**`core/models/`** — Internal dataclasses for ACL, Permission, DictObject (dict-like base class), and custom JSON serialization utilities. `DictObject` (`dict_object.py`) provides dot-notation access to dict entries.
56+
57+
**`core/multithread_download/`** — Threaded download manager with `shared_executor()` context manager for external thread pool configuration. Uses `DownloadRequest` dataclass. Default part size: `SYNAPSE_DEFAULT_DOWNLOAD_PART_SIZE`.
3458

3559
## Constraints
3660

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
Centralized constants used across the codebase — concrete type mappings, API limits, collision modes, and config file keys.
6+
7+
## Conventions
8+
9+
### concrete_types.py — 3-way registration required
10+
Maps Java class name strings (e.g., `org.sagebionetworks.repo.model.FileEntity`) for polymorphic entity deserialization. When adding a new entity or job type, register in THREE places:
11+
1. `concrete_types.py` — add the constant string
12+
2. `api/entity_factory.py` — add to the type dispatch map
13+
3. `models/mixins/asynchronous_job.py` `ASYNC_JOB_URIS` — add if it's an async job type
14+
15+
### limits.py
16+
`MAX_FILE_HANDLE_PER_COPY_REQUEST = 100` and other API batch size limits.
17+
18+
### method_flags.py
19+
Collision handling modes for file downloads: `COLLISION_OVERWRITE_LOCAL`, `COLLISION_KEEP_LOCAL`, `COLLISION_KEEP_BOTH`.
20+
21+
### config_file_constants.py
22+
Section and key names for the `~/.synapseConfig` file. `AUTHENTICATION_SECTION_NAME` identifies the auth section.
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
Authentication credential providers implementing a chain-of-responsibility pattern for token resolution.
6+
7+
## Conventions
8+
9+
### Provider chain order (priority)
10+
1. **UserArgsCredentialsProvider** — explicit login args passed to `syn.login()`
11+
2. **ConfigFileCredentialsProvider**`~/.synapseConfig` file (profile-aware via sections)
12+
3. **EnvironmentVariableCredentialsProvider**`SYNAPSE_AUTH_TOKEN` env var
13+
4. **AWSParameterStoreCredentialsProvider** — AWS SSM Parameter Store (via `SYNAPSE_TOKEN_AWS_SSM_PARAMETER_NAME` env var)
14+
15+
### Profile selection
16+
Select profile via `SYNAPSE_PROFILE` env var or `--profile` CLI arg. If username provided in login args differs from config file username, config credentials are rejected — prevents ambiguity.
17+
18+
### Token handling
19+
`SynapseAuthTokenCredentials` implements `requests.auth.AuthBase`, adding `Authorization: Bearer` header. JWT validation failure is silent (logs warning, does not raise) — allows tokens with unrecognized formats to attempt API calls.
20+
21+
## Constraints
22+
23+
- Bearer tokens must never appear in logs — redact with `BEARER_TOKEN_PATTERN` regex before logging.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!-- Last reviewed: 2026-03 -->
2+
3+
## Project
4+
5+
File download from Synapse storage with MD5 validation, collision handling, and progress tracking.
6+
7+
## Conventions
8+
9+
### Primary download path
10+
`download_async.py` is the primary async download implementation. `download_functions.py` contains shared helpers and the sync download wrapper.
11+
12+
### MD5 validation
13+
Post-transfer MD5 validation is mandatory. Raises `SynapseMd5MismatchError` on mismatch — the download is retried automatically (60 retries spanning ~30 minutes).
14+
15+
### Collision handling
16+
Controlled by `if_collision` parameter, using constants from `core/constants/method_flags.py`:
17+
- `overwrite.local` — replace existing local file
18+
- `keep.local` — skip download if local file exists
19+
- `keep.both` — rename downloaded file to avoid collision
20+
21+
### Progress tracking
22+
Uses `shared_download_progress_bar` from `core/transfer_bar.py` for tqdm-based progress. Multi-file downloads track cumulative progress via `cumulative_transfer_progress`.
23+
24+
### Key helpers
25+
- `ensure_download_location_is_directory()` — validates/creates download directory
26+
- `download_by_file_handle()` — downloads a file given its handle metadata

0 commit comments

Comments
 (0)