Skip to content

Commit 02e168f

Browse files
committed
Add documentation site with zensical and rdoc API reference.
1 parent c042ec2 commit 02e168f

File tree

25 files changed

+2154
-0
lines changed

25 files changed

+2154
-0
lines changed

.github/workflows/docs.yml

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
name: Deploy Docs
2+
3+
on:
4+
push:
5+
branches: [master]
6+
paths:
7+
- 'docs/**'
8+
- 'lib/**'
9+
- 'ext/**'
10+
- 'zensical.toml'
11+
- '.github/workflows/docs.yml'
12+
workflow_dispatch:
13+
14+
permissions:
15+
contents: read
16+
pages: write
17+
id-token: write
18+
19+
concurrency:
20+
group: pages
21+
cancel-in-progress: false
22+
23+
jobs:
24+
build:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- uses: actions/checkout@v4
28+
29+
- name: Set up Python
30+
uses: actions/setup-python@v5
31+
with:
32+
python-version: '3.12'
33+
34+
- name: Set up Ruby
35+
uses: ruby/setup-ruby@v1
36+
with:
37+
ruby-version: '4.0'
38+
bundler-cache: false
39+
40+
- name: Install zensical
41+
run: pip install zensical
42+
43+
- name: Copy changelog to docs
44+
run: cp CHANGELOG.md docs/changelog.md
45+
46+
- name: Build guide docs
47+
run: zensical build --clean
48+
49+
- name: Build API reference
50+
run: rdoc --format aliki --output site/reference --title 'LibXML Ruby API' --line-numbers --charset=utf-8 --exclude lib/xml.rb --exclude lib/xml/libxml.rb --main README.md ext/**/libxml.c ext/**/ruby_xml.c ext/**/*.c lib/**/*.rb README.md
51+
52+
- name: Upload artifact
53+
uses: actions/upload-pages-artifact@v3
54+
with:
55+
path: site
56+
57+
deploy:
58+
needs: build
59+
runs-on: ubuntu-latest
60+
environment:
61+
name: github-pages
62+
url: ${{ steps.deployment.outputs.page_url }}
63+
steps:
64+
- name: Deploy to GitHub Pages
65+
id: deployment
66+
uses: actions/deploy-pages@v4

docs/architecture/memory.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Memory Management
2+
3+
libxml-ruby automatically manages memory for the underlying libxml2 C library. This page explains the ownership model and how the bindings keep Ruby objects and libxml2 C structures in sync.
4+
5+
## Ownership Model
6+
7+
libxml2 has a simple ownership rule: an `xmlDocPtr` owns the tree attached to it, and `xmlFreeDoc` frees the document and the entire attached tree. When code unlinks a node with `xmlUnlinkNode`, that detached subtree is no longer document-owned and must either be reattached or freed with `xmlFreeNode`.
8+
9+
libxml-ruby sits on top of that model. In the normal case, the document is the owner. Ruby node and attr objects do not own the libxml node or attr they point at. They are references into libxml-owned memory, and their mark functions keep the owning Ruby document alive while Ruby still has live references into the tree.
10+
11+
In the diagram below:
12+
13+
- solid lines mean `owns`
14+
- blue dashed lines mean `references` a libxml C object
15+
- red dashed lines mean `mark`, which is a Ruby-to-Ruby GC reference
16+
17+
```mermaid
18+
flowchart TB
19+
DocWrap["Ruby XML::Document"]
20+
XDoc["xmlDocPtr"]
21+
NodeWrap["Ruby XML::Node"]
22+
XNode["xmlNodePtr"]
23+
AttrWrap["Ruby XML::Attr"]
24+
XAttr["xmlAttrPtr"]
25+
26+
DocWrap -->|owns| XDoc
27+
XDoc -->|owns| XNode
28+
XNode -->|owns| XAttr
29+
30+
NodeWrap -.references.-> XNode
31+
AttrWrap -.references.-> XAttr
32+
NodeWrap -.mark.-> DocWrap
33+
AttrWrap -.mark.-> DocWrap
34+
DocWrap ~~~ NodeWrap
35+
DocWrap ~~~ AttrWrap
36+
NodeWrap ~~~ AttrWrap
37+
38+
classDef ruby fill:#f4a0a0,stroke:#8b1f1b,stroke-width:2px;
39+
classDef xml fill:#e8f1ff,stroke:#5b84c4,stroke-width:2px;
40+
class DocWrap,NodeWrap,AttrWrap ruby;
41+
class XDoc,XNode,XAttr xml;
42+
linkStyle 3,4 stroke:#5b84c4,stroke-width:2px,stroke-dasharray: 6 4;
43+
linkStyle 5,6 stroke:#cc342d,stroke-width:2px,stroke-dasharray: 6 4;
44+
```
45+
46+
The solid ownership chain is the important part. `XML::Document` owns the `xmlDocPtr`. The `xmlDocPtr` owns the tree, and the `xmlNodePtr` owns its attrs. The dashed lines are references, not ownership. The blue dashed edges mean Ruby objects reference libxml objects. The red dashed `mark` edges mean a live Ruby node or attr keeps the Ruby document alive during GC so the underlying tree is not freed while Ruby still references it.
47+
48+
## Detached Root Nodes
49+
50+
[Detached nodes](../xml/nodes.md#detached-nodes) are the one exception to the document-owns-everything model. A newly created node is Ruby-owned until it is inserted into a document tree. Removing a node transfers ownership back to Ruby.
51+
52+
Internally, this is managed by `rxml_node_manage` (Ruby takes ownership), `rxml_node_unmanage` (libxml takes ownership), and `rxml_node_free` (frees a detached node on GC).
53+
54+
## Object Identity
55+
56+
Because temporary wrappers are created on demand, accessing the same node twice may return different Ruby objects:
57+
58+
```ruby
59+
child1 = node.children[0]
60+
child2 = node.children[0]
61+
62+
child1 == child2 # => true (same underlying node)
63+
child1.equal?(child2) # => false (different Ruby objects)
64+
```
65+
66+
Use `==` or `eql?` to compare nodes, not `equal?`.
67+
68+
Documents and detached root nodes do maintain identity through the [registry](registry.md) — retrieving the same document or detached root always returns the same Ruby object.
69+
70+
## Preventing Premature Collection
71+
72+
Keep a reference to the document (or a managed root node) as long as you use any of its nodes:
73+
74+
```ruby
75+
# Safe - doc stays in scope
76+
doc = XML::Parser.file('data.xml').parse
77+
nodes = doc.find('//item')
78+
nodes.each { |n| process(n) }
79+
80+
# Risky - doc may be collected
81+
nodes = XML::Parser.file('data.xml').parse.find('//item')
82+
GC.start # doc could be freed here
83+
nodes.first.name # potential crash
84+
```
85+
86+
## GC Sweep Order
87+
88+
During garbage collection (or at program exit), Ruby does not guarantee the order in which objects are freed. The document object is almost always freed before any child node wrappers. This is safe because child node wrappers are non-owning — they have no free function. The document's free function calls `xmlFreeDoc`, which recursively frees the entire tree. The child wrappers simply become stale and are collected without action.

docs/architecture/registry.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Pointer Registry
2+
3+
The bindings need to map libxml2 C pointers back to their Ruby wrapper objects. This is used for two purposes:
4+
5+
1. **Object identity** - returning the same Ruby object when the same C pointer is encountered again (documents and detached root nodes)
6+
2. **GC reachability** - mark functions look up the owning Ruby document to keep it alive while Ruby references exist into the tree
7+
8+
## Design
9+
10+
The registry is a pointer-keyed `st_table` in `ruby_xml_registry.c` with three operations:
11+
12+
```c
13+
void rxml_registry_register(void *ptr, VALUE obj);
14+
void rxml_registry_unregister(void *ptr);
15+
VALUE rxml_registry_lookup(void *ptr); /* Qnil on miss */
16+
```
17+
18+
The registry is **not** a GC root. It does not keep objects alive. Objects stay alive through the normal mark chains — mark functions look up the registry instead of holding direct references.
19+
20+
## What Gets Registered
21+
22+
Only objects that own their underlying C structure are registered:
23+
24+
| C pointer | Ruby wrapper | Registered when |
25+
|-----------|-------------|-----------------|
26+
| `xmlDocPtr` | `XML::Document` | Document is created or parsed |
27+
| detached root `xmlNodePtr` | `XML::Node` | Node is created or detached via `remove!` |
28+
29+
Document-owned child nodes are **not** registered. They are lightweight, non-owning wrappers that get fresh Ruby objects each time they are accessed.
30+
31+
## How Mark Functions Use It
32+
33+
When Ruby's GC runs the mark phase, node and attr mark functions look up the owning document through the registry:
34+
35+
```mermaid
36+
flowchart TD
37+
Registry["internal registry"]
38+
DocWrap["Ruby XML::Document"]
39+
XDoc["xmlDocPtr"]
40+
DetachedWrap["Detached Ruby XML::Node"]
41+
DetachedNode["detached root xmlNodePtr"]
42+
ChildWrap["Ruby XML::Node"]
43+
ChildNode["document-owned xmlNodePtr"]
44+
45+
DocWrap -->|owns| XDoc
46+
XDoc -->|owns| ChildNode
47+
DetachedWrap -->|owns| DetachedNode
48+
49+
ChildWrap -.references.-> ChildNode
50+
ChildWrap -.mark.-> DocWrap
51+
52+
XDoc -.references.-> Registry
53+
DetachedNode -.references.-> Registry
54+
Registry -.references.-> DocWrap
55+
Registry -.references.-> DetachedWrap
56+
57+
classDef ruby fill:#f4a0a0,stroke:#8b1f1b,stroke-width:2px;
58+
classDef xml fill:#e8f1ff,stroke:#5b84c4,stroke-width:2px;
59+
classDef registry fill:#f5ebcf,stroke:#b89632,stroke-width:2px;
60+
class DocWrap,DetachedWrap,ChildWrap ruby;
61+
class XDoc,DetachedNode,ChildNode xml;
62+
class Registry registry;
63+
linkStyle 3,5,6 stroke:#5b84c4,stroke-width:2px,stroke-dasharray: 6 4;
64+
linkStyle 4,7,8 stroke:#cc342d,stroke-width:2px,stroke-dasharray: 6 4;
65+
```
66+
67+
For an attached node, the mark function reads `xnode->doc` (maintained by libxml2), looks up the document in the registry, and marks the Ruby document object. For a detached subtree, it walks to the root via parent pointers, looks up the root in the registry, and marks it.
68+
69+
## Lifecycle
70+
71+
Registered pointers must be unregistered before the underlying C structure is freed:
72+
73+
- `rxml_document_free` unregisters the `xmlDocPtr` before calling `xmlFreeDoc`
74+
- `rxml_node_free` unregisters the detached root before calling `xmlFreeNode`
75+
- `rxml_node_unmanage` unregisters when a detached node is attached to a document (libxml takes ownership)

docs/getting_started.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Getting Started
2+
3+
## Requiring the Library
4+
5+
There are several ways to load libxml-ruby:
6+
7+
```ruby
8+
# Recommended - keeps everything under the LibXML namespace
9+
require 'libxml-ruby'
10+
document = LibXML::XML::Document.new
11+
```
12+
13+
```ruby
14+
# Convenience - mixes LibXML into the global namespace
15+
require 'xml'
16+
document = XML::Document.new
17+
```
18+
19+
```ruby
20+
# In your own namespace
21+
require 'libxml-ruby'
22+
23+
module MyApp
24+
include LibXML
25+
26+
class Processor
27+
def parse(file)
28+
XML::Document.file(file)
29+
end
30+
end
31+
end
32+
```
33+
34+
## Choosing a Parser
35+
36+
libxml-ruby provides four parsers, each suited to different use cases:
37+
38+
| Parser | Best For |
39+
|--------|----------|
40+
| `XML::Parser` | General-purpose DOM parsing. Loads the entire document into a tree. |
41+
| `XML::Reader` | Large documents that don't fit in memory. Pull-based streaming API. |
42+
| `XML::HTMLParser` | Parsing HTML documents (tolerates malformed markup). |
43+
| `XML::SaxParser` | Event-driven parsing with callbacks. |
44+
45+
## Data Sources
46+
47+
All parsers support multiple data sources:
48+
49+
```ruby
50+
# From a file
51+
doc = XML::Parser.file('data.xml').parse
52+
53+
# From a string
54+
doc = XML::Parser.string('<root/>').parse
55+
56+
# From an IO object
57+
File.open('data.xml') do |io|
58+
doc = XML::Parser.io(io).parse
59+
end
60+
```
61+
62+
## A Complete Example
63+
64+
```ruby
65+
require 'libxml-ruby'
66+
67+
# Parse
68+
doc = LibXML::XML::Document.file('books.xml')
69+
70+
# Navigate
71+
root = doc.root
72+
puts root.name
73+
74+
# Find nodes with XPath
75+
doc.find('//book[@year > 2000]').each do |book|
76+
title = book.find_first('title').content
77+
puts title
78+
end
79+
80+
# Create new content
81+
new_book = LibXML::XML::Node.new('book')
82+
new_book['year'] = '2024'
83+
new_book << LibXML::XML::Node.new('title', 'New Book')
84+
root << new_book
85+
86+
# Save
87+
doc.save('books_updated.xml', indent: true)
88+
```

docs/index.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# libxml-ruby
2+
3+
Ruby language bindings for the [GNOME Libxml2](http://xmlsoft.org/) XML toolkit. It is free software, released under the MIT License.
4+
5+
libxml-ruby stands out because of:
6+
7+
* **Speed** - Much faster than REXML
8+
* **Features** - Full DOM, SAX, Reader, Writer, XPath, validation (DTD, RelaxNG, XML Schema) and more
9+
* **Conformance** - Passes all 1800+ tests from the OASIS XML Tests Suite
10+
11+
## Quick Example
12+
13+
```ruby
14+
require 'libxml-ruby'
15+
16+
# Parse a document
17+
doc = LibXML::XML::Document.file('books.xml')
18+
19+
# Find nodes with XPath
20+
doc.find('//book').each do |node|
21+
puts node['title']
22+
end
23+
24+
# Validate against a schema
25+
schema = LibXML::XML::Schema.new('books.xsd')
26+
doc.validate_schema(schema)
27+
```
28+
29+
## Requirements
30+
31+
libxml-ruby requires Ruby 3.2 or higher and depends on [libxml2](http://xmlsoft.org/).
32+
33+
## License
34+
35+
libxml-ruby is released under the [MIT License](https://github.com/xml4r/libxml-ruby/blob/master/LICENSE).

0 commit comments

Comments
 (0)