SQL Domains: When to Use Them (And When Not To)

SQL Domains are powerful, but like any abstraction layer, they work better in some situations than others. Here’s a practical guide to the decision.

Use SQL Domains when:

The same semantic type appears in 3+ columns across your schema. Email addresses, phone numbers, status codes, country codes, monetary amounts — these are domain candidates.

The constraint logic is non-trivial. A REGEXP_LIKE email validation or a multi-value IN list is worth centralizing. A simple NUMBER column without constraints is not.

You want schema-level documentation to drive tooling. Domains are queryable — your data catalog, code generators, or validation tools can introspect them.

You’re building a new schema. Retrofitting domains is harder than designing with them from the start.

Be cautious when:

The constraint might need to vary per table. Domains enforce the same constraint everywhere they’re used. If one table needs CHECK (VALUE IN ('A','B')) and another needs CHECK (VALUE IN ('A','B','C')), that’s two domains, not one.

You have existing applications that don’t expect domain-level errors. Domain constraint violations raise the same ORA-02290 CHECK constraint error as inline constraints, but the constraint name includes the domain name — applications parsing constraint names may need updating.

You’re mid-migration on a legacy schema. Adding domains to existing columns requires an ALTER TABLE MODIFY that adds the domain. This can be done incrementally, but plan it carefully.

Creating domain variations correctly:

-- Don't force one domain to cover all cases
CREATE DOMAIN d_status_basic   AS VARCHAR2(10) CONSTRAINT CHECK (VALUE IN ('ACTIVE','INACTIVE'));
CREATE DOMAIN d_status_extended AS VARCHAR2(10) CONSTRAINT CHECK (VALUE IN ('ACTIVE','INACTIVE','PENDING','ARCHIVED'));

Two focused domains are better than one overly permissive domain or one domain with workaround constraints.

GROUP BY ALL: Advanced Analytics Patterns

GROUP BY ALL simplifies the common case, but it also works with advanced grouping operations. Let’s look at patterns where it really pays off.

Complex analytical queries with many dimensions:

-- Before 23ai: manually listing 6 non-aggregated columns in GROUP BY
SELECT
    region, country, product_line, product_family, channel, quarter,
    SUM(revenue)       AS total_revenue,
    SUM(units_sold)    AS total_units,
    AVG(unit_price)    AS avg_price
FROM sales_facts
GROUP BY region, country, product_line, product_family, channel, quarter;

-- With GROUP BY ALL:
SELECT
    region, country, product_line, product_family, channel, quarter,
    SUM(revenue), SUM(units_sold), AVG(unit_price)
FROM sales_facts
GROUP BY ALL;

Adding a new dimension (say, sales_rep_id) to the analysis: before 23ai, you must add it to both SELECT and GROUP BY. With GROUP BY ALL, you only add it to SELECT.

GROUP BY ROLLUP ALL — subtotals without listing columns:

SELECT region, country, SUM(revenue) AS total
FROM   sales_facts
GROUP BY ROLLUP ALL;
-- Equivalent to: GROUP BY ROLLUP(region, country)
-- Produces: region+country subtotals, region subtotals, grand total

GROUP BY CUBE ALL — cross-dimensional analysis:

SELECT channel, quarter, SUM(revenue)
FROM   sales_facts
GROUP BY CUBE ALL;
-- All combinations: channel+quarter, channel only, quarter only, grand total

Interaction with window functions:

GROUP BY ALL does not apply to window functions — they’re not aggregate expressions in the GROUP BY sense. Window functions work the same as always:

SELECT region, product_line, SUM(revenue) AS group_total,
       SUM(SUM(revenue)) OVER (PARTITION BY region) AS region_total
FROM   sales_facts
GROUP BY ALL;
-- GROUP BY ALL includes: region, product_line
-- Window function operates on the grouped results

GROUP BY ALL is at its best in analytical queries with 4+ dimensions where GROUP BY maintenance becomes error-prone.

Vector Search in Production: Patterns for AI-Native Applications

If you’re building a Retrieval-Augmented Generation (RAG) application on Oracle 23ai, here are the architectural patterns that work best in production — based on real implementations.

The basic RAG pipeline with Oracle:

-- 1. Store documents and embeddings
INSERT INTO knowledge_base (doc_id, content, embedding)
VALUES (:id, :text, TO_VECTOR(:embedding_array));

-- 2. At query time: find the most relevant documents
SELECT content, VECTOR_DISTANCE(embedding, :query_embed, COSINE) AS score
FROM   knowledge_base
WHERE  category = :filter_category    -- combine with relational filter
ORDER BY score
FETCH FIRST 5 ROWS ONLY;

Chunking strategy matters:

Large documents should be split into chunks (paragraphs, sentences, fixed token windows) before embedding. Each chunk becomes a separate row. The optimal chunk size depends on your embedding model — typically 256-512 tokens for OpenAI models.

-- Store chunks with parent document reference
CREATE TABLE knowledge_chunks (
    chunk_id   NUMBER GENERATED ALWAYS AS IDENTITY,
    doc_id     NUMBER REFERENCES knowledge_base(doc_id),
    chunk_seq  NUMBER,          -- position in document
    chunk_text CLOB,
    embedding  VECTOR(1536, FLOAT32),
    CONSTRAINT pk_chunk PRIMARY KEY (chunk_id)
);

Hybrid search — vector + keyword:

Pure semantic search misses exact matches. Combine vector similarity with Oracle Text full-text search:

SELECT k.doc_id, k.chunk_text,
       VECTOR_DISTANCE(k.embedding, :q_vec, COSINE) AS semantic_score,
       SCORE(1) AS keyword_score
FROM   knowledge_chunks k
WHERE  CONTAINS(k.chunk_text, :keyword, 1) > 0     -- keyword match
ORDER BY (0.6 * semantic_score + 0.4 * (SCORE(1)/100))
FETCH FIRST 10 ROWS ONLY;

Metadata filtering before vector search:

Always apply relational filters before the vector index scan to reduce the candidate set:

SELECT content, VECTOR_DISTANCE(embedding, :q_vec, COSINE) AS score
FROM   knowledge_chunks
WHERE  doc_category = 'LEGAL'           -- narrow the search space first
AND    created_date > DATE '2024-01-01'
ORDER BY score
FETCH FIRST 5 ROWS ONLY;

Oracle’s vector index (HNSW) performs much better when the candidate pool is pre-filtered to relevant documents.

Oracle 23ai in Production: What We’ve Learned After a Year

Oracle Database 23ai has been generally available since May 2024. After roughly 18 months of real-world deployments, a picture is emerging: where the new features deliver and where organizations run into friction.

What’s working well:

IF [NOT] EXISTS for DDL has been universally praised. It’s the kind of feature that pays dividends every single sprint. Migration scripts are cleaner, deployments are more predictable, and the volume of “deployment script failed on object already exists” incidents has dropped for teams that adopted 23ai.

SQL Domains are seeing strong adoption in organizations with mature data governance programs. Teams building new schemas from scratch in 23ai are embracing domains as a standard practice. Teams migrating existing schemas are more cautious — and rightly so. Retrofitting domains onto an existing schema requires careful planning.

The VECTOR data type and AI Vector Search have attracted significant interest from teams building RAG applications. The ability to store embeddings alongside relational data and combine vector similarity with SQL predicates is a genuine architectural advantage over separate vector stores.

Where organizations are moving cautiously:

JSON Relational Duality Views require careful design. Teams that tried to map complex, highly normalized schemas (10+ tables per “document”) found the duality view definition difficult to maintain. The sweet spot is 2-5 tables per document.

True Cache is still new enough that most organizations haven’t deployed it in production. Documentation and operational tooling (monitoring, sizing guidance) are still maturing.

Upgrade advice for 2026:

If you’re on 19c, evaluate 23ai for development environments now. Most organizations targeting 23ai for production are planning 2026-2027 timelines. That’s the right approach — test, validate, and migrate deliberately.