Overview
Academic paper presenting a Boolean encrypted search system using approximate sets for secure indexes. Each index supports membership tests for trapdoors derived from plaintext search keys, with explicit information retrieval performance measures based on false positive rates.
Cryptographic Model
Trapdoor Function
Converts plaintext search keys to encrypted form:
y ← h(x ++ k)
where:
h
is a cryptographic hash function (approximates random oracle)x
is the plaintext search keyk
is the secret key++
denotes concatenation- Collisions between search keys assumed negligible
Core Algorithms
1. Hidden Query Generation (Algorithm 1)
Simple substitution cipher mapping plaintext queries to trapdoor sets:
- Input: Plaintext query (set of keywords)
- Output: Hidden query (set of trapdoors)
- Each keyword mapped to trapdoor via
h(keyword ++ k)
2. Contains Function (Algorithm 2)
Tests if all trapdoors in a query are present in a secure index:
- Input: Hidden query, secure index
- Output: Boolean (approximate membership test)
- Returns true if all trapdoors found (may have false positives)
3. Secure Index Construction (Algorithm 3)
Creates approximate set from document bag-of-words:
- Input: Document keywords, secret key
- Output: Secure index (approximate set of trapdoors)
- Supports configurable false positive rate
4-5. Biword Phrase Search
Extensions for phrase queries using bigram trapdoors:
- Indexes adjacent word pairs (bigrams)
- Enables phrase search: “distributed systems” → trapdoor(“distributed systems”)
- Higher precision than single-keyword queries
Extensions
k-graphs
Higher-order extensions storing combinations of up to k search keys:
Digraphs (k=2):
- Space complexity:
n(n+1)/2 log₂ε
bits - Stores all 2-combinations of keywords
- Enables more precise phrase matching
Trigraphs (k=3) and beyond:
- Trade storage for fewer query round trips
- Exponential space growth: O(n^k)
- Practical for small k (2-4) and moderate vocabularies
Information Retrieval Measures
Performance characterized by:
- False Positive Rate (FPR): Probability secure index incorrectly matches irrelevant document
- True Positive Rate (TPR): Probability of matching relevant document
- Precision: Fraction of returned documents that are relevant
- Recall: Fraction of relevant documents returned
All measures derived from underlying approximate set error rates.
Security Considerations
Attack Vectors
Frequency Analysis: Observing query patterns may reveal common keywords
Index Poisoning: Adversary may inject trapdoors to inflate false positives
Substitution Cipher Pattern: Known plaintext attacks if query patterns revealed
Defenses
- Query padding and dummy trapdoors
- Minimum false positive rate requirements
- Query mixing and batching
- Periodic re-keying and index rotation
Space Complexity
Basic secure index: O(n log ε) bits for n keywords with FPR ε
Biword index: O(n² log ε) bits for phrase search
k-graph index: O(n^k log ε) bits for k-keyword combinations
Applications
- Cloud document search with untrusted server
- Encrypted email search
- Privacy-preserving web search
- Secure database queries
- Federated search across encrypted repositories
Resources
- PDF: Boolean Encrypted Search
- Code: GitHub Repository
Related Work
- Encrypted Search Thesis - Theoretical foundations
- Cipher Trapdoor Sets - Implementation framework
- Probabilistic Confidentiality Estimator - Privacy measurement
- Bernoulli Types - Approximate set theory