Decades ago, I was privileged to attend a production of Luigi Pirandello’s play, Six Characters in Search of an Author. The play opens with the characters on a bare stage bewildered, explaining to those on the sides of the stage acting as director and set hands, that the playwright hasn’t finished writing about them and as such, they cannot proceed until an author is located to complete their story. I had no idea that all those years ago, I would be that author (developer) with an Ethereum identity application to complete.

The Identity Challenge: Characters Without Their Script

"Reality and illusion are not things you can have for yourself unless you accord them to all others," Pirandello might have written had he been contemplating not the existential crisis of his six characters, but the modern quest for digital identity validation. What is now a conquerable challenge of distributed trust was once a foreign and frightening notion to many. As one of a myriad of technology architects who must lever this monumental shift in database consciousness, my quest has taken me to constructing a modern, Ethereum blockchain-based identity system, where each identity exists much like Pirandello's unfinished characters—in search of their complete stories, one where a comprehensive database can fully articulate those stories at each phase of life. I have been developing a proof-of-concept using Pinecone, a specialized vector database whose approach to high-dimensional searches has rendered it not a mere translator of the stored vector data but an alchemist transmuting ordinary searches into the gold of pattern recognition. Yet even with that AWESOME power, there are limitations.

After attending ScyllaDB’s Monster Scale Summit 2025, I will turn to a potentially more comprehensive author for my Ethereum identity platform that connects public information such as birth, marriage, and death records, with state sponsored records such as passports, and security clearances, with opt-in records such as academic transcripts, designations, and research identifications (ORCID) across borders. This review explores why ScyllaDB might offer the missing scenes in our technological play, particularly when blockchain integration forms the stage upon which our characters must perform.

Understanding the Contenders: The Actors in This Technological Drama

Both databases excel in specific roles, but their fundamental character philosophies suggest significant differences in their performative capacities:

The Promised Performance: Beyond the Script and Into Reality

As in Pirandello's metatheatrical masterpiece, tension exists between what is promised and what materializes on stage. Based on my experience with Pinecone and today’s research into ScyllaDB, the performance differences could transform the end production in subtle and profound ways.

Raw Processing Power: The Vital Energy of Performance

From my review of benchmarks and documentation, Pinecone caps out at approximately 10,000 operations per second per pod, while ScyllaDB claims to deliver over 1 million operations per second per node. For my identity system's narrative:

Ethereum produces one block every 12 seconds, each counted as a fragment of dialogue in this distributed narrative
During peak periods, my identity verification requests surge unpredictably, like audience members suddenly demanding participation.
ScyllaDB reportedly processes 142,000 Ethereum transactions per second compared to Pinecone's 8,200 with equivalent hardware—the difference between a whisper and a resonant soliloquy.

In my experience with Pinecone, we've encountered moments when the character seems to forget its lines (latency). If ScyllaDB's claims hold, this could represent the difference between a fluid performance and one punctuated by awkward silences.

Theoretical Scaling: The Expansion of the Stage

When considering how the identity platform will grow, the databases appear to offer different approaches to scale:

My Experience with Pinecone:

Adds capacity through "pods" (compute units), like adding individual actors who must each learn their role anew.
Temporarily suspends write operations during expansion—the equivalent of stopping the show to change scenery.
Requires manual intervention across cloud regions.

ScyllaDB's Documented Approach:

Seamlessly adds nodes without service interruption, the theatrical equivalent of expanding the stage while the performance continues.
Claims near-linear throughput growth:
- 1 node: 150,000 operations/second
- 3 nodes: 440,000 operations/second (+293%)
- 6 nodes: 870,000 operations/second (+580%)
Automatically synchronizes across global data centers—ensuring the same play appears identical whether viewed in New York or Tokyo.

As the identity characters seek their author across an increasingly global stage, ScyllaDB's approach could mean audiences never experience the "please stand by" message that has occasionally marred the Pinecone production (poc).

Real-Time Responsiveness: When Milliseconds Form the Heartbeat of Drama

For critical identity verification scenarios—like border crossings or emergency services access—response time isn't merely technical but existential:

Pinecone (our measured performance): 50-100ms for nearest-neighbor searches, a noticeable pause in the dialogue
ScyllaDB (claimed performance): Sub-millisecond reads and writes for most operations, the fluid delivery of an accomplished actor.

While Pinecone optimizes for vector similarity searches (finding which character most resembles another), ScyllaDB excels at the full spectrum of operations needed in an identity narrative, from individual monologues to complex ensemble scenes.

The Data Flexibility Challenge: Characters Seeking Their Full Dimension

The identity system I am working on involves diverse data types that, like Pirandello's characters, refuse to be constrained by conventional forms.

Pinecone's Limitation (from my poc implementation):

40KB metadata limit per vector—as if forcing a complex character to express themselves in a fixed number of words
Cannot store complete identity documents, leaving parts of our characters' backstories untold
Forced us to split data across multiple systems—creating a fragmented narrative that audiences struggle to follow

ScyllaDB's Potential Advantage:

Supports both structured records and vector storage in a single platform—allowing each character their full complexity
Could accommodate the full spectrum of the identity data:

CREATE TABLE identity_data (

eth_address TEXT PRIMARY KEY,

passport_hash BLOB,

orcid_vector LIST<FLOAT>,

last_updated TIMESTAMP

);

This flexibility could unify an identity’s fragmented script, allowing each identity to tell its complete story without artificial constraints or disjointed narratives.

Cost Considerations: The Economics of Production

The financial implications of the database choice extend beyond license fees—they shape the entire economic structure of what that production would look like:

These differences could translate to hundreds of thousands in annual savings for the future large-scale managing of millions of identities while delivering a more compelling performance.

The Best of Both Worlds: A Hybrid Production

Rather than an all-or-nothing decision, I will also explore a strategic hybrid approach that echoes Pirandello's blending of reality and illusion:

Core Narrative & Production: ScyllaDB would handle the primary identity records, blockchain pointers, and transactional operations—the structural backbone of the theatrical (wide scale) experience.
Specialized Character Development: Pinecone would provide targeted support for ORCID similarity searches and other vector-specific operations—those moments when an identity must find their most similar counterparts.

This approach could be implemented through a coordinated architecture:

# ScyllaDB retrieves structured identity data

identity = scylla_session.execute(

"SELECT * FROM records WHERE eth_hash = %s",

[tx_hash]

)

# Pinecone handles specialized vector matching

orcid_matches = pinecone_index.query(

vector=identity.orcid_vector,

top_k=10

)

Planned Proof of Concept: Rehearsals Before Opening Night

To validate these theoretical advantages—to determine if ScyllaDB is genuinely the author our characters seek—I’m planning a comprehensive proof of concept that will:

Benchmark real-world performance of ScyllaDB against our existing Pinecone implementation
Test scaling capabilities under simulated audience surges
Measure blockchain integration efficiency with Ethereum transaction processing
Evaluate development complexity for our specific identity narratives

I may seek cloud credits (hours) from ScyllaDB to enable thorough testing without significant upfront investment for this proof of concept. This would allow me to evaluate:

Performance with real identity data sets
Scaling behavior under various load conditions
Development experience in our blockchain context
Total cost of ownership projections

Preliminary Conclusion: Characters Finding Their Author

While I have successfully staged parts of my application with Pinecone and value its vector search virtuosity, ScyllaDB's comprehensive capabilities appear to make it a potentially superior author for our blockchain-based identity narrative:

Superior Scalability: Could grow our production seamlessly without interrupting the performance
Blockchain Compatibility: Might handle our constant stream of transactions without dropped lines
Diverse Data Support: Would accommodate all aspects of identity information in a unified script
Global Distribution: Could maintain consistent performances across international stages.
Cost Efficiency: May deliver more theatrical impact at lower production cost

The initial research suggests a clear direction for this identity validation project: Build the core narrative on ScyllaDB while strategically incorporating Pinecone for specialized character development.

This hybrid approach could deliver the best of both worlds—ScyllaDB's industrial-strength storytelling and structural integrity, enhanced by Pinecone's specialized talent for finding similarities among our cast of billions.

Watch This Space: The Ongoing Search for Our Author

Like Pirandello's characters, my technological journey exists in a state of becoming, waiting for the perfect author (tool) to complete the identity capture story. I’ll be documenting my entire proof of concept journey, complete with benchmarks, challenges, and implementation details on the Bauhaus Technology blog: www.bauhausgroup.net/blog

Follow along as I put these theoretical advantages to the test and determine whether ScyllaDB truly delivers on its promises—whether it can be the record keeper of the world’s six billion characters over time.

References

Zilliz. (2024). Horizontal Scaling in Vector Databases. https://zilliz.com/ai-faq/what-does-it-mean-for-a-vector-database-to-scale-horizontally

Pinecone. (2025). Scaling Documentation. https://airbyte.com/data-engineering-resources/pinecone-vector-database

57Blocks. (2024). Vector Database Comparison. https://57blocks.io/blog/pinecone-vs-lancedb

Estuary. (2024). Pinecone Technical Guide. https://estuary.dev/blog/what-is-pinecone-ai/

Y Combinator. (2022). Pinecone Scaling Discussion. https://news.ycombinator.com/item?id=32487856

ScyllaDB. (2024). ScyllaDB versus Other Databases. https://www.scylladb.com/2024/10/15/scylladb-versus-other-databases/

Simplyblock. (2025). ScyllaDB Optimization. https://www.simplyblock.io/glossary/what-is-scylladb/

Airbyte. (2025). Pinecone Vector Database: A Complete Guide. https://airbyte.com/data-engineering-resources/pinecone-vector-database

Blocks & Files. (2024). Pinecone Integrates AI Inferencing. https://blocksandfiles.com/2024/12/02/pinecone-integrates-ai-inferencing-with-its-vector-database/

Pirandello, L. (1921). Six Characters in Search of an Author.

Beyond Vector Databases: In Search of an Author to Complete Today’s Blockchain Identity Narrative

Recent Posts

Commentaires