Monday, June 2, 2008

CSQL Cache - Improve database performance 100 times

Accelerate database performance using In-Memory CSQL Cache

Abstract

CSQL Cache is an open-source high performance, bi-directional updateable data-caching infrastructure that sits between the clustered application process and back-end data sources to provide unprecedented high throughput to your application.

Improving Database Performance Using Database Cache

Many applications today are being developed and deployed on multi-tier environments that involve browser-based clients, web application servers and backend databases. These applications need to generate web pages on-demand by talking to backend databases because of their dynamic nature, making middle-tier database caching an effective approach to achieve high scalability and performance. Following are the advantages of database caching

Scalability: distribute query workload from backend to multiple cheap front-end systems.
Flexibility: achieve QoS, where each cache hosts different parts of the backend data, e.g., the data of Platinum customers are cached while that of ordinary customers is not.
Availability: by continued service for applications that depend only on cached tables even if the backend server is unavailable.
Performance: by potentially responding fast because of locality of data and smoothing out load peaks by avoiding round-trips between middle-tier and data-tier

In order to overcome the throughput barrier, application scales through deployment of multiple small systems. Companies have developed various homegrown solutions that involve database caching. These caching solutions can help accelerate database performance to some extent, but they are fairly ineffective as most of them support only result set caching and some are poor at dealing with the scalability. Some of these caching solutions use another heavy weight full-fledged database management system to cache the data, which yields less performance gain. These caching solutions are mostly read only or else provide tools for doing manual lazy updates. For frequently changing data, it will be holding "dirty" cached data, resulting in long latency periods that may be entirely unacceptable for applications requiring immediate access to current data.

Requirements of Caching Solution

Updateable Cache Tables
Most of the existing cache solutions are read only which limits their usage to small segment of the applications, non-real time applications.

Bi-Directional Updates
For updateable caches, updates, which happen in cache, should be propagated to the target database and any updates that happen directly on the target database should come to cache automatically.

Synchronous and Asynchronous update propagation
The updates on cache table shall be propagated to target database in two modes. Synchronous mode makes sure that after the database operation completes the updates are applied at the target database as well. In case of Asynchronous mode the updates are delayed to the target database.
Synchronous mode gives high cache consistency and is suited for real time applications. Asynchronous mode gives high throughput and is suited for near real time applications.

Multiple cache granularity: Database level, Table level and Result-set caching
Major portions of corporate databases are historical and infrequently accessed. But, there is some information that should be instantly accessible like premium customer’s data, etc

Recovery for cached tables
Incase of system or power failure, during the restart of caching platform all the committed transactions on the cached tables should be recovered.

Tools to validate the coherence of cache
In case of asynchronous mode of update propagation, cache at different cache nodes and target database may diverge. This needs to be resolved manually and the caching solution should provide tools to identify the mismatches and take corrective measures if required.

Horizontally Scalable
Clustering is employed in many solutions to increase the availability and to achieve load balancing. Caching platform should work in a clustered environment spanning to multiple nodes thereby keeping the cached data coherent across nodes.

Transparent access to non-cached tables reside in target database
Database Cache should keep track of queries and should be able to intelligently route to the database cache or to the origin database based on the data locality without any application code modification.

Transparent Fail over
There should not be any service outages, incase of caching platform failure. Client connections should be routed to the target database.

No or very minimal changes to application for the caching solution
Support for standard interfaces JDBC, ODBC etc that will make the application to work seamlessly without any application code changes. It should route all stored procedure calls to target database so that they don’t need to be migrated.


CSQL Cache

CSQL Cache is a high performance, bi-directional updateable data-caching infrastructure that sits between the clustered application process and back-end data sources to provide unprecedented high throughput to your application by offloading the computing cycles from expensive backend systems along with reduction in costly network calls, thereby enabling real time application to provide faster and predictive response time.

CSQL Cache uses the fastest Main Memory Database (CSQL MMDB) designed for high performance and high volume data computing for caching the table and provides most flexible and cost-effective way to cache and manage enterprise information without compromising on transactional and indexed access to the data. This main memory database is 10-20 times faster than traditional disk based database system as the database completely resides in main memory and developed to be used on real time high computing data platforms.

CSQL implements most of the requirements of good caching solution mentioned in the previous section except few in the latest version. For complete set of features supported by CSQL cache in the latest version, refer the data sheet on the product web site. http://www.csqlcache.com

Open source web site:

http://sourceforge.net/projects/csql

Monday, May 19, 2008

CSQL 2.0 Released

CSQL 2.0 , Transaction Data Cache Released.

This release contains
  • Operation synchronous table cache,
  • Gateway(access to non-cached tables in target database),
  • Primitive ODBC Driver
  • Improved Test Coverage
  • csqldump tool(to dump all database tables and records into file)
  • cachetable tool(to load, unload, reload tables from target database)


CSQL - Transaction Data Cache for any DB

Lead / Introduction
--------------------
CSQL is compact main memory open source SQL database engine that delivers ultra fast performance. It can handle 100K selects/sec and 50K updates/sec.
It can also work as a middle tier caching solution for any open source and commercial database thereby increasing the throughput of the application by 20 to 100 times without making any code changes.

CSQL - Main memory Database
-------------------------------

Main memory databases are times faster than disk based database systems, as all the data is available in physical memory. It also avoids the buffer manager overhead which is found in disk based database systems. Moreover data access algorithms can work efficiently as compared to traditional disk based algorithms. One of the major factor which determines the performance of database management is disk I/O. Access time for main memory is orders of magnitude less than that of disk ( 100ns vs 10ms).

Real time applications perform random access(point lookups => f1 = ?) 90% of the time, for which records from different disk blocks needs to be read. This hampers the performance of the application. Main memory cost has reduced significantly in the recent years and even desktops now have 1 GB RAM. Database managment system which depends only on memory and does not any disk I/O will be many times faster than disk based database sytems. This led to design a main memory database engine which does not involve any disk I/O.

Focus
-----
CSQL compact main memory open source SQL database engine, uses shared memory architecture, wherein the database will be available to the application process in its own process address space and can be accessed by simple pointer semantics. OS mutex mechanisms are costly as it requires context switch, csql implements its own mutex mechanism through atomic assembly instructions to reduce the machine cycles. CSQL is designed for one single purpose
'performance'. It is not a feature rich database product. It does only limited things, but with ultra fast high performance.

SQL and Standard interface support
-----------------------------------
CSQL supports all DML statements which involve single table and DDL statements related to table and index.
Standard JDBC and ODBC drivers are also available to access the SQL kernel. It also provides proprietary C++ SQL interface to access the SQL Kernel and C++ interface to access storage kernel. Apart from primitive data types, it supports Date, Time and TimeStamp data types.

Storage Engine Capabilities
----------------------------
CSQL storage engine supports all the ACID properties.
Atomicity is implemented by using physical and logical undo logs.
It supports Unique, Not null and primary key, foreign key contraints.
It supports the first three isolation levels, READ_UNCOMMITTED, READ_COMMITTED, READ_REPEATABLE.
Disk performance is stagnated and it is around 7.2K rpm for IDE and 15K for SCSI disks. But network speed is increasing at good pace. 1 Gbps Ethernet is reality now. Because of these reasons, CSQL decides to rely on network so that it performance improves with increase in network speed rather than getting stagnated as in case with disk based database systems. Because of the above said reason, durability property of transaction is supported in csql using synchronized replication.
Hash Index and Tree indexing mechanisms make sure that equality predicates and range predicates are executed faster. Index structures are highly concurrent as only short duration locks are only taken on them.
Multi granular locking aids to improve the concurency. Row level locking gives the highest level of concurrency.
Custom built fixed size and variable size allocators are highly concurrent and uses latch free algorithms which aids in increasing the performance and gives high level of concurrency.

Transaction Data Cache
--------------------------
CSQL can act as updateable real time caching for relational table of any database system (commercial and open source database systems including ORACLE and Mysql). This cache resides in application tier (same host where application resides) and reduces the computation and network overhead involved in executing SQL statements. It shall be used by both read intensive applications as well as update intensive applications, thereby increasing the throughput by 20 to 100 times.

Caching is done at the table level granularity and application shall either specify tables to be cached in a configuration file or it can use CLI tool or API's to cache the table dynamically.It loads the full table from the target database into CSQL database. After loading, any select query will be handled by the csql itself. Non -select DML statements are executed at both csql and target database, so that the cache and target database are in sync. It supports three modes to execute DML statements at target database

Applications shall also access tables which are not there in cache, using the same driver provided by CSQL. These statements on non cached tables are handled by the 'Gateway' component in the caching subsystem, whose responsibility is to check, where the table is present and route it either csql or target database based on the SQL query. If application executes complex queries which are not supported in CSQL Engine, then they are also routed to target database for execution. This will allow the application to adapt the caching solution without any application code changes.

a)DSYNC - statement execute will return after the operation is succesfully executed at csql as well as target database. This gives high consistency between cached data and target database. If csql server goes down, during recovery it automatically loads the tables from the target database.


b)ASYNC - statement execute will return after the operation is successfully executed at csql and update logs are generated. These update logs and sent to target database host by another process called "Propagator". CSQLExecutor process will run on the host where target database is running whose responsibility is to receive the update logs from the propagator and execute them on target database.

Applications can choose appropriate mode based on durability and consistency requirements. For applications, where updates shall happen directly to the target database, it provides CLI and API to reload the table fully or incremental. Incremental loading will work best if only insert and delete operations needs to be loaded from target database. Direct updates to target database shall be fetched to cache by using full table reloading. Application also can configure to have automatic loading at regular intervals, either fully or incremental.

Incase of foreign key constraints, applications shall either use DYNC mode, in which constaint violations are detected during target database execution. Other alternative is to load all the tables which are involved in the relationship to cache and use either NSYNC or ASYNC options based on the requirement.

Product Page
-------------
http://sourceforge.net/projects/csql