Showing posts with label Software. Show all posts
Showing posts with label Software. Show all posts

Create Cool Videos Without Pricey Software

WANT TO MAKE a custom video, the kind with photos, music, and video clips? Normally, it’s a time-consuming or expensive hassle. You could use Microsoft’s free Windows Live Movie Maker, but it’s pretty limited (and kind of a pain in the neck, in my humble opinion). Or you could spend about $100 on an editing app like Adobe Premiere Elements or Pinnacle Studio. But those are big and complex—and like Movie Maker, they have to be installed. Surely there must be some kind of cloud based alternative? There is, and it’s called Animoto (animoto.com). This service (which we selected for last month’s roundup of incredibly useful sites; see find.pcworld.com/70057) makes movie creation quick and easy, and the results look like something that took days or weeks to produce in a commercial editing program. First, you upload photos and videos. If your media is already online somewhere, no problem: Animoto can pull from Facebook, Flickr, Picasa, and other sites. Second, you choose music. You can upload a favorite MP3 or choose a track from Animoto’s extensive (but mostly indie) library. Finally, you select a pace: normal, half speed, or 2X speed. With that done, Animoto assembles everything into a slick video, with titles, transitions, and special effects. Don’t like the finished product? You can make changes manually or just let Animoto take another whack at it the site will generate different results every time. When you’re satisfied, you can share the video via Facebook, Twitter, or e-mail, or download it for your use. I like Animoto’s pricing options. You can test-drive the service for free, but that limits you to a 30-second movie. You can buy a fulllength (10-minute) flick for just $3. If you plan to use Animoto a lot, $30 pays for a one-year membership (and all the videos you can make).

Source of Information : PC World July 2010

WinZip pro 14.5 adds lots of useful features to the program’s roster of tools, most notably a Microsoft office–style ribbon that puts all of the program’s options within easy reach so you can start tasks with a simple click. Ribbon haters can opt for the earlier WinZip interface. Among the software’s new features are a Zip-file previewer (for looking inside Zip files when you use Windows Explorer or Microsoft outlook) and the ability to back up files to phones and digital cameras in .zip format. Goodies specific to Windows 7 include integration with the OS’s libraries and jump lists to perform many Zip-related tasks, such as opening a Zip archive and creating a new Zip archive. WinZip pro 14.5 offers improved security for encrypted files. It will automatically destroy temporary, created for viewing copies of encrypted files, and work with Intel-based hardware that uses built-in AES encryption. In addition, it can easily zip and mail files, and extract files from .iso images. Also new to this version of WinZip pro is support for the .zipx compression standard, which compresses files more efficiently than did the previous standard (.zip). If using Zip archives isn’t part of your routine, there’s no compelling reason to buy WinZip pro 14.5; Windows does simple jobs perfectly well. But if you want a ribbon interface, higher compression ratios, better integration with Windows 7, and useful extra features, the new WinZip pro is worth buying.

Source of Information : PC World July 2010

Why Use Sphinx - Aggregating Sharded Data

Building a scalable system often involves sharding (partitioning) the data across different physical MySQL servers.

When the data is sharded at a fine level of granularity, simply fetching a few rows with a selective WHERE (which should be fast) means contacting many servers, checking for errors, and merging the results together in the application. Sphinx alleviates this problem, because all the necessary functionality is already implemented inside the search daemon.

Consider an example where a 1 TB table with a billion blog posts is sharded by user ID over 10 physical MySQL servers, so a given user’s posts always go to the same server. As long as queries are restricted to a single user, everything is fine: we choose the server based on user ID and work with it as usual.

Now assume that we need to implement an archive page that shows the user’s friends’ posts. How are we going to display page 50, with entries 981 to 1000, sorted by post date? Most likely, the various friends’ data will be on different servers. With only 10 friends, there’s about a 90% chance that more than 8 servers will be used, and that probability increases to 99% if there are 20 friends. So, for most queries, we will need to contact all the servers. Worse, we’ll need to pull 1,000 posts from each server and sort them all in the application. We’d trim down the required data to the post ID and timestamp only, but that’s still 10,000 records to sort in the application. Most modern scripting languages consume a lot of CPU time for that sorting step alone. In addition, we’ll either have to fetch the records from each server sequentially (which will be slow) or write some code to juggle the parallel querying threads (which will be difficult to implement and maintain).

In such situations, it makes sense to use Sphinx instead of reinventing the wheel. All we’ll have to do in this case is set up several Sphinx instances, mirror the frequently accessed post attributes from each table—in this example, the post ID, user ID, and timestamp—and query the master Sphinx instance for entries 981 to 1000, sorted by post date, in approximately three lines of code. This is a much smarter way to scale.

Source of Information : OReIlly High Performance MySQL Second Edition

Why Use Sphinx - Scaling

Sphinx scales well both horizontally (scaling out) and vertically (scaling up).

Sphinx is fully distributable across many machines. All the use cases we’ve mentioned can benefit from distributing the work across several CPUs.

The Sphinx search daemon (searchd) supports special distributed indexes, which know which local and remote indexes should be queried and aggregated. This means scaling out is a trivial configuration change. You simply partition the data across the nodes, configure the master node to issue several remote queries in parallel with local ones, and that’s it.

You can also scale up, as in using more cores or CPUs on a single machine to improve latency. To accomplish this, you can just run several instances of searchd on a single machine and query them all from another machine via a distributed index. Alternatively, you can configure a single instance to communicate with itself so that the parallel “remote” queries actually run on a single machine, but on different CPUs or cores.

In other words, with Sphinx a single query can be made to use more than one CPU (multiple concurrent queries will use multiple CPUs automatically). This is a major difference from MySQL, where one query always gets one CPU, no matter how many are available. Also, Sphinx does not need any synchronization between concurrently running queries. That lets it avoid mutexes (a synchronization mechanism), which are a notorious MySQL performance bottleneck on multi-CPU systems.

Another important aspect of scaling up is scaling disk I/O. Different indexes (including parts of a larger distributed index) can easily be put on different physical disks or RAID volumes to improve latency and throughput. This approach has some of the same benefits as MySQL 5.1’s partitioned tables, which can also partition data into multiple locations. However, distributed indexes have some advantages over partitioned tables. Sphinx uses distributed indexes both to distribute the load and to process all parts of a query in parallel. In contrast, MySQL’s partitioning can optimize some queries (but not all) by pruning partitions, but the query processing will not be parallelized. And even though both Sphinx and MySQL partitioning will improve query throughput, if your queries are I/O-bound, you can expect linear latency improvement from Sphinx on all queries, whereas MySQL’s partitioning will improve latency only on those queries where the optimizer can prune entire partitions.

The distributed searching workflow is straightforward:
1. Issue remote queries on all remote servers.
2. Perform sequential local index searches.
3. Read the partial search results from the remote servers.
4. Merge all the partial results into the final result set, and return it to the client.

If your hardware resources permit it, you can search through several indexes on the same machine in parallel, too. If there are several physical disk drives and several CPU cores, the concurrent searches can run without interfering with each other. You can pretend that some of the indexes are remote and configure searchd to contact itself to launch a parallel query on the same machine:

index distributed_sample
{
type = distributed
local = chunk1 # resides on HDD1
agent = localhost:3312:chunk2 # resides on HDD2, searchd contacts itself
}

From the client’s point of view, distributed indexes are absolutely no different from local indexes. This lets you create “trees” of distributed indexes by using nodes as proxies for sets of other nodes. For example, the first-level node could proxy the queries to a number of the second-level nodes, which could in turn either search locally themselves or pass the queries to other nodes, to an arbitrary depth.

Source of Information : OReIlly High Performance MySQL Second Edition

Sphinx lets you generate several results from the same data simultaneously, again using a fixed amount of memory. Compared to the traditional SQL approach of either running two queries (and hoping that some data stays in the cache between runs) or creating a temporary table for each search result set, this yields a noticeable improvement.

For example, assume you need per-day, per-week, and per-month reports over a period of time. To generate these with MySQL you’d have to run three queries with different GROUP BY clauses, processing the source data three times. Sphinx, however, can process the underlying data once and accumulate all three reports in parallel. Sphinx does this with a multi-query mechanism. Instead of issuing queries one by one, you batch several queries and submit them in one request:

SetSortMode ( SPH_SORT_EXTENDED, "price desc" );
$cl->AddQuery ( "ipod" );
$cl->SetGroupBy ( "category_id", SPH_GROUPBY_ATTR, "@count desc" );
$cl->AddQuery ( "ipod" );
$cl->RunQueries ( );
?>

Sphinx will analyze the request, identify query parts it can combine, and parallelize the queries where possible.

For example, Sphinx might notice that only the sorting and grouping modes differ, and that the queries are otherwise the same. This is the case in the sample code just shown, where the sorting is by price but the grouping is by category_id. Sphinx will create several sorting queues to process these queries. When it runs the queries, it will retrieve the rows once and submit them to all queues. Compared to running the queries one by one, this eliminates several redundant full-text search or full scan operations.

Note that generating parallel result sets, although it’s a common and important optimization, is only a particular case of the more generalized multi-query mechanism. It is not the only possible optimization. The rule of thumb is to combine queries in one request where possible, which generally allows Sphinx to apply internal optimizations. Even if Sphinx can’t parallelize the queries, it still saves network round-trips. And if Sphinx adds more optimizations in the future, your queries will use them automatically with no further changes.

Source of Information : OReIlly High Performance MySQL Second Edition

Why Use Sphinx - Optimizing GROUP BY Queries

Support for everyday SQL-like clauses would be incomplete without GROUP BY functionality, so Sphinx has that too. But unlike MySQL’s general-purpose implementation, Sphinx specializes in solving a practical subset of GROUP BY tasks efficiently. This subset covers the generation of reports from big (1–100 million row) datasets when one of the following cases holds:

• The result is only a “small” number of grouped rows (where “small” is on the order of 100,000 to 1 million rows).

• Very fast execution speed is required and approximate COUNT(*) results are acceptable, when many groups are retrieved from data distributed over a cluster of machines.

This is not as restrictive as it might sound. The first scenario covers practically all imaginable time-based reports. For example, a detailed per-hour report for a period of 10 years will return fewer than 90,000 records. The second scenario could be expressed in plain English as something like “as quickly and accurately as possible, find the 20 most important records in a 100-million-row sharded table.”

These two types of queries can accelerate general-purpose queries, but you can also use them for full-text search applications. Many applications need to display not only full-text matches, but some aggregate results as well. For example, many search result pages show how many matches were found in each product category, or display a graph of matching document counts over time. Another common requirement is to group the results and show the most relevant match from each category.

Sphinx’s group-by support lets you combine grouping and full-text searching, eliminating the overhead of doing the grouping in your application or in MySQL.

As with sorting, grouping in Sphinx uses fixed memory. It is slightly (10% to 50%) more efficient than similar MySQL queries on datasets that fit in RAM. In this case, most of Sphinx’s power comes from its ability to distribute the load and greatly reduce the latency. For huge datasets that could never fit in RAM, you can build a special disk-based index for reporting, using inline attributes (defined later). Queries against such indexes execute about as fast as the disk can read the data—about 30–100 MB/sec on modern hardware. In this case, the performance can be many times better than MySQL’s, though the results will be approximate.

The most important difference from MySQL’s GROUP BY is that Sphinx may, under certain circumstances, yield approximate results. There are two reasons for this:

• Grouping uses a fixed amount of memory. If there are too many groups to hold in RAM and the matches are in a certain “unfortunate” order, per-group counts might be smaller than the actual values.

• A distributed search sends only the aggregate results, not the matches themselves, from node to node. If there are duplicate records in different nodes, pergroup distinct counts might be greater than the actual values, because the information that can remove the duplicates is not transmitted between nodes.

In practice, it is often acceptable to have fast approximate group-by counts. If this isn’t acceptable, it’s often possible to get exact results by tuning the daemon and client application carefully.

You can generate the equivalent of COUNT(DISTINCT ), too. For example, you can use this to compute the number of distinct sellers per category in an auction site.

Finally, Sphinx lets you choose criteria to select the single “best” document within each group. For example, you can select the most relevant document from each domain, while grouping by domain and sorting the result set by per-domain match counts. This is not possible in MySQL without a complex query.

Source of Information : OReIlly High Performance MySQL Second Edition Jun 2008

Why Use Sphinx - Finding the Top Results in Order

Web applications frequently need the top N results in order. As we discussed in “Optimizing LIMIT and OFFSET” on page 193, this is hard to optimize in MySQL.

The worst case is when the WHERE condition finds many rows (let’s say 1 million) and the ORDER BY columns aren’t indexed. MySQL uses the index to identify all the matching rows, reads the records one by one into the sort buffer with semirandom disk reads, sorts them all with a filesort, and then discards most of them. It will temporarily store and process the entire result, ignoring the LIMIT clause and churning RAM. And if the result set doesn’t fit in the sort buffer, it will need to go to disk, causing even more disk I/O.

This is an extreme case, and you might think it happens rarely in the real world, but in fact the problems it illustrates happen often. MySQL’s limitations on indexes for sorting—using only the leftmost part of the index, not supporting loose index scans, and allowing only a single range condition—mean many real-world queries can’t benefit from indexes. And even when they can, using semirandom disk I/O to retrieve rows is a performance killer.

Paginated result sets, which usually require queries of the form SELECT ... LIMIT N, M, are another performance problem in MySQL. They read N + M rows from disk, causing a large amount of random I/O and wasting memory resources. Sphinx can accelerate such queries significantly by eliminating the two biggest problems:

Memory usage
Sphinx’s RAM usage is always strictly limited, and the limit is configurable. Sphinx supports a result set offset and size similar to the MySQL LIMIT N, M syntax but also has a max_matches option. This controls the equivalent of the “sort buffer” size, on both a per-server and a per-query basis. Sphinx’s RAM footprint is guaranteed to be within the specified limits.

I/O
If attributes are stored in RAM, Sphinx does not do any I/O at all. And even if attributes are stored on disk, Sphinx will perform sequential I/O to read them, which is much faster than MySQL’s semirandom retrieval of rows from disks.

You can sort search results by a combination of relevance (weight), attribute values, and (when using GROUP BY) aggregate function values. The sorting clause syntax is similar to a SQL ORDER BY clause:

SetSortMode ( SPH_SORT_EXTENDED, 'price DESC, @weight ASC' );
// more code and Query( ) call here...
?>

In this example, price is a user-specified attribute stored in the index, and @weight is a special attribute, created at runtime, that contains each result’s computed relevance. You can also sort by an arithmetic expression involving attribute values, common math operators, and functions:

SetSortMode ( SPH_SORT_EXPR, '@weight + log(pageviews)*1.5' );
// more code and Query( ) call here...
?>

Source of Information : OReIlly High Performance MySQL Second Edition

Sometimes you’ll need to run SELECT queries against very large tables (containing millions of records), with several WHERE conditions on columns that have poor index selectivity (i.e., return too many rows for a given WHERE condition) or could not be indexed at all. Common examples include searching for users in a social network and
searching for items on an auction site. Typical search interfaces let the user apply WHERE conditions to 10 or more columns, while requiring the results to be sorted by other columns.

With the proper schema and query optimizations, MySQL can work acceptably for such queries, as long as the WHERE clauses don’t contain too many columns. But as thenumber of columns grows, the number of indexes required to support all possible searches grows exponentially. Covering all the possible combinations for just four columns strains MySQL’s limits. It becomes very slow and expensive to maintain the indexes, too. This means it’s practically impossible to have all the required indexes for many WHERE conditions, and you have to run the queries without indexes. More importantly, even if you can add indexes, they won’t give much benefit unless they’re selective. The classic example is a gender column, which isn’t much help because it typically selects half of all rows. MySQL will generally revert to a full table scan when the index isn’t selective enough to help it. Sphinx can perform such queries much faster than MySQL. You can build a Sphinx index with only the required columns from the data. Sphinx then allows two types of access to the data: an indexed search on a keyword or a full scan. In both cases, Sphinx applies filters, which are its equivalent of a WHERE clause. Unlike MySQL, which decides internally whether to use an index or a full scan, Sphinx lets you choose which access method to use.

To use a full scan with filters, specify an empty string as the search query. To use an indexed search, add pseudokeywords to your full-text fields while building the index and then search for those keywords. For example, if you wanted to search for items in category 123, you’d add a “category123” keyword to the document during indexing and then perform a full-text search for “category123.” You can either add keywords to one of the existing fields using the CONCAT( ) function, or create a special full-text field for the pseudokeywords for more flexibility. Normally, you should use filters for nonselective values that cover over 30% of the rows, and fake keywords for selective ones that select 10% or less. If the values are in the 10–30% gray zone, your mileage may vary, and you should use benchmarks to find the best solution. Sphinx will perform both indexed searches and scans faster than MySQL. Sometimes Sphinx actually performs a full scan faster than MySQL can perform an index read.

Source of Information : OReIlly High Performance MySQL Second Edition Jun 2008

The Titanium

The Titanium open-source platform lets Web developers leverage their Web skills for creating desktop applications.

Titanium is an open-source platform that enables developers to build rich desktop applications using standard Web technologies. Titanium applications run natively on inux, Mac OS X and Windows operating systems. At a high level, Titanium competes directly with Adobe AIR, although it differs from AIR in three major ways. First, Titanium is open source; it’s licensed under the Apache Public License (version 2). Second, Titanium is fully extensible; Titanium extensions can be written using a number of popular languages, including C++, JavaScript, Ruby and Python. Finally, Titanium opens up user interface programming to popular languages like Ruby and Python— a job typically reserved only for JavaScript. Both Ruby and Python have full access to the Document Object Model (DOM), which puts these languages on par with JavaScript for building rich, dynamic user interfaces. It is important to note that Titanium is not a system that provides a point-and-click ability to build a single application that runs both on the Web and on the desktop; however, that is not to say code sharing across the Web interface and desktop interface is impossible. Some developers may choose to develop with a share-andsegregate pattern: write a common set of shared libraries, then write platform-specific code for use in a Web interface and other code for use in a desktop interface. In this case, you’ll still have a single codebase, but you’ll end up with two different apps. Other developers may choose to develop using progressive enhancement. With progressive enhancement, you start by implementing a basic set of features, then as new resources become available, you build up functionality to make use of these new resources.

A good example is Google Docs. There’s a basic set of features you can access on-line, but if you install Google Gears, you get off-line access and other features as well. The same goes for Titanium apps. Developers can enhance their Web applications progressively by adding features and functions that will be available only when the app is run on a Titanium instance. Using this approach you have just a single app. Both of these techniques are valid choices when it comes to developing apps. Both techniques have pros and cons, and it’s up to you as the developer to choose which method to use. No matter which technique you choose—two separate codebases, one codebase and two apps, or one app—at the very least, Titanium allows you to leverage your Web development knowledge to build desktop applications. It lets you use HTML and JavaScript, as well as other languages most often associated with Web development, to develop desktop applications.

No More Limits on Web Development
Titanium is a development platform with one clear goal: leverage Web technologies to create rich, cross-platform desktop applications. Using Titanium, you can create desktop applications using HTML and JavaScript, yet still get features not available on browser applications. For example, Titanium Web applications built for the desktop can access the filesystem and interact with the underlying operating system. The idea behind Titanium isn’t new, but Titanium clearly separates itself by giving you something unique: unlimited possibilities with open-source choices. You aren’t forced to use anything proprietary—you can use any library or framework you want. All technological decisions are yours to make. Although I mainly program with JavaScript for Web applications, it isn’t the only technology that powers the Web. Titanium works well with Python, PHP, Ruby, Java, Flash and Flex, and Silverlight. So whatever technology you’re using right now to develop your Web applications, you’ll likely be able to use it with Titanium. Because Titanium is distributed under the open-source Apache Public License v2, you can download the source code, play with it, fork it and extend it. It’s this extensibility that makes Titanium a platform that developers can grow with in the future. The platform can morph and evolve into different forms as new needs emerge.

Rapidly Evolving Web Development Platform
Titanium is evolving rapidly and has experienced several major changes to its architecture in the past few months. The initial preview release of Titanium (PR1) incorporated WebKit and a modified version of Google Gears. Essentially, Titanium PR1 used WebKit as its main component, and additional features were exposed to the runtime via a native extensions system, which gave developers access to features from a modified version of Gears. Soon after this initial preview release, the Titanium team started to re-architect the platform. Google Gears was removed, and instead, a new system for exposing new features was created: Kroll. Kroll is the microkernel that powers Titanium and extends the framework. This compact microkernel, written in C++, is a cross-language, cross-platform “binding” and invocation framework that enables mixing and matching code within the kernel. All the features that Titanium exposes are accomplished via Kroll modules. By using Kroll, Titanium gains the ability to support a multitude of languages and technologies. And, because Kroll is fully extensible, anyone can add more features to the platform, using any technology. You don’t need to be a C++ guru to extend Titanium. You can create new modules using Python and Ruby, or even just plain-old JavaScript. Titanium’s use of WebKit was retained during the rewrite from PR1, and for good reasons. Not only is WebKit one of the best, standards-compliant engines available today, but it also features lots of goodies, such as HTML5 client database storage, CSS transformations and animations, and a fast JavaScript engine. All of these, of course, are available on Titanium.

A Rich API for Rich Application Development
As you saw in the code above, all languages supported by Titanium have a window object. This is the shared global object and is used to bind methods and objects that need to be available on all languages. The main namespace for the Titanium API is also bound to this global object and can be accessed via window.Titanium. Aside from WebKit goodies, such as client-side database storage and CSS animations, Titanium’s current API also contains many of the necessary features needed for desktop application development:

• Titanium.Desktop: for launching third-party applications and opening URLs on the default browser.

•Titanium.Filesystem: for working with the filesystem for things like reading and writing files, creating and managing directories and so on.

• Titanium.Media: for working with media files, such as audio and video.

• Titanium.Network: for working with network-related tasks, such as socket connections and IRC clients.

• Titanium.Notification: for custom system notifications, as well as hooks to platform-dependent notification systems like Growl and Snarl.

• Titanium.Platform: for getting information about the Titanium.Process: for working with system processes, as well as launching and executing system commands.

• Titanium.UI: for working with native windows, menus and system chrome.

Unfortunately, going over all of these APIs would require an article (or two) in itself. Fortunately, the official Titanium site provides documentation with more details.

Source of Information : Linux Journal 185 September

Sometimes you’ll need to run SELECT queries against very large tables (containing millions of records), with several WHERE conditions on columns that have poor index selectivity (i.e., return too many rows for a given WHERE condition) or could not be indexed at all. Common examples include searching for users in a social network and
searching for items on an auction site. Typical search interfaces let the user apply WHERE conditions to 10 or more columns, while requiring the results to be sorted by other columns.

With the proper schema and query optimizations, MySQL can work acceptably for such queries, as long as the WHERE clauses don’t contain too many columns. But as the number of columns grows, the number of indexes required to support all possible searches grows exponentially. Covering all the possible combinations for just four columns strains MySQL’s limits. It becomes very slow and expensive to maintain the indexes, too. This means it’s practically impossible to have all the required indexes for many WHERE conditions, and you have to run the queries without indexes. More importantly, even if you can add indexes, they won’t give much benefit unless they’re selective. The classic example is a gender column, which isn’t much help because it typically selects half of all rows. MySQL will generally revert to a full table scan when the index isn’t selective enough to help it. Sphinx can perform such queries much faster than MySQL. You can build a Sphinx index with only the required columns from the data. Sphinx then allows two types of access to the data: an indexed search on a keyword or a full scan. In both cases, Sphinx applies filters, which are its equivalent of a WHERE clause. Unlike MySQL, which decides internally whether to use an index or a full scan, Sphinx lets you choose which access method to use.

To use a full scan with filters, specify an empty string as the search query. To use an indexed search, add pseudokeywords to your full-text fields while building the index and then search for those keywords. For example, if you wanted to search for items in category 123, you’d add a “category123” keyword to the document during indexing and then perform a full-text search for “category123.” You can either add keywords to one of the existing fields using the CONCAT( ) function, or create a special full-text field for the pseudokeywords for more flexibility. Normally, you should use filters for nonselective values that cover over 30% of the rows, and fake keywords for selective ones that select 10% or less. If the values are in the 10–30% gray zone, your mileage may vary, and you should use benchmarks to find the best solution. Sphinx will perform both indexed searches and scans faster than MySQL. Sometimes Sphinx actually performs a full scan faster than MySQL can perform an index read.

Source of Information : OReIlly High Performance MySQL Second Edition Jun 2008

MySQL’s full-text search capability* is fast for smaller datasets but performs badly when the data size grows. With millions of records and gigabytes of indexed text, query times can vary from a second to more than 10 minutes, which is unacceptable for a high performance web application. Although it’s possible to scale MySQL’s full-text searches by distributing the data in many locations, this requires you to perform searches in parallel and merge the results in your application. Sphinx works significantly faster than MySQL’s built-in full-text indexes. For instance, it can search over 1 GB of text within 10 to 100 milliseconds—and that scales linearly up to 10–100 GB per CPU. Sphinx also has the following advantages:

• It can index data stored with InnoDB and other engines, not just MyISAM.

• It can create indexes on data combined from many source tables, instead of being limited to columns in a single table.

• It can dynamically combine search results from multiple indexes.

• In addition to indexing textual columns, its indexes can contain an unlimited number of numeric attributes, which are analogous to “extra columns.” Sphinx attributes can be integers, floating-point numbers, and Unix timestamps.

• It can optimize full-text searches with additional conditions on attributes.

• Its phrase-based ranking algorithm helps it return more relevant results. For instance, if you search a table of song lyrics for “I love you, dear,” a song that contains that exact phrase will turn up at the top, before songs that just contain “love” or “dear” many times.

• It makes scaling out much easier.


Source of Information : OReIlly High Performance MySQL Second Edition

Why Use Sphinx?

Sphinx can complement a MySQL-based application in many ways, bolstering performance where MySQL is not a good solution and adding functionality MySQL can’t provide. Typical usage scenarios include:

• Fast, efficient, scalable, relevant full-text searches

• Optimizing WHERE conditions on low-selectivity indexes or columns without indexes

• Optimizing ORDER BY ... LIMIT N queries and GROUP BY queries

• Generating result sets in parallel

• Scaling up and scaling out

• Aggregating partitioned data

We explore each of these scenarios in the following sections. This list is not exhaustive, though, and Sphinx users find new applications regularly. For example, one of Sphinx’s most important uses—scanning and filtering records quickly—was a user innovation, not one of Sphinx’s original design goals.


Source of Information : OReIlly High Performance MySQL Second Edition

Sphinx

Sphinx (http://www.sphinxsearch.com) is a free, open source, full-text search engine, designed from the ground up to integrate well with databases. It has DBMS-like features, is very fast, supports distributed searching, and scales well. It is also designed for efficient memory and disk I/O, which is important because they’re often the limiting factors for large operations.

Sphinx works well with MySQL. It can be used to accelerate a variety of queries, including full-text searches; you can also use it to perform fast grouping and sorting operations, among other applications. Additionally, there is a pluggable storage engine that lets a programmer or administrator access Sphinx directly through MySQL. Sphinx is especially useful for certain queries that MySQL’s general-purpose architecture doesn’t optimize very well for large datasets in real-world settings. In short, Sphinx can enhance MySQL’s functionality and performance. The source of data for a Sphinx index is usually the result of a MySQL SELECT query, but you can build an index from an unlimited number of sources of varying types, and each instance of Sphinx can search an unlimited number of indexes. For example, you can pull some of the documents in an index from a MySQL instance running on one remote server, some from a PostgreSQL instance running on another server, and some from the output of a local script through an XML pipe mechanism.


Source of Information : OReIlly High Performance MySQL Second Edition


Subscribe to Developer Techno ?
Enter your email address:

Delivered by FeedBurner