amazon s3

Amazon Web Services Success Stories

Posted on November 29, 2006. Filed under: adaptiveblue, alexiskold, amazon, amazon s3, amazon web services, smugmug, web services, webmail.us, web_patform |

We have written before about the innovative Amazon Web Services Platform. This stack was officially announced by Amazon CEO Jeff Bezos during the recent Web 2.0 summit and is now considered part of the core business strategy for Amazon. While analysts, competitors and Wall Street are pondering what to make of this move from a business sense, in this post we look at who is utilizing Amazon Web Services – and how. This post is based on personal communication with those people, along with the set of success stories available on the Amazon Web Services site.

The fact is many small, medium and even large businesses (even Microsoft), rushed to put Amazon Web Services to use. Why did they do it? Because Amazon offers a decade of experience in running one of the largest internet enterprises – and has wrapped this expertise into a set of pre-packaged services and APIs.

To remind you, here again is the Amazon Web Services Stack:

The Amazon Web Services stack is impressive in its scale and also well thought through. Amazon is methodical about this strategy and is aiming to create an offering which can truly be called the Operating System for the new Web. Many companies have already recognized the power and ROI of the Amazon platform and are literally betting their business on it.

Webmail.us – email hosting provider

Webmail.us is probably the most compelling success story for Amazon Web Services because of its huge ROI. It is an established business with over 27,000 customers. It had a real and simple business need – improve the cost and reliability of its backup system. After considering many alternative solutions, the company decided to utilize Amazon S3, The Simple Queue Service and the Elastic Compute cloud to address all of its needs.

The company claims to have improved its backup process and cut costs by 75%. The Webmail.us success story on Amazon contains a paragraph that nicely summarizes the technical and business gains:

“Amazon’s Web Scale Computing model shifts the focus from do-it-yourself to let-the-experts-do-it. It allows businesses to scale up or down based on requirements and demand, and provides pay-as-you-go billing models. This combination allows businesses to turn fixed costs into variable costs, while knowing that their data or services will always be available.”

SmugMug – online photo provider

SmugMug is another interesting success story. It is a straightforward one, because it uses Amazon S3 exactly how it was intended to be used – for storing large media. Today SmugMug hosts on Amazon over half a billion photos. And here is the real “wow factor” in this story: one week after writing the first line of code, SmugMug was storing all of its new images in Amazon S3.

The Amazon S3 API consists of just a few simple method calls. It is equally easy to implement in Java, PHP, JavaScript, Perl, C#, Python and many other languages. As the SmugMug success story illustrates, a simple API means a very quick and painless adoption.

To pepper this with more numbers: SmugMug is now backing all of its new images to S3, which amounts to 10 terabytes of data monthly. The SmugMug site has not gone down since adopting Amazon S3 and it estimates it will save half a million dollars on its disk storage annually. So, as the company points out, S3 makes it possible for SmugMug to compete head to head with bigger companies that have deep pockets, without having to raise massive amounts of cash for hardware. So this is game changing.

Altexa, ElephantDrive and JungleDisk – backup providers

As soon as the S3 came out, many companies recognized an opportunity to deliver business and personal backup solutions. The model is simple – charge a small premium on the top of the Amazon S3 storage costs.

With that approach it is essentially a user acquisition battle, where the implementation and marketing become paramount. Altexa targets small businesses, while ElephantDrive and JungleDisk target consumers – but all of them share the benefit and ease of use of S3. In their success stories, the companies emphasized incredibly quick (literally a few days) adoption, cost savings and reliability.

Scanbuy – mobile shopping solution provider

The success stories that we have covered so far are mostly using Amazon’s S3 storage service. Scanbuy however is utilizing the Amazon eCommerce API to bring unique comparison shopping solutions to mobile phones. Their claim to fame is allowing the users to lookup prices by simply scanning the barcodes of items in a store. This is a clever approach that is made possible by a combination of technologies.

One of the key technologies here is the Amazon eCommerce API, which offers unlimited and complete access to most Amazon items. Scanbuy uses the API to fetch the latest pricing information, letting the user decide if they are really getting a good deal in the store. And as the company explains, they simply could not have done what they are doing now without Amazon eCommerce API.

Conclusion

So why are analysts not sure what to make of the Amazon Web Services? An article in BusinessWeek is entitled Jeff Bezos’ Risky Bet and their main concern seems to be: will businesses use this? Well in this post we’ve shown that for some businesses, the AWS stack provides a set of very compelling value propositions – both technical and business. And having real business success stories with ROI and cost savings in the 50-75% range, makes it basically a no brainer.

We think that the real question is: does this work for Amazon? Is it ready to be a software and Web Services company? Will Amazon be able to scale this business indefinitely… and most importantly: are the margins high enough for it to be worthwhile? We have to believe that Jeff Bezos and the Amazon team did the math and that the answer is absolutely yes.

Read Full Post | Make a Comment ( None so far )

Amazon Rolls Out its Visionary WebOS Strategy

Posted on November 3, 2006. Filed under: adaptiveblue, alexiskold, amazon, amazon s3, web services, web20, webos, web_patform |

WebOS services are going to be utilized by thousands of companies – and will power the next generation of web applications. Amazon is at this point leading the charge of the big Internet companies to capture this potentially huge market.

There is a very long, but interesting, cover story in today’s BusinessWeek entitled Jeff Bezos’ Risky Bet. The article focuses on the transformation of the e-commerce giant into a software company. The growing stack of Amazon Web Services clearly points to a sea change in the Seattle e-commerce giant. Indeed Amazon is beginning to look more like an alternative Microsoft for the web computing era!

In short, Jeff Bezos’ big bet is a bet on the software infrastructure of the Web. We here at Read/WriteWeb think this is a visionary strategy by Amazon – and it is likely to pay off…

Amazon completes its Web Services stack

In August, I wrote a series of articles about Amazon’s Web Services strategy for a Web 2.0 magazine. The article that summed up what Amazon is up to was called: Amazon – the Real Web Sevices Company. Based on the piece in Business Week, it is clear that during the Web 2.0 conference next week, Amazon’s Web Services strategy will become official. As a software engineer, I can’t hide my joy. This is indeed a triumph of software engineering – a large company has managed to productize the pieces of its own infrastructure.

Not only that, but Amazon is very serious about making money on this endeavor. The web giant is carefully and methodically rolling out the building blocks of its next generation Web Platform. It started with the Amazon eCommerce API and Alexa services. But not until the Simple Storage Service rolled out, did it became clear that Amazon is building a full web services stack. Here is our diagram showing what it looks like:

Web as a Platform

Amazon’s Web Services stack is evidence of a new computing paradigm, where web services in aggregate give rise to a new web-based operating system. Like a classical operating system, this new one has the key ingredients – infinitely scalable storage, dynamic indexing service, adaptive grid, etc. These pieces, put together, provide a compelling new way to think about application development. Amazon is actively working to both define and implement the ingredients of this new Web Platform.

Why this makes sense

Building large-scale web software is a big challenge. Amazon solves this problem by offering the infrastructure that has powered one of the biggest online stores for the past decade. Amazon hides complexity behind simple, minimalist APIs and offers their services for a very reasonable cost. The Amazon team takes the concepts of search, storage, lookup and management of data – and turns them into pay-per-fetch and pay-per-space web services.

To begin with, it’ll be small and medium businesses that take up Amazon’s services. As Business Week points out, Wall Street is not going to jump on this. But the SmugMug photo service did and other startups and small businesses will follow suit. So even if large corporations will not come, there is plenty of money to be made. The Long Tail anyone?

What can we expect?

In the near term, we will probably see more services from Amazon which focus on completing their Web Services stack. For example, S3 does not have querying capabilities – which is a fairly big limitation. The elastic cloud is very powerful, but at the same time complex – so we can expect additional offerings that simplify deployment and management of the grid.

We are also likely to see other players entering the WebOS market. Google has already made moves with its Google Base API and is rumored to be working on the GDrive. Microsoft also has Live Drive in the works. Both Google and Microsoft are no doubt working on other web services initiatives. Also watch out for smaller but more innovative players, like 3Tera – which we profiled in September.

Regardless of the provider, WebOS services are going to be utilized by thousands of companies – and will power the next generation of web applications. Amazon is at this point leading the charge of the big Internet companies to capture this potentially huge market.

In upcoming posts, we will highlight the use cases for Amazon and other web services. In the mean time, let us know if you’re currently using Amazon Web Services – and what you think of the experience so far.

See Also: Web Platform Primer – what’s available via API?; GData API for Google Base released; Amazon Launches Elastic Compute Cloud

Read Full Post | Make a Comment ( None so far )

From attention economy to attention architecture

Posted on September 19, 2006. Filed under: adaptiveblue, alexiskold, amazon s3, attention, attention architecture, attention economy, attentiontrust, blueorganizer, del.icio.us, rootmarkets, seth goldstein, steve gillmore |

This article was originally published in Web 2.0 journal.

I had an interesting chat last night with Chris Saad of Touchstone about their platform and the attention market. The conversation was prompted by the announcement on TechCrunch that one of the leaders in the attention space, RootMarkets, received funding from Chicago Board of Trades. This conversation with Chris and post on TechCrunch got me thinking: we all agree that we are heading towards the attention economy, but what does the “architecture of attention” look like?To make the concept of attention compelling and to prove to the consumers that their attention information is important, we need to build applications that provide useful services. And to build these applications we need a platform for the attention players to plug into. In short, we need attention ecosystem, where application providers can interplay and deliver definitive value to the end users.

The Roots of Attention

It may not be widely known, but the foundations of the attention economy and architecture have been already laid out. Steve Gillmore and Seth Goldstein established AttentionTrust.org – a non-profit organization with a mission to both educate the people about the value of their attention and to establish the infrastructure for capturing individual attention.

The AttentionTrust serves as a forum for discussing and establishing principles, values and rights of consumers. The founders have outlined the following principles for its operation:

  • Property: You own your attention and can store it wherever you wish. You have CONTROL.
  • Mobility: You can securely move your attention wherever you want whenever you want to. You have the ability to TRANSFER your attention.
  • Economy: You can pay attention to whomever you wish and receive value in return. Your attention has WORTH.
  • Transparency: You can see exactly how your attention is being used. You can DECIDE who you trust.

These founding principles capture the essence of the attention economy. With every click, with every look at the computer screen you are paying attention. This attention information has a huge value, and it can be used to provide you back with valuable services.


The Foundations of Attention Architecture

AttentionTrust.org and RootMarkets, the attention company founded by Seth Goldstein, worked out an architecture for capturing and storing the user attention shown in the Figure 1 below.

The attention is captured by the browser extension, called AttentionRecorder. The recorder simply records which URL the user went to and when. With the recorder the user has an option of either storing the click stream locally or directing it to an AttentionVault.

The vault is essentially a remote database of the user click stream. Since AttentionRecorder defines HTTP-based API for communicating with the vault, their can be multiple implementations.

As shown in the Figure 2 below, the user can configure the AttentionRecorder to send the data to one of the vaults approved by AttentionTrust. This approach facilitates competition between the vault providers and, as advertised, puts the user in control. That is, the user can decide the most trustworthy, fastest, cheapest vault. Another big benefit is that consumer is explicitly part of the attention recording process. So the consumer has to understand how the system works, and so the consumer is more likely to trust the system because of that.

Beyond the basic attention

The initial version of attention architecture is simple, but it is not complete. As the attention space evolves it becomes clear that there is a need to expand it and to re-conceptualize how various applications and services fit in. Lets consider a few examples. The most obvious thing that is not being captured is whether the user liked what she saw. While it is certainly possible to build a good prediction model based on the time that user spent looking at the information, no such model will be complete or exact. In other words, current version of AttentionRecorder captures only implicit attention, but there are also rich variety of explicit attention.

Consider the popular social bookmarking sites like del.icio.us. We can think of them as vaults for the explicit individual attention. As a step up from implicit attention, when the user bookmarks a site and sends it to the del.icio.us we know for a fact that she liked the site. We still do not know how much she liked it. This information would be captured by another system – the one that also allows ratings. At adaptiveblue we are developing a higher level attention capturing service called blueorganizer. Beyond the basic URLs and ratings the blueorganizer captures the semantical information contained on the page, such as movies, books, wines and cars, creating basically a vault for semantically-rich attention.

So since there are different kinds of attention, the current architecture needs to be expanded to accommodate them. In particular, we need to recognize that:

  • Attention can be captured by different sources
  • Attention can come in different formats
  • Attention can be stored in different ways

By focusing on these issues, we can extend the current architecture to a flexible and rich attention platform.


The key next step is to redesign the protocol to ensure that any kind of attention data can be stored. The types of attention data would need to be established and then each attention source can be paired with one or more attention vaults, again putting the user in charge of her data. Also, since attention data can be of different types, it might be beneficial to have different kinds of vaults. Some data would naturally lend to the choice of a relational database. At adaptiveblue we built the vault using Amazon S3, which I reviewed for Web 2.0 journal earlier.

Factoring in the services

So far the old and the new attention architecture has been focused on capturing and storing the user attention. These are of course important, but the least exciting aspects of the attention platform from the end user perspective. After all where are the end user benefits? The benefits must come from the plethora of services that analyze the user’s attention and do something interesting with it. Personalized recommendations, personalized alerts, personalized news filters, personalized search and personalized shopping are just a few exciting services that can be build on top the attention platform.

The user will sign up for a subset of these services and point them to her AttentionVault(s). The services will then utilize the user attention information and seamlessly plugin into various aspects of on-line and off-line life to deliver huge productivity boost and time savings. For this to happen, the attention vaults need to offer the standard access API in addition to the standard input API. The actual format and the protocol for the attention data should be the same as for storing the data in the vaults.


Conclusion

The recent explosion of quantity and types of information puts us on the very fast track to the attention economy. Now more than ever before it is critical to understand and harness the value of individual attention information. AttentionTrust organization is the forum for discussing attention issues from privacy, business and technical perspectives. To truly harness the value of the user’s attention, the players in attention space will need to work together and extend the existing implicit attention architecture to include wider variety of attention data and to create the standards-based infrastructure for attention services.

Read Full Post | Make a Comment ( 4 so far )

Survey of Client Apps using the Web Platform

Posted on August 27, 2006. Filed under: adaptiveblue, alexiskold, amazon, amazon s3, amazon web services, blueorganizer, google api, readwriteweb, thick client, virtual storage, web services, web20, web_patform |

Originally published Read / Write Web

In this post, we survey a range of client applications which utilize the new web platform. This is a follow-up to our Web Platform Primer post a few days ago, in which we explained the building blocks of the new Web infrastructure:

web computing platform
The Web Computing Platform

Essentially the building blocks are foundational services from Internet companies such as Amazon, Google and Microsoft – which combine to form a Web development platform. Indeed a couple of days we saw Amazon add to the platform with a limited beta ‘Compute’ service, called Elastic Compute Cloud. All of these services facilitate a new breed of software: smart desktop and browser applications that use the Web Platform as their backbone.

Storage Services

Storage Services

In this category there is Amazon S3 and openomy. Amazon S3 has a wide variety of clients using it. Firstly, there are personal backup applications like Jungle Disk and Elephant Drive. Another common use case for S3 is storing large media files – the Amazon S3 success stories page features MediaSilo video storage and SmugMug on-line photo sharing. A webtop application called YouOS is also using Amazon S3 to store user information. Finally, there are two other applications listed in the success stories section: MyOwnDB, which allows users to define and store their personal information in the form database tables; and the blueorganizer smart browser extension for Firefox, developed by my [Alex's] company adaptiveblue.

The only example app using the openomy site is a very basic RSS application, built using Ruby on Rails.

Messaging and Compute Services

In the previous article we gave an example of a Messaging service: Amazon Simple Queue Service. There are no success stories listed on the Amazon site for this service but – as we noted – it is likely that Amazon.com itself utilizes this service.

After our previous article was published earlier this week, Amazon released the first example of a black-box compute service – called Amazon Elastic Compute Cloud. The service is currently in limited beta, but we are likely to start hearing of success stories soon.

Information Services

Information Services

We start this broad category with the applications that use Amazon eCommerce Service, one of the most widely used APIs on the web. Among the success stories listed on the Amazon’s page, most fall into the category of shopping and store fronts. For example:

  • ActionEngine and ScanBy use the Amazon API to enable wireless shopping.
  • Associate-o-matic uses the Amazon API to help its customers create store fronts.
  • Inside C uses the Amazon API to bring shopping into the instant messaging space.

There are other interesting uses of the API as well. For UNIX lovers there is the Amazon Command Line interface, marketed as 0-click shopping. Also there is RightCart, which enables a web-wide shopping experience on blogs and regular sites.

Note that adaptiveblue also uses the eCommerce API, to dynamically look up product information – when a user selects the title of a book, or the name of a gadget.

The most popular information API is Google Maps. A comprehensive list of usages can be found at the Google Maps Mania blog. They range from housing market sites to travel logs. These, however, are more mashups or utilities than applications – because they do not provide an end-to-end user experience, but rather provide a solution to a particular information problem. In general, we are seeing a big surge in so-called mashups fueled by Information Services and Web 2.0 APIs. A comprehensive list of these mashups, along with APIs and other great information, is maintained by John Musser at Programmable Web.

Search Services

The Alexa Web Search Platform was launched in December 2005. At the time Richard wondered if it would make Amazon a major search player. As of now there are no references to a major vertical search engine built on top of Alexa. The Alexa web site features a few applications – a Camera search and Zip File search – but that just scratches the surface of what is possible with the Alexa platform.

I still think that this platform will pick up and we will see some really interesting vertical search applications built on it. In the meantime, the blogging community does not live a day without checking the Alexa Information service for traffic rankings: alexa.com and alexaholic.com.

Web 2.0 Services

Thanks to del.cio.us, APIs are back in style. So-called Web 2.0 companies rush to open up their information, in order to enable cross-pollination of data and mashups. Here is the current chart of Top APIs from Programmable Web:

Google Maps is a clear front runner. Among other popular APIs are Flickr, Amazon, YahooMaps and del.icio.us. Also according to the ‘last 14 days’ chart, the YouTube API is on the rise.

Conclusion

It is exciting to see this new wave of applications developed on top of the emerging Web Platform. As the platform matures, we are sure to witness more and more applications using it as their primary infrastructure. This allows businesses to focus on innovation and domain knowledge, rather than worrying about the scalability of their backend systems.

Read Full Post | Make a Comment ( None so far )

Google plays API catch up with Amazon

Posted on August 23, 2006. Filed under: adaptiveblue, ajax, alexiskold, amazon, amazon s3, amazon web services, google, google api, google base, rss, virtual storage, web services, web20 |

This article was orginally published in Web 2.0 journal. 

Just a few days ago I wrote an article about Amazon Web Services stack, in which I praised Amazon’s vision and ability to deliver elegant, generic web services platform of the future. In the end of the article I mentioned that it will be difficult for Google and Microsoft to catch up. I could still be right, but tonight Google made it clear that they are going to be in this race.

The Google Base API is like Amazon S3 on steroids. In addition to pure storage capability, this API comes with concept of RSS-based structured data types, ability to automatically index and search the data, as well as storing and publish things via RSS. It is interesting, unexpected move, since the service seems to mash storage and publishing together.

Apples become Oranges?

So how do we go about comparing these services? There are several angles and criteria that might lead us to different conclusions. As a software engineer, I am subconsciously drawn to Amazon’s simple and canonical approach. Each service has a very basic, minimalistic API and is focused on accomplishing very specific task. For example, Amazon S3 just stores the data and allows the fetch, but is not concerned with things like RSS.  When the entire stack of services is aggregated together, you then get a powerful playground where you can pick and choose what you need to address your specific needs.

On the other hand, at this point everyone acknowledges that RSS has become a basic building block of the web. So you can not help but wonder if it makes sense to have it wired right into your data store. While I am not quite ready to make this leap myself, I can see how a lot of people would. My rule of thumb is that technologies, unfortunately, come and go, so I would not bet everything on RSS as it is right now. But the time, of course, will tell.

Hello and welcome to the world of Google semantics

The basic mechanics of posting and managing objects is similar to Amazon S3. You can read my detailed article about this service to learn about the rudimentary operations of storing and retrieving items.

Lets zoom in now on some of the exciting new things that come with Google Base. The first feature of note is introduction of attributes and types. This is very much welcomed, because today’s web is not a random collection of words and letters. We talk about friends, books, music, politics, housing – in short, we discuss life, where things naturally have meaning and semantics. Google introduces a attribute/type system with the set of pre-defined attributes and types, which can be augmented by the developers. This is excellent move, since it encourages common sense standard as well as leaves room for flexibility and exceptions.

The system leverages the standard RSS attributes such as title and item, but, because of its XML-based nature does not play with microformats. This is not necessarily bad, since XML-based annotation system is at least as powerful as the microformats languages. In fact, from my point of view, even this system has a few loose ends. For example, a review attribute may contain text to indicated that it is a review of a movie or a book or a restaurant review. This is not going to be sufficient for  situations when the actual underlying object needs to be identified exactly. However, since the defined attribute/type system is extensible, these sort of things can be corrected in the future.

Search is still the king

Google is the undisputed master of the search domain. All Google services are leveraging the success of this Google grand daddy. The new Google Base API is no exception. This is one of the features which puts S3 behind at this point. Ability to slice and dice the stored information each and every way is absolutely essential. What Google is doing for you automatically is creating a gigantic set of indicies for all things that you publish, so that anything can be found very, very quickly.

The query language is powerful. It even allows comparison queries for types that are declared as numbers; here is an example of a query:

[item type:products] (ipod | “mp3 player”) [price <= 150.0 USD]

Personally, I would have liked this to be more REST-full, but I guess this is shorter and more powerful. For those of you who miss the programming language class, here is the BNF of the grammar.

The query results can be paginated much like S3. The difference is that unlike S3, this paging works on indicies instead of prefixes. These differences are due to specifics of Google vs. Amazon’s implementation and do not make much difference to the end user.

Batch processing

Like search, this feature is noticeably absent from S3 repertoire. The ability to execute multiple fetches is invaluable, since it enables, for example, generating a web page based on a certain criteria. Specifically, with S3 to get the list of latest items posted by a user, we need to first query the keys and then for each key fetch the item in a separate request. This is unacceptably slow, especially when it comes to generating a web page on demand. So Google definitely did the right thing by having the batch mode built right in.

Privacy differences

Similar to S3, there is a concept of privacy, but it is not quite the same. In S3, there is a simple way of marking each item as public or private for both read and write. Google’s approach seems to different. First, there is a distinction between an item and a snippet. Here is Google’s definition:

- /feeds/snippets : for the general public and provides a slightly shortened description

- /feeds/items : a private customer-specific feed for customers to insert, update, delete, and query their own data. This feed requires authentication.

I find this pretty confusing, particularly because of the way privacy is defined, here is the definition:

   You can control whether attributes are visible by specifying the XML attribute access=”private”.

So it sounds like you can not make entire entry private? Also, does this apply to both snippet and item attributes? It is not apparent to me from the provided description.

What about performance?

Thats a good question that needs to be answered soon. The performance benchmarks on these services would be very valuable addition to the feature-by-feature comparison and so we hope to see them in the near future.

Coming soon…

So with this cat out of the bag, we can do a few predictions. First, we will soon be seeing Google UI in many Google products, particularly Google reader, that is going to render these extended RSS feeds in the nice way. They will probably look something like bluemarks that we developed at adaptiveblue. The big difference is that we had to embed the display information in a form of fairly verbose chunk of  HTML. Google will enjoy the luxury of styling these feeds using elegant, client-side stylesheets.

Another likely thing is that Google is going to promote this new format, and will work on other products and services to embrace it. I’d like to hear how this plays with microformats and generic HTML pages, because having more different formats for capturing semantics is not taking us any closer to semantic web.

Finally, we can bet on seeing more of these sort of services, probably from Microsoft, maybe from Yahoo! and definitely from small startups that are going to jump in with innovation and twists. Different approaches and APIs are likely to create a public debate on the topic.

The debate,   competition and creativity are great for us, developers. We get to enjoy the fight, but more importantly to jump in and to voice our opinions and concerns. Not only we get to use these technologies, we also get a chance to impact how these technologies evolve. This is very important, and we should not miss the opportunity. I am sure these companies are willing to listen, and are looking for your feedback, so drop them a line.

Read Full Post | Make a Comment ( None so far )

How Amazon S3 is going to change the world, 5GB at a time

Posted on July 11, 2006. Filed under: adaptiveblue, alexiskold, amazon, amazon s3, amazon web services, blueorganizer, virtual storage |

How Amazon S3 is going to change the world was originally published in Web 2.0 Journal.

We are observing the transformation of the web from an ecosystem into an operating system. Building blocks such as websites, blogs, web services, podcasts and RSS are coming together and give rise to a new computing platform. The web operating system is emerging and it is bigger than the sum of its parts.

Remember your operating systems class, when you learned that every operating system has a handful of fundamental concepts such as storage, virtual memory and scheduling? The new web is no exception. However, since the Internet is a gigantic network of computers working in parallel, the basic operating system concepts take on a different shape.

For example, when you try to save a file on your computer, there is a (rare) possibility that the disk is full, and the write will fail. But with the new web operating system, this does not have to be the case. If the disk of one computer becomes full, the web operating system can switch and store the file on another computer. So on the web, you practically never run out of space.

Amazon S3 – the new virtual storage service from Amazon

The virtual storage has been in the news for sometime now. Dion Hinchcliffe has written a survey of virtual storage providers in his recent post. He particularly commended the Amazon S3 – simple storage service for its innovative API.

In essence, the Amazon S3 offers developers a huge hashtable. The minimalistic API, available in both SOAP and REST, is focused on basic management of the objects – write, read and delete. By default, the service works over HTTP and supports storage of objects up to 5 gigabytes in size. There is also support for BitTorrent and a plan to add other protocols in the future. To use the service, you have to have an Amazon Web Services account.

Amazon has done a very thorough job documenting and supporting the service. The resources page contains a wealth of useful information to get you going, most notably the API and the user forums. There are also code samples available in various languages that illustrate how to use the Amazon S3 API.

Storing and retrieving objects

The objects in S3 account are placed into buckets. Each account is allowed to have up to 100 buckets and the bucket name has to be unique across all S3 users.


Figure 1: Example from Amazon S3 API shows CREATE BUCKET request

Each object inside a bucket has to have a unique UTF-8 compliant key assigned by the developer. Since there is no specific key structure imposed by S3, the developers are free to do what best suits their needs. The documentation hints at using slashes to create directory-like structure, but does not insist on it. The lack of key specificity and directory interface in API is not a limitation, but an added flexibility, since people’s needs might be different and implementing the directory-like storage is just a matter of following naming conventions in the code.


Figure 2: Example from Amazon S3 API shows GET BUCKET request

The API also allows the developer to list all the keys in a particular bucket. This is implemented using a flavor of the Iterator pattern and a concept of a marker. With the first query no marker is supplied. If the bucket contains more objects than specified in max-keys parameter, then the a marker for the next starting point is returned. To obtain the next set of results, the marker is passed back in the subsequent request.


Figure 3: Amazon S3 diagram

Since it might be expensive to fetch the entire object from S3, the API allows you associate the meta data with each object. The meta is returned together with the key when GET BUCKET operation is requested. This functionality is particularly handy for storing and looking up information about large media files.

S3 Security

S3 has a built in security model for both connecting to the service and setting the access policy for individual objects and buckets. To access the service, each developer is required to obtain the Access Key ID and the Secret Key. The Access Key ID, the same ID used for all Amazon Web Services, has to be passed in with every request. The Secret Key is used as an encryption key to encrypt pieces of the request in order to prove the requestor’s identity.

The authentication is somewhat involved and there are quite a few questions on forums complaining about authentication problems. Amazon API has a page dedicated to it, which you can see here. In addition, there are code samples in various languages which illustrate the correct usage of authentication. If you do not know much about encryption, look for the code sample in your language – it will save you hours of debugging.

In addition to the authentication, S3 API comes with an Access Control List (ACL) for every bucket and every object. The ACL can be set via a separate request or at the time when the bucket or an object are created. Here is the set of currently supported ACL choices.


Figure 4: Current S3 ACL choices

Using AJAX to access S3

S3 opens an intriguing possibility of dramatically simplifying the back end for some applications. You can envision the architecture where an application simply consists of a client, which directly communicates with S3. This client can be a desktop application or an applet or an Ajax application embedded in the browser. Such a model is not appropriate for enterprise systems that require complex data processing, transactions and caching, but it could work well for things like Google Sync or del.icio.us. Simplicity, instant scalability, a built-in security model and Amazon’s reputation make a strong case.

If you are writing a Firefox extension or Ajax-based web application, you can either write your own S3 wrapper in JavaScript or use S3Ajax developed by Les Orchad. S3Ajax basically mimics the S3 API, and takes away the pain of dealing with the formatting of the raw requests and encryption.

It appears that the developer has plans to build this out further. It would be good to have a higher level of abstraction built in, as well as a way to loop through the objects in the bucket and ability to fetch multiple objects concurrently.

This all sounds very sweet, but what about the performance?

Needless to say, the performance is one of the top questions on the Amazon S3 forum. Amazon does not provide any specific data and does not have an SLA in place. But the design requirements and the design principles from the S3 creators show strong focus on performance and scalability.

There are a few benchmarks that independent developers posted on forums. I’ve seem these benchmarks improve since S3 launched. The latest results, indicate low latency. A benchmark on March 17 said: “Putting a 2 MB mp3 took 0.8s, retrieving it took 1.085s — quite fast, quite responsive”.

So who is using Amazon S3 right now?

Even though the service launched just a few months ago there is already a number of companies leveraging it for a wide range of purposes. In personal on-line storage space we note ElephantDrive (Windows), JungleDisk (Windows,Linux,Mac) and Filicio.us (Web-based). From the discussion forums it is clear that a few companies are planning to use S3 to store media files, but we cannot tell who these companies are. At adaptiveblue we are using S3 to store the bluemarks of our users favorite books, movies and music. And if you want to get a feel for what other things are possible, take a look at this top10 list.

Conclusion

Amazon S3 is an innovative, exciting new service that is going to change the way we do computing on the web. Michael Arrington of TechCrunch said this in one of his posts:

“S3 provides a terrific opportunity for startups with great ideas for a storage user interface to avoid building a back end storage infrastructure. Amazon is offering extremely low pricing and a very dependable infrastructure. For some people, S3 will allow them to launch a service that they otherwise couldn’t have built.”

If you have not done so already, take a look at S3 and see for yourself. A good place to start playing with it is the jSh3ll, which offers shell like command line interface to the service. Then join the Amazon Web Services program and start coding. The possibilities are endless!


Read Full Post | Make a Comment ( 2 so far )

Liked it here?
Why not try sites on the blogroll...

Follow

Get every new post delivered to your Inbox.