Programatically detecting type / platform of the Amazon Machine Images
Yesterday I was talking with one of the Libcloud users on our IRC channel. The user was trying to figure out if there is a programmatic way to detect type of the image used (also called a platform) by an EC2 instance (e.g. Linux, RHEL, Windows, Windows with SQL server, etc.).
This information is important because the EC2 instance pricing depends on the type of the image used (more on that bellow).
I was already looking into this in the past while trying to extend pricing information which is available in Libcloud. I didn’t have much luck back then, but I decided to look into it again and dig deeper this time.
After a lot of research and poking with the API, it turned out that there still seems to be no programmatic and reliable way to determine that (if I missed something out, please let me know).
In this post I’m going to have a quick look at how EC2 instance pricing works and at some of the less than ideal approaches which can be used to determine the image type.
How EC2 instance pricing works
First lets have a quick look at how the whole EC2 instance pricing works.
Compared to a lot of other cloud providers, EC2 pricing is very complex and depends on multiple factors:
- Region (us-east-1 us-west-1, eu-west-1, …)
- Instance type (t1.micro, m1.small, m1.xlarge, …)
- Image type (Linux, RHEL, SLES, Windows, Windows with SQL Server standard, …)
- Is the instance EBS optimized
- Is the instance on-demand, reserved or spot
- Volume discounts
- Data transfer
- Other resources associated with this instance (e.g. EBS volumes)
If you want to calculate an accurate instance pricing information, you need to take into account all the factors mentioned above.
Amazon EC2 pricing information
Amazon offers all the pricing information in a human readable format on their pricing page, but they don’t offer a documented API which could be used to consume this information programatically.
Luckily, the pricing page reads JSON files (e.g. http://aws.amazon.com/ec2/pricing/json/linux-od.json) which can also be consumed programatically.
Those JSON files are undocumented and the bad thing with any undocumented feature is that it could be changed or removed at any time without any prior notice.
Sadly that’s the best we’ve get so far so we need to stick with it for now.
Programatically detecting the image type / platform
I’ve spent a bunch of time researching and poking with the API and the web interface, but I had no luck with finding an API method which would return that information.
DescribeImages API method does return platform
attribute, but only for
Windows based images. This means you still need to use a different approach
to detect RHEL, SLES and other type of Windows images.
EC2 api has some undocumented features like the undocumented max-instances
,
max-elastic-ips
and vpc-max-elastic-ips
value for the AttributeName
filter used by the DescribeAccountAttributes API method. Because of that,
I also tried a bunch of undocumented things and filter values, but I had no
luck with retrieving a platform
attribute for all the images or retrieving
only RHEL based images.
The interesting thing is that the web interface does show an image type / platform, but it seems to use a private method to obtain this information.
1. Inferring platform from the image details
Each image has name a name, description and a bunch of other attributes associated with it.
This information can be used to infer the platform from it or to build a static list which maps image id to a platform.
Inferring platform from the name and description should work reasonably well for the standard images, but it breaks down for private or copied images with custom names and descriptions.
On the other hand, the problem with a static list approach is that it doesn’t scale and it’s time consuming and error prone to keep it up to date.
2. Scrapping The Cloud Market website
The Cloud Market website provides details (including platform / image type) for every publicly available Amazon Machine Image.
This approach basically just builds on the static list approach, but instead of putting the burden of keeping this list up to date on you, it puts it on the Cloud Market team.
The Cloud Market website provides an API, but you can only retrieve details for the images which you are owner of. This means that to retrieve a platform for a particular image, you need to scrape the website which again is very hacky and far from ideal.
Conclusion
As you can see, all of the approaches I have describes are hacky and far from ideal, but sadly that’s the best we have so far.
Let’s just hope Amazon will pick their stuff together and finally provide an official API for this in the near future.