Web hosting

Tuesday, July 26, 2011

GroupWare Evaluation

This project was borne from the fact that the company wanted an inexpensive option/s with regards to intra-company communication. To this end we looked as both Open Source webmail, groupware, and solutions that were currently available. Note that these results were from an evaluation of the primary options that were available about 1.5 years from the date of publication of this particular post and may or may not be applicable to your particular situation or needs. Please do your own due diligence.

Seems quite responsive but not sure of the specifcations on their evaluation box. Standard/Advanced GUI that are accessible at the touch of a button. When installed on local (desktop) class hardware was not as responsive as expected. Checked minimum/recommended specifications for hardware and project does not seem feasible from a financial perspective any longer. Even when looking at minimum specifications on their website they seem to be far in excess of the financial resources available to this company. Great desktop client although it does seem to be resource intensive on our Dell GX260/270/280's. Obviously one of the first real Open Source alternatives as well so has a strong development history and community behind it.

Has most of the capabilities that we require but doesn't integrate particularly well with Outlook. Obviously there is the learning curve problem with regards to the web based interface. What we currently use but based on user feedback it could be better. Latest version has come quite a long way. Suprisingly, synchronisation with your PDA is a realistic/simple option now.

Only took a very cursory look at this. LiveCD in German. Interface was unwelcoming and highly doubt that users will be inclined to use it. Did not continue with evaluation. Moreover, project seems to have become dormant. Last post on homepage for website seems to be from 2009. Does provide integration with various MUA's though.

Basically, this is what our users want but due to financial contraints we are unable to pursue it as our network is composed of a number of Linux servers as well as a small number of Windows based terminal servers which can not be re-deployed to meet GroupWare needs. Moreover, the price of properly specified hardware for a Windows based server that is able to run Exchange does not fit into our budget. We calculate that that the project may range anywhere between five and six figures depending on whether or not we factor in future expansion needs.


The fact that this software is an atrociously good representation of Microsoft Exchange has done it many favours. Basically works on top of existing Linux stack with a web based interface on top. Just like Exchange though uses MAPI error codes which can be almost impossible to decipher without a proper reference source. During evaluation we discovered that some services may need to be restarted to ensure proper/continuous operatation without intervention but we believe that this may have more to do with mis-configuration since there doesn't seem to be many other users online experiencing this same problem at this stage. Strong integration with Outlook but does seem to have a few problems with offline caching mode which we later decided to turn off (may be fixed by the time of publication of this post). Pricing is reasonable (four to five figures).

Web interface reminiscent of Exchange. Pricing structure is reasonable (four figures). Has Outlook plugin to allow for better integration. A possible contender but was not mature as Zarafa was at the time of evaluation. Pricing is reasonable at first glance.


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Introduction to Modern Finance

While finance/trading/commerce has existed for several thousand years, never has the number of financial and type of options available been so extravagant and exotic. Below is a reading list of the basic asset classes and some of the terms that you will commonly find in today's financial markets. This list will be updated over time.


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Friday, July 22, 2011

WAN Acceleration on a Budget

Once upon a time computers were completely independent. With the advent of networking (especially the Internet) latency, bandwith, and redundancy are playing increasingly important roles. This is where WAN optimisation technology comes in. Depending on its configuration it can be configured to act as a proxy, and/or a reverse proxy or even in multi-node configuration that is used to accelerate LAN (most likely between multiple sides/nodes in a VPN) as well as WAN traffic. There are obviously many different vendors attempting to sell there wares which include hardware appliances, virtualised appliances, as well as complete software solutions whether they are installed on a dedicated server and/or your local desktop. Based on what I've seen they operate on multiple levels that may be protocol dependent and/or not. Below are some of the more inexpensive options out there along with premium options further down.

Thursday, July 21, 2011

Inexpensive NAS/SAN Device Evaluation Results

These are the results of experimentation using both servers and desktop hardware using software based NAS/SAN solutions. The servers were single/dual chip Xeon systems with 4GB RAM and 7.2K/10K SAS/SATA drives and GigE while the desktop systems (which were always the location of the NAS/SAN software installation) were either a Celeron with 512MB/1GB/2GB and 7.2K SATA drives or an Intel E8400 with 4GB RAM and 7.2K SATA drives. GigE was used whenever possible although we had to resort to using 10/100 Ethernet sometimes due to resource constraints. OpenFiler as well as FreeNAS were obviously tested. There is no guarantee that these results are valid across the board due to the low spec hardware being used and are as much for personal interest as for experimental validation. Moreover, the software versions being used are probably about 1.5 years old from the date of publication of this post. Note that Windows based NAS/SAN solutions are also available such as from StarWind who make both target as well as initiator software for Windows Server based operating systems.

OpenFiler stats in RAID1 configuration using NTFS with default blocks over iSCSI over GigE on server-d3. Results are very consistent.
~50-60MB/s read via iSCSI
~50-60MB/s write via iSCSI

[root@server-c1 test]# smbclient '//'
WARNING: The "printer admin" option is deprecated
Domain=[OPENFILER.OPENDO] OS=[Unix] Server=[Samba 3.2.6]
smb: \> mget *
Get file CentOS-5.3-i386-bin-1of6.iso? yes
getting file \CentOS-5.3-i386-bin-1of6.iso of size 654186496 as
CentOS-5.3-i386-bin-1of6.iso (19788.6 kb/s) (average 19788.6 kb/s)
smb: \> mget *
Get file CentOS-5.3-i386-bin-4of6.iso? y
getting file \CentOS-5.3-i386-bin-4of6.iso of size 662644736 as
CentOS-5.3-i386-bin-4of6.iso (21133.0 kb/s) (average 20443.0 kb/s)
Get file CentOS-5.3-i386-bin-3of6.iso? y
getting file \CentOS-5.3-i386-bin-3of6.iso of size 665085952 as
CentOS-5.3-i386-bin-3of6.iso (19755.4 kb/s) (average 20207.0 kb/s)
Get file CentOS-5.3-i386-bin-5of6.iso? y
getting file \CentOS-5.3-i386-bin-5of6.iso of size 668745728 as
CentOS-5.3-i386-bin-5of6.iso (20591.9 kb/s) (average 20302.7 kb/s)

OpenFiler stats in RAID1 configuration using NTFS with default blocks over iSCSI over GigE. Results are extremely variable depending on the file/s being tested.
~60-110MB/s read via iSCSI
~10-30MB/s write via iSCSI

OpenFiler stats in RAID0 configuration using NTFS over iSCSI over 10/100 Fast Ethernet.
10.4MB/s via iSCSI

OpenFiler stats in RAID0 configuration using NTFS over iSCSI over 10/100 Fast Ethernet.
10.4MB/s via iSCSI

OpenFiler stats in RAID0 configuration using Ext3 over SMB over 10/100 Fast Ethernet.

smb: \> mput ENDIAN*
9150.8 kb/s)
smb: \> mput openfiler*
Put file openfiler-2.3-x86-disc1.iso? yes
putting file openfiler-2.3-x86-disc1.iso as
\openfiler-2.3-x86-disc1.iso (8907.6 kb/s) (average 8990.8 kb/s)
smb: \> mput elastix*
Put file Elastix-1.5.2-stable-i386-bin-31mar2009.iso? yes
putting file Elastix-1.5.2-stable-i386-bin-31mar2009.iso as
\Elastix-1.5.2-stable-i386-bin-31mar2009.iso (9147.2 kb/s) (average
9077.9 kb/s)

OpenFiler stats in RAID1 configuration using Ext3 over SMB over 10/100 Fast Ethernet.

[root@server-m ISOS]# smbclient '//'
Domain=[OPENFILER.OPENDO] OS=[Unix] Server=[Samba 3.2.6]
smb: \> dir
. D 0 Wed Jun 10 16:54:56 2009
.. D 0 Wed Jun 10 16:54:56 2009

44010 blocks of size 16777216. 41762 blocks available
smb: \> mput ENDIAN*
7962.9 kb/s)
smb: \> mput openfiler*
Put file openfiler-2.3-x86-disc1.iso? y
putting file openfiler-2.3-x86-disc1.iso as
\openfiler-2.3-x86-disc1.iso (8888.7 kb/s) (average 8542.7 kb/s)
smb: \> mput elastix*
Put file Elastix-1.5.2-stable-i386-bin-31mar2009.iso? y
putting file Elastix-1.5.2-stable-i386-bin-31mar2009.iso as
\Elastix-1.5.2-stable-i386-bin-31mar2009.iso (8855.3 kb/s) (average
8715.4 kb/s)

FreeNAS stats in a RAID 0 formatted configuration using 10/100 Fast Ethernet with UFS using NFS.

-rw-r--r-- 1 root root 180313088 May 9 05:01

[root@localhost Hyperic]# time `cp hyperic-hq-installer-4.1.2-win32.msi /mnt/freenas/`

real 0m33.350s
user 0m0.034s
sys 0m0.503s

-rwxr----- 1 root root 482002540 Apr 22 22:57

[root@localhost ~]# time `cp zcs-5.0.14_GA_2850.RHEL5.20090303142201.tgz /mnt/freenas/`

real 0m48.765s
user 0m0.081s
sys 0m1.566s

-rwxr--r-- 1 root root 19509248 May 5 09:48 WindowsXP-KB936929-SP3-x86-ENU.exe

[root@localhost ~]# time `cp WindowsXP-KB936929-SP3-x86-ENU.exe /mnt/freenas/`

real 0m3.386s
user 0m0.005s
sys 0m0.052s

Definitely hit a bottleneck somewhere here. Dead flat line on transfer test now...

FreeNAS stats in a RAID 0 formatted configuration using 10/100 Fast Ethernet with UFS.
11MB/s FTP
8.6MB/s SMB

FreeNAS stats in a RAID 0 formatted configuration using 10/100 Fast Ethernet with 2GB RAM/64 KByte Blocks and then remount as a iSCSI target.
9.2MB/s iSCSI via HD-Tune

FreeNAS stats in a RAID 0 Configuration using 10/100 Fast Ethernet with 2GB RAM/64 KByte Blocks.
10.4MB/s iSCSI via HD-Tune

FreeNAS stats in a RAID 0 Configuration using 10/100 Fast Ethernet with 2GB RAM/4096 Byte Blocks.
10.4MB/s iSCSI via HD-Tune

FreeNAS stats in a RAID 0 Configuration using 10/100 Fast Ethernet with 2GB RAM.
10.4MB/s iSCSI via HD-Tune

FreeNAS stats in a RAID 0 Configuration using 10/100 Fast Ethernet with 512 RAM.
10.2MB/s iSCSI via HD-Tune

FreeNAS stats in a RAID 1 Configuration using 10/100 Fast Ethernet with 512 RAM.
11MB/s FTP
10MB/s SMB
9.8MB/s iSCSI via HD-Tune
1.3 MB/s HTTP

Test LACP later using ProCurve (Switch_C) in order to determine impact of link aggregation.

OpenFiler was attempted and worked in a single disk configuration with LDAP authentication but when it ran with RAID 1 and LDAP the entire graphical interface seemed to break. For instance, evaluation page seemed to stop.

Things to definitely check for when building a software based NAS are cable connections (especially IDE/SATA cables), power supply capacity, adequate bandwidth, and a means through which to test the bandwidth of your connection.

Obviously, FC is not within our current budgetary means.

If had the budget to build a hardware based DAS/NAS/SAN solution for under 1K would most likely look at following solutions:

- Synology DS-209 (good feature set, high price, fast transfer rates)
- Netgear ReadyNAS Duo and its variants (good feature set, reasonable price, reasonable transfer rates)
- Thecus N2050B (very few reviews available though and is technically a DAS device that uses eSATA)

Options that would definitely avoid are the following:

- DLink DNS-323 (has problems with RAID rebuild based on reviews)
- Linksys NAS200 (slow transfer rates, 10/100 only)
- NetGear SC-101/101T (not a real NAS/SAN, 10/100 only, limited number/type drives can be used)

- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Wednesday, July 20, 2011

Apache Web Server Configuration

Its become almost essential for any IT professional to be able to setup a web server nowadays. While the advent of so called personal web server solution stacks (such as XAMPP) have aided testing/development its still necessary to understand a number of concepts in order to configure it.

First, a web server is just a means of delivering of static/non-static content to an end user via computer networking.

Second, content is going to be delivered form a single actual directory which is 'pointed to' via DNS records on the Internet to a physical/virtual host.

Third, through the advent of Virtual Hosts many web servers are often able to serve more than website at a time (based on IP and/or name).

Fourth, the basic web server is often able to only host basic content or needs to be reconfigured in order to provide additional functionality. For example, you may require additional modules such as MySQL or a 'Handler' in order to provide for this.

Fifth, not all web servers are created equal. Like most other products and services, a web server can be designed with certain parameters in mind such as speed, security, or even certain technology frameworks. These additions can be particularly important in deciding which server you may end up choosing depending on the circumstance/s. For example, so called application servers which can have fairly specific capabilities that are not often available in conventional web servers.


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Tuesday, July 19, 2011

FInancial Protocol Interfacing

It used to the case that if you had an interest in a particular field/domain you had to purchase/trial expensive proprietary software to enable you to 'play with it'. Now that is no longer the case. A lot of the major software standards have become standardised which has led to a proliferation of commerical implementations as well as open source alternatives which can be used to simulate real environments through test driven development based on black box testing. Play aroud with virtual interfaces, a bit of fuzzing/random data, random lagging, as well as a proper test bench which is able to supply a large set of data and you could very well accurately simulate real world communications exchange. For the sake of brevity though, we'll leave details of how to achieve this for the reader.

From a personal perspective, while you may not be able to directly interface with an exchange you are still more than able to track movements of various assets and indexes by interfacing with various web based sources of information through a web scraper and then using this information in combination with internal/external algorithms which provide a recommendation of whether to buy/sell in order to a interface with an Internet based broker to conduct your transactions. While there is evidence to suggest that technical/human trading can fall behind tracking an index over the long term (especially after taxes are factored in) it is said that contrarian theory may be able buck this trend and rarely a small group of firms are able to beat the market over the long term. However, I'm not entirely sure this factors in all types of trading out there.

Monday, July 18, 2011

Application Servers

As the world has moved towards the cloud/Internet so has the need for applications designed to manage them. Among these are so called, 'Application Servers' which serve to manage web based applications. Often, they are/run language/platform specific applications and are able to significantly manage the environment in which the application operates. In most modern and competitive environments you're able to achieve fine grained control of security, the amount of memory available to a particular application, the number of threads that are available to an application at any one time, and increasingly, application level load balancing/redundancy. In fact, in some of the more advanced application servers its almost like having a virtual server.


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Online Development Hosting Platforms

For Open Source Developers (even contractors and other people who are on a low budget and/or don't have the time/resources to setup up a local server) there is nothing better than free when it comes to finding online development hosting platforms. Increasingly these platforms are providing greater functionality and provide you not only with a place to upload code but also version control systems (public/private), documentation, mailing lists, bug tracking systems, etc... Some of the major hosts are outlined below with analysis with a comparison of their strengths/weaknesses them by other writers towards the bottom of this post.

Sunday, July 17, 2011

Database Performance Optimisation Techniques

A while back I was working on a project which involved database underperformance of a web based application. Obviously, there are many ways through which to achieve this. Some of these methods are outlined below and include modern as well as well as traditional techniques.

Of late, there has been a trend towards so called non-relational/NoSQL type databases which scale horizontaly rather than vertically. The reasoning behind these thoughts seem to be based on a paper by Edward Schaffer that seems to imply that since the UNIX/Linux shell is capable of producing almost any result required through piping of various commands and since it doesn't have to be abstracted to run across multiple operating systems it should theoretically be able to build a high performance operating system off of this particular base. The exact quote is as follows.

"The UNIX file structure is the fastest and most readily-available database engine ever built: directories may be viewed as catalogs and tables as plain ASCII files. Commands are common UNIX utilities, such as grep, sed and awk. Nothing should be reinvented.

NoSQL was born with these ideas in mind: getting the most from the UNIX system, using some commands that glue together various standard tools. Although NoSQL is a good database system, this is not a panacea for all you problems. If you have to deal with a 10 gigabytes table that must be updated each second from various clients, NoSQL doesn't work for you since it lacks of performance on very big tables, and on frequent updates you must be in real time. For this case, I suggest you use a stronger solution based on Oracle, DB2 or such packages on a Linux cluster, AS/400 or mainframes.

However, if you have a web site containing much information and more reading occurs than writing, you will be surprised how fast is it. NoSQL (pronounced noseequel, as the author suggests) derives most of its code from the RDB database developed at RAND Organization, but more commands have been built in order to accomplish more tasks."

Obviously, there are problems with using this relatively these relatively immature technologies. Research seems to indicate that depending on the implementation being used they can be hard to modify, ad-hoc reporting can be difficult to produce, and that there is a general lack of expertise within the IT community in this field since almost organisations have no need to switch towards its use and existing relational type databases offer good enough performance as well as a mature toolset and other reporting capabilities.

Although I haven't personally experimented with in memory databases there's no questioning their usefullness. Obviously, a database that runs purely from RAM is going to be faster from both an access and transfer perspective than one that requires the use of secondary storage (even though this may change with the advancement of SSD technology and RAM based cards that have somehow been modified to act as secondary storage).

"Inefficient schemas. Adding indexes can help improve performance. However, their impact may be limited if your queries are inefficient because of poor table design that results in too many join operations or in inefficient join operations. Schema design is a key performance factor. It also provides information to the server that may be used to optimize query plans. Schema design is largely a tradeoff between good read performance and good write performance. Normalization helps write performance. Denormalization helps read performance."

Due to the fact that databases are 'abstractions' that run on top of existing software/hardware the usual performance bottlenecks should be looked at. These include, hardware specification, RAID level.operating system (and tuning options), and even network issues can play a factor especially if using distributed database being used and geography starts to become a problem. Strategies in dealing with these problems are outlined in the, 'Building a Cloud Computer Service' document, http://sites.google.com/site/dtbnguyen/

Obviously the type of join as well as the number of joins can impact on performance.

Use of indexes (whether they are partial, complete over table, compound, full-text, and/or column based only).

While there are two main database engines (InnoDB, MyISAM) there are also others that cater specifically to certain types of transactions.

"Yes, each MySQL storage engine will perform differently but the result will depend on the type of transaction. Reads, Writes, Transaction size, HA requirements are a few of the factors that would effect performance for each storage engine choice.

Prior to MySQL 5.5 the default storage engine is MyISAM which performs great on reads. With MySQL 5.5 the default storage engine is InnoDB which has improved performance over prior releases and allows a nice mix of performance and ACID compliance. These are the 2 most popular storage engines but there are other storage engines like memory, blackhole, ndb and others that offer a unique value for specific applications.

Generally speaking the safe bet is to use InnoDB for most of your tables and revisit the choice after you hit a performance bottleneck and are looking to tune things. The nice thing is it will be a simple task to change the storage engine for a table and provided the table size is not too great it shouldn't take too long to change and test each storage engine for your specific application."

While I have never personally used database proxy technology it is clear that they have their place. Like other proxy type technologies though they are obviously subject to local caching problems.

Creation of temporary tables on tables where other methods are unsuitable. Can be particularly useful when you have a resource intensive query that may be executed many times but whose actual data is rarely updated.

Finally, you can conduct tuning of the actual database server software itself. Something that you need to learn especially if you are working with limited resources and you aren't actually able to alter the database itself (can often be the case if the work has been outsourced to an external contractor).

Below is the result of one of these benchmarking/tuning scripts.

[root@db1 ~]# ./tuning-primer.sh

Using login values from ~/.my.cnf


Testing for stored webmin passwords: None Found

Could not auto detect login info!

Found Sockets:

Using: /var/lib/mysql/mysql.sock
Would you like to provide a different socket?: [y/N]
Do you have your login handy ? [y/N] : y
User: root

Would you like me to create a ~/.my.cnf file for you? [y/N] :

- By: Matthew Montgomery -

MySQL Version 5.0.45-log i686

Uptime = 0 days 1 hrs 20 min 24 sec
Avg. qps = 58980
Total Questions = 284523063
Threads Connected = 27

Warning: Server has not been running for at least 48hrs.
It may not be safe to use these recommendations

To find out more information on how each of these
runtime variables effects performance visit:
Visit http://www.mysql.com/products/enterprise/advisors.html
for info about MySQL's Enterprise Monitoring and Advisory Service

The slow query log is enabled.
Current long_query_time = 10 sec.
You have 93 out of 284727673 that take longer than 10 sec. to complete
Your long_query_time may be too high, I typically set this under 5 sec.

The binary update log is NOT enabled.
You will not be able to do point in time recovery
See http://dev.mysql.com/doc/refman/5.0/en/point-in-time-recovery.html

Current thread_cache_size = 8
Current threads_cached = 3
Current threads_per_sec = 0
Historic threads_per_sec = 0
Your thread_cache_size is fine

Current max_connections = 100
Current threads_connected = 27
Historic max_used_connections = 56
The number of used connections is 56% of the configured maximum.
Your max_connections variable seems to be fine.

Max Memory Ever Allocated : 1.08 G
Configured Max Per-thread Buffers : 1.20 G
Configured Max Global Buffers : 426 M
Configured Max Memory Limit : 1.61 G
Physical Memory : 3.95 G
Max memory limit seem to be within acceptable norms

Current MyISAM index space = 93 M
Current key_buffer_size = 384 M
Key cache miss rate is 1 : 162
Key buffer free ratio = 85 %
Your key_buffer_size seems to be fine

Query cache is enabled
Current query_cache_size = 32 M
Current query_cache_used = 7 M
Current query_cache_limit = 1 M
Current Query cache Memory fill ratio = 24.79 %
Current query_cache_min_res_unit = 4 K
Your query_cache_size seems to be too high.
Perhaps you can use these resources elsewhere
MySQL won't cache query results that are larger than query_cache_limit in size

Current sort_buffer_size = 2 M
Current read_rnd_buffer_size = 7 M
Sort buffer seems to be fine

Current join_buffer_size = 132.00 K
You have had 182 queries where a join could not use an index properly
You should enable "log-queries-not-using-indexes"
Then look for non indexed joins in the slow query log.
If you are unable to optimize your queries you may want to increase your
join_buffer_size to accommodate larger joins in one pass.

Note! This script will still suggest raising the join_buffer_size when
ANY joins not using indexes are found.

Current open_files_limit = 1134 files
The open_files_limit should typically be set to at least 2x-3x
that of table_cache if you have heavy MyISAM usage.
Your open_files_limit value seems to be fine

Current table_cache value = 512 tables
You have a total of 26 tables
You have 81 open tables.
The table_cache value seems to be fine

Current max_heap_table_size = 16 M
Current tmp_table_size = 32 M
Of 319 temp tables, 3% were created on disk
Effective in-memory tmp_table_size is limited to max_heap_table_size.
Created disk tmp tables ratio seems fine

Current read_buffer_size = 1 M
Current table scan ratio = 44612 : 1
You have a high ratio of sequential access requests to SELECTs
You may benefit from raising read_buffer_size and/or improving your use of indexes.

Current Lock Wait ratio = 1 : 5
You may benefit from selective use of InnoDB.
If you have long running SELECT's against MyISAM tables and perform
frequent updates consider setting 'low_priority_updates=1'
If you have a high concurrency of inserts on Dynamic row-length tables
consider setting 'concurrent_insert=2'.

- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Thursday, July 14, 2011

Maximising the User Experience in a Desktop Operating System

One of the bonuses with working with a server based operating system is being able to closely monitor and manage resource utilisation (and as a side effect heat and power consumption) using existing tools created by either the software vendor or third party suppliers. Obviously, when working with desktop class operating systems you have no such luxury. Hence, one must rely on the latter. Some of the ones that I have trialed included the following.

While they may seem wonderfully complex they all seem to use the same means through which to achieve their goals. In fact, I was even able to build something as a proof of concept on Linux myself a while back.

"ThreadMaster monitors all running applications, and detects when a application starts to use excessive processor resources. It dynamically hunt down the offending thread, and clamp the CPU for this thread. This feature ensures that other users can remain working without noticing anything. The clamped process will eventually finish, and the clamp will be removed automatically."

"You can have a wide range operations performed, or settings applied, each time a process is run. You can choose at what priority processes should run, and which CPUs should be assigned to them. You can also disallow certain processes from running, log all processes run, automatically restart processes when they terminate or reach a resource consumption threshold, limit the number of instances, and much more. You can even indicate processes that should induce entrance into the High Performance power scheme and/or prevent the PC from sleeping."


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Wednesday, July 13, 2011

Online Shopping Comparators

Over the last few weeks I've been working on a project on/off that involves web crawling, scraping of data online/offline from various sources, and analysing/re-formatting it. However, only recently have I been reminded of the increasing importance of this practice.

When you work for a company that has financial and/or philosophical limitations you need to be resourceful. To this end, research becomes a critical aspect of your job from time/effciency perspectives and you'll soon discover the wonderful world of comparators/aggregators such as ones on shopping.

From a technical standpoint they're not much more different than a standard web scraper. Create a web crawler which then looks for certain patterns within the web sites that they have crawled and use this information in order to create dynamically generated websites containing pertinent information within regards to a particular subject domain. The other alternative would be to rely on a push type mechanism similar to the way RSS/Atom feeds work where relevant information is pushed out at regular intervals although this would obviously require co-operation at both ends of the spectrum. Hence, the former philosophy is more favoured. The basic code/structure for such a piece of software is as follows.


- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Low Resource Internet Browsing

Over the last decade or so people have been moving further and further towards Internet/Cloud based services. As such, our dependence on browser technologies has been increasing over time. Some of the more popular out there include:

What about the little known alternatives though? Clearly, as programming languages have become more featureful development of browsers and GUI interfaces have become trivial. While I was working at a company with a limited IT budget I discovered the need to reduce system requirements in order to maintain quality of service on terminal servers. To this end, I explored many of the options that were available in terms of low resource usage browsers. Some of these included turning Flash off entirely through the use of browser add-ons such as Firefox's No-Script and Flashblock, turning off various client side based technologies such as JavaScript and so on. Obviously turning off features would lead to useability/accessibility options though. As such, I had to explore browsers which were specially designed with low resource utilisation in mind. Originally, Opera was classified as one of these but I don't believe that this philosophy continues with it or any of the major browsers out there since consumer hardware has developed to such an extent that focus on these particular issues has become de-emphasised. Moreover, it seems to be a case where software is mostly designed to scale up rather than down now. While I would normally list my research findings with regards to low resource browsers here, there seems to be quite a lot of research out there already.

Let's just put it this way though. In my personal experiments I found that browing the exact same websites on the exact same hardware/software configuration could result in memory resource utilisation that was half of a normal browser and CPU resource utilisation that was a fraction of a conventional browser (and this was for for a browser that was built on a very similar code base! Firefox Vs K-Meleon)

- as usual thanks to all of the individuals and groups who purchase and use my goods and services

Automated Audiobook Maker Script, Random Stuff, and More

- wanted to find a way to automated building of audiobooks. Built the following: https://sites.google.com/site/dtbnguyen/audiobook_maker-...