Web Scraping is the process of extracting data from websites, preferably using a program which simulates human exploration by sending simple HTTP requests or emulating a full web browser. Web Scraping, Content Scraping, Screen Scraping, Web Harvesting or Web Data Extraction are all analogous terms. In general, anything that you can see on the internet can be extracted and the process made automated.
There is a close resemblance between web scraping and web indexing. However, one stark difference is that web scraping is focussed on gathering a particular type of data like contact information, whereas the objective of content scraping is to gather all the data that is present. Web scraping has been used effectively in many fields like online price comparison (BuyHatke) and web mashup (Frrole). (more…)
In this season of gratitude, we are thankful for your continued support.
Wishing you a Merry Christmas and Happy New Year!
In the previous posts on high availability architecture, we have already talked about scaling databases and Content Delivery Networks. Many a times, we have talked about evenly distributing the requests to the different nodes, and also about how to avoid downtime when some node or component fails. In this post, the prime objective is to talk about these processes of load balancing and failover systems in details.
Load balancing is a technique of distributing your requests over a network when your server is maxing out the CPU or disk or database IO rate. The objective of load balancing is optimizing resource use and minimizing response time, thereby avoiding overburden of any one of the resources.
The goal of failover is the ability to continue the work of a particular network component or the whole server, by another, should the first one fail. Failover allows you to perform maintenance of individual servers or nodes, without any interruption of your services. (more…)
We have already discussed on the general ideas behind High Availability architecture. The scaling of databases is a very crucial component of the implementation of high availability architecture. In this post, we would explore the various techniques of scaling up databases. The distribution models discussed here apply to both relational and non-relational (NoSQL) databases.
As is the case with other components of high availability architecture that I would discuss in subsequent posts, it is imperative that the database solution that you plan to implement should be optimal to the needs of your product or application. Let us first see the need of scaling databases. (more…)
Virtual Private Networks (VPNs) allow businesses the ability to allow remote employees and vendors access into their private network when outside of its physical boundaries. Utilizing the Internet for leverage, VPNs connect a remote client into the private network as if they are physically connected to an internal switch. Once connected, the client workstation receives an internal private address and can access applications, file shares, and printers normally restricted to local access networks. Many different types of VPN connectivity solutions exist today that offer a range of features and security, but why would a business considering a VPN solution?
Connecting Remote Sites
VPN tunnels not only allow individual workstations to connect into the network, they can also allow entire remote locations to access the LAN. In doing so, a VPN connection between two sites essentially creates a WAN to allow two networks, in two separate physical locations, to communicate.
For example, assume a small business is opening a new location across the street. The owners want the primary customer database server to stay at the original location, yet be accessible by its new location as well. One option would be to have a physical cable connection run across the street from one building to the next. This option is costly and could prove to be insecure or unreliable. A solution utilizing a VPN tunnel would be more cost effective, more secure, and more reliable. Each location would most likely already have a connection to the Internet. Utilizing a VPN connectivity device, such as a firewall or software solution, the two buildings can be connected via a VPN tunnel that communicated via these Internet connections. Doing so will create a logical connection between the two location, over the Internet, and allow devices within the two buildings to communicate as if they are physically connected. Imagine this same solution being used on a larger scale by nationwide or even global companies, and you can see how VPN tunnels allow large corporations to interlink their local connections together into a single private network.
The physical workplace may be common to many workers today but the number of remote workers is growing. These employees are working from home (or any Internet enabled location) utilizing VPN connections into a central database. Instead of entire buildings being interconnected, these employees are connecting directly from a VPN client on their laptop or smartphone into the company’s private network. In doing so, the employee gains the benefit of working from home while still having access to every aspect of the network as if working inside the physical company building. In return, businesses are seeing a reduction in the cost of overhead. No longer must office space be purchased or leased to house workers during business hours. This also reduces their electrical, heating, and office supply bills.
Similarly, VPNs allow companies to create a broader disaster recovery plan by deploying VPN client enabled laptops to their employees if a disaster occurs. Essentially, if a company experiences a disaster where a location offline or companies cannot report for a length of time, a VPN connection can be utilized to replace their physical reliance of the workplace. A VPN tunnel connection could also a lifesaver when vendor access is needed. If a vendor must be onsite to simply control or view the screen of a computer or server, a VPN tunnel could be utilized instead to allow the vendor to connect remotely. This could bring the company a quicker solution and save the cost of vendors traveling to an on-site location.
VPN connections offer a secure method to companies who must connect remote locations or wish to reduce the overhead of a physical workplace. As the technology progresses and more benefits are found, VPN tunnels could become the primary method employees utilize to connect into a private WAN.
VPN Connection Options
Two primary technologies exist today to connect to a remote network utilizing: SSL and IPSEC. Both over a secure means to accessing internal networks remotely; however, they differ on how the connection is established. IPSEC establishes a secure connection utilizing software installed on the client PC. The client software establishes the connection to the remote VPN server. This authentication can be through Active Directory credentials or through a shared passphrase. Unlike the client-based IPSEC, SSL connection can be established through an Internet browser. This makes the VPN connection more manageable as a user does not need to install any software to connect. Utilizing the web- based client, a remote user can access the SSL VPN server device over any compatible browser.
In the real world, there can be situations when a dip in performance of your servers might occur from events ranging from a sudden spike in traffic can lead to a sudden power outage. It can be much worse and your servers can be crippled- irrespective of whether your applications are hosted in the cloud or a physical machine. Such situations are unavoidable. However, rather than hoping that it doesn’t occur, what you should actually do is to gear up so that your systems don’t encounter failure.
The answer to the problem is the use of High Availability (HA) configuration or architecture. High availability architecture is an approach of defining the components, modules or implementation of services of a system which ensures optimal operational performance, even at times of high loads. Although there are no fixed rules of implementing HA systems, there are generally a few good practices that one must follow so that you gain the most out of the least resources. (more…)
A .htaccess (hypertext access) file is a directory-level plain text configuration file for web servers, which, in simple terms, controls access to a certain directory in your server. The use of .htacess files became popular because they could be used to override global level server settings related to access of directories. However, in recent times, .htaccess can override many other configuration settings.
Several modern web servers like Apache support .htaccess or related files. Although some other popular servers like Nginx do not have a direct support for .htaccess files, there are ways by which we can convert .htaccess rules to work in Nginx.
.htaccess rules apply to a directory and all its subdirectories, unless there are more .htaccess files present within the sub-directories. The permissions of the .htaccess should be such that it allows universal read access but user only write access. (more…)
VLANs (Virtual Local Area Networks) are two or more LAN subnets that exist on the same networking equipment, such as a switch or firewall. Given that ports on a switch function independently, this creates the ability to treat each port as if it is its own network. Grouping these ports together creates a VLAN, essentially creating subsets of logical networks on a physical switch.
For example, assume you are using an eight port switch. If no VLANs existed, assume the entire switch operated on the 10.81.44.X network. Any devices attached to the switch could communite to one another as long as their IP address fall between 10.81.44.1 and 10.81.44.254. Now assume we have implemented VLANs on the switch. The first four ports are still associated with the 10.81.44.X network; however, we have configured the last four ports to act on the 192.168.1.X network. Doing so, we have essentially created two logical networks on one physical network switch. Only devices on the first four ports can now communicate with each other and the same goes for devices attached to the last four ports.
So what benefits do VLANs give us?
Each network has its own broadcast domain. Whenever a broadcast packet is sent out, this packet gets sent to every device on the network. As the number of devices attached to the network grow, so do the amount of broadcast packets being sent throughout the network. As the amount of traffic grows, these broadcast packets can congest the network and could potentially slow things down. Splitting the traffic into two networks created by VLANs can greatly reduce the broadcast traffic and reduce congestion on the network.
VLANs offer the ability to keep data packets from multiple networks separated. Organizations who wish to utilize wireless Internet in their workspace, yet still wish to maintain a private and secure network can utilize VLANs to achieve this goal. Take the example used earlier where two networks exist: 10.81.44.X and 192.168.1.X. The 10.81.44.X network is a private network that contains critical file servers, e-mail servers, and potentially private data that should only be accessed by internal employees. If the company simply attached a wireless router to this network, anyone with some computer knowledge could easily hack into the router from within the wireless range and access this private data network. This is where VLANs and the 192.168.1.X network come into play. On the company’s switch, a VLAN can be created specifically for the new wireless network of 192.168.1.X. These ports on the switch associated with the wireless VLAN would communicate only to the Internet and traffic would never pass between the two networks. A router would need to be placed in the middle of these two networks in order for the two to communicate. As a switch does not function as a router, the packets pass only to those ports associated with the same VLAN and function as if there are two physical networks in place.
Dividing Critical Network Traffic
Often, networks will have some sort of device or system that requires a large amount of network bandwidth. One example are VOIP phones which require voice packets to travel at a higher priority compared to file or email packets. VLANs offer a chance to segregate this higher priority traffic to their own network to avoid voice traffic from clogging network bandwidth. Similar to the example explained above, a new network could be created without purchasing any more switching hardware utilizing VLANs. The 10.81.44.X network would remain as the primary data network and a new network, 192.168.1.X, created for the VOIP traffic. The way this differs is that the same ports can be utilized for both voice and data VLANs, meaning a single port can function on two VLANs at once. Doing so still divides the traffic, as the data packets from each network will be tagged with a specific ID number correlating to each VLAN. Assume the data VLAN has a VLAN ID of 1 and the voice VLAN has a VLAN ID of 200. When a packet travels to a switch port with both a computer and VOIP phone attached, the port looks at the VLAN ID and knows which device to pass the packet to. Devices also check this VLAN and discard any packets that do not match the same network as their own. Through the use of VLANs and unique VLAN IDs, devices can reside on the same physical switch port yet still function on two logical networks.
Configuring a VLAN on a network brings multiple benefits to the security and functionality of a network, without the need to purchase more hardware. If bandwidth issues or the need for a separate wireless network arises, first turn to VLANs to save the day. You’ll save yourself some money and learn a lot about how networks functions along the way.
Most companies that have to share large files online know the limitations of FTP. Employees that share large files have to download an FTP client or use a cryptic login to a web browser to share files. In order to have any type of collaboration, once your files have been updated, you have to email everyone involved to let them know there’s a new version. When you take into consideration how many emails are ignored or fall into a spam filter, it’s no wonder collaboration can come to a screeching halt with FTP.
What About Email?
While email is great for sending and receiving small files, most of us have learned the hard way if you send anything over 2 MB in length you don’t know if it’s going to arrive at the other end. While some companies have increased the size of emails that can flow through their servers, you never know if it’s going to make it. To top it off, sometimes you don’t even get notification if the email failed to get delivered.
File Sharing to the Rescue
Online file sharing makes handling large files easy. Anyone with a web browser (and proper access) can post and retrieve files securely. Collaboration improves with file updates that let everyone on the list know when a change is made. In addition, advanced features like version tracking make collaboration easy. File update notifications can tell people when a file needs to be reviewed.
Using a private cloud solution allows you to maintain your own hardware and security. You have more flexibility with your customer data and critical enterprise files. Regulatory compliance is much easier, as some regulations will not allow a public cloud provider.
Tonido filecloud easily meets these needs with a private cloud solution that gives you total control over your secure file sharing. Filecloud allows administrators to set up shares that are accessed securely by employees and customers alike. Sharing files is as easy as sending a link via email.
Filecloud supports auditing so you no longer have to worry about who changed which files when. You can easily tell if a person has seen a file and made appropriate changes.
Since you own the product, filecloud will allow your company to set up your own branding with custom logos and a domain. This can be great for establishing trust amongst employees and customers.
Mobility has become a way of life for business today. Most employees need access to their files from phones or other mobile devices. Filecloud allows file access from anywhere using a mobile device for Apple, Android, and Windows.
Tonido filecloud gives you a simple solution that is installed on your network. You control permissions in any users that have access to it. Once it is set up, employees can access the system from their computer or mobile devices from anywhere. Click on http://www.getfilecloud.com to find out more today.