We talk with customers all the time about their web application issues, and we regularly get the question about how to approach the browser migration process itself. Customers invariably come to us with a known application (or applications) that are not compatible with the version of Internet Explorer they want to run. Beyond the small set of known applications lies all of the unknowns. Donald Rumsfeld once famously spoke of the “unknown unknowns”, which is what customers are really asking about – how do they figure out the scope and breadth of their situation. A fair number of customers we work with have no idea how bad (or good) the state of their particular web applications may be since they don’t know what they have. We’ll address the issue of readiness and issue assessment in a future blog post. This blog post is focused on the first step, the web application discovery process.
Our good friend Chris Jackson has a short and very clear formula for assessing the value of time and money against the discovery and assessment process. There seem to be few people in the industry who see the breadth of native client and web application compatibility issues and environments as clearly as Chris sees them, so I deeply respect his opinion. From a business owner standpoint, I can see why Chris’s approach would make someone uncomfortable, but he’s done the math and removed emotion and fear from the situation to come to his approach. I believe our methodology encompasses his view and provides the business owner with the factual data points needed to evaluate risk that yields the ‘numbers’ to plug into Chris’s formula and lets customers solve compatibility issues in the properly measured way.
We talk with many organizations that start the conversation by saying they want everything to work exactly like it does today in IE6. The details of what ‘exactly like IE6’ means is a deep topic into itself that I’ll address another day. For now we’ll focus on desired application functionality – users should be able to do the same tasks as before and get the same results. Browsium Ion was designed to solve the application functionality issue. Before we can talk about having the right solution, we need to define the problem and understand the process and costs of discovery. I am hopeful this discussion will enable customers to look at that expense in the context of real impact on their users. While I’ll build the financial case against a broad discovery process, since it can be incredibly costly, I will also provide a general view of the discovery process since it can be used on a smaller scale where the costs are manageable. The final decision is specific to each situation, but this information should provide a context for everyone to evaluate their approach.
I’ll lead off by saying there is no simple answer or single tool to solve the discovery and inventory problem. The answer lies in process and using a variety of tools available today. At the risk of showing preference (or offense) to any specific vendor, I will not discuss a given product or specific solution. This discussion isn’t designed to be a step by step guide either.
There are actually two questions we need to address in getting at the numbers for the formula – the first question is focused on web application discovery (what do you have) and the second is impact (what does it do). With the answer in hand you will be in position to run the numbers and select Browsium Ion as the solution to the relevant issues.
We’ve seen loads of available assessment tools on the market and we have yet to find a discovery tool which can spider an organization’s web servers and report back what is running where, with complete version and vendor information. Creating an inventory of web applications is complicated by the fact that web apps don’t really exist in the same way that native Windows apps exist, so the inventory tools available today need to be combined to get the data you need. Web applications have two things in common – they all run on a web server and render in a web browser. We need to use available tools to look at both these points in order to find the unknown unknowns and plug them into the formula.
First you need to figure out where to look. Getting the sample methodology right for your organization is essential. If you use a sample size which is too small and you miss things. Too large and you have noise to filter. You also need to think about business cycles – different things happen at different points of the month in sales, HR, finance, etc. It’s also true at end of month and end of quarter. If you look at the wrong time you may miss a critical web application.
You can use a variety of tools, proxy or firewall logs, as well as simple home grown setups to collect the list of URLs people are accessing. But in and of itself, a URL can be too broad (‘intranet.mycompany.com’ or ‘finance.mycompany.com’) or too vague (‘web02.apps.mycompany.com’) to accurately identify a specific web application. The tools also lack any mechanism to automatically pull apart the URLs to determine if it’s one web application for the site or 100 small applications with unique functions. I want to be clear that I’m not saying the tools are bad – the problem is just really hard and the likely success rate for automation is low. We can use the URL data they collect, but it needs to be combined with other information to provide a useful filter.
Let’s switch over and look at the financial feasibility issues for a second. Chris suggested to me that an average organization will collect anywhere from 10 million to 200 million unique URLs through this discovery process. 90-130 million is a common range Chris has seen. That’s a lot of data. I have no idea how many times it would circle the globe when laid end to end, but it’s a big number. Using simple math we can show it’s massively impractical to wade through that much data. Assume we have (only) 25 million unique lines of data with one URL per line, and a staffer can review 200 lines per hour. That’s going to take 125,000 hours…more than 60 ‘work’ years. You would need a team of people to work on the task, even using ‘cheap’ labor at $15/hr it would cost $1.875 million just to know what you have. If your dataset is closer to the numbers Chris sees, you’re going to spend upwards of $5 million. It will never show even a fractional ROI to look at everything inside the network to ensure it ‘works like IE6’.
We can shift back now to the process. Assuming you’ve targeted the samples well and have a small data set, and you’ve simply decided to focus on only the topmost percentage of URLs, or better still if you were able to get the business to tell you what they know, you need to see what web applications are actually running where. The next step is to look at the servers. We can use standard inventory tools to help gather a bunch of useful data on this. Our experience shows that customers should use the list of URLs to cross reference server hosts, since we have too many examples where ‘rogue’ servers were deployed outside of IT’s control, where they are invisible to internal audits and have none of the standard agent tools installed. A surprising number of critical line-of-business applications we are brought in to remediate with Ion are outside the IT radar.
Side note: We see several themes emerge as we talk with so many great companies, and one of those themes is an expectation that IT is responsible for gathering this great big list of everything in discovery. Those same businesses have business units standing up services outside of the IT process … so how did it suddenly become IT’s responsibility to figure it all out? It’s backwards. I’ve run IT organizations before and know firsthand what it’s like to be ‘responsible’ for a line of business system my team had no part in spec’ing, building or provisioning. Maybe only one of these would work to help persuade the business owners they need to be the ones owning discovery – if it’s critical, they’ll know.
Inventory and configuration harvesting tools should indicate if the server is running IIS, Apache, SharePoint, SQL, etc., as well as provide the configuration and things like application pool data. Applications with some server side installed components, like many common ERP/CRM systems, will show up in the system data gathered by the collection tools. Some web applications may not leave these kinds of breadcrumbs, requiring customers to expend a little more effort to uncover them. Rooting out the details on those applications requires a look at the IIS (or Apache) configurations to see the document root for that instance and then looking at the files in the directory for a signed binary that will indicate the vendor and potentially the version.
With the list of URLs and an understanding of what is actually running at those locations, you can begin to analyze the usage to determine the cost and probability of failure numbers from Chris’s formula. Again, having good sample user data is critical – in terms of both the macro view of ‘all’ company employees as well as micro views at the department and functional business unit level. For example, if your sample shows that 18% of the overall company population, and 95% of the finance team, uses a web application you have determined to be a critical finance application, that would likely cost a great deal if it’s not working in the upgraded browser. It’s worth time to look at that application in detail. Likewise, if another application is used by a small fraction of users it should be prioritized lower and possibly pushed off to be fixed when users report specific issues as Chris suggests.
Here at Browsium, we’re always pushing to find better ways to solve these issues and we’re regularly debating the tools that we find, or others we might build, to address some of these gaps. We’ll continue to share updates as we discover better tools and more effective processes to help customers with their application assessments. For now I hope the information provided in this post will help get organizations moving toward a successful Windows 7 and IE migration. Please share your thoughts, including what’s working (or not working) for you, in the comments.
– Matt