Make sure you've read Part 1 on how two-way devices impact performance and Part 2 on setting up the test environment.
With our test steps defined, test data created, and device simulators implemented we could begin our actual testing. We began by building an automated script to repeatedly and consistently execute our test steps. These scripts executed the test steps either by mimicking a series of user web requests or by accessing existing IntelliSOURCE APIs. To help us monitor the tests, the scripts pushed intra-test logging and preliminary results into our Slack channel.
So that we could understand the performance trend as we scaled we started our tests with 25,000 devices and progressively built our way up to our goal of 100,000. In the future we'll scale to 500,000 - 1,000,000 devices. We built additional scripts that allowed us to automatically add simulated devices
to our test environments. Removing devices proved more complicated so we saved system snapshots before adding more simulated devices.
While our test script ensured we executed our tests in a consistent and repeatable manner, we observed identical test cycles. The interactions of unrelated components (from the application layer to the testing tools to the physical hardware) and the intentionally created randomness (different devices targeted for demand response, distributed lag rates, etc.) both contribute to differences in results. While this variability can be frustrating, successful performance testing must embrace (or at least accept) it. We managed the variability by executing our tests numerous times, ultimately ending up with more than 250 test cycles and over 1,500 data points.
One of the benefits of two-way devices is knowing exactly which device received a message and the time the message got to the device. This information is stored in IntelliSOURCE and formed the basis of our measurement. For example, we recorded when outgoing messages were created and when responses from each device were stored in the database. We began with manually executed SQL queries and analysis in Excel. This was cumbersome but gave us a great way to iterate our queries and experiment with different presentations. Once we were happy with our measurement we built a simple Ruby on Rails application to automatically fetch data, perform analysis, and produce reports for each test event. A huge benefit of this investment is that it can be applied to measure how our customer's production systems are currently performing.
We followed a similar approach to our log file analysis. We began with standard UNIX tools to parse the logs and Excel to perform analysis. This evolved into a set of custom Splunk reports and dashboards enabled by automatic log forwarding.
One of the key items that both our custom measurement application and Splunk provides is data visualization. Meaningful data visualization is invaluable when comparing different test cycles, uncovering performance issues, and communicating with others. Meaningful data visualization is also hard; it requires strong knowledge of the application to determine what to present and patience to experiment with the best way to present it. Throughout our analysis we used column charts, scatter charts, pie charts, and candlestick charts to help us answer different questions.Effective data visualizations expose patterns that are difficult to discern in raw measurements
At the end of this project we demonstrated that with 100,000 DirectLink devices the IntelliSOURCE DRMS can calculate which devices to include in a demand response event, send messages to each device, and receive two way acknowledgements from all of the devices in under 100 seconds. This result met our goal and has surpassed all of our customer's service level agreements. It also gives us a great foundation as we continue to scale to 500,000 - 1,000,000 devices.
No performance testing effort would be complete without finding and fixing a few bottlenecks:
- Optimizing data structures. While newer programming languages have made it easier and more enjoyable to write code, they have also made it easier to write code that performs poorly (especially at large scale). Refactoring data structures by focusing on the most efficient type for each use case yielded significant performance impacts.
- Optimizing SQL queries. This is one of the first places people look when addressing performance concerns. We found some performance increases by simply changing SQL queries (particularly by removing IN clauses) but the real payoff was de-normalizing specific parts of the schema to make the queries significantly simpler.
- Separating and buffering operations. It's natural to implement a single functional requirement (receive and store telemetry) as a single process. However, at large scale a delay in one operation (storing telemetry) can impact the other operation (receiving messages). Separating those operations into multiple processes connected by a queuing mechanism can alleviate this bottleneck.
Lastly, we've been able to build a framework for testing our solutions in an environment that more closely resembles our customers' production systems. This is an important addition to our existing automated unit and integration tests. We've already re-used that framework for other projects within Comverge and are excited to continue enhancing it.Thanks for reading this series of posts! We hope you've enjoyed it. We've also assembled all three blog posts into one white paper, which you can download here.If you have any questions, please contact me at firstname.lastname@example.org 
In addition to devices our script added the associated accounts, premises and enrollments managed by the IntelliSOURCE DRMS.
The tradeoff is that we are storing more data
We're using Redis