Jmeter : Regular Expression Extractor

In this example, we will demonstrate the use of Regular Expression Extractor post processor in Apache JMeter. We will go about parsing and extracting the portion of response data using regular expression and apply it on a different sampler. Before we look at the usage of Regular Expression Extractor, let’s look at the concept.

Regular Expression

Regular expression is a pattern matching language that performs a match on a given value, content or expression. The regular expression is written with series of characters that denote a search pattern. The pattern is applied on strings to find and extract the match. The regular expression is often termed as regex in short. Pattern based searching has become very popular and is provided by all the known languages like Perl, Java, Ruby, Javascript, Python etc. The regex is commonly used with UNIX operating system with commands like grep, ls, awk and editors like ed and sed. The language of regex uses meta characters like . (matches any single character), [] (matches any one character), ^ (matches the start position), $ (matches the end position) and many more to devise a search pattern. Using these meta characters, one can write a powerful regex search pattern with combination of if/else conditions and replace feature. The discussion about regex is beyond the scope of this article. You can find plenty of articles and tutorials on regular expression available on the net.

Regular Expression Extractor

Regular Expression (regex) feature in JMeter is provided by the Jakarta ORO framework. It is modelled on Perl5 regex engine. With JMeter, you could use regex to extract values from the response during test execution and store it in a variable (also called as reference name) for further use. Regular Expression Extractor is a post processor that can be used to apply regex on response data. The matched expression derived on applying the regex can then be used in a different sampler dynamically in the test plan execution. The Regular Expression Extractor control panel allows you to configure the following fields:

Apply to : Regex extractor are applied to test results which is a response data from the server. A response from the primary request is considered main sample while that of sub request is a sub sample. A typical HTML page (primary resource) may have links to various other resources like image, javascript files, css etc. These are embedded resources. A request to these embedded resources will produce sub samples. An HTML page response itself becomes primary or a main sample. A user has the option to apply regex to main sample or sub samples or both.

Field to check : Regex is applied to the response data. Here you choose what type of response it should match. There are various response indicators or fields available to choose. You can apply regex to plain response body or a document that is returned as a response data. You can also apply regex to request and response headers. You can also parse URL using regex or you can opt to apply regex on response code.

Reference Name : This is the name of the variable that can be further referenced in the test plan using ${}. After applying regex, the final extracted value is stored in this variable. Behind the scenes, JMeter will generate more than 1 variable depending on the match occurred. If you have defined groups in your regex by providing parenthesis (), then it will generate as many variables as number of groups. These variables names are suffixed with the letters _g(n) where n is the group no. When you do not define any grouping on your regex, the returned value is termed as the zeroth group or group 0. Variable values can be checked by using Debug Sampler. This will enable you to verify whether you regular expression worked or not.

Regular Expression : This is the regex itself that is applied on the response data. A regex may or may not have a group. A group is a subset of string that is extracted from the match. For example, if the response data is ‘Hello World’ and my regex is Hello (.+)$, then it matches ‘Hello World’ but extracts the string ‘World’. The parenthesis () applied is the group that is captured or extracted. You may have more than one group in your regex, so which one or how many to extract, is configured through the use of template. See the below point.

Template : Templates are references or pointers to the groups. A regex may have more than one groups. It allows you to specify which group value to extract by specifying the group number as $1$ or $2$ or $1$$2$ (extract both groups). From the ‘Hello World’ example in the above point, $0$ points to the complete matched expression that is ‘Hello World’ and $1$ group points to the string ‘World’. A regex without parenthesis () is matched as $0$ (default group). Based on the template specified, that group value is stored in the variable (reference name).

Match no. : A regex applied to the response data may have more than one matches. You can specify which match should be returned. For example, a value of 2 will indicate that it should return the second match. A value of 0 will indicate any random match to be returned. A negative value will return all the matches.

Default value: The regex match is set to a variable. But what happens when the regex does not match. In such a scenario, the variable is not created or generated. But if you specify a default value then if the regex does not match then the variable is set to the specified default value. It is recommended to provide a default value so that you know whether your regex worked or not. It is a useful feature for debugging your test.

Regular Expression Extractor Example

We will now demonstrate the use of Regular Expression Extractor by configuring a regex that will extract the URL of an article from Google website. After extracting the URL, we will use it in a HTTP Request sampler to test the same. The extracted URL will be set in a variable.

Configuring Regular Expression Extractor

Before we configure regex extractor, we will create a test plan with a ThreadGroup named ‘Single User’ and a HTTP Request Sampler named ‘Google Homepage’. It will point to the server www.google.com. The below image shows the configured ThreadGroup (Single User) and HTTP Request Sampler (Google Homepage).

Next, we will apply the regex on the response body (main sample). When the test is executed, it will ping the web site named www.google.com and return the response data which is a HTML page. This HTML web page contains articles, the title of which is wrapped in a <h2> tag. We will write a regular expression that will match the first <h2> tag and extract the URL of the article. The URL will be part of an anchor <a> tag. Right click on Google Homepage sampler and select Add -> Post Processors -> Regular Expression Extractor.

The name of our extractor is ‘Google Article URL Extractor’. We will apply the regex to the main sample and directly on the response body (HTML page). The Reference Name or variable name provided is ‘article_url’. The regex used is <h2 .+?><a href="http://(.+?)".+?</h2>. We will not go into the details of the regex as this is a different discussion thread altogether. In a nutshell, this regex will find or match the first <h2> tag and extract the URL from the anchor tag. It will strip the word http:// and extract only the server part of the URL. The extractor itself is placed in a parenthesis () forming our first group. The Template field is set with the value of $1$ that points to our first group (the URL) and the Match No. field indicates the first match. The Default Value set is the ‘error’. So if our regex fails to match then the variable article_url will hold the value ‘error’. If the regex makes a successful match, then the article URL will be stored in the article_url variable.

We will use this article_url variable in another HTTP Request sampler named Google Article. Right click on Single User ThreadGroup and select Add -> Sampler -> HTTP Request.

As you can see from the above, the server name is ${article_url} which is nothing but the URL that was extracted from the previous sampler using regex. You can verify the results by running the test.

View Test Results

To view the test results, we will configure the View Results Tree listener. But before we do that, we will add a Debug Sampler to see the variable and its value being generated upon executing the test. This will help you understand whether your regex successfully matched an expression or failed. Right click on Single User ThreadGroup and select Add -> Sampler -> Debug Sampler.

As we want to debug the generated variables, set the JMeter variables field to True. Next, we will view and verify test results using View Results Tree listener. Right click on Single User ThreadGroup and select Add -> Listener -> View Results Tree.

First let’s look at the output of Debug Sampler response data. It shows our variable article_url and observe the value which is the URL that we extracted. The test has also generated group variables viz. article_url_g0 and article__url_g1. The group 0 is a regular general match and group 1 is the string that is extracted from the general match. This string is also stored in our article_url variable. The variable named article_url_g tells you the no. of groups in the regex. Our regex contained only 1 group (note the sole parenthesis () in our regex). Now lets look at the result of our Google Article sampler:

The Google Article sampler successfully made the request to the server URL that was extracted using regex. The server URL was referenced using ${article_url} expression.

Conclusion

The regular expression extractor in JMeter is one of the significant feature that can help parse different types of values on different types of response indicators. These values are stored in variables that can be used as references in other threads of the test plan. The ability to devise groups in the regex, capturing portions of matches makes it even more a powerful feature. Regular expression is best used when you need to parse the text and apply it dynamically to subsequent threads in your test plan. The objective of the article was to highlight the significance of Regular Expression Extractor and its application in the test execution

Please download the example script from below link :

https://app.box.com/not-just-a-tester-repository