Page 1 of 1
					
				About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Wed Mar 27, 2013 6:52 am
				by aleksey232
				Q: Which of the following patterns will correctly capture all Hex numbers that are delimited by at least one whitespace at either end in an input text?
A: (\s|\b)0[xX][0-9a-fA-F]+(\s|\b)
"0x22" does not contain any spaces, but the number will still be captured.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Wed Mar 27, 2013 8:47 am
				by admin
				You need the delimiter if there are multiple numbers in the string. In your example, there is only one number, which matches the pattern, so there is no question of delimiter.
In other words, the question does not ask you to match white space. It asks you to use white space as a delimiter.
HTH,
Paul.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Thu Mar 28, 2013 5:09 am
				by aleksey232
				Then I think that question should clarify that input string must contain whitespace delimiters.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Thu Mar 28, 2013 7:04 am
				by admin
				Hi Aleksey, 
I am not sure what you mean because the question does say, "...are delimited by at least one whitespace...". So it is clear that whitespace is to be used as a delimiter.
HTH,
Paul.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Fri Sep 20, 2013 9:06 am
				by The_Nick
				Would it make sense using & operator in regex? or better there is the chance of getting question on it:)? 
The_Nick.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sat Mar 15, 2014 5:08 pm
				by accurate_guy
				"0x1+0x2" contains two hex numbers delimited by "+" but not a space. Nevertheless they are matched by the pattern.
The problem with the pattern is that there exist characters which are not a space but form a word boundary. This applies to all non-word characters (eg: "0x1@0x2").
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sat Mar 15, 2014 9:22 pm
				by admin
				You are right. The pattern should be: (\s|^)0[xX][0-9a-fA-F]+(\s|$)
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sun Mar 16, 2014 4:20 am
				by accurate_guy
				Thanks for your fast replies.
A problem of the new pattern is that it doesn't match two hex numbers separated by just one space, eg "0x1 0x2". Only the first hex number is matched because the space is already consumed by the first one.
To fix this I've added \G (the end of the previous match):
(^|\s|\G)0[xX][0-9a-fA-F]+(\s|$)
I am not sure if the \G operator needs to be known in the exam.
In practice I would put the hex number itself inside parenthesis (as a capturing group) to exclude the spaces from the match.
Regards
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sun Mar 16, 2014 5:24 am
				by admin
				Are you sure, I just tried it and it matches correctly. Here is the code:
Code: Select all
            Pattern pattern = 
            Pattern.compile("(\\s||^)0[xX][0-9a-fA-F]+(\\s||$)");
            Matcher matcher = pattern.matcher("0x22 0x44");
            while (matcher.find()) {
                System.out.println("Found the text "+matcher.group()+" starting at " +matcher.start()+" and ending at index "+ matcher.end());
            }
Output:
Code: Select all
Found the text 0x22  starting at 0 and ending at index 5
Found the text 0x44 starting at 5 and ending at index 9
 
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sun Mar 16, 2014 6:53 am
				by accurate_guy
				Yes, I am sure. Somehow double-pipes (||) came into your code.
Try using 
Code: Select all
Pattern pattern = Pattern.compile("(\\s|^)0[xX][0-9a-fA-F]+(\\s|$)");
 
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Sun Mar 16, 2014 9:09 pm
				by admin
				You are right. Not sure why || works. Must be fixed.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Tue Mar 18, 2014 3:56 am
				by accurate_guy
				(\\s||^) "works" because it means
* space
* or nothing
* or beginning of the line
The problem with it is that it matches other delimiters than space.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Mon Jul 21, 2014 2:43 am
				by bptoth
				In the first option example in fact 0x1a seems to be captured not only 0x1
In the second option description in "[a-zA-Z_0-9]" is there need for the underscore?
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Tue Mar 31, 2015 1:11 am
				by pfilaretov
				Regex is hard to understand for me, so please clarify one question..
regex "[\s\b]0[xX][0-9a-fA-F]+[\s\b]" won't compile. The error is "Illegal/unsupported escape sequence near index 4". 
The square brakets ('[' and ']') mean "OR" or "RANGE", aren't they? So why is that ok: "[abc]", and that is not: "[\s\b]"?
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Tue Mar 31, 2015 7:27 am
				by admin
				Not really sure why 

 
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Tue Jan 03, 2017 3:00 pm
				by jagoneye
				The explanation should have matcher.end()-1 to return the correct ending index since it returns past the index integer.
			 
			
					
				Re: About Question enthuware.ocpjp.v7.2.1425 :
				Posted: Tue Jan 03, 2017 10:24 pm
				by admin
				In Java, the ending index is almost always one after after the last. For example, if you do substring(1, 3), it will return characters from index 1 and 2 (not 3). The ending character (or element in the case of a list) is excluded. The explanation just prints the value of match.end() from that perspective. Changing it from end()-1 will just cause confusion.