In the following example we examine two ways of extracting text that matches a specific pattern from an input string. We can either use the -match operator, or the Select-String cmdlet. The former is perfectly fine if we're sure that there is only one match in the input string we process, the Select-String cmdlet can return multiple matches so it's more versatile.
We have a string with a chat dump. The dump contains attachment identifiers, like d32b83a2-3645-44f5-a4ff-b20542d080ce, the following scripts extract those for us.
Method 1 - The -match operator
The regex expression is simple, we match all patterns that starts with id=" followed by the id itself, which is 5 groups of word characters in a capturing group. Powershell handles all alphanumeric characters and underscore as word type characters.
In other words: \w = [a-zA-Z0-9_]
# Example sample $InputStuff = 'Upload from chat<attachment id="d32b83a2-3645-44f5-a4ff-b20542d080ce"></attachment><attachment id="35182f26-2e8d-4ec4-b60b-d971b6165946"></attachment>' $Regex = 'id="(\w{8}-\w{4}-\w{4}-\w{4}-\w{12})' # Method 1 $InputStuff -match $Regex $Matches[1]
Method 2 - With the Select-String cmdlet
The former solution returned only the first match in our source string. If we need all matches, in our case all file ids returned, we have to use Select-String with the -AllMatches switch. The command returns a set of Matches with each file ids. Those matches have two groups, we use the second group (index of 1, the first group has the index of 0) to capture out extracted file id.
# Example sample $InputStuff = 'Upload from chat<attachment id="d32b83a2-3645-44f5-a4ff-b20542d080ce"></attachment><attachment id="35182f26-2e8d-4ec4-b60b-d971b6165946"></attachment>' $Regex = 'id="(\w{8}-\w{4}-\w{4}-\w{4}-\w{12})' # Method 2 (Select-String -InputObject $InputStuff -Pattern $Regex -AllMatches).Matches | Foreach-Object {$_.Groups[1].Value}
And the result:
Comments