Chapter 5. Strings and Unstructured Text

Introduction

Creating and manipulating text has long been one of the primary tasks of scripting languages and traditional shells. In fact, Perl (the language) started as a simple (but useful) tool designed for text processing. It has grown well beyond those humble roots, but its popularity provides strong evidence of the need it fills.

1 comment

  1. Jeff Poling Posted 13 days and 6 hours ago

    I will review this chapter

Add a comment

In text-based shells, this strong focus continues. When most of your interaction with the system happens by manipulating the text-based output of programs, powerful text processing utilities become crucial. These text parsing tools such as awk, sed, and grep form the keystones of text-based systems management.

1 comment

  1. Johannes Rössel Posted 21 hours ago

    awk, sed” -> “awk, sed” (the comma [and following space] should be formatted as normal text, not code)

Add a comment

In PowerShell's object-based environment, this traditional tool chain plays a less critical role. You can accomplish most of the tasks that previously required these tools much more effectively through other PowerShell commands. However, being an object-oriented shell does not mean that PowerShell drops all support for text processing. Dealing with strings and unstructured text continues to play an important part in a system administrator's life. Since PowerShell lets you to manage the majority of your system in its full fidelity (using cmdlets and objects), the text processing tools can once again focus primarily on actual text processing tasks.

Create a String

Problem

You want to create a variable that holds text.

Solution

Use PowerShell string variables to give you a way to store and work with text.

To define a string that supports variable expansion and escape characters in its definition, surround it with double quotes:

$myString = "Hello World"

To define a literal string (that does not interpret variable expansion or escape characters), surround it with single quotes:

$myString = 'Hello World'

Discussion

String literals come in two varieties: literal (nonexpanding) and expanding strings. To create a literal string, place single quotes ($myString = 'Hello World') around the text. To create an expanding string, place double quotes ($myString = "Hello World") around the text.

In a literal string, all the text between the single quotes becomes part of your string. In an expanding string, PowerShell expands variable names (such as $myString) and escape sequences (such as `n) with their values (such as the content of $myString and the newline character, respectively).

For a detailed explanation of the escape sequences and replacement rules inside PowerShell strings, see the section called “Strings”.

One exception to the "all text in a literal string is literal" rule comes from the quote characters themselves. In either type of string, PowerShell lets you to place two of that string's quote characters together to add the quote character itself:

$myString = "This string includes ""double quotes"" because it combined quote
characters."
$myString = 'This string includes ''single quotes'' because it combined quote
characters.'

This helps prevent escaping atrocities that would arise when you try to include a single quote in a single-quoted string. For example:

$myString = 'This string includes ' + "'" + 'single quotes' + "'"

Note

This example shows how easy PowerShell makes it to create new strings by adding other strings together. This is an attractive way to build a formatted report in a script but should be used with caution. Due to the way that the .NET Framework (and therefore PowerShell) manages strings, adding information to the end of a large string this way causes noticeable performance problems. If you intend to create large reports, see the section called “Generate Large Reports and Text Streams”.

Create a Multiline or Formatted String

Problem

You want to create a variable that holds text with newlines or other explicit formatting.

Solution

Use a PowerShell here string to store and work with text that includes newlines and other formatting information.

$myString = @"
This is the first line
of a very long string. A "here string"
lets you to create blocks of text
that span several lines.
"@

Discussion

PowerShell begins a here string when it sees the characters @" followed by a newline. It ends the string when it sees the characters "@ on their own line. These seemingly odd restrictions let you create strings that include quote characters, newlines, and other symbols that you commonly use when you create large blocks of preformatted text.

Note

These restrictions, while useful, can sometimes cause problems when you copy and paste PowerShell examples from the Internet. Web pages often add spaces at the end of lines, which can interfere with the strict requirements of the beginning of a here string. If PowerShell produces an error when your script defines a here string, check that the here string does not include an errant space after its first quote character.

Like string literals, here strings may be literal (and use single quotes) or expanding (and use double quotes).

In PowerShell version one, here strings were frequently used as the equivalent of block comments to disable lines in a script. PowerShell version two now supports this fully through multiline comments. For more information, see the section called “Comments”.

Place Special Characters in a String

Problem

You want to place special characters (such as tab and newline) in a string variable.

Solution

In an expanding string, use PowerShell's escape sequences to include special characters such as tab and newline.

PS > $myString = "Report for Today`n----------------"
PS > $myString
Report for Today
----------------

Discussion

As discussed in the section called “Create a String”, PowerShell strings come in two varieties: literal (or nonexpanding) and expanding strings. A literal string uses single quotes around its text, while an expanding string uses double quotes around its text.

In a literal string, all the text between the single quotes becomes part of your string. In an expanding string, PowerShell expands variable names (such as $ENV: SystemRoot) and escape sequences (such as `n) with their values (such as the SystemRoot environment variable and the newline character).

1 comment

  1. Johannes Rössel Posted 21 hours ago

    “such as $ENV: SystemRoot” → “such as $ENV:SystemRoot” (superfluous space)

Add a comment

Note

Unlike many languages that use a backslash character (\) for escape sequences, PowerShell uses a back-tick (`) character. This stems from its focus on system administration, where backslashes are ubiquitous in path names.

1 comment

  1. Johannes Rössel Posted 21 hours ago

    The backslash and backtick should be formatted as code.

Add a comment

For a detailed explanation of the escape sequences and replacement rules inside PowerShell strings, see the section called “Strings”.

2 comments

  1. Jeff Poling Posted 13 days and 6 hours ago

    I noticed that in v1 of the cookbook, the references to other sections included "Appendix A" or whatever appendix they are in. having more detail about where to go for info in the book would be helpful.

  2. Lee Holmes Posted 10 days and 13 hours ago

    Thanks. The display text for these hyperlinks gets auto-generated. I expect the printed version will do the same as the first edition.

Add a comment

Insert Dynamic Information in a String

Problem

You want to place dynamic information (such as the value of another variable) in a string.

Solution

In an expanding string, include the name of a variable in the string to insert the value of that variable.

PS > $header = "Report for Today"
PS > $myString = "$header`n----------------"
PS > $myString
Report for Today
----------------

To include information more complex than just the value of a variable, enclose it in a subexpression:

PS > $header = "Report for Today"
PS > $myString = "$header`n$('-' * $header.Length)"
PS > $myString
Report for Today
----------------

Discussion

Variable substitution in an expanding string is a simple enough concept, but subexpressions deserve a little clarification.

A subexpression is the dollar sign character, followed by a PowerShell command (or set of commands) contained in parentheses:

$(subexpression)

When PowerShell sees a subexpression in an expanding string, it evaluates the subexpression and places the result in the expanding string. In the solution, the expression '-' * $header.Length tells PowerShell to make a line of dashes $header.Length long.

Another way to place dynamic information inside a string is to use PowerShell's string formatting operator, which is based on the rules of the .NET string formatting:

PS > $header = "Report for Today"
PS > $myString = "{0}`n{1}" -f $header,('-' * $header.Length)
PS > $myString
Report for Today
----------------

For an explanation of PowerShell's formatting operator, see the section called “Place Formatted Information in a String”. For more information about PowerShell's escape characters, type Get-Help About_Escape_Characters or type Get-Help About_Special_Characters.

Prevent a String from Including Dynamic Information

Problem

You want to prevent PowerShell from interpreting special characters or variable names inside a string.

Solution

Use a nonexpanding string to have PowerShell interpret your string exactly as entered. A nonexpanding string uses the single quote character around its text.

PS > $myString = 'Useful PowerShell characters include: $, `, " and { }'
PS > $myString
Useful PowerShell characters include: $, `, " and { }

If you want to include newline characters as well, use a nonexpanding here string, as in Example 5.1, “A nonexpanding here string that includes newline characters”.

Example 5.1. A nonexpanding here string that includes newline characters

PS > $myString = @'
>> Tip of the Day
>> -------------
>> Useful PowerShell characters include: $, `, ', " and { }
>> '@
>>
PS > $myString
Tip of the Day
Useful PowerShell characters include: $, `, ', " and { }

Discussion

In a literal string, all the text between the single quotes becomes part of your string. This is in contrast to an expanding string, where PowerShell expands variable names (such as $myString) and escape sequences (such as `n) with their values (such as the content of $myString and the newline character).

Note

Nonexpanding strings are a useful way to manage files and folders that contain special characters that might otherwise be interpreted as escape sequences. For more information about managing files with special characters in their name, see the section called “Manage Files That Include Special Characters”.

As discussed in the section called “Create a String”, one exception to the "all text in a literal string is literal" rule comes from the quote characters themselves. In either type of string, PowerShell lets you place two of that string's quote characters together to include the quote character itself:

1 comment

  1. Jeff Poling Posted 13 days and 6 hours ago

    "in either type of string, Powershell let you...." --> should be "powershell lets you" ?

Add a comment

$myString = "This string includes ""double quotes"" because it combined quote
characters."
$myString = 'This string includes ''single quotes'' because it combined quote
characters.'

Place Formatted Information in a String

Problem

You want to place formatted information (such as right-aligned text or numbers rounded to a specific number of decimal places) in a string.

Solution

Use PowerShell's formatting operator to place formatted information inside a string.

PS > $formatString = "{0,8:D4} {1:C}`n"
PS > $report = "Quantity Price`n"
PS > $report += "---------------`n"
PS > $report += $formatString -f 50,2.5677
PS > $report += $formatString -f 3,9
PS > $report
Quantity Price
---------------
    0050 $2.57
    0003 $9.00

Discussion

PowerShell's string formatting operator (-f) uses the same string formatting rules as the String.Format() method in the .NET Framework. It takes a format string on its left side, and the items you want to format on its right side.

In the solution, you format two numbers: a quantity and a price. The first number ({0}) represents the quantity and is right-aligned in a box of 8 characters (,8). It is formatted as a decimal number with 4 digits (:D4). The second number ({1}) represents the price, which you format as currency (:C).

2 comments

  1. Johannes Rössel Posted 20 hours ago

    “with 4 digits (:D4)” → “with 4 digits (:D4)” (formatting “:D4” as code)

    “as currency (:C)” → “as currency (:C)” (formatting “:C” as code)

    (consistency with the other pieces of the format string in that paragraph)

  2. Johannes Rössel Posted 20 hours ago

    Disregard the second one; my eyes start getting weary.

Add a comment

Note

If you find yourself hand-crafting text-based reports, STOP! Let PowerShell's built-in commands do all the work for you. Instead, emit custom objects so that your users can work with your script as easily as they work with regular PowerShell commands. For more information, see the section called “Create and Initialize Custom Objects”.

For a detailed explanation of PowerShell's formatting operator, see the section called “Simple Operators”. For a detailed list of the formatting rules, see Appendix D, .NET String Formatting.

Although primarily used to control the layout of information, the string-formatting operator is also a readable replacement for what is normally accomplished with string concatenation:

PS > $number1 = 10
PS > $number2 = 32
PS > "$number2 divided by $number1 is " + $number2 / $number1
32 divided by 10 is 3.2

The string formatting operator makes this much easier to read:

PS > "{0} divided by {1} is {2}" -f $number2, $number1, ($number2 / $number1)
32 divided by 10 is 3.2

In addition to the string formatting operator, PowerShell provides three formatting commands (Format-Table, Format-Wide, and Format-List) that let you easily generate formatted reports. For detailed information about those cmdlets, see the section called “Formatting Output”.

Search a String for Text or a Pattern

Problem

You want to determine if a string contains another string, or want to find the position of a string within another string.

Solution

PowerShell provides several options to help you search a string for text.

Use the –like operator to determine whether a string matches a given DOS-like wildcard:

PS > "Hello World" -like "*llo W*"
True

Use the –match operator to determine whether a string matches a given regular expression:

PS > "Hello World" -match '.*l[l-z]o W.*$'
True

Use the Contains() method to determine whether a string contains a specific string:

PS > "Hello World".Contains("World")
True

Use the IndexOf() method to determine the location of one string within another:

PS > "Hello World".IndexOf("World")
6

Discussion

Since PowerShell strings are fully featured .NET objects, they support many string-oriented operations directly. The Contains() and IndexOf() methods are two examples of the many features that the String class supports. To learn what other functionality the String class supports, see the section called “Learn About Types and Objects”.

Although they use similar characters, simple wildcards and regular expressions serve significantly different purposes. Wildcards are much more simple than regular expressions, and because of that, more constrained. While you can summarize the rules for wildcards in just four bullet points, entire books have been written to help teach and illuminate the use of regular expressions.

Note

A common use of regular expressions is to search for a string that spans multiple lines. By default, regular expressions do not search across lines, but you can use the singleline (?s) option to instruct them to do so:

PS > "Hello `n World" -match "Hello.*World"
False
PS > "Hello `n World" -match "(?s)Hello.*World"
True

Wildcards lend themselves to simple matches, while regular expressions lend themselves to more complex matches.

For a detailed description of the –like operator, see the section called “Comparison Operators”. For a detailed description of the –match operator, see the section called “Simple Operators”. For a detailed list of the regular expression rules and syntax, see Appendix B, Regular Expression Reference.

One difficulty sometimes arises when you try to store the result of a PowerShell command in a string, as shown in Example 5.2, “Attempting to store output of a PowerShell command in a string”.

Example 5.2. Attempting to store output of a PowerShell command in a string

PS > Get-Help Get-ChildItem

NAME
    Get-ChildItem

SYNOPSIS
    Gets the items and child items in one or more specified locations.

(...)

PS > $helpContent = Get-Help Get-ChildItem
PS > $helpContent -match "location"
False

The –match operator searches a string for the pattern you specify but seems to fail in this case. This is because all PowerShell commands generate objects. If you don't store that output in another variable or pass it to another command, PowerShell converts to a text representation before it displays it to you. In Example 5.2, “Attempting to store output of a PowerShell command in a string”, $helpContent is a fully featured object, not just its string representation:

PS > $helpContent.Name
Get-ChildItem

To work with the text-based representation of a PowerShell command, you can explicitly send it through the Out-String cmdlet. The Out-String cmdlet converts its input into the text-based form you are used to seeing on the screen:

PS > $helpContent = Get-Help Get-ChildItem | Out-String
PS > $helpContent -match "location"
True

For a script that makes searching textual command output easier, see the section called “Program: Search Formatted Output for a Pattern”.

Replace Text in a String

Problem

You want to replace a portion of a string with another string.

Solution

PowerShell provides several options to help you replace text in a string with other text.

Use the Replace() method on the string itself to perform simple replacements:

PS > "Hello World".Replace("World", "PowerShell")
Hello PowerShell

Use PowerShell's regular expression –replace operator to perform more advanced regular expression replacements:

PS > "Hello World" -replace '(.*) (.*)','$2 $1'
World Hello

Discussion

The Replace() method and the –replace operator both provide useful ways to replace text in a string. The Replace() method is the quickest but also the most constrained. It replaces every occurrence of the exact string you specify with the exact replacement string that you provide. The –replace operator provides much more flexibility, since its arguments are regular expressions that can match and replace complex patterns.

Given the power of the regular expressions it uses, the -replace operator carries with it some pitfalls of regular expressions, as well.

First, the regular expressions that you use with the –replace operator often contain characters (such as the dollar sign that represents a group number) that PowerShell normally interprets as variable names or escape characters. To prevent PowerShell from interpreting these characters, use a nonexpanding string (single quotes) as shown by the solution.

Another, less common, pitfall is wanting to use characters that have special meaning to regular expressions as part of your replacement text. For example:

PS > "Power[Shell]" -replace "[Shell]","ful"
Powfulr[fulfulfulfulful]

That's clearly not what we intended. In regular expressions, square brakets around a set of characters means "match any of the characters inside of the square brackets." In our example, this translates to "Replace the characters, S, h, e, and l with 'ful'."

1 comment

  1. Johannes Rössel Posted 20 hours ago

    “square brakets” -> “square brackets”

Add a comment

To avoid this, we can use the regular expression escape character to escape the square brackets:

PS > "Power[Shell]" -replace "\[Shell\]","ful"
Powerful

However, this means knowing all of the regular expression special characters, and modifying the input string. Sometimes, we don't control that, so the [Regex]::Escape() method comes in handy:

PS > "Power[Shell]" -replace ([Regex]::Escape("[Shell]")),"ful"
Powerful

For more information about the –replace operator, see the section called “Simple Operators” and Appendix B, Regular Expression Reference.

Split a String on Text or a Pattern

Problem

You want to split a string based on some literal text, or a regular expression pattern.

Solution

Use PowerShell's -split operator to split on a sequence of characters or specific string:

PS > "a-b-c-d-e-f" -split "-c-"
a-b
d-e-f

To split on a pattern, supply a regular expression as the first argument:

PS > "a-b-c-d-e-f" -split "b|[d-e]"
a-
-c-
-
-f

Discussion

In PowerShell version one, the String.Split() and [Regex]::Split() methods were the two options available for splitting strings. While still available in PowerShell version two, PowerShell's -split operator provides a more natural way to split a string into smaller strings. When used with no arguments (the unary split operator), it splits a string on whitespace characters.

Example 5.3. PowerShell's unary split operator

PS > -split "Hello World `t How `n are you?"
Hello
World
How
are
you?


When used with an argument, it treats the argument as a regular expression, and then splits based on that pattern.

PS > "a-b-c-d-e-f" -split 'b|[d-e]'
a-
-c-
-
-f

If the replacement pattern avoids characters that have special meaning in a regular expression, you can use it to split a string based on another string.

PS > "a-b-c-d-e-f" -split '-c-'
a-b
d-e-f

If the replacement pattern has characters that have special meaning in a regular expression (such as the . character that represents 'any character'), use the -split operator's SimpleMatch option:

Example 5.4. PowerShell's SimpleMatch split option

PS > "a.b.c" -split '.'
(A bunch of newlines. Something went wrong!) 





PS > "a.b.c" -split '.',0,"SimpleMatch"
a
b
c


For more information about the -split operator's options, see Get-Help about_split.

While regular expressions offer an enormous amount of flexibility, the -split operator gives you ultimate flexibility by letting you supply a script block for split operation. For each character, it invokes the scriptblock and splits the string based on the result. In the script block, $_ represents the current character. For example, to split a string on even numbers:

Example 5.5. Using a script block to split a string

PS > "1234567890" -split { ($_ % 2) -eq 0 }
1
3
5
7
9


To split an entire file by a pattern, use the -Delimiter parameter of the Get-Content cmdlet.

For more information about the –split operator, see the section called “Simple Operators” and Get-Help about_split.

Combine Strings into a Larger String

Problem

You want to combine several separate strings into a single string.

Solution

Use PowerShell's unary -join operator to combine separate strings into a larger string using the default empty separator:

PS > -join ("A","B","C")
ABC

If you want to define the string that PowerShell uses to combine the strings, use PowerShell's binary -join operator.

PS > ("A","B","C") -join "`n"
A
B
C

Discussion

In PowerShell version one, the [String]::Join() method was the primary option available for joining strings. While still available in PowerShell version two, PowerShell's -join operator provides a more natural way to combine strings. When used with no arguments (the unary join operator), it joins the list using the default empty separator. When used between a list and a separator (the binary join operator), it joins the strings using the provided separator.

Aside from its performance benefit, the -join operator solves an extremely common difficulty that arises from trying to do it by hand.

When first writing the code to join a list with a separator (for example, a comma and a space), you usually end up leaving a lonely separator at the beginning or ending of the output:

PS > $list = "Hello","World"
PS > $output = ""
PS >
PS > foreach($item in $list)
>> {
>>     $output += $item + ", "
>> }
>>
PS > $output
Hello, World,

You can resolve this by adding some extra logic to the foreach loop:

PS > $list = "Hello","World"
PS > $output = ""
PS >
PS > foreach($item in $list)
>> {
>>     if($output -ne "") { $output += ", " }
>>     $output += $item
>> }
>>
PS > $output
Hello, World

Or, save yourself the trouble and use the -join operator directly:

PS > $list = "Hello","World"
PS > $list -join ", "
Hello, World

For more a more structured way to join strings into larger strings or reports, see the section called “Place Formatted Information in a String”.

Convert a String to Upper/Lowercase

Problem

You want to convert a string to uppercase or lowercase.

Solution

Use the ToUpper() and ToLower() methods of the string to convert it to uppercase and lowercase, respectively.

To convert a string to uppercase, use the ToUpper() method:

PS > "Hello World".ToUpper()
HELLO WORLD

To convert a string to lowercase, use the ToLower() method:

PS > "Hello World".ToLower()
hello world

Discussion

Since PowerShell strings are fully featured .NET objects, they support many string-oriented operations directly. The ToUpper() and ToLower() methods are two examples of the many features that the String class supports. To learn what other functionality the String class supports, see the section called “Learn About Types and Objects”.

Note

Neither PowerShell nor the methods of the .NET String class directly support capitalizing only the first letter of a word. If you want to capitalize only the first character of a word or sentence, try the following commands:

PS > $text = "hello"
PS > $newText = $text.Substring(0,1).ToUpper() +
>>    $text.Substring(1)
>> $newText
>>
Hello

One thing to keep in mind as you convert a string to uppercase or lowercase is your motivation for doing it. One of the most common reasons is for comparing strings, as shown in Example 5.6, “Using the ToUpper() method to normalize strings”.

Example 5.6. Using the ToUpper() method to normalize strings

## $text comes from the user, and contains the value "quit"
if($text.ToUpper() -eq "QUIT") { ... }

Unfortunately, explicitly changing the capitalization of strings fails in subtle ways when your script runs in different cultures. Many cultures follow different capitalization and comparison rules than you may be used to. For example, the Turkish language includes two types of the letter "I": one with a dot, and one without. The uppercase version of the lowercase letter "i" corresponds to the version of the capital I with a dot, not the capital I used in QUIT. Those capitalization rules cause the string comparison code in Example 5.6, “Using the ToUpper() method to normalize strings” to fail in the Turkish culture.

To compare some input against a hard-coded string in a case-insensitive manner, the better solution is to use PowerShell's–eq operator without changing any of the casing yourself. The–eq operator is case-insensitive and culture-neutral by default:

1 comment

  1. Johannes Rössel Posted 19 hours ago

    “use PowerShell's–eq operator” -> “use PowerShell's -eq operator”

    “The–eq operator” -> “The -eq operator”

    (missing spaces; en dash instead of hyphen-minus)

    I'm guessing some search & replace went wrong here, might have occurred in a few other places as well.

Add a comment

PS > $text1 = "Hello"
PS > $text2 = "HELLO"
PS > $text1 -eq $text2
True

For more information about writing culture-aware scripts, see the section called “Write Culture-Aware Scripts”.

Trim a String

Problem

You want to remove leading or trailing spaces from a string or user input.

Solution

Use the Trim() method of the string to remove all leading and trailing whitespace characters from that string.

PS > $text = " `t Test String`t `t"
PS > "|" + $text.Trim() + "|"
|Test String|

Discussion

The Trim() method cleans all whitespace from the beginning and end of a string. If you want just one or the other, you can also call the TrimStart() or TrimEnd() method to remove whitespace from the beginning or the end of the string, respectively. If you want to remove specific characters from the beginning or end of a string, the Trim(), TrimStart(), and TrimEnd() methods provide options to support that. To trim a list of specific characters from the end of a string, provide that list to the method, as shown in Example 5.7, “Trimming a list of characters from the end of a string”.

Example 5.7. Trimming a list of characters from the end of a string

PS > "Hello World".TrimEnd('d','l','r','o','W',' ')
He

Note

At first blush, the following command that attempts to trim the text "World" from the end of a string appears to work incorrectly:

PS > "Hello World".TrimEnd(" World")
He

This happens because the TrimEnd() method takes a list of characters to remove from the end of a string. PowerShell automatically converts a string to a list of characters if required, and in this case converts your string to the characters W,o,r,l,d, and a space. These are in fact the same characters as were used in Example 5.7, “Trimming a list of characters from the end of a string”, so it has the same effect.

2 comments

  1. Jeff Poling Posted 13 days and 5 hours ago

    It took me a little while to understand how this example is the same as example 5.7. That might be my lack of powershell experience or maybe just tiredness =) Is there a way to more explicitly show that the TrimEnd() method is actually removing all the individual characters in "world" and not the word "world"? Is there a way to trim the word "world" from the end?

  2. Johannes Rössel Posted 18 hours ago

    “the characters W,o,r,l,d, and a space” -> “the characters W, o, r, l, d, and a space” (missing spaces)

Add a comment

If you want to replace text anywhere in a string (and not just from the beginning or end), see the section called “Replace Text in a String”.

Format a Date for Output

Problem

You want to control the way that PowerShell displays or formats a date.

Solution

To control the format of a date, use one of the following options:

Discussion

Except for the–Uformat parameter of the Get-Date cmdlet, all date formatting in PowerShell uses the standard .NET DateTime format strings. These format strings let you display dates in one of many standard formats (such as your system's short or long date patterns), or in a completely custom manner. For more information on how to specify standard .NET DateTime format strings, see Appendix E, .NET DateTime Formatting.

1 comment

  1. Johannes Rössel Posted 18 hours ago

    “the–Uformat parameter” -> “the -UFormat parameter” (missing space, en dash -> hyphen-minus, capitalization of the parameter)

Add a comment

If you are already used to the Unix-style date formatting strings (or are converting an existing script that uses a complex one), the –Uformat parameter of the Get-Date cmdlet may be helpful. It accepts the format strings accepted by the Unix date command, but does not provide any functionality that standard .NET date formatting strings cannot.

1 comment

  1. Johannes Rössel Posted 18 hours ago

    “the –Uformat parameter” → “the -UFormat parameter” (capitalization of the parameter, en dash → hyphen-minus)

Add a comment

When working with the string version of dates and times, be aware that they are the most common source of internationalization issues—problems that arise from running a script on a machine with a different culture than the one it was written on. In North America "05/09/1998" means "May 9, 1998." In many other cultures, though, it means "September 5, 1998." Whenever possible use and compare DateTime objects (rather than strings) to other DateTime objects, as that avoids these cultural differences. Example 5.8, “Comparing DateTime objects with the -gt operator” demonstrates this approach.

Example 5.8. Comparing DateTime objects with the -gt operator

PS > $dueDate = [DateTime] "01/01/2006"
PS > if([DateTime]::Now -gt $dueDate)
>> {
>>     "Account is now due"
>> }
>>
Account is now due

Note

PowerShell always assumes the North American date format when it interprets a DateTime constant such as [DateTime] "05/09/1998". This is for the same reason that all languages interpret numeric constants (such as 12.34) in the North American format. If it did otherwise, nearly every script that dealt with dates and times would fail on international systems.

1 comment

  1. Tibor Soos Posted 8 days and 22 hours ago

    [DateTime] "1998.05.09" is also accepted. The [datetime]::Parse() method should be mentioned here, as it accepts international imput: [datetime]::Parse("2010. március 20")

Add a comment

For more information about the Get-Date cmdlet, type Get-Help Get-Date. For more information about dealing with dates and times in a culturally-aware manner, see the section called “Write Culture-Aware Scripts”.

Program: Convert Text Streams to Objects

One of the strongest features of PowerShell is its object-based pipeline. You don't waste your energy creating, destroying, and recreating the object representation of your data. In other shells, you lose the full-fidelity representation of data when the pipeline converts it to pure text. You can regain some of it through excessive text parsing, but not all of it.

However, you still often have to interact with low-fidelity input that originates from outside PowerShell. Text-based data files and legacy programs are two examples.

PowerShell offers great support for two of the three text-parsing staples:

Sed

Replaces text. For that functionality, PowerShell offers the –replace operator.

Grep

Searches text. For that functionality, PowerShell offers the Select-String cmdlet, among others.

The third traditional text-parsing tool, Awk, lets you to chop a line of text into more intuitive groupings. PowerShell offers the Split() method on strings, but that lacks some of the power you usually need to break a string into groups.

The Convert-TextObject script presented in Example 5.9, “Convert-TextObject.ps1” lets you convert text streams into a set of objects that represent those text elements according to the rules you specify. From there, you can use all of PowerShell's object-based tools, which gives you even more power than you would get with the text-based equivalents.

Example 5.9. Convert-TextObject.ps1

param(
    [string] $delimiter, 
    [string] $parseExpression, 
    [string[]] $propertyName, 
    [type[]] $propertyType
    )

function Main(
    $inputObjects, $parseExpression, $propertyType, 
    $propertyName, $delimiter)
{
    $delimiterSpecified = [bool] $delimiter
    $parseExpressionSpecified = [bool] $parseExpression

    if($delimiterSpecified -and $parseExpressionSpecified)
    {
        Usage
        return
    }

    if(-not $($delimiterSpecified -or $parseExpressionSpecified))
    {
        $delimiter = "\s+"
        $delimiterSpecified = $true
    }

    foreach($inputObject in $inputObjects)
    {
        if(-not $inputObject) { $inputObject = "" }
        foreach($inputLine in $inputObject.ToString())
        {
            ParseTextObject $inputLine $delimiter $parseExpression `
                $propertyType $propertyName
        }
    }
}

function Usage
{
    "Usage: "
    " Convert-TextObject"
    " Convert-TextObject -ParseExpression parseExpression " +
        "[-PropertyName propertyName] [-PropertyType propertyType]"
    " Convert-TextObject -Delimiter delimiter " + 
        "[-PropertyName propertyName] [-PropertyType propertyType]"
    return
}

function ParseTextObject
{
    param(
        $textInput, $delimiter, $parseExpression,
        $propertyTypes, $propertyNames)

    $parseExpressionSpecified = -not $delimiter

    $returnObject = New-Object PSObject

    $matches = $null
    $matchCount = 0
    if($parseExpressionSpecified)
    {
        [void] ($textInput -match $parseExpression)
        $matchCount = $matches.Count
    }
    else
    {
        $matches = [Regex]::Split($textInput, $delimiter)
        $matchCount = $matches.Length
    }

    if(-not $matchCount)
    {
        return
    }

    $counter = 0
    if($parseExpressionSpecified) { $counter++ }
    for(; $counter -lt $matchCount; $counter++)
    {
        $propertyName = "None"
        $propertyType = [string]

        if($parseExpressionSpecified)
        {
            $propertyName = "P$counter"

            if($counter -le $propertyNames.Length)
            {
                if($propertyName[$counter - 1])
                {
                    $propertyName = $propertyNames[$counter - 1] 
                }
            }

            if($counter -le $propertyTypes.Length)
            {
                if($propertyTypes[$counter - 1])
                {
                    $propertyType = $propertyTypes[$counter - 1] 
                }
            }
        }
        else
        {
            $propertyName = "P$($counter + 1)"

            if($counter -lt $propertyNames.Length) 
            {
                if($propertyNames[$counter])
                {
                    $propertyName = $propertyNames[$counter] 
                }
            }

            if($counter -lt $propertyTypes.Length)
            {
                if($propertyTypes[$counter])
                {
                    $propertyType = $propertyTypes[$counter] 
                }
            }
        }

        Add-Note $returnObject $propertyName `
            ($matches[$counter] -as $propertyType)
    }

    $returnObject
}

function Add-Note ($object, $name, $value) 
{
     $object | Add-Member NoteProperty $name $value
}


Main $input $parseExpression $propertyType $propertyName $delimiter
      

Generate Large Reports and Text Streams

Problem

You want to write a script that generates a large report or large amount of data.

Solution

The best approach to generating a large amount of data is to take advantage of PowerShell's streaming behavior whenever possible. Opt for solutions that pipeline data between commands:

Get-ChildItem C:\ *.txt -Recurse | Out-File c:\temp\AllTextFiles.txt

rather than collect the output at each stage:

$files = Get-ChildItem C:\ *.txt -Recurse
$files | Out-File c:\temp\AllTextFiles.txt

If your script generates a large text report (and streaming is not an option), use the StringBuilder class:

$output = New-Object System.Text.StringBuilder
Get-ChildItem C:\ *.txt -Recurse |
    Foreach-Object { [void] $output.Append($_.FullName + "`n") }
$output.ToString()

rather than simple text concatenation:

$output = ""
Get-ChildItem C:\ *.txt -Recurse | Foreach-Object { $output += $_.FullName }
$output

Discussion

In PowerShell, combining commands in a pipeline is a fundamental concept. As scripts and cmdlets generate output, PowerShell passes that output to the next command in the pipeline as soon as it can. In the solution, the Get-ChildItem commands that retrieve all text files on the C: drive take a very long time to complete. However, since they begin to generate data almost immediately, PowerShell can pass that data onto the next command as soon as the Get-ChildItem cmdlet produces it. This is true of any commands that generate or consume data and is called streaming. The pipeline completes almost as soon as the Get-ChildItem cmdlet finishes producing its data and uses memory very efficiently as it does so.

The second Get-ChildItem example (that collects its data) prevents PowerShell from taking advantage of this streaming opportunity. It first stores all the files in an array, which, because of the amount of data, takes a long time and enormous amount of memory. Then, it sends all those objects into the output file, which takes a long time as well.

However, most commands can consume data produced by the pipeline directly, as illustrated by the Out-File cmdlet. For those commands, PowerShell provides streaming behavior as long as you combine the commands into a pipeline. For commands that do not support data coming from the pipeline directly, the Foreach-Object cmdlet (with the aliases of foreach and %) lets you to still work with each piece of data as the previous command produces it, as shown in the StringBuilder example.

Creating large text reports

When you generate large reports, it is common to store the entire report into a string, and then write that string out to a file once the script completes. You can usually accomplish this most effectively by streaming the text directly to its destination (a file or the screen), but sometimes this is not possible.

Since PowerShell makes it so easy to add more text to the end of a string (as in $output += $_.FullName), many initially opt for that approach. This works great for small-to-medium strings, but causes significant performance problems for large strings.

Note

As an example of this performance difference, compare the following:

PS > Measure-Command {
>>    $output = New-Object Text.StringBuilder
>>    1..10000 |
>>        Foreach-Object { $output.Append("Hello World") }
>> }
>>

(...)
TotalSeconds : 2.3471592

PS > Measure-Command {
>>    $output = ""
>>    1..10000 | Foreach-Object { $output += "Hello World" }
>> }
>>

(...)
TotalSeconds      : 4.9884882

In the .NET Framework (and therefore PowerShell), strings never change after you create them. When you add more text to the end of a string, PowerShell has to build a new string by combining the two smaller strings. This operation takes a long time for large strings, which is why the .NET Framework includes the System.Text.StringBuilder class. Unlike normal strings, the StringBuilder class assumes that you will modify its data—an assumption that allows it to adapt to change much more efficiently.

Generate Source Code and other Repetitive Text

Problem

You want to simplify the creation of large amounts of repetitive source code or other text.

Solution

Use PowerShell's string formatting operator (-f) to place dynamic information inside of a pre-formatted string, and then repeat that replacement for each piece of dynamic information.

Discussion

Code generation is a useful technique in nearly any technology that produces output from some text-based input. For example, imagine having to create an HTML report to show all of the processes running on your system at that time. In this case, "code" is the HTML code understood by a web browser.

1 comment

  1. Jeff Poling Posted 13 days and 5 hours ago

    "For example, imaging having" --> should be "imagine" ?

Add a comment

HTML pages start with some standard text (<html>, <head>, <body>), and then you would likely include the processes in an HTML <table>. Each row would include colums for each of the properties in the process you're working with.

Generating this by hand would be mind-numbing and error-prone. Instead, you can write a function to generate the code for the row:

function Get-HtmlRow($process)
{
    $template = "<TR> <TD>{0}</TD> <TD>{1}</TD> </TR>"
    $template -f $process.Name,$process.ID
}

Then generate the report in milliseconds, rather than hours:

"<HTML><BODY><TABLE>" > report.html
Get-Process | Foreach-Object { Get-HtmlRow $_ } >> report.html
"</TABLE></BODY></HTML>" >> report.html
Invoke-Item .\report.html

In addition to the formatting operator, you can sometimes use the String.Replace method:

$string = @'
Name is __NAME__
Id is __ID__
'@

$string = $string.Replace("__NAME__", $process.Name)
$string = $string.Replace("__ID__", $process.Id)

This works well (and is very readable) if you have tight control over the data you'll be using as replacement text. If it is at all possible for the replacement text to contain one of the special tags ("__NAME__" or "__ID__", for example), then they will also get replaced by further replacements and corrupt your final output.

To avoid this issue, you can use the Format-String script:

Example 5.10. Format-String.ps1


<#

.SYNOPSIS
Replaces text in a string based on named replacement tags

.EXAMPLE
PS >.\Format-String "Hello {NAME}" @{ NAME = 'PowerShell' }
Hello PowerShell
  
#>

param($string, [hashtable] $replacements)

$currentIndex = 0
$replacementList = @()

foreach($key in $replacements.Keys)
{
    $string = $string.Replace("{$key}", "{$currentIndex}")
    $replacementList += $replacements[$key]
    
    $currentIndex++
}

$string -f $replacementList
      

PowerShell includes several commands for code generation that you've probably used without recognizing the "code generation" aspect of it. The ConvertTo-Html cmdlet applies code generation of incoming objects to HTML reports. The ConvertTo-Csv cmdlet applies code generation to CSV files. The ConvertTo-Xml cmdlet applies code generation to XML files.

Code generation techniques seem to come up naturally when you realize you are writing a report, but are often missed when writing source code of another programming or scripting language. For example, imagine you need to write a C# function that outputs all of the details of a process. The System.Diagnostics.Process class has a lot of properties, so that's going to be a long function. Writing it by hand is going to be difficult, so you can have PowerShell do most of it for you.

For any object (for example, a process that you've retrieved from the Get-Process command), you can access its PsObject.Properties property to get a list of all of its properties. Each of those has a Name property, so you can use that to generate the C# code:

$process.PsObject.Properties |
    Foreach-Object {
        'Console.WriteLine("{0}: " + process.{0});' -f $_.Name }

This generates over 60 lines of C# source code, rather than having you do it by hand:

Console.WriteLine("Name: " + process.Name);
Console.WriteLine("Handles: " + process.Handles);
Console.WriteLine("VM: " + process.VM);
Console.WriteLine("WS: " + process.WS);
Console.WriteLine("PM: " + process.PM);
Console.WriteLine("NPM: " + process.NPM);
Console.WriteLine("Path: " + process.Path);
Console.WriteLine("Company: " + process.Company);
Console.WriteLine("CPU: " + process.CPU);
Console.WriteLine("FileVersion: " + process.FileVersion);
Console.WriteLine("ProductVersion: " + process.ProductVersion);
(...)

Similar benefits come from generating bulk SQL statements, repetitive data structures, and more.

PowerShell code generation can even help you with large-scale administration tasks even when PowerShell is not available. Given a large list of input (for example, a complex list of files to copy), you can easily generate a cmd.exe batch file or Unix shell script to automate the task. Generate the script in PowerShell, then invoke it on the system of your choice!

You must sign in or register before commenting
*
*
*
*
*

Atom Icon Comments on this page or Comments on the whole book.