Chapter 12. Internet-Enabled Scripts

Introduction

Although PowerShell provides an enormous benefit even when your scripts interact only with the local system, working with data sources from the Internet opens exciting and unique opportunities. For example, you might download files or information from the Internet, interact with a web service, store your output as HTML, or even send an email that reports the results of a long-running script.

Through its cmdlets and access to the networking support in the .NET Framework, PowerShell provides ample opportunities for Internet-enabled administration.

Download a File from the Internet

Problem

You want to download a file from a web site on the Internet.

Solution

Use the DownloadFile() method from the .NET Framework's System.Net.WebClient class to download a file:

PS > $source = "http://www.leeholmes.com/favicon.ico"
PS > $destination = "c:\temp\favicon.ico"
PS >
PS > $wc = New-Object System.Net.WebClient
PS > $wc.DownloadFile($source, $destination)

Discussion

The System.Net.WebClient class from the .NET Framework lets you easily upload and download data from remote web servers.

The WebClient class acts much like a web browser, in that you can specify a user agent, proxy (if your outgoing connection requires one), and even credentials.

All web browsers send a user agent identifier along with their web request. This identifier tells the web site what application is making the request—such as Internet Explorer, Firefox, or an automated crawler from a search engine. Many web sites check this user agent identifier to determine how to display the page. Unfortunately, many fail entirely if they can't determine the user agent for the incoming request. To make the System.Net.WebClient identify itself as Internet Explorer, use the following commands, instead:

$userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;)"
$wc = New-Object System.Net.WebClient
$wc.Headers.Add("user-agent", $userAgent)

Notice that the solution uses a fully qualified path for the destination file. This is an important step, as the DownloadFile() method saves its files to the directory in which PowerShell.exe started (the root of your user profile directory by default) otherwise.

You can use the DownloadFile() method to download web pages just as easily as you download files. Just supply only an URL as a source (such as http://blogs.msdn.com/powershell/rss.xml) instead of a filename. If you ultimately intend to parse or read through the downloaded page, the DownloadString() method may be more appropriate.

For more information on how to use download and parse web pages, see the section called “Download a Web Page from the Internet”.

Download a Web Page from the Internet

Problem

You want to download a web page from the Internet and work with the content as a plain string.

1 comment

  1. Markus Lindeamnn Posted 12 days and 8 hours ago

    I'd like to see a recipe for posting back to a website. Specifically loggin on to a forms based authentication protected site

Add a comment

Solution

Use the DownloadString() method from the .NET Framework's System.Net.WebClient class to download a web page or plain text file into a string.

PS > $source = "http://blogs.msdn.com/powershell/rss.xml"
PS >
PS > $wc = New-Object System.Net.WebClient
PS > $content = $wc.DownloadString($source)

Discussion

The most common reason to download a web page from the Internet is to extract unstructured information from it. Although web services are becoming increasingly popular, they are still far less common than web pages that display useful data. Because of this, retrieving data from services on the Internet often comes by means of screen scraping: downloading the HTML of the web page and then carefully separating out the content you want from the vast majority of the content that you do not.

The technique of screen scraping has been around much longer than the Internet! As long as computer systems have generated output designed primarily for humans, screen scraping tools have risen to make this output available to other computer programs.

Unfortunately, screen scraping is an error-prone way to extract content.

Note

That's not an exaggeration! As proof, Example 12.2, “Get-Answer.ps1” broke four or five times while the first edition of this book was being written, and then again after it was published. Such are the perils of screen scraping.

If the web page authors change the underlying HTML, your code will usually stop working correctly. If the site's HTML is written as valid XHTML, you may be able to use PowerShell's built in XML support to more easily parse the content.

For more information about PowerShell's built-in XML support, see the section called “Access Information in an XML File”.

Despite its fragility, pure screen scraping is often the only alternative. Since screen scraping is just text manipulation, so you've got the same options you do with other text reports. For some fairly structured web pages, you can get away with a single regular expression replacement (plus cleanup), as shown in Example 12.1, “Search-Twitter.ps1”.

Example 12.1. Search-Twitter.ps1


param($term = "PowerShell")

Add-Type -Assembly System.Web
$queryUrl = 'http://integratedsearch.twitter.com/search.html?q={0}'
$queryUrl = $queryUrl -f ([System.Web.HttpUtility]::UrlEncode($term))

$wc = New-Object System.Net.WebClient
$wc.Encoding = [System.Text.Encoding]::UTF8
$results = $wc.DownloadString($queryUrl)

$matches = $results | 
    Select-String -Pattern '(?s)<div[^>]*msg[^>]*>.*?</div>' -AllMatches

foreach($match in $matches.Matches)
{
    $tweet = $match.Value -replace '<[^>]*>', ''
    
    [System.Web.HttpUtility]::HtmlDecode($tweet.Trim()) + "`n"
}
	

Others, while possible to accomplish with complicated regular expressions, can often be made much simpler through more straight-forward text manipulation. Example 12.2, “Get-Answer.ps1” uses this second approach to fetch "Instant Answers" from Bing.

Example 12.2. Get-Answer.ps1


$question = $args -join " "

function Main
{
    Add-Type -Assembly System.Web

    $encoded = [System.Web.HttpUtility]::UrlEncode($question)
    $url = "http://www.bing.com/search?q=$encoded"
    $text = (new-object System.Net.WebClient).DownloadString($url)

    $startIndex = $text.IndexOf('<div class="ans">')
    
    $endIndex = $text.IndexOf('<div class="sn_att2">')
    if($endIndex -lt 0) { $endIndex = $text.IndexOf('<div id="results">') }

    if(($startIndex -ge 0) -and ($endIndex -ge 0))
    {
        $partialText = $text.Substring($startIndex, $endIndex - $startIndex)

        $partialText = $partialText -replace '<div[^>]*>',"`n"
        $partialText = $partialText -replace '<tr[^>]*>',"`n"
        $partialText = $partialText -replace '<li[^>]*>',"`n"
        $partialText = $partialText -replace '<br[^>]*>',"`n"
        $partialText = $partialText -replace '<span[^>]*>'," "
        $partialText = $partialText -replace '<td[^>]*>',"    "
           
        $partialText = CleanHtml $partialText
           
        $partialText = $partialText -split "`n" |
            Foreach-Object { $_.Trim() } | Where-Object { $_ }
        $partialText = $partialText -join "`n"

        [System.Web.HttpUtility]::HtmlDecode($partialText.Trim())
    }
    else
    {
        "`nNo answer found."
    }
}

function CleanHtml ($htmlInput)
{
    $tempString = [Regex]::Replace($htmlInput, "(?s)<[^>]*>", "")
    $tempString.Replace("&nbsp&nbsp", "")
}

. Main
      

For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

Program: Get-PageUrls

When working with HTML, it is common to require advanced regular expressions that separate the content you care about from the content you don't. A perfect example of this is extracting all the HTML links from a web page.

Links come in many forms, depending on how lenient you want to be. They may be well-formed according to the various HTML standards. They may use relative paths, or they may use absolute paths. They may place double quotes around the URL, or they may place single quotes around the URL. If you're really unlucky, they may accidentally include quotes on only one side of the URL.

Example 12.3, “Get-PageUrls.ps1” demonstrates some approaches for dealing with this type of advanced parsing task. Given a web page that you've downloaded from the Internet, it extracts all links from the page and returns a list of the URLs in that page. It also fixes URLs that were originally written as relative URLs (for example, /file.zip) to include the server from which they originated.

Example 12.3. Get-PageUrls.ps1

param(
    [string] $filename = $(throw "Please specify a filename."),
    
    [string] $base = $(throw "Please specify a base URL."),
    
    [string] $pattern = ".*"
     )          

Add-Type -Assembly System.Web
     
$regex = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>"

function Main
{
    $base = $base.Replace("\", "/")

    if($base.IndexOf("://") -lt 0)
    { 
        throw "Please specify a base URL in the form of " +
            "http://server/path_to_file/file.html" 
    }

    $base = $base.Substring(0,$base.LastIndexOf("/") + 1)
    $baseSlash = $base.IndexOf("/", $base.IndexOf("://") + 3)
    
    if($baseSlash -ge 0)
    {
        $domain = $base.Substring(0, $baseSlash)
    }
    else
    {
        $domain = $base
    }


    $content = [String]::Join(' ', (get-content $filename))
    $contentMatches = @(GetMatches $content $regex)

    foreach($contentMatch in $contentMatches)
    {
        if(-not ($contentMatch -match $pattern)) { continue }
        if($contentMatch -match "javascript:") { continue }

        $contentMatch = $contentMatch.Replace("\", "/")

        if($contentMatch.IndexOf("://") -gt 0)
        {
            $url = $contentMatch
        }
        elseif($contentMatch[0] -eq "/")
        {
            $url = "$domain$contentMatch"
        }
        else
        {
            $url = "$base$contentMatch"
            $url = $url.Replace("/./", "/")
        }

        [System.Web.HttpUtility]::HtmlDecode($url)
    }
}

function GetMatches([string] $content, [string] $regex)
{
    $returnMatches = new-object System.Collections.ArrayList

    $resultingMatches = [Regex]::Matches($content, $regex, "IgnoreCase")
    foreach($match in $resultingMatches) 
    {
        $cleanedMatch = $match.Groups[1].Value.Trim()
        [void] $returnMatches.Add($cleanedMatch)
    }

    $returnMatches   
}

. Main

      

For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

Connect to a Webservice

Problem

You want to connect to and interact with an Internet webservice.

Solution

Use the New-WebserviceProxy cmdlet to work with a webservice.

PS > $url = "http://terraservice.net/TerraService.asmx"
PS > $terraServer = New-WebserviceProxy $url -Namespace Cookbook
PS > $place = New-Object Cookbook.Place
PS > $place.City = "Redmond"
PS > $place.State = "WA"
PS > $place.Country = "USA"
PS > $facts = $terraserver.GetPlaceFacts($place)
PS > $facts.Center

                                  Lon                                  Lat
                                  ---                                  ---
                    -122.110000610352                     47.6699981689453

Discussion

Although screen scraping (parsing the HTML of a web page) is the most common way to obtain data from the Internet, web services are becoming increasingly common. Web services provide a significant advantage over HTML parsing, as they are much less likely to break when the web designer changes minor features in a design.

The only benefit to web services isn't just their more stable interface, however. When working with web services, the .NET Framework lets you generate proxies that let you interact with the web service as easily as you would work with a regular .NET object. That is because to you, the web service user, these proxies act almost exactly the same as any other .NET object. To call a method on the web service, simply call a method on the proxy.

The New-WebserviceProxy cmdlet simplifies all of the work required to connect to a web service, making it just as easy as a call to the New-Object cmdlet.

The primary differences you will notice when working with a web service proxy (as opposed to a regular .NET object) are the speed and Internet connectivity requirements. Depending on conditions, a method call on a web service proxy could easily take several seconds to complete. If your computer (or the remote computer) experiences network difficulties, the call might even return a network error message (such as a timeout) instead of the information you had hoped for.

If the webservice requires authentication in a domain, specify the -UseDefaultCredential parameter. If it requires explicit credentials, use the -Credential parameter.

When you create a new webservice proxy, PowerShell creates a new .NET object on your behalf that connects to that webservice. All .NET types live within a namespace to prevent them from conflicting with other types that have the same name, so PowerShell automatically generates the namespace name for you. You normally won't need to pay attention to this namespace. However, some web services require input objects that the web service also defines, such as the Place object in the solution. For these web services, use the -Namespace parameter to place the web service (and its support objects) in a namespace of your choice.

Note

Support objects from one webservice proxy cannot be consumed by a different webservice proxy, even if they are two proxies to a webservice at the same URL. If you need to work with two connections to a webservice at the same URL, and your task requires creating support objects for that service, be sure to use two different namespaces for those proxies.

The New-WebserviceProxy cmdlet was introduced in version two of PowerShell. If you need to connect to a webservice from version one of PowerShell, see the section called “Program: Connect-WebService”.

For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

Program: Connect-WebService

the section called “Connect to a Webservice” discusses how to connect to a webservice on the Internet. However, The New-WebserviceProxy cmdlet was introduced in version two of PowerShell. If you need to connect to a webservice from version one of PowerShell, Example 12.4, “Connect-WebService.ps1” is your solution. It lets you connect to a remote webservice if you know the location of its service description file (WSDL). It generates the web service proxy for you, letting you interact with it as you would any other .NET object.

Example 12.4. Connect-WebService.ps1

      
      

2 comments

  1. Tibor Soos Posted 8 days and 7 hours ago

    No example here.

  2. Lee Holmes Posted 7 days and 17 hours ago

    Thanks. We had to comment it out because it was crashing the feedback system. It's unchanged from the first version of the book, so that's OK.

Add a comment


For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

Export Command Output As a Web Page

Problem

You want to export the results of a command as a web page so that you can post it to a web server.

Solution

Use PowerShell's ConvertTo-Html cmdlet to convert command output into a web page. For example, to create a quick HTML summary of PowerShell's commands:

PS > $filename = "c:\temp\help.html"
PS >
PS > $commands = Get-Command | Where { $_.CommandType -ne "Alias" }
PS > $summary = $commands | Get-Help | Select Name,Synopsis
PS > $summary | ConvertTo-Html | Set-Content $filename

Discussion

When you use the ConvertTo-Html cmdlet to export command output to a file, PowerShell generates an HTML table that represents the command output. In the table, it creates a row for each object that you provide. For each row, PowerShell creates columns to represent the values of your object's properties.

If the table format makes the output difficult to read, ConvertTo-Html offers the -As parameter that lets you set the output style to either Table or List.

While the default output is useful, you can customize the structure and style of the resulting HTML as much as you see fit. For example, the -PreContent and -PostContent parameters let you include additional text before and after the resulting table or list. The -Head parameter lets you define the content of the HEAD section of the HTML. Even if you want to generate most of the HTML from scratch, you can still use the -Fragment parameter to generate just the inner table or list.

For more information about the ConvertTo-Html cmdlet, type Get-Help ConvertTo-Html.

Send an Email

Problem

You want to send an email.

Solution

Use the Send-MailMessage cmdlet to send an email.

PS > Send-MailMessage -To guide@leeholmes.com `
>>     -From user@example.com `
>>     -Subject "Hello!" `
>>     -Body "Hello, from another satisfied Cookbook reader!" `
>>     -SmtpServer mail.example.com

Discussion

The Send-MailMessage cmdlet supports everything you would expect an email-centric cmdlet to support: attachments, plain text messages, HTML messages, priority, receipt requests an more. The most difficult aspect is usually remembering the correct SMTP server to use.

The Send-MailMessage cmdlet works to help this problem, as well. If you don't specify the -SmtpServer parameter, it uses the server specified in the $PSEmailServer variable, if any.

The Send-MailMessage cmdlet was introduced in version two of PowerShell. If you need to send an email from version one of PowerShell, see the section called “Program: Send-MailMessage”.

Program: Send-MailMessage

The Send-MailMessage cmdlet is the easiest way to send an email from PowerShell, but was introduced in version two of PowerShell. If you need to send an email from version one of PowerShell, you can use Example 12.5, “Send-MailMessage.ps1”.

In addition to the fields shown in the script, the System.Net.Mail.MailMessage class supports properties that let you add attachments, set message priority, and much more. For more information about working with classes from the .NET Framework, see the section called “Work with .NET Objects”.

Example 12.5. Send-MailMessage.ps1


param(
    [string[]] $to = $(throw "Please specify the destination mail address"),
    [string] $subject = "<No Subject>",
    [string] $body = $(throw "Please specify the message content"),
    [string] $smtpHost = $(throw "Please specify a mail server."),
    [string] $from = "$($env:UserName)@example.com"
  )

$email = New-Object System.Net.Mail.MailMessage 

foreach($mailTo in $to)
{
    $email.To.Add($mailTo)
}

$email.From = $from
$email.Subject = $subject
$email.Body = $body

$client = New-Object System.Net.Mail.SmtpClient $smtpHost
$client.UseDefaultCredentials = $true
$client.Send($email)
      

For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

Program: Interact with Internet Protocols

While it is common to work at an abstract level with web sites and web services, an entirely separate style of Internet-enabled scripting comes from interacting with the remote computer at a much lower level. This lower level (called the TCP level, for Transmission Control Protocol) forms the communication foundation of most Internet protocols—such as Telnet, SMTP (sending mail), POP3 (receiving mail), and HTTP (retrieving web content).

The .NET Framework provides classes that let you interact with many of the Internet protocols directly: the System.Web.Mail.SmtpMail class for SMTP, the System.Net.WebClient class for HTTP, and a few others. When the .NET Framework does not support an Internet protocol that you need, though, you can often script the application protocol directly if you know the details of how it works.

Example 12.6, “Interacting with a remote POP3 mailbox” shows how to receive information about mail waiting in a remote POP3 mailbox, using the Send-TcpRequest script given in Example 12.7, “Send-TcpRequest.ps1”.

Example 12.6. Interacting with a remote POP3 mailbox

## Get the user credential
if(-not (Test-Path Variable:\mailCredential))
{
   $mailCredential = Get-Credential
}
$address = $mailCredential.UserName
$password = $mailCredential.GetNetworkCredential().Password

$pop3Commands = "USER $address","PASS $password","STAT","QUIT"
$output = $pop3Commands | Send-TcpRequest mail.myserver.com 110
$inbox = $output.Split("`n")[3]

$status = $inbox |
    Convert-TextObject -PropertyName "Response","Waiting","BytesTotal","Extra"
"{0} messages waiting, totaling {1} bytes." -f $status.Waiting, $status.BytesTotal

In Example 12.6, “Interacting with a remote POP3 mailbox”, you connect to port 110 of the remote mail server. You then issue commands to request the status of the mailbox in a form that the mail server understands. The format of this network conversation is specified and required by the standard POP3 protocol. Example 12.6, “Interacting with a remote POP3 mailbox” uses the Convert-TextObject command, which is provided in the section called “Program: Convert Text Streams to Objects”.

Example 12.7, “Send-TcpRequest.ps1” supports the core functionality of Example 12.6, “Interacting with a remote POP3 mailbox”. It lets you easily work with plain-text TCP protocols.

Example 12.7. Send-TcpRequest.ps1

##############################################################################
param(
        [string] $remoteHost = "localhost",
        [switch] $test,
        [int] $port = 80,
        [switch] $UseSSL,
        [string] $inputObject,
        [int] $commandDelay = 100
     )

[string] $SCRIPT:output = ""

$currentInput = $inputObject
if(-not $currentInput)
{
    $currentInput = @($input)
}
$scriptedMode = ([bool] $currentInput) -or $test

function Main
{
    if(-not $scriptedMode)
    {
        write-host "Connecting to $remoteHost on port $port"
    }

    try
    {
        $socket = New-Object System.Net.Sockets.TcpClient($remoteHost, $port)
    }
    catch
    {
        if($test) { $false }
        else { Write-Error "Could not connect to remote computer: $_" }

        return
    }

    if($test) { $true; return }

    if(-not $scriptedMode)
    {
        write-host "Connected.  Press ^D followed by [ENTER] to exit.`n"
    }

    $stream = $socket.GetStream()
    
    if($UseSSL)
    {
        $sslStream = New-Object System.Net.Security.SslStream $stream,$false
        $sslStream.AuthenticateAsClient($remoteHost)
        $stream = $sslStream
    }

    $writer = new-object System.IO.StreamWriter $stream

    while($true)
    {
        $SCRIPT:output += GetOutput

        if($scriptedMode)
        {
            foreach($line in $currentInput)
            {
                $writer.WriteLine($line)
                $writer.Flush()
                Start-Sleep -m $commandDelay
                $SCRIPT:output += GetOutput
            }

            break
        }
        else
        {
            if($output) 
            {
                foreach($line in $output.Split("`n"))
                {
                    write-host $line
                }
                $SCRIPT:output = ""
            }

            $command = read-host
            if($command -eq ([char] 4)) { break; }

            $writer.WriteLine($command)
            $writer.Flush()
        }
    }

    $writer.Close()
    $stream.Close()

    if($scriptedMode)
    {
        $output
    }
}

function GetOutput
{
    $buffer = new-object System.Byte[] 1024
    $encoding = new-object System.Text.AsciiEncoding

    $outputBuffer = ""
    $foundMore = $false

    do
    {
        start-sleep -m 1000

        $foundmore = $false
        $stream.ReadTimeout = 1000

        do
        {
            try
            {
                $read = $stream.Read($buffer, 0, 1024)

                if($read -gt 0)
                {
                    $foundmore = $true
                    $outputBuffer += ($encoding.GetString($buffer, 0, $read))
                }
            } catch { $foundMore = $false; $read = 0 }
        } while($read -gt 0)
    } while($foundmore)

    $outputBuffer
}

. Main

For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.

You must sign in or register before commenting
*
*
*
*
*

Atom Icon Comments on this page or Comments on the whole book.