Although PowerShell provides an enormous benefit even when your scripts interact only with the local system, working with data sources from the Internet opens exciting and unique opportunities. For example, you might download files or information from the Internet, interact with a web service, store your output as HTML, or even send an email that reports the results of a long-running script.
Through its cmdlets and access to the networking support in the .NET Framework, PowerShell provides ample opportunities for Internet-enabled administration.
Use the DownloadFile() method from the .NET
Framework's System.Net.WebClient
class to download a file:
PS > $source = "http://www.leeholmes.com/favicon.ico" PS > $destination = "c:\temp\favicon.ico" PS > PS > $wc = New-Object System.Net.WebClient PS > $wc.DownloadFile($source, $destination)
The System.Net.WebClient class from the .NET
Framework lets you easily upload and download data from remote web
servers.
The WebClient class acts much like a web browser,
in that you can specify a user agent, proxy (if your outgoing connection
requires one), and even credentials.
All web browsers send a user agent identifier
along with their web request. This identifier tells the web site what
application is making the request—such as Internet Explorer, Firefox, or
an automated crawler from a search engine. Many web sites check this
user agent identifier to determine how to display the page.
Unfortunately, many fail entirely if they can't determine the user agent
for the incoming request. To make the System.Net.WebClient identify itself as
Internet Explorer, use the following commands, instead:
$userAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2;)"
$wc = New-Object System.Net.WebClient
$wc.Headers.Add("user-agent", $userAgent)Notice that the solution uses a fully
qualified path for the destination file. This is an important step, as
the DownloadFile() method saves its
files to the directory in which PowerShell.exe
started (the root of your user profile directory by default)
otherwise.
You can use the DownloadFile() method to download web pages
just as easily as you download files. Just supply only an URL as a
source (such as http://blogs.msdn.com/powershell/rss.xml) instead of a
filename. If you ultimately intend to parse or read through the
downloaded page, the DownloadString()
method may be more appropriate.
For more information on how to use download and parse web pages, see the section called “Download a Web Page from the Internet”.
You want to download a web page from the Internet and work with the content as a plain string.
I'd like to see a recipe for posting back to a website. Specifically loggin on to a forms based authentication protected site
Use the DownloadString() method from the .NET
Framework's System.Net.WebClient
class to download a web page or plain text file into a string.
PS > $source = "http://blogs.msdn.com/powershell/rss.xml" PS > PS > $wc = New-Object System.Net.WebClient PS > $content = $wc.DownloadString($source)
The most common reason to download a web page from the Internet is to extract unstructured information from it. Although web services are becoming increasingly popular, they are still far less common than web pages that display useful data. Because of this, retrieving data from services on the Internet often comes by means of screen scraping: downloading the HTML of the web page and then carefully separating out the content you want from the vast majority of the content that you do not.
The technique of screen scraping has been around much longer than the Internet! As long as computer systems have generated output designed primarily for humans, screen scraping tools have risen to make this output available to other computer programs.
Unfortunately, screen scraping is an error-prone way to extract content.
That's not an exaggeration! As proof, Example 12.2, “Get-Answer.ps1” broke four or five times while the first edition of this book was being written, and then again after it was published. Such are the perils of screen scraping.
If the web page authors change the underlying HTML, your code will usually stop working correctly. If the site's HTML is written as valid XHTML, you may be able to use PowerShell's built in XML support to more easily parse the content.
For more information about PowerShell's built-in XML support, see the section called “Access Information in an XML File”.
Despite its fragility, pure screen scraping is often the only alternative. Since screen scraping is just text manipulation, so you've got the same options you do with other text reports. For some fairly structured web pages, you can get away with a single regular expression replacement (plus cleanup), as shown in Example 12.1, “Search-Twitter.ps1”.
Example 12.1. Search-Twitter.ps1
param($term = "PowerShell")
Add-Type -Assembly System.Web
$queryUrl = 'http://integratedsearch.twitter.com/search.html?q={0}'
$queryUrl = $queryUrl -f ([System.Web.HttpUtility]::UrlEncode($term))
$wc = New-Object System.Net.WebClient
$wc.Encoding = [System.Text.Encoding]::UTF8
$results = $wc.DownloadString($queryUrl)
$matches = $results |
Select-String -Pattern '(?s)<div[^>]*msg[^>]*>.*?</div>' -AllMatches
foreach($match in $matches.Matches)
{
$tweet = $match.Value -replace '<[^>]*>', ''
[System.Web.HttpUtility]::HtmlDecode($tweet.Trim()) + "`n"
}
Others, while possible to accomplish with complicated regular expressions, can often be made much simpler through more straight-forward text manipulation. Example 12.2, “Get-Answer.ps1” uses this second approach to fetch "Instant Answers" from Bing.
Example 12.2. Get-Answer.ps1
$question = $args -join " "
function Main
{
Add-Type -Assembly System.Web
$encoded = [System.Web.HttpUtility]::UrlEncode($question)
$url = "http://www.bing.com/search?q=$encoded"
$text = (new-object System.Net.WebClient).DownloadString($url)
$startIndex = $text.IndexOf('<div class="ans">')
$endIndex = $text.IndexOf('<div class="sn_att2">')
if($endIndex -lt 0) { $endIndex = $text.IndexOf('<div id="results">') }
if(($startIndex -ge 0) -and ($endIndex -ge 0))
{
$partialText = $text.Substring($startIndex, $endIndex - $startIndex)
$partialText = $partialText -replace '<div[^>]*>',"`n"
$partialText = $partialText -replace '<tr[^>]*>',"`n"
$partialText = $partialText -replace '<li[^>]*>',"`n"
$partialText = $partialText -replace '<br[^>]*>',"`n"
$partialText = $partialText -replace '<span[^>]*>'," "
$partialText = $partialText -replace '<td[^>]*>'," "
$partialText = CleanHtml $partialText
$partialText = $partialText -split "`n" |
Foreach-Object { $_.Trim() } | Where-Object { $_ }
$partialText = $partialText -join "`n"
[System.Web.HttpUtility]::HtmlDecode($partialText.Trim())
}
else
{
"`nNo answer found."
}
}
function CleanHtml ($htmlInput)
{
$tempString = [Regex]::Replace($htmlInput, "(?s)<[^>]*>", "")
$tempString.Replace("  ", "")
}
. Main
For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
When working with HTML, it is common to require advanced regular expressions that separate the content you care about from the content you don't. A perfect example of this is extracting all the HTML links from a web page.
Links come in many forms, depending on how lenient you want to be. They may be well-formed according to the various HTML standards. They may use relative paths, or they may use absolute paths. They may place double quotes around the URL, or they may place single quotes around the URL. If you're really unlucky, they may accidentally include quotes on only one side of the URL.
Example 12.3, “Get-PageUrls.ps1” demonstrates
some approaches for dealing with this type of advanced parsing task. Given
a web page that you've downloaded from the Internet, it extracts all links
from the page and returns a list of the URLs in that page. It also fixes
URLs that were originally written as relative URLs (for example, /file.zip) to include the server from which they
originated.
Example 12.3. Get-PageUrls.ps1
param(
[string] $filename = $(throw "Please specify a filename."),
[string] $base = $(throw "Please specify a base URL."),
[string] $pattern = ".*"
)
Add-Type -Assembly System.Web
$regex = "<\s*a\s*[^>]*?href\s*=\s*[`"']*([^`"'>]+)[^>]*?>"
function Main
{
$base = $base.Replace("\", "/")
if($base.IndexOf("://") -lt 0)
{
throw "Please specify a base URL in the form of " +
"http://server/path_to_file/file.html"
}
$base = $base.Substring(0,$base.LastIndexOf("/") + 1)
$baseSlash = $base.IndexOf("/", $base.IndexOf("://") + 3)
if($baseSlash -ge 0)
{
$domain = $base.Substring(0, $baseSlash)
}
else
{
$domain = $base
}
$content = [String]::Join(' ', (get-content $filename))
$contentMatches = @(GetMatches $content $regex)
foreach($contentMatch in $contentMatches)
{
if(-not ($contentMatch -match $pattern)) { continue }
if($contentMatch -match "javascript:") { continue }
$contentMatch = $contentMatch.Replace("\", "/")
if($contentMatch.IndexOf("://") -gt 0)
{
$url = $contentMatch
}
elseif($contentMatch[0] -eq "/")
{
$url = "$domain$contentMatch"
}
else
{
$url = "$base$contentMatch"
$url = $url.Replace("/./", "/")
}
[System.Web.HttpUtility]::HtmlDecode($url)
}
}
function GetMatches([string] $content, [string] $regex)
{
$returnMatches = new-object System.Collections.ArrayList
$resultingMatches = [Regex]::Matches($content, $regex, "IgnoreCase")
foreach($match in $resultingMatches)
{
$cleanedMatch = $match.Groups[1].Value.Trim()
[void] $returnMatches.Add($cleanedMatch)
}
$returnMatches
}
. Main
For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
You want to connect to and interact with an Internet webservice.
Use the New-WebserviceProxy
cmdlet to work with a webservice.
PS > $url = "http://terraservice.net/TerraService.asmx"
PS > $terraServer = New-WebserviceProxy $url -Namespace Cookbook
PS > $place = New-Object Cookbook.Place
PS > $place.City = "Redmond"
PS > $place.State = "WA"
PS > $place.Country = "USA"
PS > $facts = $terraserver.GetPlaceFacts($place)
PS > $facts.Center
Lon Lat
--- ---
-122.110000610352 47.6699981689453Although screen scraping (parsing the HTML of a web page) is the most common way to obtain data from the Internet, web services are becoming increasingly common. Web services provide a significant advantage over HTML parsing, as they are much less likely to break when the web designer changes minor features in a design.
The only benefit to web services isn't just their more stable interface, however. When working with web services, the .NET Framework lets you generate proxies that let you interact with the web service as easily as you would work with a regular .NET object. That is because to you, the web service user, these proxies act almost exactly the same as any other .NET object. To call a method on the web service, simply call a method on the proxy.
The New-WebserviceProxy
cmdlet simplifies all of the work required to connect to a web service,
making it just as easy as a call to the New-Object
cmdlet.
The primary differences you will notice when working with a web service proxy (as opposed to a regular .NET object) are the speed and Internet connectivity requirements. Depending on conditions, a method call on a web service proxy could easily take several seconds to complete. If your computer (or the remote computer) experiences network difficulties, the call might even return a network error message (such as a timeout) instead of the information you had hoped for.
If the webservice requires authentication in a
domain, specify the -UseDefaultCredential parameter.
If it requires explicit credentials, use the
-Credential parameter.
When you create a new webservice proxy,
PowerShell creates a new .NET object on your behalf that connects to
that webservice. All .NET types live within a
namespace to prevent them from conflicting with
other types that have the same name, so PowerShell automatically
generates the namespace name for you. You normally won't need to pay
attention to this namespace. However, some web services require input
objects that the web service also defines, such as the
Place object in the solution. For these web
services, use the -Namespace parameter to place the
web service (and its support objects) in a namespace of your
choice.
Support objects from one webservice proxy cannot be consumed by a different webservice proxy, even if they are two proxies to a webservice at the same URL. If you need to work with two connections to a webservice at the same URL, and your task requires creating support objects for that service, be sure to use two different namespaces for those proxies.
The New-WebserviceProxy
cmdlet was introduced in version two of PowerShell. If you need to
connect to a webservice from version one of PowerShell, see the section called “Program: Connect-WebService”.
For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
the section called “Connect to a Webservice”
discusses how to connect to a webservice on the Internet. However, The
New-WebserviceProxy cmdlet was introduced in version
two of PowerShell. If you need to connect to a webservice from version one
of PowerShell, Example 12.4, “Connect-WebService.ps1” is your solution.
It lets you connect to a remote webservice if you know the location of its
service description file (WSDL). It generates the web service proxy for
you, letting you interact with it as you would any other .NET
object.
Example 12.4. Connect-WebService.ps1
No example here.
Thanks. We had to comment it out because it was crashing the feedback system. It's unchanged from the first version of the book, so that's OK.
For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
You want to export the results of a command as a web page so that you can post it to a web server.
Use PowerShell's
ConvertTo-Html cmdlet to convert command output into
a web page. For example, to create a quick HTML summary of PowerShell's
commands:
PS > $filename = "c:\temp\help.html"
PS >
PS > $commands = Get-Command | Where { $_.CommandType -ne "Alias" }
PS > $summary = $commands | Get-Help | Select Name,Synopsis
PS > $summary | ConvertTo-Html | Set-Content $filenameWhen you use the ConvertTo-Html cmdlet to export command output
to a file, PowerShell generates an HTML table that represents the
command output. In the table, it creates a row for each object that you
provide. For each row, PowerShell creates columns to represent the
values of your object's properties.
If the table format makes the output difficult
to read, ConvertTo-Html offers the
-As parameter that lets you set the output style to
either Table or List.
While the default output is useful, you can
customize the structure and style of the resulting HTML as much as you
see fit. For example, the -PreContent and
-PostContent parameters let you include additional
text before and after the resulting table or list. The
-Head parameter lets you define the content of the
HEAD section of the HTML. Even if you want to generate most of the HTML
from scratch, you can still use the -Fragment
parameter to generate just the inner table or list.
For more information about the ConvertTo-Html cmdlet, type Get-Help ConvertTo-Html.
Use the Send-MailMessage
cmdlet to send an email.
PS > Send-MailMessage -To guide@leeholmes.com ` >> -From user@example.com ` >> -Subject "Hello!" ` >> -Body "Hello, from another satisfied Cookbook reader!" ` >> -SmtpServer mail.example.com
The Send-MailMessage cmdlet
supports everything you would expect an email-centric cmdlet to support:
attachments, plain text messages, HTML messages, priority, receipt
requests an more. The most difficult aspect is usually remembering the
correct SMTP server to use.
The Send-MailMessage cmdlet
works to help this problem, as well. If you don't specify the
-SmtpServer parameter, it uses the server specified
in the $PSEmailServer variable, if any.
The Send-MailMessage cmdlet
was introduced in version two of PowerShell. If you need to send an
email from version one of PowerShell, see the section called “Program: Send-MailMessage”.
The Send-MailMessage cmdlet
is the easiest way to send an email from PowerShell, but was introduced in
version two of PowerShell. If you need to send an email from version one
of PowerShell, you can use Example 12.5, “Send-MailMessage.ps1”.
In addition to the fields shown in the script,
the System.Net.Mail.MailMessage class
supports properties that let you add attachments, set message priority,
and much more. For more information about working with classes from the
.NET Framework, see the section called “Work with .NET Objects”.
Example 12.5. Send-MailMessage.ps1
param(
[string[]] $to = $(throw "Please specify the destination mail address"),
[string] $subject = "<No Subject>",
[string] $body = $(throw "Please specify the message content"),
[string] $smtpHost = $(throw "Please specify a mail server."),
[string] $from = "$($env:UserName)@example.com"
)
$email = New-Object System.Net.Mail.MailMessage
foreach($mailTo in $to)
{
$email.To.Add($mailTo)
}
$email.From = $from
$email.Subject = $subject
$email.Body = $body
$client = New-Object System.Net.Mail.SmtpClient $smtpHost
$client.UseDefaultCredentials = $true
$client.Send($email)
For more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
While it is common to work at an abstract level with web sites and web services, an entirely separate style of Internet-enabled scripting comes from interacting with the remote computer at a much lower level. This lower level (called the TCP level, for Transmission Control Protocol) forms the communication foundation of most Internet protocols—such as Telnet, SMTP (sending mail), POP3 (receiving mail), and HTTP (retrieving web content).
The .NET Framework provides classes that let you
interact with many of the Internet protocols directly: the System.Web.Mail.SmtpMail class for SMTP, the
System.Net.WebClient class for HTTP,
and a few others. When the .NET Framework does not support an Internet
protocol that you need, though, you can often script the application
protocol directly if you know the details of how it works.
Example 12.6, “Interacting with a remote POP3 mailbox” shows how to receive
information about mail waiting in a remote POP3 mailbox, using the
Send-TcpRequest script given in Example 12.7, “Send-TcpRequest.ps1”.
Example 12.6. Interacting with a remote POP3 mailbox
## Get the user credential
if(-not (Test-Path Variable:\mailCredential))
{
$mailCredential = Get-Credential
}
$address = $mailCredential.UserName
$password = $mailCredential.GetNetworkCredential().Password
$pop3Commands = "USER $address","PASS $password","STAT","QUIT"
$output = $pop3Commands | Send-TcpRequest mail.myserver.com 110
$inbox = $output.Split("`n")[3]
$status = $inbox |
Convert-TextObject -PropertyName "Response","Waiting","BytesTotal","Extra"
"{0} messages waiting, totaling {1} bytes." -f $status.Waiting, $status.BytesTotalIn Example 12.6, “Interacting with a remote POP3 mailbox”, you connect to port
110 of the remote mail server. You then issue commands to request the
status of the mailbox in a form that the mail server understands. The
format of this network conversation is specified and required by the
standard POP3 protocol. Example 12.6, “Interacting with a remote POP3 mailbox” uses the Convert-TextObject command, which is provided in
the section called “Program: Convert Text Streams to Objects”.
Example 12.7, “Send-TcpRequest.ps1” supports the core functionality of Example 12.6, “Interacting with a remote POP3 mailbox”. It lets you easily work with plain-text TCP protocols.
Example 12.7. Send-TcpRequest.ps1
##############################################################################
param(
[string] $remoteHost = "localhost",
[switch] $test,
[int] $port = 80,
[switch] $UseSSL,
[string] $inputObject,
[int] $commandDelay = 100
)
[string] $SCRIPT:output = ""
$currentInput = $inputObject
if(-not $currentInput)
{
$currentInput = @($input)
}
$scriptedMode = ([bool] $currentInput) -or $test
function Main
{
if(-not $scriptedMode)
{
write-host "Connecting to $remoteHost on port $port"
}
try
{
$socket = New-Object System.Net.Sockets.TcpClient($remoteHost, $port)
}
catch
{
if($test) { $false }
else { Write-Error "Could not connect to remote computer: $_" }
return
}
if($test) { $true; return }
if(-not $scriptedMode)
{
write-host "Connected. Press ^D followed by [ENTER] to exit.`n"
}
$stream = $socket.GetStream()
if($UseSSL)
{
$sslStream = New-Object System.Net.Security.SslStream $stream,$false
$sslStream.AuthenticateAsClient($remoteHost)
$stream = $sslStream
}
$writer = new-object System.IO.StreamWriter $stream
while($true)
{
$SCRIPT:output += GetOutput
if($scriptedMode)
{
foreach($line in $currentInput)
{
$writer.WriteLine($line)
$writer.Flush()
Start-Sleep -m $commandDelay
$SCRIPT:output += GetOutput
}
break
}
else
{
if($output)
{
foreach($line in $output.Split("`n"))
{
write-host $line
}
$SCRIPT:output = ""
}
$command = read-host
if($command -eq ([char] 4)) { break; }
$writer.WriteLine($command)
$writer.Flush()
}
}
$writer.Close()
$stream.Close()
if($scriptedMode)
{
$output
}
}
function GetOutput
{
$buffer = new-object System.Byte[] 1024
$encoding = new-object System.Text.AsciiEncoding
$outputBuffer = ""
$foundMore = $false
do
{
start-sleep -m 1000
$foundmore = $false
$stream.ReadTimeout = 1000
do
{
try
{
$read = $stream.Read($buffer, 0, 1024)
if($read -gt 0)
{
$foundmore = $true
$outputBuffer += ($encoding.GetString($buffer, 0, $read))
}
} catch { $foundMore = $false; $read = 0 }
} while($read -gt 0)
} while($foundmore)
$outputBuffer
}
. MainFor more information about running scripts, see the section called “Run Programs, Scripts, and Existing Tools”.
No comments yet
Add a comment