using (WordprocessingDocument doc = WordprocessingDocument.Open("c:\\Test.docx", false))
var build = doc.MainDocumentPart
.Descendants(w + "sdt")
.Where(e => ((string)e.Elements(w + "sdtPr")
.Elements(w + "alias")
.Attributes(w + "val")
.FirstOrDefault()).ToLower() == "build")
.Select(b => GetTextFromContentControl(b));
foreach (var b in build)
textBox1.Text += b + "\n";
From the blog title, I believe this is the LINQ part. The .Select(b => GetTextFromContentControl(b)); is confusing to me because it passes b (an XMLElement) into GetTextFromContentControl, but b hasn't been declared at this point yet. And then the follow on foreach statement confuses me even more. I guess b was assigned a value in that "b =>" part??? And since there is a foreach, it lends me to believe that there could be more than one b in the document, but when I added more document controls to the test doc, it still only printed out one time.
More questions: what are the statements that begin with dots called? I see that the var build line doesn't end until the semi colon in the select line. I am used to functions that look like myFunc(param1, param2), but I don't get what these dot statements do. I'll be happy to look them up and read about them myself, but I don't know what they are called. Although if someone wants to give me a little insight here, I'll be even happier
Ultimately, I would like this code to parse through all of the content controls in a document, reading the control name and assigning a value to them. I could also use a clue as to how to modify the code here so that it could handle multiple controls within a document. Again, I don't need the full answer, but just a little nudge in the right direction.
what are the statements that begin with dots called?
These are actually nothing special, and are really nothing you haven't seen before.
It's just the good ol' member access operator (".").
As you know, when you have an object, you would use "." to access its members, which include methods.
Now, if a method returns an object, you can chain a new method call on the result.
This is the same as:
var result = someVar.SomeMethod();
Sometimes people use this to do funky things with the API they design, and create a domain-specific language. For example, you may see code like:
var result = deckOfCards.PickFrom().Spades().Random();
But I wouldn't go crazy with that approach...
Anyway, since statements in C# must end in ";", any white space in between doesn't matter.
Which means that
can be rewritten as
if someone finds it convenient for whatever reason.
Now, LINQ and lambda expressions.
LINQ is, under the hood, just a set of methods that generally operate on collections of objects, and accept lambda expressions and the collection in question as parameters.
Most of these methods are integrated into the language syntax as keywords (query syntax), but the original syntax is also available (method syntax), and these can even be mixed. Whatever you can do using the query syntax, you can do using the method syntax, and somewhat more.
But, basically, you pass in a collection, and often a predicate of some sort, and the method returns a new collection based on that. (There are some details, like deferred execution, but never mind that.)
Are you familiar with delegates and anonymous methods?
These are basically built on top of that. In a nutshell, a delegate is an object oriented, CLR-supported, function pointer. It can hold a reference to one or more methods with a specific signature, and it can be used to execute them all - you would use the same way you'd use any method, except you'd type in the name of the delegate variable instead of a method name. Events are based on these, so you probably have some familiarity with them. Events use this mechanism to inform all the subscribed methods that something has happened.
Now, since these are proper objects, no reason why they can't be passed as parameters to some methods. This is one of the ways to dynamically customize the behavior of a certain method; when it comes to some operation or a decision, this method could simply use the delegate to call whatever it points to, obtain the result, and proceed based on that.
(For example, you could pass in a delegate that points to a function that can compare instances of a custom class.)
If this style is employed a lot, it all begins to look like functional programming. People can use this to implement things like continuations.
All this represents a rather different approach to programing, so unlike to what we're used to with traditional OOP, which is an imperative programming style/paradigm.
Anyway, don't let all that confuse you.
Time to answer another of you questions:
The .Select(b => GetTextFromContentControl(b)); is confusing to me because it passes b (an XMLElement) into GetTextFromContentControl, but b hasn't been declared at this point yet.
IMHO, the "b" shouldn't be named just "b". LINQ can be confusing as it is, and the current trend of not giving more descriptive names to variables used in LINQ queries is not helping.
OK, let's decompose the Select method. This is its declaration:
public static IEnumerable<TResult> Select<TSource,TResult>(
this IEnumerable<TSource> source,
This is just an extension method - as indicated by the first parameter and the fact that it's static. BTW, extension methods are just a way to tell C# to pretend that the method in question belongs to some specific type, so that it can be accessed via the member access operator (the ".").
In reality, the object get's passed to the static method, which then uses it.
This specific method, as you can see by the first parameter, is an extension method for types that implement IEnumerable<T>, that is, collections.
So, for any collection, you can write: someCollection.Select(/* whatever */).
The Select() method "projects each element of a sequence into a new form" - that is, it takes each element, applies a user provided transformation to it (obtaining the result as a new object), and returns a collection of the transformed elements.
The second parameter is a delegate, a reference to a method that tells the Select() how to transform (project) each element. The signature of this method must accept variable of type TSource as a parameter, and return an object of type TResult. (Sometimes, TResult == TSource.)
For example, you may have a collection of complex objects, but only need to have a collection of values of some string property of each object. You can use Select() to project each of the complex objects into a corresponding string property, and obtain a collection of those strings. Or, you can use it to create even more complex, composed objects, or perform a math operation on each element and get a collection of results, and stuff like that...
Long, story short, you could write a regular method, that takes an element of a collection such as List<T>, picks some transformation as required by the program logic, and returns the transformed element (that is, turned into a different type, or otherwise in a different form); and then pass a delegate to this method.
But, as this methods are usually relatively simple, you often don't want to pollute your class with this otherwise not really relevant methods, and you don't want to go through the trouble of creating a new class just to hold such methods - so C# enables you to do it inline.
Before lambda expressions, you had to use anonymous methods, which were declared inline using the delegate keyword (think of it as of declaring a delegate to an anonymous method).
So, you could write (assuming this will convert to Func<T, TR>):
// everything that follows here and is indented is the single (second) parameter to the Select method (the first one is implicit - it's the stringList collection)
/* project (transform) slElem into something else */
The Select() method will then, in its body, pass each element of the list referenced by stringList to the anonymous method, and add the result of that method to the new collection. When the new collection is complete, it is returned as the result of the Select method().
But, with lambda expressions, you can do the same in a more compact manner.
Read up on lambda expressions to find out about various ways to use them, and the varieties of their syntax. But, as you can see, what comes before the "=>" is the parameter list, and that's why it doesn't need to be declared nowhere in the code - just as with regular functions, the parameters are declared in the parameter list.
The neat, but confusing thing is that the compiler can infer the types of the parameters, so these aren't written. What you need to keep in mind here is what the parameters actually represent. Here, slElem represents the current element of the stringList collection, as it is enumerated by the Select() method.
What fallows after "=>" is either a single statement, or a method body (code block).
The "=>" itself is read as "goes to". So, in your example
b => GetTextFromContentControl(b)
reads: "(the parameter b) goes to the GetTextFromContentControl(b) method". Here, "b" is an element of whatever collection was returned by the Where() method. The indentation here works in your favor - this is why the dot's are lined up, to increase readability: by looking at the indentation, you can tell that .Select() is called on the collection returned by Where(). Indentation doesn't matter for the compiler, it just a coding convention intended to help a human reader to keep track of things. Also, note that everything between "Where(" and ").Select()" is a parameter to Where() method.
Now... What this thing does?
var build = doc.MainDocumentPart // (1) Gets the main document part
.GetXDocument() // (2) This is an extension method that returns a related XDocument object,
// which represents an XML document representation of the word document main part.
.Descendants(w + "sdt") // (3) Gets a collection of descendant elements of XDocument, filtered by w + "sdt"
// (4) LINQ Where selector: filters a collection based on the lambda predicate passed in. The lambda here is everything
// from the first "(" to the last ")" before ".Select(/**/)"
// The where accepts a delegate (or a lambda) which takes a collection as a parameter, and tests each element for a
// condition, returning a bool value indicating the result of the test. Where then select only those elements for
// which the test was true.
// So basically this is what it says below: .Where(e => /* result of some operation on e */ == "build")
// Here, e is the an element of the collection that was returned by Descendants(w + "sdt"), for which Where() was called.
.Where(e => ((string)e.Elements(w + "sdtPr") // (5) An extension method that gets a collection of child elements for every element e, filtered by w + "sdtPr"
.Elements(w + "alias") // (6) An extension method that gets a collection of child elements of every element of what (5) returned, filtered by w + "alias"
.Attributes(w + "val") // (7) An extension method, returns a filtered collection of the attributes of every element in what (6) returned, filtered by w + "val"
.FirstOrDefault()).ToLower() == "build") // (8a) Returns the first element of what (7) returned, or a default value if the sequence contains no elements.
// (8b) The resulting single element from (8a) is then type-cast to string, and compared to "build". The expression (string)element == "build" evaluates to bool,
// and the result of that evaluation is returned as the result of the lambda.
// Thus, the Where selects all elements that somewhere along the line (as specified by the filters), two levels down, contain an attribute that says "build".
.Select(b => GetTextFromContentControl(b)); // (9) From the collection that Where() returned, project each element b, into the result of the
// GetTextFromContentControl(b) method; compose the results into a new collection of transformed elements,
// and assign that collection as the value of the build variable, up there, back at (1). The compiler infers the correct type.
Long story short: this code takes the document main part, represented as an XML document, gets all its descendant elements with a certain name, for which, after some digging through the hierarchy, there's a descendant element (as specified by the XName filters) with an attribute "build"; finally, the collection of such elements is projected by the Select() method into a collection of their corresponding content control text entries (that is, strings).
In the end, each string contained in build is then added to the (presumably) multiline textbox.
Now, some LINQ advocates over here might disagree, but I personally think that this is all a bit cumbersome and confusing, and that there's got to be a better way to do this.
Maybe decompose this whole process into several traditional methods, that internally use LINQ. Dunno, I like my methods arranged in an orderly fashion - I (perhaps irrationally) dislike all this chaining and nesting, fells somehow... messy. But, if the process was decomposed, would it be as efficient as the original query? Worse? Better?
Last edited by TheGreatCthulhu; December 14th, 2011 at 08:44 AM.
OMG, I was very tired and sleepy when I wrote that last post, I can't believe I got a bit confused about a few things regarding the LINQ extension methods.
Don't get me wrong, most of what I wrote is still valid, but when it comes to the Where() and Select() methods, I need to write some corrections - I apologize if this caused some confusion.
I'll edit the previous post to correct the errors, but I first want to point out what they were here, otherwise someone who read the previous post might not notice the edits at all.
First, I said that the second parameter of Select() and Where() methods is a delegate that represents a function that returns a collection of objects. This is not true; here's the deal:
Select() operates on an IEnumerable<TSource> takes a Func<TSource, TResult> as a parameter. It returns an IEnumerable<TResult>. Where() operates on an IEnumerable<TSource> and takes a Func<TSource, bool> as a parameter. It returns an IEnumerable<TSource>.
Func<TSource, TResult> is a delegate that can reference any method that:
takes a TSource as its only parameter, and
returns a TResult (performing some processing first as required).
Func<TSource, bool> is the same thing, except that TResult must be bool.
In case of Select() the delegate (or lambda) that you pass in is invoked for each element of the collection for which the Select() was called - the current element is passed in (as TSource-parameter), and is then transformed in the body of the referenced method (or lambda) into an object of type TResult, which is then returned as the result of the referenced method (or lambda). Each of these elements is then added to the resulting collection to be returned by Select(). Select basically builds and returns a transformed collection of type IEnumerable<TResult> - again, deffered execution aside - by invoking the delegate passed to it for each member of the original collection.
This is called element projection - the Select() method projects each element in a new form, and returns a collection of transformed objects.
As for Where(), the delegate parameter it accepts is called a predicate - it is a method used to filter the elements of the collection for which Where() was called. It invokes the delegate (usually a lambda) for each of the elements in the source collection, passes the current element to it, and waits for the referenced method/lambda to test if some condition is met. If so, the referenced method/lambda returns true, otherwise it returns false (remember, it must be a method/lambda that returns a bool).
If the result of this operation is true, the element is added to the resulting collection (of the same type) that is to be returned by the Where() method (once again, deferred execution aside).
So, finally, if you wanted to give a meaningful variable name to a lambda expression parameter, you could write something like this:
var q = infoNodeCollection
// do something with inElement,
// transform it to something else, and
// return the new object
// Test if transformedElement
// satisfies some condition, and
// return a bool as an indicator
// q ends up being a collection of transformed elements
// that satisfy the given condition
Second, I said, for some inexplicable reason, that the delegate/lambda passed to the Select() method selects a subset of the original elements, and that Select() than returns the subset as the result. As you had the chance to read above, it's not true. Again, Select() creates and returns a new collection by projecting each element of the source collection, using a user-provided projection rule.
Side note - because of deferred execution (lazy execution), the variable q is often regarded as representing the LINQ query itself, and not the resulting collection - but these are details. The query is not actually executed until the object is enumerated.
EDIT: All corrected - well, as far as I can tell.
Last edited by TheGreatCthulhu; December 14th, 2011 at 08:30 AM.
Wow! What an incredible response. This must have taken at least an hour or more to write! I am impressed. And enlightened. Your explanations are perfectly clear and I have already made adjustments to the code that allow it to meet my needs. I have also started reading the LINQ sections in my Deitel C# book. Perhaps you should consider writing a book as well??
If you are ever in the Virginia Beach area please allow me to buy you a beer or other beverage of your choosing