You'll need to do some string manipulation. What you actually need here is to look up String.Split and String.Substring and String.LastIndexOf - Just naming a few that immediately pops into my head.
You probably also want to use string.Split(string, StringSplitOptions) giving StringSplitOptions.RemoveEmptyEntries as the second argument.
You might also think about canonicalizing your string to contain only a restricted character set (namely "A-Za-z0-9. "). That should avoid any problems if non-space whitespace characters are present.
Best Regards,
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
Hey,
I'm just learning to program, so I've tried to add some of the cool stuff that I've learned into this (LINQ, Yield, etc)
class Program
{
static void Main(string[] args)
{
const string paragraph = "Sentence one. Sentence two contains some more words. Sentence three; thinking.";
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestApplications
{
class Program
{
static void Main(string[] args)
{
if (args.Length == 1 && args[0] == "/?")
{
Console.WriteLine("SYNTAX: TestApplications <Sentence>");
#if DEBUG
Pause();
#endif
return;
}
if (args.Length < 1)
{
Console.WriteLine("Invalid Syntax! /? for help.");
#if DEBUG
Pause();
#endif
return;
}
string sentence = args[0];
if (String.IsNullOrEmpty(sentence))
{
Console.WriteLine("String is empty!");
return;
}
foreach (List<int> count in ParseSentence(sentence))
{
Console.WriteLine(String.Format("Sentence {0} = {1} words.", count[0], count[1]));
}
#if DEBUG
Pause();
#endif
}
private static IEnumerable<List<int>> ParseSentence(string input)
{
int wordCount = 0;
int sentenceCount = 0;
List<int> ret = new List<int>();
for (int i = 0; i < input.Length; i++)
{
if (input[i] == ' ') //End of word
{
++wordCount;
continue;
}
if (input[i] == '.') //End of sentence
{
++wordCount; //Period should be at the end of a word...
++sentenceCount;
ret.Add(sentenceCount);
ret.Add(wordCount);
yield return ret;
ret = new List<int>();
wordCount = 0;
}
}
}
private static void Pause()
{
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true);
}
}
}
Last edited by Deranged; June 7th, 2012 at 06:44 PM.
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
I find it too inconsistent, or rather too generic. I parse files all day and if the file has a lot of mistakes, your going to grab those mistakes with the .Split
For example if you had a CSV file that was quote qualified:
"Field1","Field2",_"Field3","Field,4"
I would probably end up using a while loop with 3 int's, a start, end, and cursor and go down character by character to remove the _ which I'm signifying as a white space and use the qualifier to grab Field 4 with the comma. String.Split wouldn't be able to take in account of Field,4 unless I don't know something about it.
Anyways, I've used it before but for something, extremely simple. Like a name delimited by a dash or something. |00912323048|John-Doe|yyyy/MM/dd|etc
Well, fair enough. As for me, I do a bunch of bioinformatics parsing and my preferred format is tab-separated value format. It's portable, easily human-readable and editable, and works with lots of existing tools (e.g. GNU tools and String.Split)
Perhaps the greater lesson is to encourage your data sources to provide you machine-friendly data in the first place!
Best Regards,
BioPhysEngr http://blog.biophysengr.net
--
All advice is offered in good faith only. You are ultimately responsible for effects of your programs and the integrity of the machines they run on.
I work with hundreds if not a thousand insurance companies, they fail, hardcore. I have see hundreds of delimited files, all kinds are different in some way. Ugh, but you're right, tab delimited is nice but it is also the most common to have extra/missing columns because someone entering data used an extra tab (if it is entered by human means) but not always, sometimes these insurance companies hire people out of Trade school, and nothing against trade schools but they don't teach a lot and these students are still very new... Anyways..
Bookmarks