-
[2010?] Excel split/corrections
Hey everyone, I am new to the forums (Sorry if my thread title is incorrect, let me know if I need to change it) and kind of hope to grow and get some insight from this fantastic site. I am having a little bit of trouble I would really love some insight or some kind of help with something small I have been writing in Visual Studios 2010.
At my job, we deal with a lot of excel files, mainly CSV but others as well. We import these CSV files into our databases at work and we can only import 1000 lines at a time. I have already created something that can split a header line and 999 to equal 1000. So now I am having a problem by simply implementing a simple Replace string. So what I have so far is:
Code:
Imports System.IO
Imports System.Text
Public Class Form1
Inherits System.Windows.Forms.Form
Windows Form Designer generated code
Public Delegate Sub UpdateProgressSub(ByVal CurrentLine As Integer)
Private _IsAbort As Boolean
Sub SplitCSV(ByVal FilePath As String, ByVal LineCount As Integer, ByVal MaxOutputFile As Integer, ByVal Status As UpdateProgressSub, ByRef IsAbort As Boolean)
' Validate first
If LineCount < 100 Then Throw New Exception("Number of lines must be more than 100.")
' Open the csv file for reading
Dim Reader As New IO.StreamReader(FilePath)
' Create the output directory
Dim OutputFolder As String = FilePath & "_Splits"
If Directory.Exists(FilePath) = False Then
Directory.CreateDirectory(OutputFolder)
End If
' Check/read the csv column's header
Dim strHeader As String = Reader.ReadLine
' Assign string to invalid characters.
Dim value1 As String = "'"
Console.WriteLine(value1)
' Replace every instance of the string with the excel file.
Dim value2 As String = value1.Replace("'", "")
Console.WriteLine(value2)
' Start splitting
Dim FileIndex As Integer
Do
' Update progress
FileIndex += 1
If Not Status Is Nothing Then
Status.Invoke((FileIndex - 1) * LineCount)
End If
' Check if the number of splitted files doesn't exceed the limit
If (MaxOutputFile < FileIndex) And (MaxOutputFile > 0) Then Exit Do
' Create new file to store a piece of the csv file
Dim PiecePath As String = OutputFolder & "\" & Path.GetFileNameWithoutExtension(FilePath) & "_" & FileIndex & Path.GetExtension(FilePath)
Dim Writer As New StreamWriter(PiecePath, False)
Writer.AutoFlush = False
Writer.WriteLine(strHeader)
' Read and writes precise number of rows
For i As Integer = 1 To LineCount
Dim s As String = Reader.ReadLine()
If s <> Nothing And _IsAbort = False Then
Writer.WriteLine(s)
Else
Writer.Flush()
Writer.Close()
Exit Do
End If
Next
' Flush and close the splitted file
Writer.Flush()
Writer.Close()
Loop
Reader.Close()
End Sub
Private Sub btnSplit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSplit.Click
Dim th As New Threading.Thread(AddressOf SplitIt)
th.Start()
End Sub
Sub SplitIt()
btnSplit.Enabled = False
Try
SplitCSV(txtCSV.Text, numLinesCount.Value, numMaxFiles.Value, AddressOf UpdateProgress, _IsAbort)
If Not _IsAbort Then
MsgBox("Completed Successfully!", MsgBoxStyle.Information)
Else
_IsAbort = False
MsgBox("Spliting proccess was aborted by user.", MsgBoxStyle.Critical)
End If
Catch ex As Exception
MsgBox("Unable to split the CSV File. Reason: " & ex.Message, MsgBoxStyle.Critical)
Finally
btnSplit.Enabled = True
End Try
End Sub
Sub UpdateProgress(ByVal CurrentLine As Integer)
lblStatus.Text = "Aprox. " & CurrentLine.ToString & " lines splitted"
End Sub
Private Sub btnBrowse_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBrowse.Click
If OpenFileDialog1.ShowDialog = DialogResult.OK Then
txtCSV.Text = OpenFileDialog1.FileName()
End If
End Sub
Private Sub btnAbort_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)
_IsAbort = True
End Sub
Private Sub Form1_Closing(ByVal sender As Object, ByVal e As System.ComponentModel.CancelEventArgs) Handles MyBase.Closing
_IsAbort = True
End Sub
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
End Sub
Private Sub numLinesCount_ValueChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles numLinesCount.ValueChanged
End Sub
Private Sub CheckBox1_CheckedChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CheckBox1.CheckedChanged
' Assign string to literal constant.
Dim value1 As String = "'"
Console.WriteLine(value1)
' Replace every instance of the string.
Dim value2 As String = value1.Replace("'", "")
Console.WriteLine(value2)
End Sub
End Class
So I bold what I added, and I cut out the windows generated form. I am fairly new and would really appreciate insight. Honestly I would love it done two ways. I put a check box on my form, if checked = fix invalid characters and if un-checked = ignore
Invalid characters I need replaced before it is split is, (" , ') these three and possibly more. Then if there is a way to replace a line within the split process if something is missing in Column A, B or C.
Is this even possible in the first place to do replacing?
Thanks everyone!
-
Re: [2010?] Excel split/corrections
Seems like the hard way to do this.
Read ALL the lines at once, split them into 100 lines, and then use Replace() 3 times for each group. You don't need to split lines, then write them.
-
Re: [2010?] Excel split/corrections
Hey dglienna, thanks for the quick reply. I failed to mention I couldn't get my current code to replace any invalid characters. Can you elaborate how I can integrate the Repalce() into my code?
Also do you know of a way I can add a check to my code that if Column A , B , C if they have an empty field to delete the row and move the others up?
And I thought I had it Read All lines already? I need to take a look at it again, been really busy but I need to try and get this working by the end of the week.
Thanks for your time!
-
Re: [2010?] Excel split/corrections
Have you looked at the SAMPLES?
-
Re: [2010?] Excel split/corrections
I will now, thank you dglienna. I will see what I can do from reading up in there.
-
Re: [2010?] Excel split/corrections
Alright, so I am just not comprehending it my current problem. I took a lot of samples and spent hours reading and trying things. I am completely lost now and just went back to step 1. I have my splitting method working. I can take a CSV file with 20000 lines and split it 20 times keeping the for each of them.
I cannot get a replace funtion or a delete row function to work. Once again what I that is working:
Code:
Sub SplitCSV(ByVal FilePath As String, ByVal LineCount As Integer, ByVal MaxOutputFile As Integer, ByVal Status As UpdateProgressSub, ByRef IsAbort As Boolean)
' Validate first
If LineCount < 100 Then Throw New Exception("Number of lines must be more than 100.")
' Open the csv file for reading
Dim Reader As New IO.StreamReader(FilePath)
' Create the output directory
Dim OutputFolder As String = FilePath & "_Splits"
If Directory.Exists(FilePath) = False Then
Directory.CreateDirectory(OutputFolder)
End If
' Read the csv column's header
Dim strHeader As String = Reader.ReadLine
' Start splitting
Dim FileIndex As Integer
Do
' Update progress
FileIndex += 1
If Not Status Is Nothing Then
Status.Invoke((FileIndex - 1) * LineCount)
End If
' Check if the number of splitted files doesn't exceed the limit
If (MaxOutputFile < FileIndex) And (MaxOutputFile > 0) Then Exit Do
' Create new file to store a piece of the csv file
Dim PiecePath As String = OutputFolder & "\" & Path.GetFileNameWithoutExtension(FilePath) & "_" & FileIndex & Path.GetExtension(FilePath)
Dim Writer As New StreamWriter(PiecePath, False)
Writer.AutoFlush = False
Writer.WriteLine(strHeader)
' Read and writes precise number of rows
For i As Integer = 1 To LineCount
Dim s As String = Reader.ReadLine()
If s <> Nothing And _IsAbort = False Then
Writer.WriteLine(s)
Else
Writer.Flush()
Writer.Close()
Exit Do
End If
Next
' Flush and close the splitted file
Writer.Flush()
Writer.Close()
Loop
Reader.Close()
End Sub
What I need is to somehow replace invalid characters within my CSV file for example = , ' " .
What I also need is to somehow delete a entire row if a cell is empty in Column A,B,C
I can already tell this is most likely a simple process and somehow its just blowing through me. I am still a beginner in the learning process.
Thank you.
-
Re: [2010?] Excel split/corrections
This will read ALL lines. Then, you can do ONE replace command
Code:
Imports System.IO
Module Module1
Sub Main()
Dim SourceFile As StreamReader
Try
SourceFile = New StreamReader("test.txt")
Console.WriteLine(SourceFile.ReadToEnd())
SourceFile.Close()
Catch Except As Exception
Console.WriteLine("Debug file does not yet exist")
End Try
End Sub
End Module
-
Re: [2010?] Excel split/corrections
I'm not sure how a , nor " is an invalid character in a CSV as these are required especially the , and may also occur in many data items.
I would also advise reading the entire file as suggested above.
Would be much simpler if you were using a tab delimited file rather than a CSV
-
Re: [2010?] Excel split/corrections
I know it seems strange but at work we can only use CSV files for our data uploads to our servers. I spend hours cleaning up CSV files with these characters our servers cannot accept. Then what really kills me is if a cell is empty in Column A,B,C it can corrupt my whole upload. So I was hoping to make something to save my time by a long shot. I sometimes get files of 20k-100k of lines and so far I can split them into 1000 a piece per file which is what I needed, I guess I am just having a hard time implementing a replace command and delete rows. Everything I tried nothing happened.
-
Re: [2010?] Excel split/corrections
What DB can't read a large file? Or, even OPEN the Excel files?
Might want to look up PowerPivot. It might do what you want?
-
Re: [2010?] Excel split/corrections
Unfortunately it's not my call, I work for a medical software company and I import patient CSV list cause its the only file accepted. I am like I said making a tool to make my life easier with our very badly set up system lol
So I got your code to work dglienna but not with my Header row check.. man I am very newbish at this but thanks for your help so far its much appreciated as it makes me one step further to efficiency heh
-
Re: [2010?] Excel split/corrections
-
Re: [2010?] Excel split/corrections
Ugh I replaced the upper lines of my code with yours as I figured thats where I needed it. Got the split to work again with the headers. I could not get Replace() to work, I tried using strings again and still couldn't. I couldn't even get row's to delete with missing cells.
I think I'm in over my head, I only have so much time to read and learn myself. I start school with software development in December. I apologize if I am annoying to help dglienna. I had help with the original code and I only understand basics.
Thanks for helping this far, I think I used what you gave me right now I don't know how to get Replace() to work. Sorry :(
-
Re: [2010?] Excel split/corrections
It looks like you are only calling the replace() function on the header row and not on the rows within the file.
-
Re: [2010?] Excel split/corrections
-
1 Attachment(s)
Re: [2010?] Excel split/corrections
Sorry, I just reverted back to what I originally had.
What I originally did was replace my validate and reader with yours, not sure if that was supposed to be done.
I uploaded my source if you don't mind taking a look at it. Thats all that I have, thanks for your time.
-
Re: [2010?] Excel split/corrections
The problem is that if you replace the , characters in the text from the file you no longer have a field seperator so there will be no way to tell which data belongs in which field or as you refer to it cell.
You would need to break the file down into fields then replace the unwanted characters within the fields, check for missing field data then rebuild the row and write it to the new file in the desired format.
The other problem is that if commas can appear in the data as well as being field seperators reading the data and deciding which commas to remove can be rather tricky.
One easy way to get around most of these issues is to open the file with Excel and save it as a TAB delimited text file then read the entire file into memory, use replace as needed to replace unwanted characters, then split to place the rows into an array and then loop through the array splitting each row into fields so you can check the data. If the data is to be saved you then rebuild the row in the desired output format and write to a file. When the number of rows written reach the desired number you would change to the next file name and continue until all the rows have been processed.
-
Re: [2010?] Excel split/corrections
My mistake, I don't need to replace commas since all the CSV files I work with will not have them as thats what splits the raw data into separate cells (since you already know this) I was just using it as a example of what I need help with.
See my problem is, I spend hours day in and day out going through CSV files using the replace command in excel and then scrolling through thousands of rows deleting them, I was hoping to take my split tool and add these two features into it doing all my hard work.
It would be great too if I did get that far in the process of adding those features have some kind of log file generate of what rows were deleted since I deal with medical records all day but that is just something I thought of as a 3rd feature. I am unfortunately not too skilled with .net let alone any other languages either. I am still very new a starting school, I was really hoping for some assistance, only if someone had the time to spare though.
-
Re: [2010?] Excel split/corrections
Ok Then if you do not need to replace the commas it gets a lot easier.
Something like this will allow you to read an entire file, replace characters then break it down into rows then fields, check the data and optionally write to output file.
Code:
Dim SR As New System.IO.StreamReader("c:\temp\test.csv")
Dim FileData As String = SR.ReadToEnd
SR.Close()
FileData = FileData.Replace(Chr(10), "")
FileData = FileData.Replace(Chr(34), "")
FileData = FileData.Replace("'", "")
Dim FileRows() As String = FileData.Split(Chr(13))
Dim FileRow() As String
Dim BlankFlag As Boolean = False
For x As Integer = 0 To FileRows.Length - 1
FileRow = FileRows(x).Split(",")
For y As Integer = 0 To FileRow.Length - 1
If FileRow(y) = "" Then
BlankFlag = True
Exit For
End If
Next
If BlankFlag = False Then
'write filerows(x) to output file
Else
'write the row to the log file
End If
BlankFlag = False
Next
-
Re: [2010?] Excel split/corrections
btw the code above is only for example of how it may be done. You will need to add code to actually write to the output and log files where I have commented on the matter.
-
Re: [2010?] Excel split/corrections
So I tried using what you came up with, I am still only able to split the file and it isn't giving me any kind of error yet it's still not replacing or removing rows :\
-
Re: [2010?] Excel split/corrections
I have no idea how you tried to use it but the code does do the replace without issue. There is no code there to write any data to file, you must add that. Rather than just pasting it in and hoping it works you should read it and especially take note of where I added the comments about writing the data.
-
Re: [2010?] Excel split/corrections
Oh I didn't try pasting it in lol
Just took your idea's and tried using them but didn't get any results.
-
Re: [2010?] Excel split/corrections
Well, NOT showing what you tried isn't going to help you...
-
Re: [2010?] Excel split/corrections
Well I know I completely messed it up so I didn't post it. I honestly cant figure out what to do at this point even with suggestions. I just need to probably drop this and learn a bit more to be honest.