[2010?] Excel split/corrections
Hey everyone, I am new to the forums (Sorry if my thread title is incorrect, let me know if I need to change it) and kind of hope to grow and get some insight from this fantastic site. I am having a little bit of trouble I would really love some insight or some kind of help with something small I have been writing in Visual Studios 2010.
At my job, we deal with a lot of excel files, mainly CSV but others as well. We import these CSV files into our databases at work and we can only import 1000 lines at a time. I have already created something that can split a header line and 999 to equal 1000. So now I am having a problem by simply implementing a simple Replace string. So what I have so far is:
Code:
Imports System.IO
Imports System.Text
Public Class Form1
Inherits System.Windows.Forms.Form
Windows Form Designer generated code
Public Delegate Sub UpdateProgressSub(ByVal CurrentLine As Integer)
Private _IsAbort As Boolean
Sub SplitCSV(ByVal FilePath As String, ByVal LineCount As Integer, ByVal MaxOutputFile As Integer, ByVal Status As UpdateProgressSub, ByRef IsAbort As Boolean)
' Validate first
If LineCount < 100 Then Throw New Exception("Number of lines must be more than 100.")
' Open the csv file for reading
Dim Reader As New IO.StreamReader(FilePath)
' Create the output directory
Dim OutputFolder As String = FilePath & "_Splits"
If Directory.Exists(FilePath) = False Then
Directory.CreateDirectory(OutputFolder)
End If
' Check/read the csv column's header
Dim strHeader As String = Reader.ReadLine
' Assign string to invalid characters.
Dim value1 As String = "'"
Console.WriteLine(value1)
' Replace every instance of the string with the excel file.
Dim value2 As String = value1.Replace("'", "")
Console.WriteLine(value2)
' Start splitting
Dim FileIndex As Integer
Do
' Update progress
FileIndex += 1
If Not Status Is Nothing Then
Status.Invoke((FileIndex - 1) * LineCount)
End If
' Check if the number of splitted files doesn't exceed the limit
If (MaxOutputFile < FileIndex) And (MaxOutputFile > 0) Then Exit Do
' Create new file to store a piece of the csv file
Dim PiecePath As String = OutputFolder & "\" & Path.GetFileNameWithoutExtension(FilePath) & "_" & FileIndex & Path.GetExtension(FilePath)
Dim Writer As New StreamWriter(PiecePath, False)
Writer.AutoFlush = False
Writer.WriteLine(strHeader)
' Read and writes precise number of rows
For i As Integer = 1 To LineCount
Dim s As String = Reader.ReadLine()
If s <> Nothing And _IsAbort = False Then
Writer.WriteLine(s)
Else
Writer.Flush()
Writer.Close()
Exit Do
End If
Next
' Flush and close the splitted file
Writer.Flush()
Writer.Close()
Loop
Reader.Close()
End Sub
Private Sub btnSplit_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnSplit.Click
Dim th As New Threading.Thread(AddressOf SplitIt)
th.Start()
End Sub
Sub SplitIt()
btnSplit.Enabled = False
Try
SplitCSV(txtCSV.Text, numLinesCount.Value, numMaxFiles.Value, AddressOf UpdateProgress, _IsAbort)
If Not _IsAbort Then
MsgBox("Completed Successfully!", MsgBoxStyle.Information)
Else
_IsAbort = False
MsgBox("Spliting proccess was aborted by user.", MsgBoxStyle.Critical)
End If
Catch ex As Exception
MsgBox("Unable to split the CSV File. Reason: " & ex.Message, MsgBoxStyle.Critical)
Finally
btnSplit.Enabled = True
End Try
End Sub
Sub UpdateProgress(ByVal CurrentLine As Integer)
lblStatus.Text = "Aprox. " & CurrentLine.ToString & " lines splitted"
End Sub
Private Sub btnBrowse_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBrowse.Click
If OpenFileDialog1.ShowDialog = DialogResult.OK Then
txtCSV.Text = OpenFileDialog1.FileName()
End If
End Sub
Private Sub btnAbort_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)
_IsAbort = True
End Sub
Private Sub Form1_Closing(ByVal sender As Object, ByVal e As System.ComponentModel.CancelEventArgs) Handles MyBase.Closing
_IsAbort = True
End Sub
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
End Sub
Private Sub numLinesCount_ValueChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles numLinesCount.ValueChanged
End Sub
Private Sub CheckBox1_CheckedChanged(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles CheckBox1.CheckedChanged
' Assign string to literal constant.
Dim value1 As String = "'"
Console.WriteLine(value1)
' Replace every instance of the string.
Dim value2 As String = value1.Replace("'", "")
Console.WriteLine(value2)
End Sub
End Class
So I bold what I added, and I cut out the windows generated form. I am fairly new and would really appreciate insight. Honestly I would love it done two ways. I put a check box on my form, if checked = fix invalid characters and if un-checked = ignore
Invalid characters I need replaced before it is split is, (" , ') these three and possibly more. Then if there is a way to replace a line within the split process if something is missing in Column A, B or C.
Is this even possible in the first place to do replacing?
Thanks everyone!
Re: [2010?] Excel split/corrections
Seems like the hard way to do this.
Read ALL the lines at once, split them into 100 lines, and then use Replace() 3 times for each group. You don't need to split lines, then write them.
Re: [2010?] Excel split/corrections
Hey dglienna, thanks for the quick reply. I failed to mention I couldn't get my current code to replace any invalid characters. Can you elaborate how I can integrate the Repalce() into my code?
Also do you know of a way I can add a check to my code that if Column A , B , C if they have an empty field to delete the row and move the others up?
And I thought I had it Read All lines already? I need to take a look at it again, been really busy but I need to try and get this working by the end of the week.
Thanks for your time!
Re: [2010?] Excel split/corrections
Have you looked at the SAMPLES?
Re: [2010?] Excel split/corrections
I will now, thank you dglienna. I will see what I can do from reading up in there.
Re: [2010?] Excel split/corrections
Alright, so I am just not comprehending it my current problem. I took a lot of samples and spent hours reading and trying things. I am completely lost now and just went back to step 1. I have my splitting method working. I can take a CSV file with 20000 lines and split it 20 times keeping the for each of them.
I cannot get a replace funtion or a delete row function to work. Once again what I that is working:
Code:
Sub SplitCSV(ByVal FilePath As String, ByVal LineCount As Integer, ByVal MaxOutputFile As Integer, ByVal Status As UpdateProgressSub, ByRef IsAbort As Boolean)
' Validate first
If LineCount < 100 Then Throw New Exception("Number of lines must be more than 100.")
' Open the csv file for reading
Dim Reader As New IO.StreamReader(FilePath)
' Create the output directory
Dim OutputFolder As String = FilePath & "_Splits"
If Directory.Exists(FilePath) = False Then
Directory.CreateDirectory(OutputFolder)
End If
' Read the csv column's header
Dim strHeader As String = Reader.ReadLine
' Start splitting
Dim FileIndex As Integer
Do
' Update progress
FileIndex += 1
If Not Status Is Nothing Then
Status.Invoke((FileIndex - 1) * LineCount)
End If
' Check if the number of splitted files doesn't exceed the limit
If (MaxOutputFile < FileIndex) And (MaxOutputFile > 0) Then Exit Do
' Create new file to store a piece of the csv file
Dim PiecePath As String = OutputFolder & "\" & Path.GetFileNameWithoutExtension(FilePath) & "_" & FileIndex & Path.GetExtension(FilePath)
Dim Writer As New StreamWriter(PiecePath, False)
Writer.AutoFlush = False
Writer.WriteLine(strHeader)
' Read and writes precise number of rows
For i As Integer = 1 To LineCount
Dim s As String = Reader.ReadLine()
If s <> Nothing And _IsAbort = False Then
Writer.WriteLine(s)
Else
Writer.Flush()
Writer.Close()
Exit Do
End If
Next
' Flush and close the splitted file
Writer.Flush()
Writer.Close()
Loop
Reader.Close()
End Sub
What I need is to somehow replace invalid characters within my CSV file for example = , ' " .
What I also need is to somehow delete a entire row if a cell is empty in Column A,B,C
I can already tell this is most likely a simple process and somehow its just blowing through me. I am still a beginner in the learning process.
Thank you.
Re: [2010?] Excel split/corrections
This will read ALL lines. Then, you can do ONE replace command
Code:
Imports System.IO
Module Module1
Sub Main()
Dim SourceFile As StreamReader
Try
SourceFile = New StreamReader("test.txt")
Console.WriteLine(SourceFile.ReadToEnd())
SourceFile.Close()
Catch Except As Exception
Console.WriteLine("Debug file does not yet exist")
End Try
End Sub
End Module
Re: [2010?] Excel split/corrections
I'm not sure how a , nor " is an invalid character in a CSV as these are required especially the , and may also occur in many data items.
I would also advise reading the entire file as suggested above.
Would be much simpler if you were using a tab delimited file rather than a CSV
Re: [2010?] Excel split/corrections
I know it seems strange but at work we can only use CSV files for our data uploads to our servers. I spend hours cleaning up CSV files with these characters our servers cannot accept. Then what really kills me is if a cell is empty in Column A,B,C it can corrupt my whole upload. So I was hoping to make something to save my time by a long shot. I sometimes get files of 20k-100k of lines and so far I can split them into 1000 a piece per file which is what I needed, I guess I am just having a hard time implementing a replace command and delete rows. Everything I tried nothing happened.
Re: [2010?] Excel split/corrections
What DB can't read a large file? Or, even OPEN the Excel files?
Might want to look up PowerPivot. It might do what you want?
Re: [2010?] Excel split/corrections
Unfortunately it's not my call, I work for a medical software company and I import patient CSV list cause its the only file accepted. I am like I said making a tool to make my life easier with our very badly set up system lol
So I got your code to work dglienna but not with my Header row check.. man I am very newbish at this but thanks for your help so far its much appreciated as it makes me one step further to efficiency heh
Re: [2010?] Excel split/corrections
Re: [2010?] Excel split/corrections
Ugh I replaced the upper lines of my code with yours as I figured thats where I needed it. Got the split to work again with the headers. I could not get Replace() to work, I tried using strings again and still couldn't. I couldn't even get row's to delete with missing cells.
I think I'm in over my head, I only have so much time to read and learn myself. I start school with software development in December. I apologize if I am annoying to help dglienna. I had help with the original code and I only understand basics.
Thanks for helping this far, I think I used what you gave me right now I don't know how to get Replace() to work. Sorry :(
Re: [2010?] Excel split/corrections
It looks like you are only calling the replace() function on the header row and not on the rows within the file.
Re: [2010?] Excel split/corrections