python


pandas parse csv with newlines


A while ago I used quotes on both sides of my data and read it into pandas pandas parse csv with left and right quote chars now, I also need to support newlines and some weird characters.
Minimum sample below, the first string (temp) will work just fine, but the second one won't parse properly.
import pandas as pd
import os
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>
<foo>$$><$$<bar>$$><$$<baz>"""
temp=u"""<first>$$><$$<second>$$><$$<third>
<foo>$$><$$<bar>$$><$$<baz>
<foo>$$><$$<Green; kkkk 101; aaaa, bbb; [foo<1>>aaa<123>>xxx<1>>zzz<1.17989207 | 18187681 | asdf |>>
;sdf{
}
;ADD{
]>$$><$$<baz>"""
big_df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
engine='python') # we cant use pandas optimized C parser due to our special delimiters.
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
big_df
edit
As outlined in the comment - when putting all onto a single line it works just fine.
How could I automate this maybe via sed/Awk?
awk '{printf("%s ",$0)} END{print ""}' sample.csv will remove all new lines and concatenate everything into a single line. I would rather only want to remove the problematic newlines.
awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' sample.csv will already remove the normal newlines. But still there are the additional blank lines.
So your "real" newlines are marked with $$>\n. Read your file in to string, replace $$>\n with something temporary, remove any remaining newlines, reinsert the "real" newlines, then pass to read_csv().
temp = temp.replace('$$>\n', '%%NEWLINE%%').replace('\n','').replace('%%NEWLINE%%', '\n')
big_df = pd.read_csv(StringIO(temp), ...)

Related Links

__getattr__ keeps returning None even when I attempt to return values
python foreign character in csv
How to pack python files and its dependencies in a single executable file?
Printing Variable names and contents as debugging tool; looking for emacs/Python shortcut
Cheking added file to upload python, pylons?
How to refer to the local module in Python?
Is close() necessary when using iterator on a Python file object [duplicate]
Django Admin “Edit Selection” Action?
How to change firefox proxy from webdriver?
Is it possible to hook up a more robust HTML parser to Python mechanize?
Enable Unicode “globally” in Python
Dynamically import a callable given the full module path?
python chaining
py2app and xml.etree.ElementTree
What is the difference between isinstance('aaa', basestring) and isinstance('aaa', str)?
Is this essential functional programming feature missing from python?

Categories

HOME
tensorflow
laravel-5
visual-studio-2015
google-apps-script
sql-server
cntk
symfony
vb6
voip
windows-7
google-contacts
qpython3
rocketmq
specflow
iis-7.5
pygame
desktop
vuex
lotus-notes
click
outlook-web-addins
ndis
flann
activesync
redux-observable
ng-show
flexboxgrid
esql
google-maps-android-api-2
dspic
csh
hanami
dss
xilinx-ise
fileinfo
aurelia-binding
dhtmlx-scheduler
strophe
ipfw
viewstate
poco-libraries
fax
ical-dotnet
jrules
cppunit
3scale
automator
tuleap
walmart-electrode
nstouchbar
jags
livefyre
sgmlreader
shutdown
slot
slidesjs
launch
backstop.js
tomee
ruby-2.0
metalsmith
gradle-eclipse
selecteditem
financial
aerogear
asp.net-mvc-2
optionbutton
zuora
registrykey
prezto
rx-groovy
callstack
google-earth-plugin
c++-actor-framework
django-1.6
libssh2
method-overriding
colt
magic-numbers
resource-files
magicalrecord-2.2
bfd
java.util.date
driver-signing
nimrod
va-list
xslkey
objcmongodb
fluidsynth
socketstream
mvccontrib
n-layer
tfs-power-tools
redirectstandardoutput
concurrent-programming
shared-objects
datacontract
smooth
rijndael
external-assemblies
graniteds
data-entry
usergroups

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App