python


pandas parse csv with newlines


A while ago I used quotes on both sides of my data and read it into pandas pandas parse csv with left and right quote chars now, I also need to support newlines and some weird characters.
Minimum sample below, the first string (temp) will work just fine, but the second one won't parse properly.
import pandas as pd
import os
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>
<foo>$$><$$<bar>$$><$$<baz>"""
temp=u"""<first>$$><$$<second>$$><$$<third>
<foo>$$><$$<bar>$$><$$<baz>
<foo>$$><$$<Green; kkkk 101; aaaa, bbb; [foo<1>>aaa<123>>xxx<1>>zzz<1.17989207 | 18187681 | asdf |>>
;sdf{
}
;ADD{
]>$$><$$<baz>"""
big_df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
engine='python') # we cant use pandas optimized C parser due to our special delimiters.
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
big_df
edit
As outlined in the comment - when putting all onto a single line it works just fine.
How could I automate this maybe via sed/Awk?
awk '{printf("%s ",$0)} END{print ""}' sample.csv will remove all new lines and concatenate everything into a single line. I would rather only want to remove the problematic newlines.
awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' sample.csv will already remove the normal newlines. But still there are the additional blank lines.
So your "real" newlines are marked with $$>\n. Read your file in to string, replace $$>\n with something temporary, remove any remaining newlines, reinsert the "real" newlines, then pass to read_csv().
temp = temp.replace('$$>\n', '%%NEWLINE%%').replace('\n','').replace('%%NEWLINE%%', '\n')
big_df = pd.read_csv(StringIO(temp), ...)

Related Links

How to for loop list all values into a dataframe? Python
Multidimensional vector classification
How do I install and use gdb inside a docker?
Combing two pandas dataframes, weaving same columns index/title next to one another
Is there an effective way to hide an inline keyboard with python telegram bot?
Overriding django current app in a django template
Python script to query google maps and get the resulting URL [closed]
the difference between multiprocessing.sharedctypes.Value and multiprocessing.Value in python
TensorFlow placeholder dimension - what's the difference?
Python “scraping” maps/images
Python terminal output width
SQLAlchemy metadata column type with Postgresql interval hour to second
how to delete excel rows using python
Handling streaming data that gets saved in a file in python
Parsing XML column in SQL Alchemy results
Result wasn't printing, why is `print` ignored after `return`?

Categories

HOME
pug
antivirus
windows-7
visual-studio-2013
google-tag-manager
agile
jbehave
d
sbt-assembly
xtext
vsftpd
google-spreadsheet-api
desktop
dropbear
google-classroom
oclint
cakephp-2.9
r-lavaan
yeoman-generator
tree-traversal
jsonserializer
primes
avplayeritem
compare-and-swap
net-snmp
branch
lync-2013
helper
dpi
ipfw
ocs
rhel.net
readfile
tic-tac-toe
homekit
prototypejs
w3-total-cache
dimple.js
istorage
maintenance
gce
iphone-developer-program
collapsingtoolbarlayout
garrys-mod
barcode-printing
svn-merge
inject
typhoon
gameanalytics
linkageerror
gmsmapview
mfc-feature-pack
altbeacon
except
anythingslider
sonarqube5.3
throughput
function-fitting
wikitext
lmfit
rad
jericho-html-parser
retro-computing
xjc
sqoop2
java-melody
io.js
planetary.js
pic24
broadcasting
flash-cc
jqmobi
javascriptmvc
vmware-server
qtgui
objcmongodb
zend-framework-modules
crocodoc
qsqltablemodel
legacy-code
separation-of-concerns
nssavepanel
zookeeper
easygui
django-nose
msbuildextensionpack
ets
post-redirect-get
boost-date-time
graniteds
visual-c++-2008-express
grooveshark
webrat
oggvorbis
lazy-c++
handheld
dotproject

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App